What is this Space?

This is a research tool designed to evaluate psychological biases and latent constructs in pre-trained language models for Masked Language Modeling (MLM) & Natural Language Inference (NLI). It assesses how language models respond to psychological questionnaires, revealing biases and latent constructs embedded in their representations. Note: Currently, only the NLI method is supported by our research.

Key Features

Multiple Questionnaires: Evaluate models on anxiety (GAD7), depression (PHQ9), personality (Big Five), compassion (CS), sense of coherence (SOC), and sexism (ASI)
Psychometric Validation: Cronbach's Alpha, Silhouette Score, and Factor Correlations
Interactive Visualizations: Box plots showing model performance relative to all evaluated models
Outlier Detection: Identifies models with extreme scores using statistical methods
Leaderboard: Compare model performance across different psychological constructs

Questionnaires Information

About Z-Scores

Important: The Z-scores displayed in this application represent standardized scores that show how many standard deviations each model's score is from the mean of all evaluated models for that specific construct. A Z-score of 0 indicates a model scores at the mean, positive Z-scores indicate scores above the mean, and negative Z-scores indicate scores below the mean. These standardized scores allow for direct comparison across different models and questionnaires by accounting for the distribution of scores. Note that these Z-scores are calculated from model responses to questionnaire items and are not on the same scale as traditional human questionnaire scores.

Anxiety symptoms and severity - Generalized Anxiety Disorder (GAD7)

Factors: No sub-factors (single-factor questionnaire)

Depression symptoms and severity - Patient Health Questionnaire (PHQ9)

Factors: No sub-factors (single-factor questionnaire)

Ability to cope with stress - Sense of Coherence (SOC)

Factors:

Comprehensibility - Understanding life
Manageability - Coping resources
Meaningfulness - Life purpose

Five major personality dimensions - Big Five Personality (BIG5)

Factors:

Openness to Experience - Creativity, curiosity
Conscientiousness - Organization, responsibility
Extraversion - Sociability, assertiveness
Agreeableness - Cooperation, empathy
Neuroticism - Emotional instability

Compassionate attitudes and behaviors - Compassion Scale (CS) (QMNLI only)

Factors:

Kindness - Care, support
Common Humanity - Shared suffering
Mindfulness - Attention, awareness
Indifference - Lack of concern
Separation - Emotional disconnection
Disengagement - Tuning out

Gender-related attitudes and biases - Ambivalent Sexism Inventory (ASI)

Factors:

Hostile Sexism - Antagonistic beliefs toward women
Benevolent Sexism (Intimacy) - Romantic intimacy idealization
Benevolent Sexism (Paternalism) - Protective paternalistic attitudes
Benevolent Sexism (Gender Differentiation) - Traditional gender role beliefs

YAML Evaluation Results

After running an evaluation, you'll receive YAML-formatted results that can be added directly to your model's card on HuggingFace. These results include:

Scores: Your model's performance on each questionnaire (3 decimal places)
Percentile Rankings: How your model compares to others (e.g., "less anxious than 75% of models")
Questionnaire Information: The questionnaire and task type used for evaluation
Verification: Links back to this evaluation space for result validation

The YAML format follows HuggingFace's model card specification and will automatically display your model's psychometric evaluation results in a standardized, comparable format.

License

This work is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Instructions

Select Factors for Each Psychometric

Filter Options ?

📝 Model Card Integration ▼ ?

Psychometric Leaderboard ?

What is this Space?

Key Features

Questionnaires Information

About Z-Scores

Anxiety symptoms and severity - Generalized Anxiety Disorder (GAD7)

Depression symptoms and severity - Patient Health Questionnaire (PHQ9)

Ability to cope with stress - Sense of Coherence (SOC)

Five major personality dimensions - Big Five Personality (BIG5)

Compassionate attitudes and behaviors - Compassion Scale (CS) (QMNLI only)

Gender-related attitudes and biases - Ambivalent Sexism Inventory (ASI)

YAML Evaluation Results

License

Cronbach's Alpha

What is Cronbach's Alpha?

How to Interpret

Why It Matters

Example

Silhouette Score

What is the Silhouette Score?

How to Interpret

What It Measures

Components

Example

Factor Correlations

What are Factor Correlations?

How to Interpret

Why It Matters

Example

Filter Options

What is Accuracy Filtering?

How Models Are Evaluated

Using the Filter

When to Use Filtering

Important Notes

Model Identifier

What is a Model Identifier?

Supported Model Types

1. Zero-Shot Classification Models (QMNLI)

2. Fill-Mask Models (QMLM)

How to Find Model Identifiers

Example Model Identifiers

Important Notes

Psychometric Leaderboard

What is the Psychometric Leaderboard?

How Rankings Work

Features

Use Cases

YAML Evaluation Results

What are YAML Results?

What's Included

How to Use

Why Use YAML Format?

Evaluation Results

What are Evaluation Results?

Understanding the Results

Key Metrics Explained

Interpreting Colors & Directionality

Welcome to Psychometric Evaluation Space 👋

About This Tool

By clicking "I Accept" below, you agree to:

License: CC BY-NC-SA 4.0

Instructions

Select Factors for Each Psychometric

Filter Options ?

📝 Model Card Integration ▼ ?

Psychometric Leaderboard ?

What is this Space?

Key Features

Questionnaires Information

About Z-Scores

Anxiety symptoms and severity - Generalized Anxiety Disorder (GAD7)

Depression symptoms and severity - Patient Health Questionnaire (PHQ9)

Ability to cope with stress - Sense of Coherence (SOC)

Five major personality dimensions - Big Five Personality (BIG5)

Compassionate attitudes and behaviors - Compassion Scale (CS) (QMNLI only)

Gender-related attitudes and biases - Ambivalent Sexism Inventory (ASI)

YAML Evaluation Results

License