If you’ve ever collected survey data using Likert scales (e.g., “Strongly Disagree” → “Strongly Agree”), you might wonder: Are my questions really measuring what I think they are?
This is where factor analysis comes in. It’s a statistical tool that checks whether your survey items cluster together into meaningful groups (called factors). For example, ten questions about “job satisfaction” might actually form two factors: career growth and work environment.
But to do factor analysis correctly, you need to understand the metrics that tell you if your model is good. This guide covers all the essentials, step by step.
🔹 Step 1: Check Data Suitability
Before running factor analysis, you need to check if your data is appropriate.
Metrics to use:
- Kaiser-Meyer-Olkin (KMO) Test
- Measures sampling adequacy.
- Threshold: ≥ 0.6 acceptable, ≥ 0.7 good, ≥ 0.8 great, ≥ 0.9 superb.
- Bartlett’s Test of Sphericity
- Tests if your variables are sufficiently correlated.
- Threshold: p < 0.05 (significant = suitable for factor analysis).
Step 2: Extract Factors
Now, the software will start grouping items. But how many factors should you keep?
Metrics to use:
- Eigenvalues
- Reflect variance explained by each factor.
- Threshold: Keep factors with eigenvalues > 1 (Kaiser’s rule).
- Scree Plot
- A visual check: look for the “elbow” point where the slope levels off. Keep factors before the elbow.
- Parallel Analysis (more robust)
- Compares actual eigenvalues with those from random data. Keep factors above random cutoffs.
Step 3: Assess Factor Loadings
Factor loadings show how strongly each question relates to its factor.
Thresholds:
- ≥ 0.30 → minimal
- ≥ 0.40 → acceptable
- ≥ 0.50 → good
- ≥ 0.70 → very strong
If an item has a low loading (<0.4) or loads on multiple factors (cross-loading), consider revising or removing it.
Step 4: Improve Interpretability with Rotation
Rotation makes your factor solution easier to read.
- Varimax (orthogonal): Use if factors are independent.
- Oblimin/Promax (oblique): Use if factors are correlated (common in social science).
Threshold for clean interpretation: Each item should load high on one factor and low on others.
Step 5: Check Reliability
Once factors are identified, test if they’re reliable.
Metrics to use:
- Cronbach’s Alpha
- Checks internal consistency.
- Threshold: ≥ 0.7 acceptable, ≥ 0.8 good, ≥ 0.9 excellent.
- Composite Reliability (CR)
- More accurate in CFA.
- Threshold: ≥ 0.7 good.
Step 6: Assess Validity
Beyond reliability, you need to ensure constructs are valid.
- Convergent Validity
- Items measuring the same construct should correlate strongly.
- Measured using Average Variance Extracted (AVE).
- Threshold: AVE ≥ 0.5.
- Discriminant Validity
- Constructs should be distinct from each other.
- Check if square root of AVE > correlations with other constructs.
- Model Fit (for CFA/SEM):
- χ²/df (Chi-square/df): < 3 good.
- RMSEA (Root Mean Square Error of Approximation): < 0.08 acceptable, < 0.05 excellent.
- CFI (Comparative Fit Index): ≥ 0.90 good, ≥ 0.95 excellent.
- TLI (Tucker-Lewis Index): ≥ 0.90 good, ≥ 0.95 excellent.
- SRMR (Standardized Root Mean Square Residual): < 0.08 acceptable.
Final Checklist
Here’s a quick cheat sheet for factor analysis on Likert-scale data:
- KMO ≥ 0.6 → Sampling adequacy.
- Bartlett’s Test p < 0.05 → Data suitable.
- Eigenvalues > 1 & Scree Plot → Factor retention.
- Factor Loadings ≥ 0.4 → Keep strong items.
- Cronbach’s Alpha ≥ 0.7 → Reliability check.
- AVE ≥ 0.5, CR ≥ 0.7 → Convergent validity.
- Square root of AVE > correlations → Discriminant validity.
- CFI/TLI ≥ 0.90, RMSEA < 0.08 → Model fit.
Final Thoughts
Factor analysis helps you turn messy Likert-scale responses into clear, validated constructs you can trust. The key is not just running the analysis but also checking the metrics carefully.
In short:
- First, ensure your data is suitable (KMO, Bartlett).
- Then, extract and refine factors (eigenvalues, loadings, rotation).
- Finally, test reliability and validity (Alpha, AVE, CR, CFI, RMSEA).
Do all that, and your survey will stand on solid statistical ground—ready for publication or professional reporting.