Correlation Tests
SciPy - Statistical Testing
3 min read
Published Nov 17 2025
Guide Sections
Guide Comments
Correlation tests answer a simple question:
“Are these two variables related — and how strongly?”
SciPy provides three main correlation measures:
Correlation Type | Measures | Best For |
Pearson | Linear relationship | Continuous, normally distributed data |
Spearman | Monotonic relationship | Ordinal, ranked, or non-normal data |
Kendall | Rank agreement | Small samples, ties in data |
Pearson Correlation
Measures linear relationship between two continuous variables
Use when:
- Both variables are continuous
- Relationship looks linear
- Normality assumption is reasonable
- No major outliers
Example
Outputs
corr→ correlation coefficient (range: -1 to +1)p→ p-value for the significance of the correlation
Interpretation
corr = 1.0→ perfect positive linear relationshipcorr = -1.0→ perfect negative linear relationshipcorr = 0→ no linear relationship
Practical notes
- Sensitive to outliers
- Only captures linear relationships
- If data is skewed → use Spearman instead
pearman Rank Correlation
Monotonic relationship (non-parametric). Spearman correlates the rank order of the values, not the raw values.
Use when:
- Data is not normal
- Relationship is monotonic (always increasing or decreasing, not necessarily linear)
- Data is ordinal (e.g., Likert scales)
- There are outliers
Example
Interpretation
- Same coefficient range (-1 to +1)
- But correlation is based on ranking
Practical notes
- More robust than Pearson
- Does not assume normality
- Great for messy real-world data
Kendall’s Tau
Rank correlation robust to ties (non-parametric)
Use when:
- Sample sizes are small (< 20)
- Data contains many ties (duplicate values)
- You want a robust rank-based measure
Example
Interpretation
- Correlation coefficient usually smaller in magnitude than Pearson or Spearman
- Works well on small or noisy datasets
Practical notes
- Most robust to ties
- Slowest with large datasets
- Rarely used in large-sample applied work, but great for small surveys
When to Use Which Correlation Test
Situation | Best Test |
Data is continuous and linear | Pearson |
Data is continuous but non-normal | Spearman |
Relationship is monotonic but not linear | Spearman |
Data contains many ties | Kendall |
Sample size is small | Kendall |
Data is ordinal (Likert scale) | Spearman |
Scatterplots
Always pair correlation with a plot.
Example:
Visual checks reveal:
- Non-linear patterns
- Outliers
- Clusters
- Heteroscedasticity (unequal variance)
Correlation Matrices
Useful for exploring many variables at once.
Using NumPy + SciPy:
Spearman matrix via pandas:
Multiple Correlation Tests (Loop Example)
When testing many variable pairs:
Effect Sizes for Correlation
Correlation coefficient is already an effect size.
Heuristic (Cohen's guidelines):
- 0.10 → small
- 0.30 → medium
- 0.50 → large
This applies to Pearson and Spearman. Kendall uses slightly smaller thresholds.
Partial Correlation (Not in SciPy)
To measure correlation while controlling for another variable, use Pingouin or Statsmodels.
Example (Pingouin):
Practical Examples — When Correlation Tests Matter
Example 1 — Customer behaviour
- Time on site vs purchase amount
- Use Spearman (non-normal)
Example 2 — Medical data
- Age vs blood pressure
- Likely Pearson
Example 3 — Finance
- Returns of two stocks
- Pearson or Spearman depending on distribution
Example 4 — Survey analysis
- Satisfaction rating vs recommendation likelihood
- Spearman or Kendall (ordinal data)
Example 5 — Performance metrics
- CPU usage vs response time
- Often non-linear → Spearman














