Correlation Tests

SciPy - Statistical Testing

3 min read

Published Nov 17 2025

PythonSciPyStatistics

Correlation tests answer a simple question:

“Are these two variables related — and how strongly?”

SciPy provides three main correlation measures:

Correlation Type	Measures	Best For
Pearson	Linear relationship	Continuous, normally distributed data
Spearman	Monotonic relationship	Ordinal, ranked, or non-normal data
Kendall	Rank agreement	Small samples, ties in data

Pearson Correlation

Measures linear relationship between two continuous variables

Use when:

Both variables are continuous
Relationship looks linear
Normality assumption is reasonable
No major outliers

Example

x = [1, 2, 3, 4, 5]

y = [2, 4, 5, 4, 5]

corr, p = stats.pearsonr(x, y)

print(corr, p)

Outputs

corr → correlation coefficient (range: -1 to +1)
p → p-value for the significance of the correlation

Interpretation

corr = 1.0 → perfect positive linear relationship
corr = -1.0 → perfect negative linear relationship
corr = 0 → no linear relationship

Practical notes

Sensitive to outliers
Only captures linear relationships
If data is skewed → use Spearman instead

pearman Rank Correlation

Monotonic relationship (non-parametric). Spearman correlates the rank order of the values, not the raw values.

Use when:

Data is not normal
Relationship is monotonic (always increasing or decreasing, not necessarily linear)
Data is ordinal (e.g., Likert scales)
There are outliers

Example

x = [1, 2, 3, 4, 5]

y = [10, 20, 30, 40, 45]

corr, p = stats.spearmanr(x, y)

print(corr, p)

Interpretation

Same coefficient range (-1 to +1)
But correlation is based on ranking

Practical notes

More robust than Pearson
Does not assume normality
Great for messy real-world data

Kendall’s Tau

Rank correlation robust to ties (non-parametric)

Use when:

Sample sizes are small (< 20)
Data contains many ties (duplicate values)
You want a robust rank-based measure

Example

corr, p = stats.kendalltau(x, y)

print(corr, p)

Interpretation

Correlation coefficient usually smaller in magnitude than Pearson or Spearman
Works well on small or noisy datasets

Practical notes

Most robust to ties
Slowest with large datasets
Rarely used in large-sample applied work, but great for small surveys

When to Use Which Correlation Test

Situation	Best Test
Data is continuous and linear	Pearson
Data is continuous but non-normal	Spearman
Relationship is monotonic but not linear	Spearman
Data contains many ties	Kendall
Sample size is small	Kendall
Data is ordinal (Likert scale)	Spearman