Correlation Tests

SciPy - Statistical Testing

3 min read

Published Nov 17 2025


9
0
0
0

PythonSciPyStatistics

Correlation tests answer a simple question:

“Are these two variables related — and how strongly?”


SciPy provides three main correlation measures:

Correlation Type

Measures

Best For

Pearson

Linear relationship

Continuous, normally distributed data

Spearman

Monotonic relationship

Ordinal, ranked, or non-normal data

Kendall

Rank agreement

Small samples, ties in data






Pearson Correlation

Measures linear relationship between two continuous variables


Use when:

  • Both variables are continuous
  • Relationship looks linear
  • Normality assumption is reasonable
  • No major outliers

Example

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

corr, p = stats.pearsonr(x, y)
print(corr, p)


Outputs

  • corr → correlation coefficient (range: -1 to +1)
  • p → p-value for the significance of the correlation

Interpretation

  • corr = 1.0 → perfect positive linear relationship
  • corr = -1.0 → perfect negative linear relationship
  • corr = 0 → no linear relationship

Practical notes

  • Sensitive to outliers
  • Only captures linear relationships
  • If data is skewed → use Spearman instead





pearman Rank Correlation

Monotonic relationship (non-parametric). Spearman correlates the rank order of the values, not the raw values.


Use when:

  • Data is not normal
  • Relationship is monotonic (always increasing or decreasing, not necessarily linear)
  • Data is ordinal (e.g., Likert scales)
  • There are outliers

Example

x = [1, 2, 3, 4, 5]
y = [10, 20, 30, 40, 45]

corr, p = stats.spearmanr(x, y)
print(corr, p)

Interpretation

  • Same coefficient range (-1 to +1)
  • But correlation is based on ranking

Practical notes

  • More robust than Pearson
  • Does not assume normality
  • Great for messy real-world data





Kendall’s Tau

Rank correlation robust to ties (non-parametric)


Use when:

  • Sample sizes are small (< 20)
  • Data contains many ties (duplicate values)
  • You want a robust rank-based measure

Example

corr, p = stats.kendalltau(x, y)
print(corr, p)

Interpretation

  • Correlation coefficient usually smaller in magnitude than Pearson or Spearman
  • Works well on small or noisy datasets

Practical notes

  • Most robust to ties
  • Slowest with large datasets
  • Rarely used in large-sample applied work, but great for small surveys





When to Use Which Correlation Test

Situation

Best Test

Data is continuous and linear

Pearson

Data is continuous but non-normal

Spearman

Relationship is monotonic but not linear

Spearman

Data contains many ties

Kendall

Sample size is small

Kendall

Data is ordinal (Likert scale)

Spearman






Scatterplots

Always pair correlation with a plot.

Example:

import matplotlib.pyplot as plt

plt.scatter(x, y)
plt.xlabel("X")
plt.ylabel("Y")
plt.show()


Visual checks reveal:

  • Non-linear patterns
  • Outliers
  • Clusters
  • Heteroscedasticity (unequal variance)





Correlation Matrices

Useful for exploring many variables at once.


Using NumPy + SciPy:

data = np.column_stack((x, y, z))
corr_matrix = np.corrcoef(data, rowvar=False)
print(corr_matrix)


Spearman matrix via pandas:

import pandas as pd

df = pd.DataFrame({"x": x, "y": y, "z": z})
print(df.corr(method="spearman"))






Multiple Correlation Tests (Loop Example)

When testing many variable pairs:

df = pd.DataFrame({
    "a": np.random.randn(20),
    "b": np.random.randn(20),
    "c": np.random.randn(20)
})

for col1 in df.columns:
    for col2 in df.columns:
        if col1 < col2: # avoid duplicates
            corr, p = stats.spearmanr(df[col1], df[col2])
            print(col1, col2, corr, p)





Effect Sizes for Correlation

Correlation coefficient is already an effect size.


Heuristic (Cohen's guidelines):

  • 0.10 → small
  • 0.30 → medium
  • 0.50 → large

This applies to Pearson and Spearman. Kendall uses slightly smaller thresholds.






Partial Correlation (Not in SciPy)

To measure correlation while controlling for another variable, use Pingouin or Statsmodels.


Example (Pingouin):

pip install pingouin
import pingouin as pg

pg.partial_corr(data=df, x='x', y='y', covar='z')





Practical Examples — When Correlation Tests Matter

Example 1 — Customer behaviour

  • Time on site vs purchase amount
  • Use Spearman (non-normal)

Example 2 — Medical data

  • Age vs blood pressure
  • Likely Pearson

Example 3 — Finance

  • Returns of two stocks
  • Pearson or Spearman depending on distribution

Example 4 — Survey analysis

  • Satisfaction rating vs recommendation likelihood
  • Spearman or Kendall (ordinal data)

Example 5 — Performance metrics

  • CPU usage vs response time
  • Often non-linear → Spearman

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact