Pearsons correlation test

Maths: Statistics for machine learning

2 min read

Published Oct 22 2025, updated Oct 23 2025


40
0
0
0

Machine LearningMathsNumPyPandasPythonStatistics

Pearson’s correlation measures the strength and direction of the linear relationship between two continuous variables.


It tells you:

“As one variable changes, how does the other tend to change — and how strongly?”



The Formula

Pearsons Formula

Where:

  • xi, yi​ = individual data points
  • X̅, ȳ​ = sample means
  • r ranges between –1 and +1



Interpretation of r

r value

Relationship

Description

+1.0

Perfect positive

As X increases, Y increases perfectly

+0.7 to +0.9

Strong positive

X and Y rise together strongly

+0.3 to +0.6

Moderate positive

X and Y loosely rise together

0

None

No linear relationship

–0.3 to –0.6

Moderate negative

X increases → Y decreases moderately

–0.7 to –0.9

Strong negative

X increases → Y decreases strongly

–1.0

Perfect negative

As X increases, Y decreases perfectly




Assumptions of Pearson’s r

  • Continuous variables - Both X and Y must be numeric
  • Linearity - Relationship between X and Y is linear
  • Normality - Each variable approximately normal
  • No significant outliers - Outliers can distort correlation
  • Homoscedasticity - Constant variance of Y across X values

If these assumptions don’t hold → use Spearman’s rank correlation instead.




Example in Python

Let’s test the relationship between hours studied and exam score.

import numpy as np
from scipy.stats import pearsonr

# Example data
hours_studied = np.array([2, 3, 4, 5, 6, 7, 8, 9])
exam_score = np.array([50, 55, 61, 65, 70, 74, 80, 85])

# Calculate Pearson correlation
r, p = pearsonr(hours_studied, exam_score)

print(f"Pearson's r: {r:.3f}")
print(f"P-value: {p:.4f}")

if p < 0.05:
    print("Reject H₀ — there is a significant correlation.")
else:
    print("Fail to reject H₀ — no significant correlation.")

Example Output:

Pearson's r: 0.991
P-value: 0.0000
Reject H₀ — strong positive correlation.



Hypothesis Testing

  • H₀ (Null Hypothesis) - There is no linear relationship (r = 0)
  • H₁ (Alternative Hypothesis) - There is a linear relationship (r ≠ 0)

If p ≤ 0.05, reject H₀ → significant correlation

If p > 0.05, fail to reject H₀ → no significant linear correlation




Visual example

Pearsons Visualisation

The closer the points are to the red line, the stronger the correlation.




Limitations

  • Measures only linear relationships - Won’t capture non-linear patterns
  • Sensitive to outliers - One extreme point can distort r
  • Correlation ≠ causation - X and Y may move together without direct influence

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact