Pearsons correlation test
Maths: Statistics for machine learning
2 min read
Published Oct 22 2025, updated Oct 23 2025
Guide Sections
Guide Comments
Pearson’s correlation measures the strength and direction of the linear relationship between two continuous variables.
It tells you:
“As one variable changes, how does the other tend to change — and how strongly?”
The Formula

Where:
- xi, yi = individual data points
- X̅, ȳ = sample means
- r ranges between –1 and +1
Interpretation of r
r value | Relationship | Description |
+1.0 | Perfect positive | As X increases, Y increases perfectly |
+0.7 to +0.9 | Strong positive | X and Y rise together strongly |
+0.3 to +0.6 | Moderate positive | X and Y loosely rise together |
0 | None | No linear relationship |
–0.3 to –0.6 | Moderate negative | X increases → Y decreases moderately |
–0.7 to –0.9 | Strong negative | X increases → Y decreases strongly |
–1.0 | Perfect negative | As X increases, Y decreases perfectly |
Assumptions of Pearson’s r
- Continuous variables - Both X and Y must be numeric
- Linearity - Relationship between X and Y is linear
- Normality - Each variable approximately normal
- No significant outliers - Outliers can distort correlation
- Homoscedasticity - Constant variance of Y across X values
If these assumptions don’t hold → use Spearman’s rank correlation instead.
Example in Python
Let’s test the relationship between hours studied and exam score.
Example Output:
Hypothesis Testing
- H₀ (Null Hypothesis) - There is no linear relationship (r = 0)
- H₁ (Alternative Hypothesis) - There is a linear relationship (r ≠ 0)
If p ≤ 0.05, reject H₀ → significant correlation
If p > 0.05, fail to reject H₀ → no significant linear correlation
Visual example

The closer the points are to the red line, the stronger the correlation.
Limitations
- Measures only linear relationships - Won’t capture non-linear patterns
- Sensitive to outliers - One extreme point can distort r
- Correlation ≠ causation - X and Y may move together without direct influence














