P-Value (Probability Value)
Maths: Statistics for machine learning
3 min read
Published Oct 22 2025, updated Oct 23 2025
Guide Sections
Guide Comments
In hypothesis testing, the p-value measures the probability of observing results as extreme as (or more extreme than) the ones in your sample, if the null hypothesis (H₀) were true.
In simple terms:
“The p-value tells you how surprising your data are, assuming the null hypothesis is correct.”
How It Fits in Hypothesis Testing
- You start by assuming H₀ is true (for example, “there’s no difference between two groups”).
- You then use your data to calculate a test statistic (like a z, t, or χ² value).
- The p-value tells you the likelihood of getting that statistic (or something more extreme) just by random chance.
Small p-value → data are unlikely under H₀ → evidence against H₀
Large p-value → data are consistent with H₀ → not enough evidence to reject H₀
Interpretation Rules
p-value | Interpretation | Decision (if α = 0.05) |
p ≤ 0.05 | Strong evidence against H₀ | Reject H₀ |
p > 0.05 | Weak evidence against H₀ | Fail to reject H₀ |
(The threshold α = 0.05 means you’re willing to accept a 5% chance of being wrong if you reject H₀.)
Example
Suppose you’re testing whether a new drug lowers blood pressure.
- H₀: The drug has no effect (mean difference = 0).
- You collect sample data and calculate p = 0.03.
Interpretation:
There’s a 3% chance of observing a result this extreme if the drug truly had no effect.
Because 0.03 < 0.05, the result is statistically significant, so you reject H₀ and conclude the drug likely works.
Common Misunderstandings
Misconception | Correct Understanding |
“p = 0.03 means H₀ is false.” | No — it means the data are unlikely if H₀ is true. |
“1 − p is the probability H₁ is true.” | No — p-values don’t give probabilities of hypotheses. |
“A smaller p means a bigger effect.” | Not necessarily — p-values depend on sample size and variance. |
“p > 0.05 means H₀ is true.” | It just means there’s not enough evidence to reject it. |
Relationship with α (Significance Level)
- α (alpha) is the cutoff you set before testing (e.g., 0.05).
- The p-value is what you calculate from your data.
- If p ≤ α → Reject H₀, else Fail to reject H₀.
Think of it like:
“α is your threshold for evidence; p is the actual evidence you got.”
Graphical Intuition
Imagine a bell curve (sampling distribution under H₀):
- The centre is where results are most likely if H₀ is true.
- The tails represent rare, extreme outcomes.
- The p-value is the area under the curve in those tails - the probability of getting results as extreme as yours.
Smaller p-value → smaller tail area → stronger evidence against H₀
Two-Tailed Example
If you’re testing whether a mean is different (not just higher or lower), that means:
- You measure how far your result is from the mean.
- Then you double that probability (because you check both tails).
In Machine Learning
- A/B testing - To check if model A significantly outperforms model B
- Feature importance - To test if a feature significantly affects the target
- Model evaluation - To see if differences in accuracy are statistically significant
- Data analysis - To detect real patterns vs random noise














