Type I and Type II Errors
Maths: Statistics for machine learning
3 min read
Published Oct 22 2025, updated Oct 23 2025
Guide Sections
Guide Comments
When we perform hypothesis testing, we make decisions based on sample data - but there’s always a chance we make a wrong decision about the population.
These mistakes are called Type I and Type II errors.
Hypothesis Setup Reminder
- H₀ (Null Hypothesis): “There is no effect or difference.”
- H₁ (Alternative Hypothesis): “There is an effect or difference.”
The test decides whether to reject H₀ or fail to reject H₀ - but since we’re working with samples, we can’t be 100% certain.
Two Possible Truths vs Two Possible Decisions
Reality (Truth) | Decision | Result |
H₀ is true | Reject H₀ | Type I Error (false positive) |
H₀ is true | Fail to reject H₀ | Correct |
H₀ is false | Reject H₀ | Correct |
H₀ is false | Fail to reject H₀ | Type II Error (false negative) |
Type I Error (False Positive)
- You reject the null hypothesis when it’s actually true.
- You think there’s an effect or difference, but there isn’t.
Symbol: α (alpha)
Example:
- Concluding a drug works when it doesn’t.
- Detecting a pattern in data that’s just random noise.
- Believing a model improvement is real when it’s due to chance.
Analogy:
“You accused an innocent person.”
Controlled By:
The significance level (α) — usually set at 0.05 → meaning a 5% chance you’ll wrongly reject a true null.
Type II Error (False Negative)
- You fail to reject H₀ when it’s actually false.
- You miss a real effect.
Symbol: β (beta)
Example:
- Concluding a drug has no effect when it actually does.
- Failing to detect a real improvement in model performance.
- Missing a real correlation between two variables.
Analogy:
“You let a guilty person go free.”
Controlled By:
The power of the test (1 − β) — higher power = lower chance of Type II error.
Visual Summary
Imagine overlapping curves:
- The left curve = sampling distribution if H₀ is true
- The right curve = sampling distribution if H₁ is true
The critical region (α) is the area where you reject H₀.
- If your data fall there when H₀ is true → Type I error
- If your data fall outside when H₁ is true → Type II error
Increasing sample size narrows both curves → reduces both types of error.
Key Metrics
Concept | Symbol | Definition |
Type I Error Rate | α | Probability of rejecting H₀ when true |
Type II Error Rate | β | Probability of failing to reject H₀ when false |
Power of Test | 1 − β | Probability of correctly rejecting a false H₀ |
Goal:
Keep α low (e.g., 0.05) and power high (e.g., ≥ 0.8).
Real-World Examples
Scenario | Type I Error | Type II Error |
Medical test | Diagnosing a healthy person as sick (false alarm) | Missing a real illness |
Spam filter | Marking a real email as spam | Missing a spam email |
Website A/B test | Thinking version B improves conversion when it doesn’t | Failing to detect that version B really helps |
Machine learning feature test | Believing a feature improves accuracy when it doesn’t | Missing a truly useful feature |
Balancing Errors
Reducing one type of error usually increases the other:
- Lowering α (making the test stricter) → fewer false positives, but more false negatives.
- Increasing α (being more lenient) → fewer false negatives, but more false positives.
The balance depends on context:
- In medicine → minimise Type I (avoid false claims).
- In fraud detection → minimise Type II (catch all fraud).














