Type I and Type II Errors

Maths: Statistics for machine learning

3 min read

Published Oct 22 2025, updated Oct 23 2025

Machine LearningMathsNumPyPandasPythonStatistics

When we perform hypothesis testing, we make decisions based on sample data - but there’s always a chance we make a wrong decision about the population.

These mistakes are called Type I and Type II errors.

Hypothesis Setup Reminder

H₀ (Null Hypothesis): “There is no effect or difference.”
H₁ (Alternative Hypothesis): “There is an effect or difference.”

The test decides whether to reject H₀ or fail to reject H₀ - but since we’re working with samples, we can’t be 100% certain.

Two Possible Truths vs Two Possible Decisions

Reality (Truth)	Decision	Result
H₀ is true	Reject H₀	Type I Error (false positive)
H₀ is true	Fail to reject H₀	Correct
H₀ is false	Reject H₀	Correct
H₀ is false	Fail to reject H₀	Type II Error (false negative)

Type I Error (False Positive)

You reject the null hypothesis when it’s actually true.
You think there’s an effect or difference, but there isn’t.

Symbol: α (alpha)

Example:

Concluding a drug works when it doesn’t.
Detecting a pattern in data that’s just random noise.
Believing a model improvement is real when it’s due to chance.

Analogy:

“You accused an innocent person.”

Controlled By:

The significance level (α) — usually set at 0.05 → meaning a 5% chance you’ll wrongly reject a true null.

Type II Error (False Negative)

You fail to reject H₀ when it’s actually false.
You miss a real effect.

Symbol: β (beta)

Example:

Concluding a drug has no effect when it actually does.
Failing to detect a real improvement in model performance.
Missing a real correlation between two variables.

Analogy:

“You let a guilty person go free.”

Controlled By:

The power of the test (1 − β) — higher power = lower chance of Type II error.

Visual Summary

Imagine overlapping curves:

The left curve = sampling distribution if H₀ is true
The right curve = sampling distribution if H₁ is true

The critical region (α) is the area where you reject H₀.

If your data fall there when H₀ is true → Type I error
If your data fall outside when H₁ is true → Type II error

Increasing sample size narrows both curves → reduces both types of error.

Key Metrics

Concept	Symbol	Definition
Type I Error Rate	α	Probability of rejecting H₀ when true
Type II Error Rate	β	Probability of failing to reject H₀ when false
Power of Test	1 − β	Probability of correctly rejecting a false H₀

Goal:
Keep α low (e.g., 0.05) and power high (e.g., ≥ 0.8).

Real-World Examples

Scenario	Type I Error	Type II Error
Medical test	Diagnosing a healthy person as sick (false alarm)	Missing a real illness
Spam filter	Marking a real email as spam	Missing a spam email
Website A/B test	Thinking version B improves conversion when it doesn’t	Failing to detect that version B really helps
Machine learning feature test	Believing a feature improves accuracy when it doesn’t	Missing a truly useful feature

Balancing Errors

Reducing one type of error usually increases the other:

Lowering α (making the test stricter) → fewer false positives, but more false negatives.
Increasing α (being more lenient) → fewer false negatives, but more false positives.

The balance depends on context: