Statistical Testing and Hypothesis Testing

Maths: Statistics for machine learning

4 min read

Published Oct 22 2025, updated Oct 23 2025

Machine LearningMathsNumPyPandasPythonStatistics

Statistical testing (or hypothesis testing) is a structured method used to make decisions or draw conclusions about a population based on data from a sample.

It helps answer questions like:

“Is there really a difference between two groups, or could the difference be due to random chance?”

In simple terms:

“We test whether the data provide enough evidence to support a claim.”

Key Idea

Statistical testing uses sample data to test a hypothesis about a population parameter (like the mean, proportion, or variance).
It’s a way to quantify uncertainty and make objective decisions rather than relying on guesswork.

Steps in Hypothesis Testing

State the hypotheses - Formulate the null (H₀) and alternative (H₁) hypotheses.
Choose significance level (α) - Decide how much risk of error you’ll accept (commonly 0.05).
Collect and summarise data - Compute sample statistics (mean, variance, etc.).
Calculate test statistic - Use an appropriate formula (z, t, χ², F, etc.) to measure difference.
Compute p-value - Probability of observing the result if H₀ is true.
Make a decision - If p-value < α → reject H₀; otherwise, fail to reject H₀.

1. Null and Alternative Hypotheses

Symbol	Meaning	Example
H₀ (Null Hypothesis)	Assumes no effect or no difference.	“The mean weight = 70 kg.”
H₁ (Alternative Hypothesis)	Suggests there is an effect or difference.	“The mean weight ≠ 70 kg.”

We always start by assuming H₀ is true, then use data to decide whether there’s enough evidence to reject it.

2. Significance Level (α)

The significance level (α) represents the probability of rejecting H₀ when it’s actually true (Type I error).

Common choices:

α = 0.05 (5%) → 95% confidence
α = 0.01 (1%) → 99% confidence

Smaller α → stricter test (less chance of false positives).

3. Test Statistic

A test statistic measures how far your sample result is from what H₀ predicts — in standardised units (Z, T, F, or χ²).

Test	Use Case	Example
Z-test	Known σ, large n	Comparing mean to population mean
T-test	Unknown σ, small n	Comparing sample means
Chi-square test (χ²)	Categorical data	Testing independence or goodness-of-fit
ANOVA (F-test)	Comparing >2 group means	Checking if at least one group differs

4. P-value and Decision Rule

p-value = Probability of observing a test statistic as extreme as (or more extreme than) the one from your data, if H₀ is true.
Decision:
- If p-value ≤ α → Reject H₀ (evidence supports H₁)
- If p-value > α → Fail to reject H₀ (no strong evidence)

Smaller p-value → stronger evidence against H₀.

5. Types of Errors

Type	Description	Example
Type I Error (α)	Rejecting H₀ when it’s true	Concluding a drug works when it doesn’t
Type II Error (β)	Failing to reject H₀ when it’s false	Missing that a drug actually works

In testing, we balance these errors by adjusting α, sample size, and test power.