Statistical Testing and Hypothesis Testing

Maths: Statistics for machine learning

4 min read

Published Oct 22 2025, updated Oct 23 2025


40
0
0
0

Machine LearningMathsNumPyPandasPythonStatistics

Statistical testing (or hypothesis testing) is a structured method used to make decisions or draw conclusions about a population based on data from a sample.


It helps answer questions like:

“Is there really a difference between two groups, or could the difference be due to random chance?”

In simple terms:

“We test whether the data provide enough evidence to support a claim.”


Key Idea

Statistical testing uses sample data to test a hypothesis about a population parameter (like the mean, proportion, or variance).
It’s a way to quantify uncertainty and make objective decisions rather than relying on guesswork.




Steps in Hypothesis Testing

  1. State the hypotheses - Formulate the null (H₀) and alternative (H₁) hypotheses.
  2. Choose significance level (α) - Decide how much risk of error you’ll accept (commonly 0.05).
  3. Collect and summarise data - Compute sample statistics (mean, variance, etc.).
  4. Calculate test statistic - Use an appropriate formula (z, t, χ², F, etc.) to measure difference.
  5. Compute p-value - Probability of observing the result if H₀ is true.
  6. Make a decision - If p-value < α → reject H₀; otherwise, fail to reject H₀.



1. Null and Alternative Hypotheses

Symbol

Meaning

Example

H₀ (Null Hypothesis)

Assumes no effect or no difference.

“The mean weight = 70 kg.”

H₁ (Alternative Hypothesis)

Suggests there is an effect or difference.

“The mean weight ≠ 70 kg.”


We always start by assuming H₀ is true, then use data to decide whether there’s enough evidence to reject it.




2. Significance Level (α)

The significance level (α) represents the probability of rejecting H₀ when it’s actually true (Type I error).

Common choices:

  • α = 0.05 (5%) → 95% confidence
  • α = 0.01 (1%) → 99% confidence

Smaller α → stricter test (less chance of false positives).




3. Test Statistic

A test statistic measures how far your sample result is from what H₀ predicts — in standardised units (Z, T, F, or χ²).

Test

Use Case

Example

Z-test

Known σ, large n

Comparing mean to population mean

T-test

Unknown σ, small n

Comparing sample means

Chi-square test (χ²)

Categorical data

Testing independence or goodness-of-fit

ANOVA (F-test)

Comparing >2 group means

Checking if at least one group differs




4. P-value and Decision Rule

  • p-value = Probability of observing a test statistic as extreme as (or more extreme than) the one from your data, if H₀ is true.
  • Decision:
    • If p-value ≤ α → Reject H₀ (evidence supports H₁)
    • If p-value > α → Fail to reject H₀ (no strong evidence)

Smaller p-value → stronger evidence against H₀.




5. Types of Errors

Type

Description

Example

Type I Error (α)

Rejecting H₀ when it’s true

Concluding a drug works when it doesn’t

Type II Error (β)

Failing to reject H₀ when it’s false

Missing that a drug actually works


In testing, we balance these errors by adjusting α, sample size, and test power.




6. Confidence Level & Power

  • Confidence Level - 1 − α (probability of not making Type I error)
  • Power of a Test - 1 − β (probability of correctly rejecting a false H₀)

A powerful test (power close to 1) is more likely to detect real effects.




Example (Two-tailed t-test)

Suppose you’re testing whether the average height of a sample differs from 170 cm.

  1. H₀: μ = 170 , H₁: μ ≠ 170
  2. α = 0.05
  3. Compute t-statistic
  4. Compare p-value to 0.05
  5. If p < 0.05 → reject H₀ (significant difference)

The result tells you whether your sample mean differs significantly from 170 cm.




In Machine Learning

  • Model validation - Testing if two models perform significantly differently
  • Feature selection - Checking if a feature significantly impacts target
  • A/B testing - Comparing conversion rates, engagement, etc.
  • Error analysis - Checking if residuals are normally distributed
  • Experiment design - Quantifying uncertainty and statistical significance

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact