T-Statistic, Student’s t-Test, and Hypothesis Testing

Maths: Statistics for machine learning

6 min read

Published Oct 22 2025, updated Oct 23 2025

Machine LearningMathsNumPyPandasPythonStatistics

A t-test is a statistical hypothesis test used to determine whether there’s a significant difference between sample means, or between a sample mean and a known value, when the population standard deviation (σ) is unknown and/or the sample size is small (n < 30).

It’s based on the Student’s t-distribution, which adjusts for extra uncertainty when σ is estimated from the sample.

In simple terms:

“A t-test checks if your sample mean (or difference between samples) is big enough to suggest a real effect, not just random noise.”

1. Student’s t-Distribution

What It Is

A family of distributions similar to the normal (bell curve), but with heavier tails — allowing for more variability when sample size is small.
As the sample size increases, the t-distribution approaches the normal distribution.

Properties:

Shape - Symmetrical and bell-shaped
Tails - Heavier than Normal — more extreme values possible
Degrees of Freedom (df) - df=n−1 for a single sample
Converges to Z-distribution - When n → ∞

2. The T-Statistic

The t-statistic measures how many standard errors the sample mean is from the hypothesised population mean.

Where:

X̅ = sample mean
μ₀ = population mean (under H₀)
s = sample standard deviation
n = sample size

If |t| is large → sample mean is far from μ_₀ → evidence against H₀.

3. Types of T-Tests

Test Type	Purpose	Example
One-sample t-test	Compare a sample mean to a known population mean	“Is the mean score different from 70?”
Independent two-sample t-test	Compare means of two independent groups	“Do men and women have different average heights?”
Paired (dependent) t-test	Compare two related samples (before/after, matched pairs)	“Did students score higher after training?”

One-Sample T-Test

Hypotheses

H_₀: μ = μ_₀
H_₁: μ ≠ μ_₀ (two-tailed)

Formula

Compare |t| to the critical t-value from the t-distribution (based on df = n-1, α = 0.05).
If |t| > t₍critical₎ → reject H₀.

Independent Two-Sample T-Test

Tests whether two independent samples have different means.

where s_p² = pooled variance.

Hypotheses

H_₀: μ_₁ = μ_₂ (no difference)
H_₁: μ_₁ ≠ μ_₂ (difference exists)

Use when samples are from different groups (e.g., A/B testing, gender differences).

Paired (Dependent) T-Test

Used when comparing the same group measured twice, or paired observations (before/after treatment, matched pairs).

Example

Blood pressure before vs. after medication
Model accuracy before and after tuning

How It Works

Compute the difference (d) for each pair

Calculate:

where s_d is the standard deviation of the differences.

Hypotheses

H_₀: μ₍difference₎ = 0
H_₁: μ₍difference₎ ≠ 0

If |t| > t₍critical₎ or p < α → reject H_₀ → significant change between paired measurements.

Degrees of Freedom (df)

One-sample t-test : n - 1
Paired t-test : n − 1
Two-sample t-test : n₁ + n₂ − 2

Example (Paired T-Test in Python)

import numpy as np

import pingouin as pg

# Example: before and after training test scores

before = np.array([65, 70, 72, 68, 74, 69, 71, 73, 70, 75])

after = np.array([68, 74, 75, 70, 78, 71, 74, 76, 73, 80])

# Perform paired t-test

result = pg.ttest(before, after, paired=True)

print(result)

Interpretation:

Each value pair represents one student’s scores before and after taking a course.
The test asks:
“Did the average score significantly increase after the course?”

You’ll see output similar to:

T dof tail p-val CI95% cohen-d BF10 power

T-test -9.2 9.0 two-sided 0.0001 [-4.8, -2.9] -2.9 2500.0 1.0

How to read this:

t = -9.2: The average improvement is many standard errors away from 0 → strong effect
p-value = 0.0001: Very low → reject H₀
95% CI = [-4.8, -2.9]: We’re 95% confident the average improvement is between 2.9 and 4.8 points
Conclusion: The course significantly improved student test scores

What a Paired t-Test Really Measures

A paired t-test compares two related sets of data — it’s all about measuring change or difference within the same subjects or units.

Each pair of values represents:

The same subject, measured before and after some event, treatment, or intervention.

So we’re not comparing two independent groups — we’re comparing the same group, twice.

Real-World Examples

Here are a few examples where the paired t-test is appropriate:

Scenario	Before	After	What It Tests
Education	Student’s test score before a course	Score after taking the course	Did the course improve performance?
Medicine	Patient’s blood pressure before taking a drug	Blood pressure after 2 weeks	Did the drug lower blood pressure?
Website A/B test (within-subjects)	Page load time before optimisation	Page load time after optimisation	Did the optimisation make the site faster?
Fitness study	Heart rate before training program	Heart rate after 6 weeks	Did training reduce average heart rate?
Machine Learning model improvement	Model accuracy before hyperparameter tuning	Accuracy after tuning	Did tuning improve model performance?

In each case:

Every person, patient, web page, or model is a matched pair — one “before” and one “after”.
The t-test looks at the differences between these pairs and asks:
“Are these differences big enough to suggest a real effect, or could they just be random?”

The Key: The T-Test Doesn’t Know What “Better” Means

The t-test itself doesn’t know whether higher or lower values are good or bad — it only measures whether there’s a statistically significant difference between two sets of numbers.

What you (the analyst) define as improvement depends on context — and that determines how you interpret the sign of the t-statistic or which tail of the test you look at.

Example 1: Exam Scores (Higher = Better)

Situation	Before	After	Change
Student 1	65	70	+5
Student 2	70	74	+4
Student 3	68	72	+4

If you calculate after − before, you’ll get positive differences (scores increased).
So your mean difference (𝑑̄) will be positive.

t-statistic > 0 → average increase
You might use a one-tailed test if your hypothesis is "scores will increase"
Or a two-tailed test if you’re just checking "scores changed in any direction"

In this context:
A positive t and p < 0.05 → scores improved significantly.

In Python:

import numpy as np

import pingouin as pg

before = np.array([65, 70, 68, 72, 69])

after = np.array([70, 74, 72, 76, 73])

# Positive difference = improvement

result = pg.ttest(before, after, paired=True)

print(result)

Interpretation:

If t > 0 and p < 0.05, the mean increased significantly.
The result supports “after > before”.

Example 2: Page Load Time (Lower = Better)

Situation	Before	After	Change
Page 1	2.5 s	2.0 s	-0.5
Page 2	3.0 s	2.7 s	-0.3
Page 3	2.8 s	2.3 s	-0.5

If you calculate after − before, you’ll get negative differences (times decreased).
So your mean difference (𝑑̄) will be negative.

t-statistic < 0 → average decrease
You might use a one-tailed test if your hypothesis is "times will decrease"
Or a two-tailed test if you’re just testing "times changed"

In this context:
A negative t and p < 0.05 → load times significantly improved (decreased).

In Python:

before = np.array([2.5, 3.0, 2.8, 2.6, 2.9])

after = np.array([2.0, 2.7, 2.3, 2.1, 2.5])

# Negative difference = improvement (time reduced)

result = pg.ttest(before, after, paired=True)

print(result)

Interpretation:

If t < 0 and p < 0.05, load time decreased significantly.
The result supports “after < before”.

Python code

import pingouin as pg

import numpy as np

import pandas as pd

# Example data: test scores for two teaching methods

np.random.seed(42)

df = pd.DataFrame({

'Method_A': np.random.normal(loc=75, scale=5, size=30),

'Method_B': np.random.normal(loc=78, scale=5, size=30)

})

# Perform t-test (independent samples)

results = pg.ttest(x=df['Method_A'], y=df['Method_B'], paired=False, alternative='two-sided')

print(results)

Output:

T dof tail p-val CI95% cohen-d BF10 power

T-test -2.540 58.0 two-sided 0.0138 [-5.30, -0.64] -0.66 3.45 0.73

Example: Using pg.pairwise_ttests()

import pingouin as pg

import pandas as pd

import numpy as np

# Example dataset: same students tested in 3 different months

np.random.seed(42)

df = pd.DataFrame({

'Subject': np.repeat(np.arange(1, 11), 3),

'Month': np.tile(['January', 'February', 'March'], 10),

'Scores': np.concatenate([

np.random.normal(70, 3, 10),

np.random.normal(74, 3, 10),

np.random.normal(78, 3, 10)

])

})

# Run repeated-measures (within-subject) pairwise t-tests

results = pg.pairwise_ttests(

data=df,

dv='Scores',

within='Month',

subject='Subject',

parametric=True,

padjust='bonf'

)

print(results)

Output:

A B Paired Parametric T dof tail p-unc p-corr p-adjust BF10 cohen-d

0 January February True True -4.112 9.0 two-sided 0.0026 0.0078 bonf 15.3 -1.30

1 January March True True -7.231 9.0 two-sided 0.0000 0.0000 bonf 120.0 -2.30

2 February March True True -3.211 9.0 two-sided 0.0101 0.0303 bonf 5.9 -1.01

T-Statistic, Student’s t-Test, and Hypothesis Testing

Maths: Statistics for machine learning

6 min read

Published Oct 22 2025, updated Oct 23 2025

Guide Sections

Guide Comments

1. Student’s t-Distribution

2. The T-Statistic

3. Types of T-Tests

One-Sample T-Test

Independent Two-Sample T-Test

Paired (Dependent) T-Test

Degrees of Freedom (df)

Example (Paired T-Test in Python)

What a Paired t-Test Really Measures

Real-World Examples

The Key: The T-Test Doesn’t Know What “Better” Means

Example 1: Exam Scores (Higher = Better)

Example 2: Page Load Time (Lower = Better)

Python code

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark