Normality Tests

SciPy - Statistical Testing

3 min read

Published Nov 17 2025

PythonSciPyStatistics

Normality tests help you answer a simple but crucial question:

“Does this data look like it came from a normal distribution?”

Why this matters:

Parametric tests (t-tests, ANOVA) assume normality
Non-parametric tests don’t
Many statistical decisions begin here

SciPy provides several ways to test normality, each with its strengths.

Practical Interpretation of Normality Tests

All normality tests return:

test statistic
p-value

Interpretation is the same for all:

If p ≥ 0.05

Data is consistent with normal distribution
You can reasonably use parametric tests

If p < 0.05

Data is NOT normally distributed
Consider non-parametric tests (Chapter 3)

Important: Normality tests are very sensitive to sample size:

Large samples → even tiny deviations become “significant”
Small samples → tests often have low power

Always pair normality tests with a plot (histogram, Q-Q plot).

Shapiro–Wilk Test (Recommended)

The go-to normality test for small to medium samples.

Works best for:

Sample sizes up to ~5000
General-purpose normality checking
Pre-checking assumptions for t-tests or ANOVA

Example

from scipy import stats

data = [12.3, 11.5, 12.1, 12.6, 11.8, 12.0]

stat, p = stats.shapiro(data)

print(stat, p)

Interpretation

p < 0.05 → NOT normal
p ≥ 0.05 → normal enough

Advantages

Most powerful normality test
Works well for small samples

Disadvantages

Too sensitive for huge datasets

Kolmogorov–Smirnov (K–S Test)

Compares data to a specified distribution.

For normality, you must supply:

mean
standard deviation

Example

data = np.array([4.8, 5.0, 5.1, 4.9, 5.2])

mu, sigma = np.mean(data), np.std(data)

stat, p = stats.kstest(data, 'norm', args=(mu, sigma))

print(stat, p)

Interpretation

Same general rule:

p < 0.05 → not normal
p ≥ 0.05 → consistent with normality

Notes

Less sensitive than Shapiro–Wilk
Not recommended for small samples
Good for checking against any distribution, not just normal

Anderson–Darling Test

This test always returns a decision threshold, not a simple p-value.

Example

result = stats.anderson(data, dist='norm')

print(result.statistic)

print(result.critical_values)

print(result.significance_level)

Interpretation

If the test statistic is:

> critical value → reject normality
<= critical value → fail to reject

Advantages

More sensitive in the tails
Good for moderate to large samples

Disadvantages

Slightly more complicated interpretation
No simple p-value

D’Agostino and Pearson’s Test (K2 Test)

Combines skewness and kurtosis to test normality.

Example

stat, p = stats.normaltest(data)

print(stat, p)

Use when:

Sample size ≥ 20
You want a test sensitive to deviations in skewness and kurtosis

Avoid when:

Very small samples (< 20)

Visual Normality Checks (Highly Recommended)

Don’t rely solely on p-values — always look at the distribution.

Histogram

import matplotlib.pyplot as plt

plt.hist(data, bins=10)

plt.show()

Q-Q Plot

import scipy.stats as stats

import matplotlib.pyplot as plt

stats.probplot(data, dist="norm", plot=plt)

plt.show()

If points ≈ straight line → data is approx. normal
If points bend or curve → non-normal

Choosing the Right Normality Test

Test	Best For	Avoid When	Notes
Shapiro–Wilk	Small–medium samples (<5000)	Very large samples	Most widely used
K–S Test	Comparing to any distribution	Small samples	Requires specifying mean & sd
Anderson–Darling	Mild deviations in tails	Need simple p-value	Very sensitive
D’Agostino K2	Sample ≥ 20, skew/kurtosis detection	Small samples	Good for moderate sizes