Shapiro–Wilk Test
Maths: Statistics for machine learning
2 min read
Published Oct 22 2025, updated Oct 23 2025
Guide Sections
Guide Comments
he Shapiro–Wilk test is a statistical test for normality — it checks whether a given dataset is drawn from a normal (Gaussian) distribution.
It’s one of the most powerful and widely used normality tests, especially for small to medium sample sizes (n < 5000).
In simple terms:
“The Shapiro–Wilk test checks if your data follow a bell-shaped normal distribution.”
When to Use It
- Continuous data - Works on numerical values
- Small to medium samples - Best for n < 5000
- Need to choose test type - Helps decide between parametric and non-parametric tests
When Not to Use It
- Categorical data - Not suitable
- Large samples - Even tiny deviations appear significant (use visual + other tests too)
Hypotheses
- H₀ (Null Hypothesis) - The data are normally distributed
- H₁ (Alternative Hypothesis) - The data are not normally distributed
Test Statistic
The Shapiro–Wilk test computes a W statistic, which measures how close your sample’s distribution is to a normal one.

Where:
- x(i): ordered sample values (sorted smallest → largest)
- ai: constants from expected normal distribution
- X̅: sample mean
If W ≈ 1, data are close to normal.
If W is much smaller, data deviate from normality.
Decision Rule
- p > 0.05 - Fail to reject H₀ → data look normal
- p ≤ 0.05 - Reject H₀ → data not normal
Example in Python
Let’s check if a dataset follows a normal distribution.
Example Output:
Example with Non-Normal Data
Example output:
Visual Check:

Histogram and KDE

QQ Plot
If the points follow a straight line in the Q–Q plot → roughly normal, Curved or S-shaped patterns → not normal














