Shapiro–Wilk Test

Maths: Statistics for machine learning

2 min read

Published Oct 22 2025, updated Oct 23 2025


40
0
0
0

Machine LearningMathsNumPyPandasPythonStatistics

he Shapiro–Wilk test is a statistical test for normality — it checks whether a given dataset is drawn from a normal (Gaussian) distribution.

It’s one of the most powerful and widely used normality tests, especially for small to medium sample sizes (n < 5000).


In simple terms:

“The Shapiro–Wilk test checks if your data follow a bell-shaped normal distribution.”




When to Use It

  • Continuous data - Works on numerical values
  • Small to medium samples - Best for n < 5000
  • Need to choose test type - Helps decide between parametric and non-parametric tests

When Not to Use It

  • Categorical data - Not suitable
  • Large samples - Even tiny deviations appear significant (use visual + other tests too)



Hypotheses

  • H₀ (Null Hypothesis) - The data are normally distributed
  • H₁ (Alternative Hypothesis) - The data are not normally distributed



Test Statistic

The Shapiro–Wilk test computes a W statistic, which measures how close your sample’s distribution is to a normal one.

Shapiro formula

Where:

  • x(i)​: ordered sample values (sorted smallest → largest)
  • ai​: constants from expected normal distribution
  • X̅: sample mean

If W ≈ 1, data are close to normal.
If W is much smaller, data deviate from normality.




Decision Rule

  • p > 0.05 - Fail to reject H₀ → data look normal
  • p ≤ 0.05 - Reject H₀ → data not normal



Example in Python

Let’s check if a dataset follows a normal distribution.

import numpy as np
from scipy.stats import shapiro

# Example data
data = np.array([10, 12, 11, 14, 13, 15, 12, 11, 13, 14])

# Perform Shapiro–Wilk Test
stat, p = shapiro(data)
print(f"Shapiro–Wilk Statistic: {stat:.3f}")
print(f"P-value: {p:.4f}")

# Interpret
if p > 0.05:
    print("Data look normally distributed (fail to reject H₀).")
else:
    print("Data are not normally distributed (reject H₀).")

Example Output:

Shapiro–Wilk Statistic: 0.967
P-value: 0.8234
Data look normally distributed (fail to reject H₀).


Example with Non-Normal Data

# Create skewed data
skewed = np.random.exponential(scale=2, size=100)

stat, p = shapiro(skewed)
print(f"W={stat:.3f}, p={p:.4f}")

if p > 0.05:
    print("Normal")
else:
    print("Not normal")

Example output:

W=0.812, p=0.0001
Not normal

Visual Check:

Shapiro plot
Histogram and KDE

Shapiro QQ plot
QQ Plot

If the points follow a straight line in the Q–Q plot → roughly normal, Curved or S-shaped patterns → not normal


Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact