Kruskal–Wallis H Test
Maths: Statistics for machine learning
2 min read
Published Oct 22 2025, updated Oct 23 2025
Guide Sections
Guide Comments
The Kruskal–Wallis H test is a non-parametric statistical test used to determine whether there are statistically significant differences between the medians of three or more independent groups.
It’s the non-parametric alternative to a one-way ANOVA.
In simple terms:
“The Kruskal–Wallis test checks whether the distributions of three or more groups are the same — without assuming a normal distribution.”
When to Use It
- Three or more groups - Independent samples
- Ordinal or continuous data - That do not follow a normal distribution
- Same shape of distribution - The test assumes group distributions have a similar shape
When not to Use It
- Paired data - Use the Friedman test instead
- Normal data - Use ANOVA instead
Example Question
“Do customers in different regions (North, South, East, West) spend the same amount on average?”
If spending data are skewed (e.g., non-normal, outliers), the Kruskal–Wallis test is the best choice.
Hypotheses
- H₀ (Null Hypothesis) - All group medians are equal (no difference between groups)
- H₁ (Alternative Hypothesis) - At least one group median is different
How It Works
- Combine all group data together.
- Rank all data from smallest to largest (1 = smallest).
- Compute the sum of ranks (Rᵢ) for each group.
- Calculate the test statistic H:

Where:
- N = total number of observations
- Ri = sum of ranks for group i
- ni = size of group i
- H follows an approximate chi-squared (χ²) distribution with k − 1 degrees of freedom.
If H is large → group medians differ → reject H₀.
Example in Python
Let’s test if three different marketing campaigns lead to different customer spending.
Interpretation:
- p < 0.05 → Reject H₀ → At least one group’s median differs.
- p ≥ 0.05 → Fail to reject H₀ → No significant difference between groups.
Output Example:
Since p < 0.05 → there is a significant difference between at least one group’s distribution.
Visual Results:

This shows each group’s spread - if one boxplot is clearly higher/lower, that’s what drives the significant difference.
Python code
Output:














