Mann–Whitney U Test
Maths: Statistics for machine learning
3 min read
Published Oct 22 2025, updated Oct 23 2025
Guide Sections
Guide Comments
The Mann–Whitney U Test is a non-parametric statistical test used to compare two independent groups to determine whether their distributions differ — typically in terms of central tendency (median).
It’s an alternative to the independent samples t-test, but unlike the t-test, it does not assume normality of data.
In simple terms:
“The Mann–Whitney test checks if one group tends to have higher or lower values than another, without assuming the data are normally distributed.”
When to Use It
- Two groups - Independent (not paired) samples
- Data type - Ordinal or continuous (but not necessarily normal)
- Goal - Compare central tendency (medians) between groups
- Not for paired data - For paired data, use the Wilcoxon signed-rank test instead
Example Question
“Do customers from Region A spend more on average than customers from Region B?”
If the spending data are not normally distributed (e.g., skewed or contain outliers), the Mann–Whitney U test is the better choice than a t-test.
How It Works
Instead of comparing means directly, the Mann–Whitney test:
- Combines all data from both groups
- Ranks the data from smallest to largest (1 = smallest)
- Calculates the sum of ranks for each group
- Computes a U statistic — the number of times observations in one group precede observations in the other
- Uses that U value to test whether the groups’ rank distributions are significantly different
Because it uses ranks, it’s robust to non-normal distributions and outliers.
Hypotheses
- H₀ (Null Hypothesis) - The two groups come from identical distributions (no difference in medians)
- H₁ (Alternative Hypothesis) - The two groups come from different distributions (one tends to be higher/lower)
Test Statistic
For samples of size n1 and n2:

Where R1 = sum of ranks for group 1.
The smaller of U1 and U2 is used as the test statistic.
For large samples (n₁, n₂ > 20), U is approximately normally distributed, so we can convert it to a Z-score.
Example in Python
Let’s say you want to test whether two marketing campaigns lead to different spending amounts.
Interpretation:
- p < 0.05 → reject H₀ → significant difference between groups
- p ≥ 0.05 → fail to reject H₀ → no significant difference
Advantages
- Doesn’t assume normality
- Can handle ordinal data (ranks, ratings, etc.)
- Robust to outliers
- Works with unequal sample sizes
Limitations
- Assumes distributions have the same shape (only medians differ)
- Less powerful than t-test when data are normal
- Doesn’t tell you by how much the groups differ — only that they do
Python code
Output:














