Wilcoxon Signed-Rank Test
Maths: Statistics for machine learning
3 min read
Published Oct 22 2025, updated Oct 23 2025
Guide Sections
Guide Comments
The Wilcoxon Signed-Rank Test is a non-parametric statistical test used to compare two related (paired) samples to determine whether their population mean ranks differ.
It’s the non-parametric equivalent of the paired t-test, used when the differences between pairs are not normally distributed.
In simple terms:
“The Wilcoxon test checks whether the median difference between paired observations is zero — without assuming the data follow a normal distribution.”
When to Use It
- Two sets of paired data - e.g., before/after, left/right, matched subjects
- Data are ordinal or continuous - but not necessarily normal
- Goal - Test if the median of the differences is zero
When not to Use It
- Independent groups - Use Mann–Whitney U test instead
- Normal differences - You can use a paired t-test instead
Example Scenario
“Did a training course significantly improve students’ test scores?”
Each student is tested before and after the course.
If the differences in scores aren’t normally distributed, use the Wilcoxon Signed-Rank Test instead of a paired t-test.
Hypotheses
- H₀ (Null Hypothesis) - The median difference between pairs = 0 (no change)
- H₁ (Alternative Hypothesis) - The median difference ≠ 0 (a change exists)
How It Works (Step-by-Step)
- Calculate the difference (d) for each pair:

- Ignore pairs where di = 0 (no change).
- Take the absolute value of each difference.
- Rank the absolute differences (1 = smallest).
- Assign the original signs (+/–) back to each rank.
- Compute:
- W+ = sum of positive ranks
- W− = sum of negative ranks
- The test statistic (W) is the smaller of W+ and W−.
- Compare W to the critical value (from Wilcoxon table)
or compute the p-value.
If p < 0.05, reject H₀ → the difference is statistically significant.
Example in Python
Let’s test if a meditation program reduced stress levels (lower scores = less stress):
Interpretation:
- If p < 0.05, reject H₀ → the program significantly reduced stress levels.
- If p ≥ 0.05, fail to reject H₀ → no significant change.
Alternative Options:
'greater'→ test if after > before'less'→ test if after < before'two-sided'→ test for any change (default)
If most ranks are negative (after < before),
the sum of negative ranks (W₋) will be much larger — indicating a significant decrease.
Advantages
- Doesn’t require normality
- Handles outliers well
- Works with small samples
- Simple and intuitive (rank-based)
Limitations
- Only works for paired data
- Less powerful than the paired t-test when data are normal
- Assumes data are symmetrically distributed about the median difference
Python code
Output:














