Effect Sizes

SciPy - Statistical Testing

2 min read

Published Nov 17 2025

PythonSciPyStatistics

Statistical significance (p-values) tells you whether an effect exists.

Effect sizes tell you how large the effect is.

Both are essential.

Effect sizes are widely used in:

A/B testing
Clinical research
Behavioural science
Social science
Machine learning evaluation
Business analytics

For mean differences:

Cohen's d
Glass's Δ
Hedge’s g

For non-parametric tests:

Rank-biserial correlation
Cliff’s delta

For categorical tests:

Cramér’s V
Phi coefficient

For ANOVA:

Eta-squared (η²)
Partial eta-squared

Cohen’s d (Independent Samples)

For two independent groups, measures the standardised mean difference.

Formula (simplified):

(mean1 - mean2) / pooled_standard_deviation

Example Implementation

import numpy as np

def cohens_d(x, y):

x, y = np.array(x), np.array(y)

nx, ny = len(x), len(y)

# pooled standard deviation

pooled_std = np.sqrt(((nx - 1)*np.var(x, ddof=1) + (ny - 1)*np.var(y, ddof=1)) / (nx + ny - 2))

return (np.mean(x) - np.mean(y)) / pooled_std

group1 = [5.1, 5.3, 5.0, 5.4]

group2 = [6.2, 6.0, 6.3, 6.1]

print(cohens_d(group1, group2))

Interpretation

0.20 = small
0.50 = medium
0.80 = large

Cohen’s d (Paired Samples)

Used after a paired t-test.

Example

def cohens_d_paired(before, after):

diff = np.array(after) - np.array(before)

return np.mean(diff) / np.std(diff, ddof=1)

before = [100, 102, 98, 105]

after = [103, 107, 101, 110]

print(cohens_d_paired(before, after))

Hedge’s g (Corrected Cohen’s d)

More accurate for small samples (n < 20).

Function

def hedges_g(x, y):

d = cohens_d(x, y)

n1, n2 = len(x), len(y)

correction = 1 - (3 / (4*(n1+n2) - 9))

return d * correction

print(hedges_g(group1, group2))

Glass’s Δ

Used when group variances differ greatly.
Uses control group's standard deviation only.

Function

def glass_delta(x, y):

# y = control group

x, y = np.array(x), np.array(y)

return (np.mean(x) - np.mean(y)) / np.std(y, ddof=1)

Rank-Biserial Correlation (Mann–Whitney U)

Effect size for Mann–Whitney U test.

Function

def rank_biserial(u_stat, n1, n2):

return 1 - (2 * u_stat) / (n1 * n2)

Example

from scipy import stats

u, p = stats.mannwhitneyu(group1, group2)

print(rank_biserial(u, len(group1), len(group2)))

Cliff’s Delta (Non-Parametric Effect Size)

Measures how often values in one group exceed values in another.

Function

def cliffs_delta(x, y):

x, y = np.array(x), np.array(y)

n1, n2 = len(x), len(y)

greater = sum(xi > yj for xi in x for yj in y)

less = sum(xi < yj for xi in x for yj in y)

return (greater - less) / (n1*n2)

Interpretation

0.147 = small
0.330 = medium
0.474 = large

Cramér’s V (Chi-Square Effect Size)

Used for chi-square tests of independence.

Function

def cramers_v(table):

chi2, p, dof, expected = stats.chi2_contingency(table)

n = table.sum()

k = min(table.shape) - 1

return np.sqrt(chi2 / (n * k))

Interpretation

0.10 = small
0.30 = medium
0.50 = large

Phi Coefficient (2×2 Tables Only)

Special case of Cramér’s V for 2×2 tables.

Function

def phi_coefficient(table):

chi2, p, dof, expected = stats.chi2_contingency(table)

n = table.sum()

return np.sqrt(chi2 / n)

Equivalent to Pearson correlation for binary variables.

Eta-Squared (η²) for ANOVA

Effect size for one-way ANOVA.

Function

def eta_squared_anova(groups):

# Flatten groups

all_data = np.concatenate(groups)

grand_mean = np.mean(all_data)

# Between-group sum of squares

ss_between = sum(len(g)*(np.mean(g)-grand_mean)**2 for g in groups)

# Total sum of squares

ss_total = sum((x-grand_mean)**2 for x in all_data)

return ss_between / ss_total

group_a = [5.1, 5.3, 5.2]

group_b = [6.2, 6.1, 6.3]

group_c = [4.9, 5.0, 4.8]

print(eta_squared_anova([group_a, group_b, group_c]))

Interpretation

0.01 = small
0.06 = medium
0.14 = large

Partial Eta-Squared

Used in multi-factor ANOVA (via Statsmodels).

Statsmodels output includes it automatically if using:

import statsmodels.api as sm

from statsmodels.formula.api import ols

model = ols("value ~ C(group)", data=df).fit()

anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)

Compute manually from table:

ss_effect = anova_table['sum_sq']['C(group)']

ss_error = anova_table['sum_sq']['Residual']

partial_eta_sq = ss_effect / (ss_effect + ss_error)

Summary: When to Use Which Effect Size

Scenario	Best Effect Size
Two independent means	Cohen’s d / Hedge’s g
Two paired means	Cohen’s d (paired)
Non-parametric 2-group	Rank-biserial / Cliff’s delta
Chi-square independence	Cramér’s V
2×2 categorical	Phi coefficient
One-way ANOVA	Eta-squared
Multi-factor ANOVA	Partial eta-squared
Conversion rates (binary)	Cohen’s h