ANOVA (Analysis of Variance)

Maths: Statistics for machine learning

4 min read

Published Oct 22 2025, updated Oct 23 2025


40
0
0
0

Machine LearningMathsNumPyPandasPythonStatistics

ANOVA (short for Analysis of Variance) is a parametric statistical test used to determine whether there are significant differences between the means of three or more independent groups.

It compares the variance between groups to the variance within groups — if the between-group variance is much larger, it suggests at least one group’s mean is different.


In simple terms:

“ANOVA tests whether the average values across multiple groups are all the same, or if at least one is significantly different.”




When to Use It

  • Comparing 3 or more groups - E.g., three teaching methods, product versions, or model types
  • Continuous dependent variable - E.g., scores, revenue, accuracy
  • Categorical independent variable(s) - E.g., group, gender, treatment type
  • Normal distribution and equal variances - Assumptions of ANOVA

When Not to Use It

  • Non-normal data - Use Kruskal–Wallis test instead
  • Paired/repeated samples - Use Repeated-Measures ANOVA or Friedman test



Types of ANOVA

  • One-Way ANOVA - One independent variable (factor) eg. compare test scores across 3 teaching methods
  • Two-Way ANOVA - Two independent variables (factors) eg. compare test scores across methods and genders
  • Repeated Measures ANOVA - Same subjects tested under multiple conditions eg. compare model accuracy before/after feature tuning



Hypotheses

  • H₀ (Null Hypothesis) - All group means are equal (no difference)
  • H₁ (Alternative Hypothesis) - At least one group mean is different



How It Works

ANOVA partitions the total variation in the data into:

  1. Between-group variation (SSB) → how much group means differ from the overall mean
  2. Within-group variation (SSW) → how much individual observations differ within each group

It then compares the ratio of these two:

Anova Formula

Where:

  • MSB=SSB / k−1
  • MSW=SSW / N−k
  • k = number of groups
  • N = total number of observations

If the F-ratio is large → between-group variance > within-group variance → group means differ significantly.




Example: Comparing Model Performance

You trained three machine learning models and recorded their accuracy scores.
You want to know if their mean accuracies differ significantly.




Python Example

import numpy as np
from scipy.stats import f_oneway

# Accuracy scores for 3 different models
model_A = np.array([0.82, 0.80, 0.79, 0.83, 0.81])
model_B = np.array([0.76, 0.77, 0.74, 0.78, 0.75])
model_C = np.array([0.85, 0.88, 0.84, 0.87, 0.86])

# Perform One-Way ANOVA
f_stat, p = f_oneway(model_A, model_B, model_C)
print(f"F-Statistic: {f_stat:.3f}")
print(f"P-value: {p:.4f}")

if p < 0.05:
    print("Reject H₀ — at least one model mean differs significantly.")
else:
    print("Fail to reject H₀ — model means are not significantly different.")

Example Output:

F-Statistic: 37.245
P-value: 0.0001
Reject H₀ — at least one model mean differs.

Interpretation:

  • There is a significant difference in average accuracy between the models.
  • However, ANOVA doesn’t tell which models differ — only that a difference exists.

Visualisation:

Anova Visualisation

Each box shows the distribution of accuracy scores per model. If the boxes don’t overlap much, ANOVA will likely find a significant difference.




Assumptions of ANOVA

Assumption

Description

How to Check

1. Independence

Observations are independent

Experimental design

2. Normality

Data in each group are normally distributed

Shapiro–Wilk test

3. Homogeneity of variance

Variances are equal across groups

Levene’s test


If assumptions are violated:

  • Use Kruskal–Wallis (non-parametric ANOVA)
  • Or transform the data (log, sqrt)





Two-Way ANOVA (Factorial ANOVA)

The Two-Way ANOVA extends the One-Way ANOVA by including two independent variables (factors) instead of one.

It tests:

  1. Whether each factor individually affects the mean of the dependent variable, and
  2. Whether there’s an interaction effect between the two factors — i.e., if the effect of one depends on the level of the other.

In simple terms:

“Two-Way ANOVA tests if two categorical variables (and their combination) significantly influence a continuous outcome.”



When to Use It

  • Two categorical independent variables - e.g., Model Type and Dataset Size
  • One continuous dependent variable - e.g., Accuracy, Revenue, Test Score
  • Data are normally distributed - (within each group)
  • Variances are equal - Across all combinations of factors

When Not to Use It

  • Violated assumptions - Use Friedman or Scheirer–Ray–Hare test instead



Example Scenario

You test three ML models (A, B, C)
on two dataset sizes (Small, Large)
and record the accuracy scores.

You want to know:

  1. Does the model type matter?
  2. Does dataset size matter?
  3. Is there an interaction effect (does model performance depend on dataset size)?



Example in Python

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = {
    'Model': ['A','A','A','A','A','B','B','B','B','B','C','C','C','C','C']*2,
    'DatasetSize': ['Small']*15 + ['Large']*15,
    'Accuracy': [
        0.80,0.82,0.83,0.81,0.79, 0.75,0.76,0.74,0.77,0.75, 0.86,0.88,0.85,0.87,0.84, # Small datasets
        0.84,0.85,0.83,0.86,0.85, 0.80,0.82,0.78,0.81,0.80, 0.88,0.90,0.87,0.89,0.88 # Large datasets
    ]
}

df = pd.DataFrame(data)

# Run Two-Way ANOVA
model = ols('Accuracy ~ C(Model) + C(DatasetSize) + C(Model):C(DatasetSize)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

Example Output:

                               sum_sq df F PR(>F)
C(Model) 0.0145 2.0 42.3786 0.0000
C(DatasetSize) 0.0048 1.0 28.0534 0.0000
C(Model):C(DatasetSize) 0.0010 2.0 2.8942 0.0724
Residual 0.0085 24.0 NaN NaN

Interpretation:

  • Model type (p < 0.001) → significant difference between model means
  • Dataset size (p < 0.001) → dataset size significantly affects accuracy
  • Interaction (p = 0.07) → no significant interaction (i.e., model ranking is similar across sizes)

Visualisation:

Anova two way Visualisation

Interpretation of Plot:

  • If lines between “Small” and “Large” datasets are parallel, there’s no interaction.
  • If lines cross or diverge, that suggests an interaction — some models perform better on specific dataset sizes.






Python code

import pingouin as pg
import pandas as pd
import numpy as np

# Example data: Pain threshold for different hair colors
np.random.seed(42)
df = pd.DataFrame({
    'Hair color': np.repeat(['Blonde', 'Brunette', 'Redhead'], 20),
    'Pain threshold': np.concatenate([
        np.random.normal(55, 5, 20),
        np.random.normal(60, 5, 20),
        np.random.normal(65, 5, 20)
    ])
})

# Run one-way ANOVA
anova = pg.anova(data=df, dv='Pain threshold', between='Hair color', detailed=True)
print(anova)

Output:

       Source SS DF MS F p-unc np2
0 Hair color 1159.130250 2 579.565125 27.461636 4.448381e-09 0.490723
1 Within 1202.958614 57 21.104537 NaN NaN NaN

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact