Chi-Square Tests

SciPy - Statistical Testing

3 min read

Published Nov 17 2025


9
0
0
0

PythonSciPyStatistics

Chi-square tests are used for categorical data, not numeric measurements.


They help answer two major questions:

  1. Do observed frequencies match expected frequencies? - Goodness of Fit Test
  2. Are two categorical variables associated? - Test of Independence (Contingency Table)





Chi-Square Goodness of Fit Test

“Does my observed distribution match an expected distribution?”


Use when:

  • You have one categorical variable
  • You want to compare observed counts to expected counts
  • Example: Dice rolls, survey choices, defect types, etc.

Example

A six-sided die was rolled 60 times. Here are the counts:

observed = np.array([8, 12, 10, 11, 9, 10])

# perfectly fair die
expected = np.array([10, 10, 10, 10, 10, 10])

Run the test:

chi2, p = stats.chisquare(f_obs=observed, f_exp=expected)
print(chi2, p)

Interpretation

  • p < 0.05 → observed distribution ≠ expected distribution
  • p ≥ 0.05 → no evidence of difference

If expected = uniform distribution

You can skip f_exp:

chi2, p = stats.chisquare(observed)






Chi-Square Test of Independence

“Are these two categorical variables related?”


Use when:

  • You have two categorical variables
  • You want to test whether they are associated
  • Example: Gender vs purchase decision, education vs voting preference, etc.

Requires a contingency table (cross-tabulation).






Test of Independence — Example (Manual Table)

Consider data on whether people bought a product:


Bought

Not Bought

Male

30

10

Female

20

40


Represent this as:

table = np.array([
    [30, 10],
    [20, 40]
])

Run the test

chi2, p, dof, expected = stats.chi2_contingency(table)
print(chi2, p)
print("Expected frequencies:\n", expected)

Interpretation

  • p < 0.05 → variables are associated (dependent)
  • p ≥ 0.05 → variables are not associated (independent)

Expected frequencies tell what we would expect if variables were independent.






Test of Independence — Example with Pandas

With real datasets, you usually start with a DataFrame:

df = pd.DataFrame({
    "gender": ["M","M","F","F","F","M","F"],
    "bought": ["yes","no","yes","no","no","yes","no"]
})

Create the contingency table

table = pd.crosstab(df['gender'], df['bought'])
print(table)


Run the test

chi2, p, dof, expected = stats.chi2_contingency(table)
print(chi2, p)






Requirements & Assumptions

  • Counts, not proportions - Chi-square tests require frequency counts, not percentages.
  • Observations must be independent - Each person/item is counted once.
  • Expected frequency rule - At least 80% of expected counts should be ≥ 5.

If not, use Fisher’s Exact Test (SciPy supports it for 2×2 tables):

oddsratio, p = stats.fisher_exact(table)






When to Use Fisher’s Exact Test

Use instead of chi-square when:

  • Sample size is small (< 40)
  • Expected counts < 5 in any cell

Example

table = np.array([
    [3, 1],
    [1, 3]
])

oddsratio, p = stats.fisher_exact(table)
print(p)





Post-hoc Testing for Chi-Square

The chi-square test only tells if any association exists.


It does not tell:

  • Which categories differ
  • Where the difference occurs

To dig deeper:

  • Examine standardised residuals
  • Perform pairwise chi-square tests with Bonferroni correction

Standardised Residuals

residuals = (table - expected) / np.sqrt(expected)
print(residuals)

Cells with large |residual| (> ~2) indicate areas of significant difference.






Effect Sizes for Chi-Square Tests

Cramér’s V (common effect size)

def cramers_v(table):
    chi2, p, dof, expected = stats.chi2_contingency(table)
    n = table.sum()
    k = min(table.shape) - 1
    return np.sqrt(chi2 / (n * k))

print(cramers_v(table))


Interpretation (Cohen’s guidelines):

Cramér’s V

Strength

0.10

Small

0.30

Medium

0.50

Large






Choosing Between Chi-Square and Other Tests

Data Type

Goal

Test

One categorical variable

Compare to expected frequencies

Chi-Square Goodness of Fit

Two categorical variables

Test association

Chi-Square Independence

Two categorical variables, small samples

Test association

Fisher’s Exact Test

Numerical groups

Means

t-tests / ANOVA

Ordinal groups

Medians

Kruskal–Wallis / Mann–Whitney


Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact