Normal (Gaussian) Distribution

Maths: Statistics for machine learning

2 min read

Published Oct 22 2025, updated Oct 23 2025


40
0
0
0

Machine LearningMathsNumPyPandasPythonStatistics

The Normal Distribution — also called the Gaussian Distribution — is a continuous probability distribution that describes data that cluster around a central mean.

It is the classic “bell-shaped curve” that appears in nature, data, and model errors.


In simple terms:

“Most observations are close to the mean, and the probability of extreme values decreases symmetrically on both sides.”




Probability Density Function (PDF)

Normal PDF Formula

Where:

  • μ = mean (centre of the distribution)
  • σ = standard deviation (spread/width of the curve)
  • e = 2.718 (Euler’s number)

The total area under the curve = 1


68–95–99.7 Rule (Empirical Rule)

For a normal distribution:

  • ~68% of values lie within (1 standard deviation) of the mean
  • ~95% within (1 standard deviation) of the mean
  • ~99.7% within (1 standard deviation) of the mean

Examples

  • Human height (mean 170cm) - Most people near 170 cm
  • IQ scores (mean 100) - 68% of people score between 85–115
  • Measurement errors (mean 0) - Random noise around true value

Normal Distribution

A bell-shaped curve centred at μ=0 with shaded regions for:

  • 68% within ±1σ
  • 95% within ±2σ
  • 99.7% within ±3σ

The total area under the curve = 1
The red dashed line marks the mean (also the median and mode)






In Machine Learning

  • Model errors / residuals - Assumed to follow Normal distribution in regression
  • Feature normalisation - Many ML algorithms perform better on normally distributed features
  • Gaussian Naive Bayes - Uses the Normal PDF to model continuous features
  • Statistical tests (Z-test, t-test) - Based on normality assumptions
  • Initialisation / noise models - Random weight initialisation, dropout noise, etc.





Python code

Test if all numerical columns in a DataFrame are normally distributed with pg.normality():

import pingouin as pg
import numpy as np
import pandas as pd

# Example datasets
# normal distribution
normal_data = np.random.normal(loc=50, scale=5, size=100)
# skewed distribution
skewed_data = np.random.exponential(scale=5, size=100)

# Combine into a DataFrame
df = pd.DataFrame({'Normal': normal_data, 'Skewed': skewed_data})

# Test for normality
results = pg.normality(df, alpha=0.05)
print(results)

The arguments we parse are: dataalpha=0.05 for the significance level


Output:

               W pval normal
Normal 0.987236 4.536763e-01 True
Skewed 0.830377 2.382588e-09 False

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact