Bernoulli Distribution
Maths: Statistics for machine learning
2 min read
This section is 2 min read, full guide is 105 min read
Published Oct 22 2025, updated Oct 23 2025
40
Show sections list
0
Log in to enable the "Like" button
0
Guide comments
0
Log in to enable the "Save" button
Respond to this guide
Guide Sections
Guide Comments
Machine LearningMathsNumPyPandasPythonStatistics
The Bernoulli distribution models a random experiment that has only two possible outcomes:
- Success (1) — occurs with probability p
- Failure (0) — occurs with probability 1 − p
It’s the foundation of binary probability modelling — used whenever outcomes are yes/no, true/false, 1/0, or success/failure.
Probability Mass Function (PMF)

Where:
- X = random variable (0 or 1)
- p = probability of success e.g., P(X = 1)
The sum of probabilities equals 1:
P(X = 0) + P(X = 1) = 1
Examples:
- Coin toss - Heads or Tails
- Email classification - Spam or not spam
- Loan approval - Approved or denied
- Customer purchase - Purchases or doesn't purchase

A simple bar chart with:
- One bar for 0 (failure) around height ≈ 0.3
- One bar for 1 (success) around height ≈ 0.7
- Labeled probabilities that roughly match your chosen
p
The two bars represent the probability mass function (PMF) for {0, 1}.
In Machine Learning
- Binary classification - Bernoulli models binary outcomes (e.g., spam / not spam)
- Logistic regression - Models the probability of success (1) using a Bernoulli likelihood
- Naive Bayes (BernoulliNB) - Features are binary and modeled with Bernoulli probabilities
- Neural networks - Output activations (sigmoid) approximate Bernoulli probabilities














