Introduction to Statistics for Machine Learning

Maths: Statistics for machine learning

2 min read

Published Oct 22 2025, updated Oct 23 2025

Machine LearningMathsNumPyPandasPythonStatistics

Definition of Statistics

Statistics is the science of collecting, organising, analysing, interpreting, and presenting data.
It provides the foundation for data-driven decision making — which is essential in machine learning.

Types of Statistics

There are two main types of statistics:

Descriptive Statistics
Inferential Statistics

1. Descriptive Statistics

Descriptive statistics summarise and organise data so that it can be easily understood.
They describe the basic features of a dataset, providing simple summaries about the sample and measures.

It involves:

Measures of Central Tendency – e.g. mean, median, mode
Measures of Dispersion (Variability) – e.g. range, variance, standard deviation
Data Distribution – e.g. histograms, box plots, violin plots
Summary Statistics – e.g. minimum, maximum, quartiles, percentiles

Example descriptive questions:

What is the average height in a class?
What is the most commonly purchased product in a store?
What is the spread (variability) of exam scores in a class?

2. Inferential Statistics

Inferential statistics involve methods for making predictions, inferences, or generalisations about a population based on a sample of data.
They allow us to test hypotheses and estimate population parameters.

It involves:

Hypothesis Testing
P-Values and Confidence Intervals
Statistical Tests, such as:
- Z-Test
- T-Test
- Chi-Square Test
- ANOVA (Analysis of Variance)

Example inferential questions:

Are the average heights in this class similar to the average heights in other schools?
Do customers in different stores tend to buy the same products?
Does a new drug significantly improve recovery rates compared to an existing one?

Machine Learning Context

In machine learning, statistics helps in:

Understanding data distributions before modelling (EDA – Exploratory Data Analysis)
Feature scaling and normalisation
Evaluating model performance using statistical tests
Avoiding overfitting through sampling and hypothesis validation