Introduction to Statistics for Machine Learning
Maths: Statistics for machine learning
2 min read
Published Oct 22 2025, updated Oct 23 2025
Guide Sections
Guide Comments
Definition of Statistics
Statistics is the science of collecting, organising, analysing, interpreting, and presenting data.
It provides the foundation for data-driven decision making — which is essential in machine learning.
Types of Statistics
There are two main types of statistics:
- Descriptive Statistics
- Inferential Statistics
1. Descriptive Statistics
Descriptive statistics summarise and organise data so that it can be easily understood.
They describe the basic features of a dataset, providing simple summaries about the sample and measures.
It involves:
- Measures of Central Tendency – e.g. mean, median, mode
- Measures of Dispersion (Variability) – e.g. range, variance, standard deviation
- Data Distribution – e.g. histograms, box plots, violin plots
- Summary Statistics – e.g. minimum, maximum, quartiles, percentiles
Example descriptive questions:
- What is the average height in a class?
- What is the most commonly purchased product in a store?
- What is the spread (variability) of exam scores in a class?
2. Inferential Statistics
Inferential statistics involve methods for making predictions, inferences, or generalisations about a population based on a sample of data.
They allow us to test hypotheses and estimate population parameters.
It involves:
- Hypothesis Testing
- P-Values and Confidence Intervals
- Statistical Tests, such as:
- Z-Test
- T-Test
- Chi-Square Test
- ANOVA (Analysis of Variance)
Example inferential questions:
- Are the average heights in this class similar to the average heights in other schools?
- Do customers in different stores tend to buy the same products?
- Does a new drug significantly improve recovery rates compared to an existing one?
Machine Learning Context
In machine learning, statistics helps in:
- Understanding data distributions before modelling (EDA – Exploratory Data Analysis)
- Feature scaling and normalisation
- Evaluating model performance using statistical tests
- Avoiding overfitting through sampling and hypothesis validation














