Introduction to Statistics for Machine Learning

Maths: Statistics for machine learning

2 min read

Published Oct 22 2025, updated Oct 23 2025


40
0
0
0

Machine LearningMathsNumPyPandasPythonStatistics

Definition of Statistics

Statistics is the science of collecting, organising, analysing, interpreting, and presenting data.
It provides the foundation for data-driven decision making — which is essential in machine learning.




Types of Statistics

There are two main types of statistics:

  1. Descriptive Statistics
  2. Inferential Statistics

1. Descriptive Statistics

Descriptive statistics summarise and organise data so that it can be easily understood.
They describe the basic features of a dataset, providing simple summaries about the sample and measures.

It involves:

  • Measures of Central Tendency – e.g. mean, median, mode
  • Measures of Dispersion (Variability) – e.g. range, variance, standard deviation
  • Data Distribution – e.g. histograms, box plots, violin plots
  • Summary Statistics – e.g. minimum, maximum, quartiles, percentiles

Example descriptive questions:

  • What is the average height in a class?
  • What is the most commonly purchased product in a store?
  • What is the spread (variability) of exam scores in a class?

2. Inferential Statistics

Inferential statistics involve methods for making predictions, inferences, or generalisations about a population based on a sample of data.
They allow us to test hypotheses and estimate population parameters.

It involves:

  • Hypothesis Testing
  • P-Values and Confidence Intervals
  • Statistical Tests, such as:
    • Z-Test
    • T-Test
    • Chi-Square Test
    • ANOVA (Analysis of Variance)

Example inferential questions:

  • Are the average heights in this class similar to the average heights in other schools?
  • Do customers in different stores tend to buy the same products?
  • Does a new drug significantly improve recovery rates compared to an existing one?



Machine Learning Context

In machine learning, statistics helps in:

  • Understanding data distributions before modelling (EDA – Exploratory Data Analysis)
  • Feature scaling and normalisation
  • Evaluating model performance using statistical tests
  • Avoiding overfitting through sampling and hypothesis validation

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact