Population and Sample Sets
Maths: Statistics for machine learning
3 min read
Published Oct 22 2025, updated Oct 23 2025
Guide Sections
Guide Comments

Population
A population is the entire set of individuals, items, or data points that share a common characteristic of interest in a study.
It includes all members of a defined group about which we want to draw conclusions.
Characteristics:
- Complete set: Includes all observations or elements of interest.
- Parameter: A numerical value that describes a characteristic of the population (e.g., population mean μ, population variance σ²).
Example populations:
- All students in a school — to calculate the average height of students.
- All stores nationwide — to identify the most purchased product.
- All consumers in a city — to understand purchasing behavior.
- All patients in a hospital — to study the effectiveness of a new drug.
Sample Data
A sample is a subset of the population selected for analysis.
Sampling allows researchers to make inferences about the population without studying every individual, which is often impractical or expensive.
Characteristics:
- Subset: Represents a portion of the population.
- Statistic: A numerical value describing the sample (e.g., sample mean x̄, sample variance s²).
- Random Sampling: Samples should be randomly selected to reduce bias and improve representativeness.
Example samples:
- A group of 30 students from a school — to estimate the average student height.
- Four stores across the country — to predict the most purchased product.
- A group of 500 consumers from a city — to estimate city-wide purchasing trends.
- A group of 150 patients — to test a drug’s effectiveness before wider rollout.
Types of Sampling
There are various techniques to select sample data from a population.
The choice depends on the research goal, data availability, and required accuracy.
1. Probability Sampling
Each member of the population has a known and non-zero chance of being selected.
This reduces selection bias and allows for statistical inference.
Common methods:
- Simple Random Sampling: Every member has an equal chance of being selected.
Example: Drawing names out of a hat. - Systematic Sampling: Selecting every nth member after a random start.
Example: Surveying every 10th customer entering a store. - Stratified Sampling: Dividing the population into strata (groups) based on shared characteristics, then randomly sampling within each.
Example: Dividing employees by department and randomly selecting from each department. - Cluster Sampling: Dividing the population into clusters, randomly selecting a few clusters, and surveying all members within them.
Example: Selecting a few schools and surveying all teachers in those schools. - Multistage Sampling: Combining several sampling methods in stages.
Example: Selecting clusters (schools), then randomly sampling individuals (students) within them.
2. Non-Probability Sampling
Not all members of the population have a known chance of being selected.
These methods are easier and cheaper but may introduce bias, limiting generalisability.
Common methods:
- Convenience Sampling: Selecting individuals that are easiest to reach.
Example: Surveying shoppers in a store. - Judgmental (Purposive) Sampling: Selecting participants based on the researcher’s judgment or expertise.
Example: Choosing experts in a field for a study. - Snowball Sampling: Existing participants recruit new ones from their networks.
Example: Asking participants to refer friends or colleagues. - Quota Sampling: Ensuring certain characteristics are represented by setting quotas (e.g., age, gender), but not selecting participants randomly.














