Measure of Dispersion
Maths: Statistics for machine learning
5 min read
Published Oct 22 2025, updated Oct 23 2025
Guide Sections
Guide Comments
Measures of Dispersion describe how spread out or varied the data is in a dataset.
While measures of central tendency (like the mean, median, and mode) tell us the centre of the data, measures of dispersion tell us how far the data values move away from that centre.
In other words —
Central tendency shows where the data is centred.
Dispersion shows how much the data is scattered.
Why It’s Important
- Helps us understand data variability — whether data points are close together or widely spread.
- Two datasets can have the same mean but very different spreads.
- In machine learning, it helps detect outliers, measure data stability, and decide scaling or normalisation methods.
The Main Measures of Dispersion
1. Range
- Definition: The difference between the maximum and minimum values in a dataset.
- Formula:

- Example:
- Data: 10, 12, 15, 18, 20
- Range = 20 − 10 = 10
- Notes:
- Very simple to calculate.
- Highly affected by outliers (extremely high or low values).
2. Interquartile Range (IQR)
- Definition: The range between the 25th percentile (Q1) and the 75th percentile (Q3).
It shows the spread of the middle 50% of the data. - Formula:

- Example:
- Data (ordered): 5, 7, 8, 10, 12, 14, 16, 18
- Q1 = 7.5, Q3 = 15 → IQR = 15 − 7.5 = 7.5
- Notes:
- Not affected by outliers.
- Often used in box plots to show data spread.
- Helps identify outliers (values below Q1 − 1.5 × IQR or above Q3 + 1.5 × IQR).
3. Variance
- Definition: The average of the squared differences between each data point and the mean.
It tells us how far data points deviate from the mean on average. - Formula:

- Example:
- Data: 2, 4, 6
- Mean = 4
- Variance = ((2−4)² + (4−4)² + (6−4)²) / 3 = (4 + 0 + 4)/3 = 2.67
- Notes:
- The result is in squared units, so not directly comparable to the original data.
- Useful for understanding data spread mathematically.
- The formula above is for population variance with it being divided by n, sample variance is divided by n-1.
4. Standard Deviation
- Definition: The square root of variance.
It shows the average amount that data values deviate from the mean, using the same units as the data. - Formula:

- Example:
- Using the previous data → √2.67 = 1.63
- Notes:
- Most commonly used measure of dispersion.
- Larger standard deviation → more spread out data.
- Smaller standard deviation → values closer to the mean.
5. Coefficient of Variation (CV)
- Definition: The ratio of the standard deviation to the mean, expressed as a percentage.
It allows you to compare variability between datasets with different units or scales. - Formula:

- Example:
- If mean = 50, SD = 5 → CV = (5 / 50) × 100% = 10%
- Notes:
- Useful for comparing relative variability.
- Commonly used in finance and risk analysis.
Summary
Measure | Definition | Formula | Sensitive to Outliers? |
Range | Difference between max and min | Max − Min | Yes |
IQR | Middle 50% of data | Q3 − Q1 | No |
Variance | Average of squared deviations | Σ(x−x̄)² / n | Yes |
Standard Deviation | Average deviation from mean | √Variance | Yes |
Coefficient of Variation | SD relative to mean | (SD / Mean) × 100 | Yes |
In Machine Learning
- High dispersion → data varies widely, may indicate outliers or noisy data.
- Low dispersion → consistent data, easier for models to learn patterns.
- Used in:
- Feature scaling and normalisation
- Outlier detection
- Feature selection (variance thresholding)
Bessel’s correction for sample variance
When calculating variance for a population, we divide by n, the total number of items in the population.
When you have data for the entire population, you already know every value.
So when calculating variance, you can use the true mean (μ) and just find how far each value is from that mean.
Because you’re using the actual mean of the population, you don’t need to “adjust” for anything.
Your variance represents the true spread of all values — no bias to correct.
When you only have a sample (a small subset of the population), you don’t know the true mean (μ).
You only have the sample mean (x̄) — an estimate of μ.
That small difference causes a subtle bias:
- The sample mean tends to be closer to the sample data points than the true population mean would be.
- This makes the average squared distances (the variance) a little too small.
So to correct that bias, we divide by (n − 1) instead of n.
This correction is called Bessel’s correction.
In summary:
- Using n−1 gives a better estimate of the true population variance when you only have sample data.
- If you divided by n, your sample variance would consistently underestimate how variable the population really is.
Example:
Say the true population values are:
- Population mean (μ) = 5
- Population variance (divide by n=4):
- ((2−5)2+(4−5)2+(6−5)2+(8−5)2)/4=5
Now take a sample:
- Sample mean (x̄) = 4
- If we divide by n=3, variance = 2.67
- If we divide by n−1=2, variance = 4.0
The n−1 version (4.0) is closer to the true population variance (5).
That’s why we use n−1 — it gives a better, unbiased estimate.
Calculating in Python
NumPy example:
Pandas example:
Population and Sample Variance:
NumPy defaults as population variance (n) and Pandas defaults as sample variance (n-1).














