Unsupervised Learning
Scikit-learn Basics
6 min read
Published Nov 17 2025, updated Nov 19 2025
Guide Sections
Guide Comments
Unlike supervised learning, unsupervised learning deals with unlabeled data, data where we don’t have known outcomes or target labels.
The goal is to discover hidden patterns, structures, or relationships within the data itself.
Examples include:
- Grouping customers by purchasing behaviour (clustering)
- Reducing high-dimensional data for visualisation (PCA)
- Detecting anomalies (outlier detection)
Scikit-learn provides a variety of unsupervised algorithms that follow the same familiar interface:
Even though these models don’t use labels (y), they still “learn” from the structure of X.
Key Types of Unsupervised Learning
- Clustering:
Groups data points into clusters based on similarity.
Examples: KMeans, DBSCAN, Agglomerative Clustering. - Dimensionality Reduction:
Compresses high-dimensional data into fewer features while preserving structure.
Examples: PCA (Principal Component Analysis), t-SNE. - Anomaly Detection:
Identifies data points that differ significantly from the majority.
Examples: Isolation Forest, One-Class SVM.
Clustering
Clustering attempts to find natural groupings in data.
Each algorithm has a different way of defining what a “cluster” means:
- KMeans: Divides data into k spherical clusters based on distance to centroids.
- DBSCAN: Groups dense regions together and marks sparse points as outliers.
- Hierarchical / Agglomerative: Builds a tree (dendrogram) of clusters.
KMeans Clustering
KMeans is the most widely used clustering algorithm. It partitions the data into k clusters by minimising the within-cluster variance.
Algorithm summary:
- Choose
kcluster centres (centroids). - Assign each point to the nearest centroid.
- Recompute centroids.
- Repeat until assignments stabilise.
Example:

Notes:
- You must specify the number of clusters (
n_clusters) in advance. - Initialisation can affect results, use
n_initto improve stability. - The Elbow Method helps estimate the optimal number of clusters.
The Elbow Method
The Elbow Method evaluates clustering quality across different k values using inertia (sum of squared distances to centroids).

The “elbow” point — where inertia stops decreasing sharply — indicates a good value for k.
Silhouette Score
The silhouette score is a metric used to evaluate how well data has been clustered. It tells you how similar a point is to its own cluster compared to other clusters.
For each point i, the silhouette value is:

Where:
- a(i) = average distance from point i to all other points in its cluster
- b(i) = minimum average distance from point i to points in any other cluster
Thus:
- s(i) ≈ +1 → good clustering (well separated)
- s(i) ≈ 0 → borderline/overlapping clusters
- s(i) < 0 → bad clustering (misclassified point)
The overall silhouette score is the mean of all s(i) values.
Why Use It With K-Means?
K-means requires choosing k, the number of clusters.
Silhouette score helps determine the best k:
- Higher silhouette score = better clustering structure.
Common process: compute silhouette scores for k = 2…10 and pick the best.
How To Compute Silhouette Score in Python:
How To Select the Best 'k' Using Silhouette Analysis:
You can plot the scores:
How To Interpret Silhouette Scores
0.71 – 1.00- Excellent clustering0.51 – 0.70- Good clustering0.26 – 0.50- Fair clustering≤ 0.25- Poor clustering / possible overlap
DBSCAN (Density-Based Clustering)
DBSCAN groups together points that are close to each other in dense regions and labels sparse points as outliers.
Advantages:
- Doesn’t require specifying
k - Detects arbitrarily shaped clusters
- Handles outliers naturally
Example:

Notes:
eps: neighborhood radius, smaller values create more clusters.min_samples: minimum points required to form a dense region.- Returns
-1for outlier points.
Can you use Silhouette Score?
Yes, as long as DBSCAN assigns cluster labels.
Important notes:
- DBSCAN labels noise points as -1
- You must remove noise points (-1) before computing silhouette score (because they don’t belong to any cluster)
Example:
Dimensionality Reduction
Dimensionality reduction simplifies datasets with many features while preserving important structure or variance.
This is crucial for:
- Visualisation
- Noise reduction
- Improving efficiency
- Handling multicollinearity
Two common methods:
- PCA (Principal Component Analysis) – linear projection maximising variance.
- t-SNE (t-distributed Stochastic Neighbour Embedding) – nonlinear visualisation for complex manifolds.
Principal Component Analysis (PCA)
PCA transforms features into new orthogonal components (linear combinations) ordered by variance.
It’s unsupervised but often used as a preprocessing step before supervised tasks.
Why PCA Is Useful
PCA helps when you have:
- Many features
- Redundant or highly correlated variables
- Noise obscuring patterns
- Algorithms that struggle in high dimensions, e.g. clustering, regression, visualisation
By using PCA, you:
- Simplify the dataset
- Remove collinearity
- Reduce noise
- Improve model efficiency
- Often improve clustering or classification accuracy
- Make it easier to visualise structure, e.g. 2D or 3D plots
How PCA Works
PCA does three main things:
1. Identifies directions of maximum variance
- It finds the axes along which the data spreads out the most.
2. Creates new features (principal components)
- Each component is a weighted combination of the original variables.
- Component 1 → captures the largest amount of variance
- Component 2 → the second largest, orthogonal to Component 1
- And so on…
3. Reduces dimensionality
- Instead of keeping all components, you keep only the first few that explain most of the variation.
For example:
- 50 features → reduce to 5 components but still capture 90–95% of the information.
You can then specify the number of components inside a pipeline, for example:
Example:

Output interpretation:
- The axes (PC1, PC2) are directions of maximum variance.
- Clusters often emerge naturally even without using labels.
Notes:
- PCA requires scaled data (
StandardScalerfirst). - You can check how much variance each component explains:
PCA is linear, nonlinear structures may need t-SNE or UMAP.
t-SNE for Visualisation
t-SNE (t-distributed Stochastic Neighbour Embedding) is useful for visualising complex, high-dimensional data in 2D or 3D.
Example:

Notes:
- t-SNE focuses on preserving local structure, similar points stay close.
- It’s non-deterministic, small random changes can alter output.
- Best used for visualisation, not as input to other models.
Evaluating Unsupervised Models
Evaluating unsupervised models can be tricky because there are no ground-truth labels.
Intrinsic Metrics
These measure internal cohesion and separation between clusters:
- Inertia (for KMeans) - Total distance of samples to their assigned centroids.
- Silhouette Score - Measures how similar a sample is to its own cluster vs others.
Closer to 1 = well-separated clusters.
Extrinsic Metrics
When true labels are available (e.g. for benchmarking):
- Adjusted Rand Index (ARI) - Compares clustering results to true labels.
- Normalised Mutual Information (NMI) - Measures shared information between predicted and actual clusters.
Anomaly Detection (Overview)
Anomaly detection identifies data points that deviate from the overall pattern. It’s used in fraud detection, quality control, and cybersecurity.
Scikit-learn provides:
- IsolationForest
- OneClassSVM
- LocalOutlierFactor
Example:
Anomalies are automatically flagged based on sparse density or isolation.
Comparing and Choosing Unsupervised Methods
Task | Recommended Method | Notes |
Find groups in numeric data | KMeans | Simple, fast, needs k |
Find irregular shapes / outliers | DBSCAN | Density-based, no k required |
Reduce high-dimensional data | PCA | Linear projection, scalable |
Visualise complex data | t-SNE | Nonlinear, good for 2D visualisation |
Detect anomalies | IsolationForest | Works well on high-dimensional data |
Best Practices
- Always scale your features before clustering or PCA, unscaled features can dominate results.
- Try multiple clustering algorithms, different models capture different patterns.
- Use visualisation tools (PCA, t-SNE) to inspect structure and separability.
- Don’t overinterpret clusters, unsupervised models find structure, not meaning.
- Evaluate with multiple metrics if you have ground-truth labels.














