Hyperparameter Tuning

Scikit-learn Basics

3 min read

Published Nov 17 2025, updated Nov 19 2025

ClusteringFeature EngineeringK-MeansLinear RegressionLogistic RegressionMachine LearningNumPyPythonRandom Forestsscikit-learnSupervised LearningUnsupervised Learning

Every machine learning model in Scikit-learn has hyperparameters, configuration settings chosen before training.
They control the model’s structure or learning behavior (e.g., tree depth, regularisation strength, number of neighbours).

Unlike learned parameters (like weights or coefficients), hyperparameters are not learned from data.
Choosing them well can dramatically improve performance — and choosing them poorly can cause overfitting or underfitting.

Scikit-learn provides systematic, automated ways to search for optimal hyperparameter values:

Grid Search (exhaustive search across predefined values)
Randomised Search (sampling from parameter distributions)
Bayesian Optimisation (via external libraries like Optuna — optional)

Parameters vs Hyperparameters

Type	Example	Set When	Learned By
Model Parameter	Regression coefficients (`coef_`)	During `fit()`	The model
Hyperparameter	`max_depth`, `n_estimators`, `C`, `alpha`	Before training	You (via tuning)

Example:

from sklearn.tree import DecisionTreeClassifier

# Hyperparameters

model = DecisionTreeClassifier(max_depth=5, min_samples_split=4)

model.fit(X_train, y_train)

Here, max_depth and min_samples_split are hyperparameters that control model complexity.

Why Tune Hyperparameters?

Proper tuning ensures that:

The model generalises better to unseen data
Performance is optimised without overfitting
You find the right bias-variance balance

Example:
A RandomForest with too few trees might underfit; too many could overfit or waste resources.
Hyperparameter tuning helps find the sweet spot.

Manual Search (Baseline Approach)

You can start by manually trying a few configurations and comparing validation scores:

from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import load_wine

X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

for n in [50, 100, 200]:

model = RandomForestClassifier(n_estimators=n, random_state=42)

scores = cross_val_score(model, X_train, y_train, cv=5)

print(f"{n} trees: mean accuracy = {scores.mean():.3f}")

Output:

50 trees: mean accuracy = 0.963

100 trees: mean accuracy = 0.978

200 trees: mean accuracy = 0.979

This simple process can guide your initial hyperparameter ranges for grid or randomised search.

Grid Search (Exhaustive Search)

Grid Search evaluates every combination of parameter values across a specified grid. It’s thorough but can be slow on large grids.

Example: GridSearchCV

from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.model_selection import GridSearchCV

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import load_wine

X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

param_grid = {

'n_estimators': [50, 100, 200],

'max_depth': [None, 5, 10],

'min_samples_split': [2, 5]

}

grid_search = GridSearchCV(

estimator=RandomForestClassifier(random_state=42),

param_grid=param_grid,

cv=5,

scoring='accuracy',

n_jobs=-1

)

grid_search.fit(X_train, y_train)

print("Best parameters:", grid_search.best_params_)

print("Best CV score:", grid_search.best_score_)

Output:

Best parameters: {'max_depth': None, 'min_samples_split': 5, 'n_estimators': 50}

Best CV score: 0.9774928774928775

Notes:

cv=5 → 5-fold cross-validation.
scoring='accuracy' can be replaced with any metric (f1, roc_auc, r2, etc.).
n_jobs=-1 uses all CPU cores for parallel computation.

After finding the best model:

best_model = grid_search.best_estimator_

print("Test accuracy:", best_model.score(X_test, y_test))

Randomised Search (Efficient Sampling)

When the parameter space is large, RandomizedSearchCV samples random combinations instead of trying all possible ones.

This often finds good results faster and works well for large models (e.g., gradient boosting).

Example:

from sklearn.model_selection import train_test_split, RandomizedSearchCV

from sklearn.ensemble import GradientBoostingClassifier

from scipy.stats import randint

from sklearn.datasets import load_wine

X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

param_dist = {

'n_estimators': randint(50, 200),

'max_depth': randint(2, 10),

'learning_rate': [0.01, 0.05, 0.1, 0.2]

}

random_search = RandomizedSearchCV(

GradientBoostingClassifier(random_state=42),

param_distributions=param_dist,

n_iter=20,

cv=5,

scoring='accuracy',

random_state=42,

n_jobs=-1

)

random_search.fit(X_train, y_train)

print("Best params:", random_search.best_params_)

print("Best CV score:", random_search.best_score_)

Output:

Best params: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 142}

Best CV score: 0.9401709401709402

Notes:

n_iter controls how many random samples to test.
You can pass distributions (scipy.stats.randint, uniform) for continuous ranges.
Randomised search is ideal when grid search is too expensive.

Nested Cross-Validation (Advanced)

When hyperparameter tuning is part of model selection, nested cross-validation provides an unbiased estimate of generalisation.

It runs an inner loop for tuning and an outer loop for evaluation:

from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, KFold

from sklearn.svm import SVC

import numpy as np

from sklearn.datasets import load_wine

X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}

grid = GridSearchCV(SVC(), param_grid=param_grid, cv=3)

outer_cv = KFold(n_splits=5, shuffle=True, random_state=42)

nested_scores = cross_val_score(grid, X, y, cv=outer_cv)

print("Nested CV mean accuracy:", np.mean(nested_scores))

Output:

Nested CV mean accuracy: 0.9552380952380952

Why nested CV?
It avoids data leakage between tuning and validation by ensuring the outer test folds never influence hyperparameter choices.

Custom Scoring Functions

You can customise evaluation metrics for tuning by using make_scorer.

Example (optimise F1-score for binary classification):

from sklearn.metrics import make_scorer, f1_score

f1_scorer = make_scorer(f1_score, average='binary')

grid = GridSearchCV(model, param_grid, scoring=f1_scorer, cv=5)

Scikit-learn also provides built-in scoring names:

from sklearn.metrics import get_scorer_names

print(get_scorer_names())

You’ll see metrics like 'accuracy', 'roc_auc', 'neg_mean_squared_error', etc.

Practical Tips for Efficient Tuning

Start simple - Begin with a few important hyperparameters; expand only if needed.
Use Randomised Search for large grids - Often finds comparable results much faster.
Parallelise - Set n_jobs=-1 to use all available cores.
Use cross-validation - Ensures your tuning isn’t biased by a lucky train/test split.
Cache results (optional) - Use joblib to store models during search, especially for large datasets.
Monitor runtime - Some models (e.g., SVMs, gradient boosting) scale poorly with very large parameter grids.

Example: Full Tuning Workflow

from sklearn.model_selection import train_test_split, RandomizedSearchCV

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import load_wine

from scipy.stats import randint

X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

param_dist = {

'n_estimators': randint(50, 300),

'max_depth': randint(2, 15),

'min_samples_split': randint(2, 10)

}

search = RandomizedSearchCV(

RandomForestClassifier(random_state=42),

param_distributions=param_dist,

n_iter=20,

cv=5,

scoring='accuracy',

n_jobs=-1,

random_state=42

)

search.fit(X_train, y_train)

print("Best parameters:", search.best_params_)

print("Best cross-val score:", search.best_score_)

print("Test set accuracy:", search.best_estimator_.score(X_test, y_test))

Output:

Best parameters: {'max_depth': 13, 'min_samples_split': 9, 'n_estimators': 64}

Best cross-val score: 0.9849002849002849

Test set accuracy: 1.0

When and How to Stop Tuning

If tuning multiple algorithms, use a coarse search first to identify strong candidates.
Once the best model type is known, refine with tighter parameter ranges.
Avoid re-tuning endlessly, use validation performance as your stopping point.
Keep test data fully unseen until final confirmation.