Tuning Hyperparameters

Keras Basics

3 min read

Published Nov 17 2025

KerasNeural NetworksPythonTensorFlow

Hyperparameters control how a neural network learns. They are not learned from data, you must explicitly choose them.

The most important hyperparameters are:

Learning rate
Batch size
Network depth & width
Activation functions
Optimisers
Dropout rates
Regularisation strengths

This section shows you practical ways to tune them using Keras tools and general guidelines.

Learning Rate (Most Important Hyperparameter)

Learning rate (LR) controls how big the weight updates are.

If LR is too high:

Model diverges
Loss jumps around
Accuracy collapses

If LR is too low:

Training is slow
Model gets stuck in poor local minima

Common Learning Rates

Task	Recommended LR
Simple dense/MLP models	1e-3
CNNs	1e-3 → 1e-4
Transfer learning	1e-4 → 1e-5
Fine-tuning pre-trained layers	1e-5 → 1e-6
RNN/LSTM/GRU	1e-3
GANs	1e-4

How to Set LR in Keras

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

model.compile(optimizer=optimizer, loss=..., metrics=...)

Learning Rate Finder (Fast & Effective)

Increase learning rate gradually and plot loss.

Keras implementation:

tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-6 * 10**epoch)

Train for ~5–8 epochs, see where the loss drops fastest.

Batch Size

Batch size controls how many samples are processed before updating weights.

Common batch sizes:

32 → good default
64 → faster, more stable
128–256 → good on GPU
8–16 → useful for small datasets or noisy gradients

Effects of Batch Size

Large batch sizes:

Faster training per epoch
Smoother gradients
Can require lower LR
May generalise slightly worse

Small batch sizes:

More noisy gradients
Longer training
May escape local minima better
Often generalise better

Modify Batch Size in Keras

Just change the number:

model.fit(train_ds, batch_size=64, epochs=10)

Or for TF Datasets:

train_ds = train_ds.batch(64)

Model Depth & Width

Two major architectural hyperparameters:

Depth → number of layers
Width → number of neurones per layer

General guidelines:

MLP on tabular data - 2–4 Dense layers
CNN on simple images - 4–8 CNN layers
CNN on complex images - 10–50+ layers
LSTM/GRU models - 1–3 layers

Width:

32–128 for small models
128–512 for medium models
512–2048 for advanced models

Activation Functions

Standard choices:

ReLU → best default for hidden layers
leaky ReLU → use if ReLU kills gradients
sigmoid → binary classification output
softmax → multi-class classification output
tanh → sometimes for RNNs

Set activations like:

layers.Dense(128, activation="relu")

Optimisers

Optimiser choice strongly affects performance.

Recommended:

Adam - Default for most tasks
RMSprop - RNNs, LSTMs
SGD + momentum - Large CNNs, fine-tuning
AdamW - Better regularisation, often improved stability

Example:

optimizer = tf.keras.optimizers.AdamW(learning_rate=1e-3, weight_decay=1e-5)

Dropout Rate

Too little → overfitting
Too much → underfitting

Guidelines:

Dense layers - 0.3–0.6
CNNs - 0.1–0.4
RNNs (recurrent_dropout) - 0.1–0.3

Example:

layers.Dropout(0.5)

L2 Regularisation

Use L2 when the model is large or dataset is small.

Typical values:

1e-4
1e-3

Example:

layers.Dense(128, kernel_regularizer=tf.keras.regularizers.l2(1e-4))

Using KerasTuner (Automated Hyperparameter Search)

Install:

pip install keras-tuner

Example tuner:

from keras_tuner import HyperModel, RandomSearch

class MyModel(HyperModel):

def build(self, hp):

model = keras.Sequential()

# Tune width

model.add(layers.Dense(

hp.Int('units', 32, 256, step=32),

activation="relu"

))

# Tune dropout

model.add(layers.Dropout(hp.Float('dropout', 0.1, 0.5)))

model.add(layers.Dense(1, activation="sigmoid"))

# Tune learning rate

lr = hp.Choice("lr", [1e-2, 1e-3, 1e-4])

model.compile(optimizer=keras.optimizers.Adam(lr),

loss="binary_crossentropy",

metrics=["accuracy"])

return model

tuner = RandomSearch(

MyModel(),

objective="val_accuracy",

max_trials=10

)

Run search:

tuner.search(train_ds, validation_data=val_ds, epochs=5)

Practical Hyperparameter Tuning Strategy

A simple yet powerful workflow:

Step 1 — Start with a reasonable architecture - Don’t tune if model is too small or too big yet.
Step 2 — Tune learning rate first - Use LR Finder or try {1e-2, 1e-3, 1e-4}.
Step 3 — Tune batch size - Try {32, 64, 128}.
Step 4 — Tune model capacity - Increase width/depth until you overfit.
Step 5 — Tune regularisation (dropout, L2) - Add only if needed.
Step 6 — Try a different optimiser - Adam → AdamW → SGD momentum.
Step 7 — Use callbacks - EarlyStopping + ReduceLROnPlateau is often enough.