Tuning Hyperparameters
Keras Basics
3 min read
Published Nov 17 2025
Guide Sections
Guide Comments
Hyperparameters control how a neural network learns. They are not learned from data, you must explicitly choose them.
The most important hyperparameters are:
- Learning rate
- Batch size
- Network depth & width
- Activation functions
- Optimisers
- Dropout rates
- Regularisation strengths
This section shows you practical ways to tune them using Keras tools and general guidelines.
Learning Rate (Most Important Hyperparameter)
Learning rate (LR) controls how big the weight updates are.
If LR is too high:
- Model diverges
- Loss jumps around
- Accuracy collapses
If LR is too low:
- Training is slow
- Model gets stuck in poor local minima
Common Learning Rates
Task | Recommended LR |
Simple dense/MLP models | 1e-3 |
CNNs | 1e-3 → 1e-4 |
Transfer learning | 1e-4 → 1e-5 |
Fine-tuning pre-trained layers | 1e-5 → 1e-6 |
RNN/LSTM/GRU | 1e-3 |
GANs | 1e-4 |
How to Set LR in Keras
Learning Rate Finder (Fast & Effective)
Increase learning rate gradually and plot loss.
Keras implementation:
Train for ~5–8 epochs, see where the loss drops fastest.
Batch Size
Batch size controls how many samples are processed before updating weights.
Common batch sizes:
- 32 → good default
- 64 → faster, more stable
- 128–256 → good on GPU
- 8–16 → useful for small datasets or noisy gradients
Effects of Batch Size
Large batch sizes:
- Faster training per epoch
- Smoother gradients
- Can require lower LR
- May generalise slightly worse
Small batch sizes:
- More noisy gradients
- Longer training
- May escape local minima better
- Often generalise better
Modify Batch Size in Keras
Just change the number:
Or for TF Datasets:
Model Depth & Width
Two major architectural hyperparameters:
- Depth → number of layers
- Width → number of neurones per layer
General guidelines:
- MLP on tabular data - 2–4 Dense layers
- CNN on simple images - 4–8 CNN layers
- CNN on complex images - 10–50+ layers
- LSTM/GRU models - 1–3 layers
Width:
- 32–128 for small models
- 128–512 for medium models
- 512–2048 for advanced models
Activation Functions
Standard choices:
- ReLU → best default for hidden layers
- leaky ReLU → use if ReLU kills gradients
- sigmoid → binary classification output
- softmax → multi-class classification output
- tanh → sometimes for RNNs
Set activations like:
Optimisers
Optimiser choice strongly affects performance.
Recommended:
- Adam - Default for most tasks
- RMSprop - RNNs, LSTMs
- SGD + momentum - Large CNNs, fine-tuning
- AdamW - Better regularisation, often improved stability
Example:
Dropout Rate
Too little → overfitting
Too much → underfitting
Guidelines:
- Dense layers - 0.3–0.6
- CNNs - 0.1–0.4
- RNNs (recurrent_dropout) - 0.1–0.3
Example:
L2 Regularisation
Use L2 when the model is large or dataset is small.
Typical values:
- 1e-4
- 1e-3
Example:
Using KerasTuner (Automated Hyperparameter Search)
Install:
Example tuner:
Run search:
Practical Hyperparameter Tuning Strategy
A simple yet powerful workflow:
- Step 1 — Start with a reasonable architecture - Don’t tune if model is too small or too big yet.
- Step 2 — Tune learning rate first - Use LR Finder or try {1e-2, 1e-3, 1e-4}.
- Step 3 — Tune batch size - Try {32, 64, 128}.
- Step 4 — Tune model capacity - Increase width/depth until you overfit.
- Step 5 — Tune regularisation (dropout, L2) - Add only if needed.
- Step 6 — Try a different optimiser - Adam → AdamW → SGD momentum.
- Step 7 — Use callbacks - EarlyStopping + ReduceLROnPlateau is often enough.
Example: Systematic Tuning
Initial settings:
- LR = 1e-3
- batch = 32
- units = 128
- no dropout
Observed: overfitting → val loss increases
- Add dropout 0.3
Observed: unstable training
- Lower LR to 1e-4
Observed: slow convergence
- Increase batch to 64
Observed: plateau
- Add ReduceLROnPlateau
This iterative workflow is how most practitioners tune models.














