Tuning Hyperparameters

Keras Basics

3 min read

Published Nov 17 2025


11
0
0
0

KerasNeural NetworksPythonTensorFlow

Hyperparameters control how a neural network learns. They are not learned from data, you must explicitly choose them.


The most important hyperparameters are:

  1. Learning rate
  2. Batch size
  3. Network depth & width
  4. Activation functions
  5. Optimisers
  6. Dropout rates
  7. Regularisation strengths

This section shows you practical ways to tune them using Keras tools and general guidelines.






Learning Rate (Most Important Hyperparameter)

Learning rate (LR) controls how big the weight updates are.


If LR is too high:

  • Model diverges
  • Loss jumps around
  • Accuracy collapses

If LR is too low:

  • Training is slow
  • Model gets stuck in poor local minima

Common Learning Rates

Task

Recommended LR

Simple dense/MLP models

1e-3

CNNs

1e-3 → 1e-4

Transfer learning

1e-4 → 1e-5

Fine-tuning pre-trained layers

1e-5 → 1e-6

RNN/LSTM/GRU

1e-3

GANs

1e-4



How to Set LR in Keras

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss=..., metrics=...)



Learning Rate Finder (Fast & Effective)

Increase learning rate gradually and plot loss.

Keras implementation:

tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-6 * 10**epoch)

Train for ~5–8 epochs, see where the loss drops fastest.






Batch Size

Batch size controls how many samples are processed before updating weights.


Common batch sizes:

  • 32 → good default
  • 64 → faster, more stable
  • 128–256 → good on GPU
  • 8–16 → useful for small datasets or noisy gradients


Effects of Batch Size

Large batch sizes:

  • Faster training per epoch
  • Smoother gradients
  • Can require lower LR
  • May generalise slightly worse

Small batch sizes:

  • More noisy gradients
  • Longer training
  • May escape local minima better
  • Often generalise better


Modify Batch Size in Keras

Just change the number:

model.fit(train_ds, batch_size=64, epochs=10)

Or for TF Datasets:

train_ds = train_ds.batch(64)






Model Depth & Width

Two major architectural hyperparameters:

  • Depth → number of layers
  • Width → number of neurones per layer

General guidelines:

  • MLP on tabular data - 2–4 Dense layers
  • CNN on simple images - 4–8 CNN layers
  • CNN on complex images - 10–50+ layers
  • LSTM/GRU models - 1–3 layers

Width:

  • 32–128 for small models
  • 128–512 for medium models
  • 512–2048 for advanced models





Activation Functions

Standard choices:

  • ReLU → best default for hidden layers
  • leaky ReLU → use if ReLU kills gradients
  • sigmoid → binary classification output
  • softmax → multi-class classification output
  • tanh → sometimes for RNNs

Set activations like:

layers.Dense(128, activation="relu")





Optimisers

Optimiser choice strongly affects performance.

Recommended:

  • Adam - Default for most tasks
  • RMSprop - RNNs, LSTMs
  • SGD + momentum - Large CNNs, fine-tuning
  • AdamW - Better regularisation, often improved stability

Example:

optimizer = tf.keras.optimizers.AdamW(learning_rate=1e-3, weight_decay=1e-5)





Dropout Rate

Too little → overfitting
Too much → underfitting


Guidelines:

  • Dense layers - 0.3–0.6
  • CNNs - 0.1–0.4
  • RNNs (recurrent_dropout) - 0.1–0.3

Example:

layers.Dropout(0.5)





L2 Regularisation

Use L2 when the model is large or dataset is small.


Typical values:

  • 1e-4
  • 1e-3

Example:

layers.Dense(128, kernel_regularizer=tf.keras.regularizers.l2(1e-4))





Using KerasTuner (Automated Hyperparameter Search)

Install:

pip install keras-tuner

Example tuner:

from keras_tuner import HyperModel, RandomSearch

class MyModel(HyperModel):
    def build(self, hp):
        model = keras.Sequential()
        
        # Tune width
        model.add(layers.Dense(
            hp.Int('units', 32, 256, step=32),
            activation="relu"
        ))
        
        # Tune dropout
        model.add(layers.Dropout(hp.Float('dropout', 0.1, 0.5)))
        
        model.add(layers.Dense(1, activation="sigmoid"))
        
        # Tune learning rate
        lr = hp.Choice("lr", [1e-2, 1e-3, 1e-4])
        model.compile(optimizer=keras.optimizers.Adam(lr),
                      loss="binary_crossentropy",
                      metrics=["accuracy"])
        return model

tuner = RandomSearch(
    MyModel(),
    objective="val_accuracy",
    max_trials=10
)

Run search:

tuner.search(train_ds, validation_data=val_ds, epochs=5)





Practical Hyperparameter Tuning Strategy

A simple yet powerful workflow:

  • Step 1 — Start with a reasonable architecture - Don’t tune if model is too small or too big yet.
  • Step 2 — Tune learning rate first - Use LR Finder or try {1e-2, 1e-3, 1e-4}.
  • Step 3 — Tune batch size - Try {32, 64, 128}.
  • Step 4 — Tune model capacity - Increase width/depth until you overfit.
  • Step 5 — Tune regularisation (dropout, L2) - Add only if needed.
  • Step 6 — Try a different optimiser - Adam → AdamW → SGD momentum.
  • Step 7 — Use callbacks - EarlyStopping + ReduceLROnPlateau is often enough.





Example: Systematic Tuning

Initial settings:

  • LR = 1e-3
  • batch = 32
  • units = 128
  • no dropout

Observed: overfitting → val loss increases

  • Add dropout 0.3

Observed: unstable training

  • Lower LR to 1e-4

Observed: slow convergence

  • Increase batch to 64

Observed: plateau

  • Add ReduceLROnPlateau

This iterative workflow is how most practitioners tune models.


Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact