Recurrent & Sequence Models (RNN, LSTM, GRU)

Keras Basics

2 min read

Published Nov 17 2025

KerasNeural NetworksPythonTensorFlow

Recurrent neural networks (RNNs) are designed for sequential data, where order matters.

Examples:

Text (sentences, documents)
Time series (stock prices, weather)
Event logs
DNA sequences
Audio

In this chapter you’ll learn how to use:

SimpleRNN
LSTM (Long Short-Term Memory)
GRU (Gated Recurrent Unit)

We’ll use the IMDB sentiment analysis dataset again, but this time with RNN layers instead of a simple embedding + pooling network previously.

Load the IMDB Dataset

We use the top 10,000 most frequent words:

from tensorflow.keras import datasets

(x_train, y_train), (x_test, y_test) = datasets.imdb.load_data(num_words=10000)

Data:

x_train[i] is a list of integer word indices
Length varies per review

Pad Sequences

RNNs require fixed-length sequences, so we pad them:

from tensorflow.keras.preprocessing.sequence import pad_sequences

max_len = 200

x_train = pad_sequences(x_train, maxlen=max_len)

x_test = pad_sequences(x_test, maxlen=max_len)

Build an LSTM Model

LSTM networks are the most popular and performant RNN type for text.

LSTM Model Architecture:

from tensorflow import keras

from tensorflow.keras import layers

model = keras.Sequential([

layers.Embedding(input_dim=10000, output_dim=32, input_length=max_len),

layers.LSTM(64),

layers.Dense(1, activation='sigmoid')

])

Explanation:

Embedding layer → converts word indices into dense vectors
LSTM(64 units) → processes sequence in order
Sigmoid → binary classification output

Compile the LSTM Model

model.compile(

optimizer="adam",

loss="binary_crossentropy",

metrics=["accuracy"]

)

Train the LSTM Model

history = model.fit(

x_train, y_train,

epochs=5,

batch_size=64,

validation_split=0.2

)

LSTMs are slower than CNNs or MLPs, sequences must be processed step-by-step.

Typical accuracy: 88–92%.

Evaluate

model.evaluate(x_test, y_test)

Predicting Sentiment

import numpy as np

pred_prob = model.predict(x_test[:1])[0][0]

print("Predicted prob:", pred_prob)

print("Predicted class:", int(pred_prob > 0.5))

Using GRU Instead of LSTM (Faster, Similar Accuracy)

GRUs are a simplified LSTM variant:

model = keras.Sequential([

layers.Embedding(10000, 32, input_length=max_len),

layers.GRU(64),

layers.Dense(1, activation='sigmoid')

])

Train exactly the same way:

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

GRUs typically:

Train faster
Use fewer parameters
Achieve similar accuracy

Using SimpleRNN (Not Recommended for Long Sequences)

SimpleRNN is a basic RNN layer. Just to show how, not practical for long text.

model = keras.Sequential([

layers.Embedding(10000, 32, input_length=max_len),

layers.SimpleRNN(32),

layers.Dense(1, activation='sigmoid')

])

Training is identical, but accuracy will be much lower, especially on long sequences.

Bidirectional LSTM (Higher Accuracy)

The model reads the sequence forward + backward.

model = keras.Sequential([

layers.Embedding(10000, 32, input_length=max_len),

layers.Bidirectional(layers.LSTM(64)),

layers.Dense(1, activation='sigmoid')

])

This often reaches 92–94% accuracy.

LSTM/GRU Dropout

RNN layers support two types of dropout:

layers.LSTM(

64,

# dropout on inputs

dropout=0.3,

# dropout on recurrent connections

recurrent_dropout=0.3

)

This helps prevent overfitting.

Masking and Variable-Length Sequences

Keras can automatically skip padded values:

layers.Masking(mask_value=0)

Example with masking:

model = keras.Sequential([

# ignore padded 0s

layers.Masking(mask_value=0),

layers.Embedding(10000, 32),

layers.LSTM(64),

layers.Dense(1, activation='sigmoid')

])

Time Series with RNNs

Given a sliding window time series:

model = keras.Sequential([

layers.LSTM(64, input_shape=(timesteps, features)),

layers.Dense(1)

])

Full Working LSTM Example

from tensorflow.keras import datasets

from tensorflow.keras.preprocessing.sequence import pad_sequences

from tensorflow import keras

from tensorflow.keras import layers

# Load data

(x_train, y_train), (x_test, y_test) = datasets.imdb.load_data(num_words=10000)

# Pad sequences

max_len = 200

x_train = pad_sequences(x_train, maxlen=max_len)

x_test = pad_sequences(x_test, maxlen=max_len)

# Build model

model = keras.Sequential([

layers.Embedding(10000, 32, input_length=max_len),

layers.LSTM(64),

layers.Dense(1, activation='sigmoid')

])

# Compile

model.compile(

optimizer="adam",

loss="binary_crossentropy",

metrics=["accuracy"]

)

# Train

model.fit(

x_train, y_train,

epochs=5,

batch_size=64,

validation_split=0.2

)

# Evaluate

model.evaluate(x_test, y_test)

# Predict example

pred_prob = model.predict(x_test[:1])[0][0]

print("Prediction:", int(pred_prob > 0.5), " True:", y_test[0])

Recurrent & Sequence Models (RNN, LSTM, GRU)

Keras Basics

2 min read

Published Nov 17 2025

Guide Sections

Guide Comments

Load the IMDB Dataset

Pad Sequences

Build an LSTM Model

Compile the LSTM Model

Train the LSTM Model

Evaluate

Predicting Sentiment

Using GRU Instead of LSTM (Faster, Similar Accuracy)

Using SimpleRNN (Not Recommended for Long Sequences)

Bidirectional LSTM (Higher Accuracy)

LSTM/GRU Dropout

Masking and Variable-Length Sequences

Time Series with RNNs

Full Working LSTM Example

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark