Keras for Classification (Binary & Multi-Class)
Keras Basics
2 min read
Published Nov 17 2025
Guide Sections
Guide Comments
Classification forms the backbone of many deep learning applications:
- Sentiment analysis
- Image recognition
- Spam filtering
- Disease classification
- Quality control
- Product categorisation
We’ll use two standard real-world datasets:
- IMDB Reviews → Binary classification
- Fashion-MNIST → Multi-class classification
Binary Classification - (IMDB Sentiment)
IMDB Dataset Overview
Dataset contains:
- 25,000 movie reviews (train)
- 25,000 movie reviews (test)
- Labels: 0 = negative, 1 = positive
- Reviews are pre-tokenised as integer sequences
Load data:
We limit vocabulary to 10,000 most common words.
Preprocess the IMDB Data
Reviews are variable-length sequences.
We pad them so all sequences have equal length.
Build a Binary Classification Model
We’ll use:
- An Embedding layer to convert integers → dense word vectors
- A GlobalAveragePooling1D to reduce the sequence
- A Dense output with sigmoid activation
Why this architecture?
- Embedding: learns word meanings
- Pooling: simple sequence reduction
- Sigmoid output: perfect for binary classification
Compile the Binary Model
Binary classification uses:
- Loss: binary_crossentropy
- Activation: sigmoid
- Metric: accuracy
Train the Model
Training is fast because the model is lightweight.
Evaluate the Model
Typical accuracy: ~85–88%
More advanced models (LSTM/Conv1D) reach 90–92%, which you’ll learn in a later section.
Predictions
Multi-Class Classification - (Fashion-MNIST)
Fashion-MNIST Overview
This dataset is like MNIST but with clothing items:
- T-shirts
- Trousers
- Pullovers
- Dresses
- Coats
- Sandals
- Shirts
- Sneakers
- Bags
- Ankle boots
Load dataset:
Preprocess the Data
Normalise pixel values:
Flatten for dense layers:
Build a Multi-Class Classifier
Multi-class classifiers end with softmax output and integer labels.
Compile the Multi-Class Model
Key points:
- softmax output
- sparse_categorical_crossentropy for integer labels (0–9)
Train the Model
Expect around 88–92% accuracy without CNNs. (CNNs will significantly improve in a later section.)
Evaluate
Making Predictions
Confusion Matrix
We can use scikit-learn’s confusion matrix:
Key Differences Between Binary and Multi-Class Models
Binary Classification | Multi-Class Classification |
Output: 1 unit | Output: N units (classes) |
Activation: sigmoid | Activation: softmax |
Loss: binary_crossentropy | Loss: sparse_categorical_crossentropy |
Prediction: >0.5 threshold | Prediction: argmax(probabilities) |














