Image Analysis (Computer Vision Basics)

Machine Learning Fundamentals with Python

5 min read

Published Nov 16 2025

ClusteringImagesK-MeansLinear RegressionLogistic RegressionMachine LearningNeural NetworksNLPNumPyPythonRandom Forestsscikit-learnSupervised LearningUnsupervised Learning

Image analysis (or computer vision) focuses on teaching computers to understand and interpret images.

Examples include:

Detecting faces in photos
Classifying handwritten digits (MNIST)
Sorting medical images into healthy/unhealthy categories
Grouping similar images automatically

Machine learning treats each image as a set of numeric values — pixel intensities — that can be analysed, visualised, and used as model inputs.

Example using MNIST Dataset

MNIST (Modified National Institute of Standards and Technology) is a dataset of 70,000 greyscale images of handwritten digits — each 28×28 pixels.

Each image is labeled with the digit it represents (0–9).

Loading the Dataset

You can load it directly from TensorFlow — no downloading or folders needed.

from tensorflow.keras.datasets import mnist

# Load data

(x_train, y_train), (x_test, y_test) = mnist.load_data()

print("Training set:", x_train.shape, "Labels:", y_train.shape)

print("Test set:", x_test.shape, "Labels:", y_test.shape)

Output:

Training set: (60000, 28, 28) Labels: (60000,)

Test set: (10000, 28, 28) Labels: (10000,)

Each image is a 28×28 pixel grayscale image stored as a 2D NumPy array.

Visualising Sample Images

Let’s look at some random samples.

import matplotlib.pyplot as plt

import numpy as np

# Set up the grid: 10 rows (digits 0–9) × 10 columns (examples)

num_classes = 10

num_examples = 10

plt.figure(figsize=(10, 10))

for digit in range(num_classes):

# Get all indices for this digit

indices = np.where(y_train == digit)[0]

# Randomly choose 10 examples for this digit

selected = np.random.choice(indices, num_examples, replace=False)

for i, idx in enumerate(selected):

plt.subplot(num_classes, num_examples, digit * num_examples + i + 1)

plt.imshow(x_train[idx], cmap='gray')

plt.axis("off")

plt.suptitle("MNIST: 10 Random Examples per Digit", fontsize=16)

plt.tight_layout()

plt.show()

machine learning fundamentals images analysis random all digits

Explanation:

np.where(y_train == digit)[0] → finds all image indices for a given digit.
np.random.choice(..., 10, replace=False) → randomly picks 10 unique examples for that digit.
plt.subplot(10, 10, ...) → arranges images into a 10×10 grid.
We multiply digit * num_examples to correctly position each group of images by row.
Each row corresponds to one digit (0 through 9).
Each column shows a random handwriting variation of that digit.
It gives you an instant sense of:
- How consistent (or variable) each digit looks
- Which digits may be hard for a model to distinguish (e.g., 3 vs 8, 4 vs 9)

Checking Pixel Values

Each pixel value represents brightness — 0 (black) to 255 (white).

print("Pixel value range:", x_train.min(), "to", x_train.max())

# Normalise pixel values for better model training later

x_train_norm = x_train / 255.0

x_test_norm = x_test / 255.0

print("After normalisation:", x_train_norm.min(), "to", x_train_norm.max())

Outputs:

Pixel value range: 0 to 255

After normalisation: 0.0 to 1.0

Explanation:

Normalising pixel values (0–1 range) helps models train faster and more reliably.
This step is essential for deep learning models later.

Visualising the Distribution of Labels

It’s always good to check how balanced your dataset is.

import seaborn as sns

sns.countplot(x=y_train)

plt.title("Distribution of Digits in Training Set")

plt.show()

machine learning fundamentals images analysis digit distribution

Observation:
Each digit appears roughly the same number of times — a well-balanced dataset.

Computing Average Images per Label

Let’s calculate the average image for each digit (0–9).
This gives you a sense of what the “typical” example of each class looks like.

avg_images = []

for digit in range(10):

avg_image = np.mean(x_train[y_train == digit], axis=0)

avg_images.append(avg_image)

# Visualise all average digits

plt.figure(figsize=(10, 4))

for i in range(10):

plt.subplot(2, 5, i + 1)

plt.imshow(avg_images[i], cmap='gray')

plt.title(f"Avg: {i}")

plt.axis("off")

plt.tight_layout()

plt.show()

machine learning fundamentals images analysis average digits

Explanation:

Each averaged image shows a “blurred” outline of the most common way that digit is written.
For example, the average ‘1’ will be a faint vertical line, while the average ‘0’ will be a ring.

Comparing Two Average Images

Let’s see how two digits differ visually — e.g., comparing 3 vs 8.

digit_a, digit_b = 3, 8

avg_a = avg_images[digit_a]

avg_b = avg_images[digit_b]

# Compute pixel-wise absolute difference

diff = np.abs(avg_a - avg_b)

plt.figure(figsize=(9,3))

plt.subplot(1,3,1)

plt.imshow(avg_a, cmap='gray')

plt.title(f"Average {digit_a}")

plt.axis("off")

plt.subplot(1,3,2)

plt.imshow(avg_b, cmap='gray')

plt.title(f"Average {digit_b}")

plt.axis("off")

plt.subplot(1,3,3)

plt.imshow(diff, cmap='hot')

plt.title("Difference (3 vs 8)")

plt.axis("off")

plt.tight_layout()

plt.show()

machine learning fundamentals images analysis compare average digits

Explanation:

Bright (red/yellow) areas in the difference plot show where the digits differ most.
For example, 8 has extra loops that 3 doesn’t — this shows up clearly.

Reshaping for Model Input

When using this data in models later:

Traditional ML models (like KNN or SVM) need each image flattened into a 1D vector.
Neural networks (CNNs) keep the 2D structure (28×28, or 28×28×1 for grayscale).

# Flatten for non-CNN models

X_train_flat = x_train_norm.reshape(len(x_train_norm), -1)

X_test_flat = x_test_norm.reshape(len(x_test_norm), -1)

print("Flattened shape:", X_train_flat.shape)

Output:

Flattened shape: (60000, 784)

Quick Classifier Example

Just for demonstration, we will train a simple model (no deep learning) to see how well it can classify digits using flattened pixel data.

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

# Train on a small subset for speed

subset = 10000

model = LogisticRegression(max_iter=1000)

model.fit(X_train_flat[:subset], y_train[:subset])

# Test on a small subset

y_pred = model.predict(X_test_flat[:2000])

print("Accuracy:", accuracy_score(y_test[:2000], y_pred))

Output:

Accuracy: 0.87

That’s quite good for such a simple model and no feature engineering, however deep neural networks would push this above 98–99% accuracy.

Understanding Greyscale vs. RGB vs. RGBA Images

When working with image data in machine learning, you’ll often encounter different colour formats:

Greyscale
RGB
RGBA

Each format stores pixel information differently — and this directly affects the shape of the NumPy array that represents the image.

Greyscale Images

A grayscale image contains only shades of grey — from black to white.
Each pixel has one value, usually between 0 and 255:

0 = black
255 = white

Because there’s only one value per pixel, a grayscale image is stored as a 2D array:

Height - Number of rows (pixels vertically)
Width - Number of columns (pixels horizontally)

So in the digit examples above that were greyscale, each image has shape (28, 28) — just 2 dimensions.

RGB Images

An RGB image has three colour channels:

R = Red intensity
G = Green intensity
B = Blue intensity

Each pixel stores 3 values, one for each colour. These are combined to form the final colour seen on screen.

So an RGB image is stored as a 3D array:

Height - Pixel rows
Width - Pixel columns
Channels - Colour components (3 = R, G, B)

You can get a greyscale image showing a channels colour intensity by just selecting that array element:

red_channel = arr[:, :, 0]

blue_channel = arr[:, :, 1]

green_channel = arr[:, :, 2]

RGBA Images

RGBA adds a fourth channel:

A = Alpha, representing transparency (opacity).
- 0 = fully transparent
- 255 = fully opaque

So RGBA images are also 3D arrays, but with 4 channels.

Each pixel is now [R, G, B, A]. Often in machine learning the fourth alpha channel is dropped before training.

How It Affects Deep Learning Models

When feeding image data into a neural network, the input shape must match the number of channels.

If you feed greyscale images into a network expecting RGB, it will error out because (28, 28, 1) and (28, 28, 3) are fundamentally different array shapes.

You can fix that by:

# Add an extra channel dimension for grayscale

x_train = x_train.reshape(-1, 28, 28, 1)

This makes it compatible with CNNs that expect a 4D input: (num_images, height, width, channels).