Image Analysis (Computer Vision Basics)

Machine Learning Fundamentals with Python

5 min read

Published Nov 16 2025


10
0
0
0

ClusteringImagesK-MeansLinear RegressionLogistic RegressionMachine LearningNeural NetworksNLPNumPyPythonRandom Forestsscikit-learnSupervised LearningUnsupervised Learning

Image analysis (or computer vision) focuses on teaching computers to understand and interpret images.


Examples include:

  • Detecting faces in photos
  • Classifying handwritten digits (MNIST)
  • Sorting medical images into healthy/unhealthy categories
  • Grouping similar images automatically

Machine learning treats each image as a set of numeric values — pixel intensities — that can be analysed, visualised, and used as model inputs.



Example using MNIST Dataset

MNIST (Modified National Institute of Standards and Technology) is a dataset of 70,000 greyscale images of handwritten digits — each 28×28 pixels.

Each image is labeled with the digit it represents (0–9).



Loading the Dataset

You can load it directly from TensorFlow — no downloading or folders needed.

from tensorflow.keras.datasets import mnist

# Load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

print("Training set:", x_train.shape, "Labels:", y_train.shape)
print("Test set:", x_test.shape, "Labels:", y_test.shape)

Output:

Training set: (60000, 28, 28) Labels: (60000,)
Test set: (10000, 28, 28) Labels: (10000,)

Each image is a 28×28 pixel grayscale image stored as a 2D NumPy array.



Visualising Sample Images

Let’s look at some random samples.

import matplotlib.pyplot as plt
import numpy as np

# Set up the grid: 10 rows (digits 0–9) × 10 columns (examples)
num_classes = 10
num_examples = 10

plt.figure(figsize=(10, 10))

for digit in range(num_classes):
    # Get all indices for this digit
    indices = np.where(y_train == digit)[0]
    
    # Randomly choose 10 examples for this digit
    selected = np.random.choice(indices, num_examples, replace=False)
    
    for i, idx in enumerate(selected):
        plt.subplot(num_classes, num_examples, digit * num_examples + i + 1)
        plt.imshow(x_train[idx], cmap='gray')
        plt.axis("off")

plt.suptitle("MNIST: 10 Random Examples per Digit", fontsize=16)
plt.tight_layout()
plt.show()

machine learning fundamentals images analysis random all digits

Explanation:

  • np.where(y_train == digit)[0] → finds all image indices for a given digit.
  • np.random.choice(..., 10, replace=False) → randomly picks 10 unique examples for that digit.
  • plt.subplot(10, 10, ...) → arranges images into a 10×10 grid.
  • We multiply digit * num_examples to correctly position each group of images by row.
  • Each row corresponds to one digit (0 through 9).
  • Each column shows a random handwriting variation of that digit.
  • It gives you an instant sense of:
    • How consistent (or variable) each digit looks
    • Which digits may be hard for a model to distinguish (e.g., 3 vs 8, 4 vs 9)


Checking Pixel Values

Each pixel value represents brightness — 0 (black) to 255 (white).

print("Pixel value range:", x_train.min(), "to", x_train.max())

# Normalise pixel values for better model training later
x_train_norm = x_train / 255.0
x_test_norm = x_test / 255.0

print("After normalisation:", x_train_norm.min(), "to", x_train_norm.max())

Outputs:

Pixel value range: 0 to 255
After normalisation: 0.0 to 1.0

Explanation:

  • Normalising pixel values (0–1 range) helps models train faster and more reliably.
  • This step is essential for deep learning models later.


Visualising the Distribution of Labels

It’s always good to check how balanced your dataset is.

import seaborn as sns

sns.countplot(x=y_train)
plt.title("Distribution of Digits in Training Set")
plt.show()

machine learning fundamentals images analysis digit distribution

Observation:
Each digit appears roughly the same number of times — a well-balanced dataset.



Computing Average Images per Label

Let’s calculate the average image for each digit (0–9).
This gives you a sense of what the “typical” example of each class looks like.

avg_images = []
for digit in range(10):
    avg_image = np.mean(x_train[y_train == digit], axis=0)
    avg_images.append(avg_image)

# Visualise all average digits
plt.figure(figsize=(10, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(avg_images[i], cmap='gray')
    plt.title(f"Avg: {i}")
    plt.axis("off")
plt.tight_layout()
plt.show()

machine learning fundamentals images analysis average digits

Explanation:

  • Each averaged image shows a “blurred” outline of the most common way that digit is written.
  • For example, the average ‘1’ will be a faint vertical line, while the average ‘0’ will be a ring.


Comparing Two Average Images

Let’s see how two digits differ visually — e.g., comparing 3 vs 8.

digit_a, digit_b = 3, 8

avg_a = avg_images[digit_a]
avg_b = avg_images[digit_b]

# Compute pixel-wise absolute difference
diff = np.abs(avg_a - avg_b)

plt.figure(figsize=(9,3))
plt.subplot(1,3,1)
plt.imshow(avg_a, cmap='gray')
plt.title(f"Average {digit_a}")
plt.axis("off")

plt.subplot(1,3,2)
plt.imshow(avg_b, cmap='gray')
plt.title(f"Average {digit_b}")
plt.axis("off")

plt.subplot(1,3,3)
plt.imshow(diff, cmap='hot')
plt.title("Difference (3 vs 8)")
plt.axis("off")
plt.tight_layout()
plt.show()

machine learning fundamentals images analysis compare average digits

Explanation:

  • Bright (red/yellow) areas in the difference plot show where the digits differ most.
  • For example, 8 has extra loops that 3 doesn’t — this shows up clearly.


Reshaping for Model Input

When using this data in models later:

  • Traditional ML models (like KNN or SVM) need each image flattened into a 1D vector.
  • Neural networks (CNNs) keep the 2D structure (28×28, or 28×28×1 for grayscale).
# Flatten for non-CNN models
X_train_flat = x_train_norm.reshape(len(x_train_norm), -1)
X_test_flat = x_test_norm.reshape(len(x_test_norm), -1)

print("Flattened shape:", X_train_flat.shape)

Output:

Flattened shape: (60000, 784)



Quick Classifier Example

Just for demonstration, we will train a simple model (no deep learning) to see how well it can classify digits using flattened pixel data.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Train on a small subset for speed
subset = 10000
model = LogisticRegression(max_iter=1000)
model.fit(X_train_flat[:subset], y_train[:subset])

# Test on a small subset
y_pred = model.predict(X_test_flat[:2000])
print("Accuracy:", accuracy_score(y_test[:2000], y_pred))

Output:

Accuracy: 0.87

That’s quite good for such a simple model and no feature engineering, however deep neural networks would push this above 98–99% accuracy.






Understanding Greyscale vs. RGB vs. RGBA Images

When working with image data in machine learning, you’ll often encounter different colour formats:

  • Greyscale
  • RGB
  • RGBA

Each format stores pixel information differently — and this directly affects the shape of the NumPy array that represents the image.



Greyscale Images

A grayscale image contains only shades of grey — from black to white.
Each pixel has one value, usually between 0 and 255:

  • 0 = black
  • 255 = white

Because there’s only one value per pixel, a grayscale image is stored as a 2D array:

  • Height - Number of rows (pixels vertically)
  • Width - Number of columns (pixels horizontally)

So in the digit examples above that were greyscale, each image has shape (28, 28) — just 2 dimensions.



RGB Images

An RGB image has three colour channels:

  • R = Red intensity
  • G = Green intensity
  • B = Blue intensity

Each pixel stores 3 values, one for each colour. These are combined to form the final colour seen on screen.

So an RGB image is stored as a 3D array:

  • Height - Pixel rows
  • Width - Pixel columns
  • Channels - Colour components (3 = R, G, B)

You can get a greyscale image showing a channels colour intensity by just selecting that array element:

red_channel = arr[:, :, 0]
blue_channel = arr[:, :, 1]
green_channel = arr[:, :, 2]


RGBA Images

RGBA adds a fourth channel:

  • A = Alpha, representing transparency (opacity).
    • 0 = fully transparent
    • 255 = fully opaque

So RGBA images are also 3D arrays, but with 4 channels.

Each pixel is now [R, G, B, A]. Often in machine learning the fourth alpha channel is dropped before training.



How It Affects Deep Learning Models

When feeding image data into a neural network, the input shape must match the number of channels.

If you feed greyscale images into a network expecting RGB, it will error out because (28, 28, 1) and (28, 28, 3) are fundamentally different array shapes.

You can fix that by:

# Add an extra channel dimension for grayscale
x_train = x_train.reshape(-1, 28, 28, 1)

This makes it compatible with CNNs that expect a 4D input: (num_images, height, width, channels).



Converting Between Formats

You can convert between grayscale and RGB using Pillow:

from PIL import Image
# Convert to greyscale
img_gray = img.convert("L")
# Convert back to RGB (3 channels)
img_rgb = img_gray.convert("RGB")

Or with NumPy:

# Expand greyscale (H, W) → (H, W, 3) by copying the channel
rgb_from_gray = np.repeat(x_train[0][:, :, np.newaxis], 3, axis=2)

print(rgb_from_gray.shape)
 

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact