Image Analysis (Computer Vision Basics)
Machine Learning Fundamentals with Python
5 min read
Published Nov 16 2025
Guide Sections
Guide Comments
Image analysis (or computer vision) focuses on teaching computers to understand and interpret images.
Examples include:
- Detecting faces in photos
- Classifying handwritten digits (MNIST)
- Sorting medical images into healthy/unhealthy categories
- Grouping similar images automatically
Machine learning treats each image as a set of numeric values — pixel intensities — that can be analysed, visualised, and used as model inputs.
Example using MNIST Dataset
MNIST (Modified National Institute of Standards and Technology) is a dataset of 70,000 greyscale images of handwritten digits — each 28×28 pixels.
Each image is labeled with the digit it represents (0–9).
Loading the Dataset
You can load it directly from TensorFlow — no downloading or folders needed.
Output:
Each image is a 28×28 pixel grayscale image stored as a 2D NumPy array.
Visualising Sample Images
Let’s look at some random samples.

Explanation:
np.where(y_train == digit)[0]→ finds all image indices for a given digit.np.random.choice(..., 10, replace=False)→ randomly picks 10 unique examples for that digit.plt.subplot(10, 10, ...)→ arranges images into a 10×10 grid.- We multiply
digit * num_examplesto correctly position each group of images by row. - Each row corresponds to one digit (0 through 9).
- Each column shows a random handwriting variation of that digit.
- It gives you an instant sense of:
- How consistent (or variable) each digit looks
- Which digits may be hard for a model to distinguish (e.g., 3 vs 8, 4 vs 9)
Checking Pixel Values
Each pixel value represents brightness — 0 (black) to 255 (white).
Outputs:
Explanation:
- Normalising pixel values (0–1 range) helps models train faster and more reliably.
- This step is essential for deep learning models later.
Visualising the Distribution of Labels
It’s always good to check how balanced your dataset is.

Observation:
Each digit appears roughly the same number of times — a well-balanced dataset.
Computing Average Images per Label
Let’s calculate the average image for each digit (0–9).
This gives you a sense of what the “typical” example of each class looks like.

Explanation:
- Each averaged image shows a “blurred” outline of the most common way that digit is written.
- For example, the average ‘1’ will be a faint vertical line, while the average ‘0’ will be a ring.
Comparing Two Average Images
Let’s see how two digits differ visually — e.g., comparing 3 vs 8.

Explanation:
- Bright (red/yellow) areas in the difference plot show where the digits differ most.
- For example, 8 has extra loops that 3 doesn’t — this shows up clearly.
Reshaping for Model Input
When using this data in models later:
- Traditional ML models (like KNN or SVM) need each image flattened into a 1D vector.
- Neural networks (CNNs) keep the 2D structure (28×28, or 28×28×1 for grayscale).
Output:
Quick Classifier Example
Just for demonstration, we will train a simple model (no deep learning) to see how well it can classify digits using flattened pixel data.
Output:
That’s quite good for such a simple model and no feature engineering, however deep neural networks would push this above 98–99% accuracy.
Understanding Greyscale vs. RGB vs. RGBA Images
When working with image data in machine learning, you’ll often encounter different colour formats:
- Greyscale
- RGB
- RGBA
Each format stores pixel information differently — and this directly affects the shape of the NumPy array that represents the image.
Greyscale Images
A grayscale image contains only shades of grey — from black to white.
Each pixel has one value, usually between 0 and 255:
0= black255= white
Because there’s only one value per pixel, a grayscale image is stored as a 2D array:
Height- Number of rows (pixels vertically)Width- Number of columns (pixels horizontally)
So in the digit examples above that were greyscale, each image has shape (28, 28) — just 2 dimensions.
RGB Images
An RGB image has three colour channels:
- R = Red intensity
- G = Green intensity
- B = Blue intensity
Each pixel stores 3 values, one for each colour. These are combined to form the final colour seen on screen.
So an RGB image is stored as a 3D array:
Height- Pixel rowsWidth- Pixel columnsChannels- Colour components (3 = R, G, B)
You can get a greyscale image showing a channels colour intensity by just selecting that array element:
RGBA Images
RGBA adds a fourth channel:
- A = Alpha, representing transparency (opacity).
- 0 = fully transparent
- 255 = fully opaque
So RGBA images are also 3D arrays, but with 4 channels.
Each pixel is now [R, G, B, A]. Often in machine learning the fourth alpha channel is dropped before training.
How It Affects Deep Learning Models
When feeding image data into a neural network, the input shape must match the number of channels.
If you feed greyscale images into a network expecting RGB, it will error out because (28, 28, 1) and (28, 28, 3) are fundamentally different array shapes.
You can fix that by:
This makes it compatible with CNNs that expect a 4D input: (num_images, height, width, channels).
Converting Between Formats
You can convert between grayscale and RGB using Pillow:
Or with NumPy:














