Categorical Data

Pandas Basics

1 min read

Published Sep 29 2025, updated Sep 30 2025


20
0
0
0

PandasPython

What is Categorical Data?

  • A Pandas data type that represents discrete values with limited categories.
  • Examples:
    • Gender: "Male", "Female"
    • Size: "S", "M", "L", "XL"
    • Days of week: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"




Why Use Categorical Data?

  • Memory efficiency: stores values as integers under the hood, not full strings.
  • Performance: comparisons and groupby operations are faster.
  • Explicit ordering: you can specify an order (Small < Medium < Large).





Creating Categorical Data

From a Series:

import pandas as pd

sizes = pd.Series(["S", "M", "L", "S", "M"])
cat_sizes = sizes.astype("category")
print(cat_sizes)

Output:

0 S
1 M
2 L
3 S
4 M
dtype: category
Categories (3, object): ['L', 'M', 'S']


Categories and Order:

sizes = pd.Series(["S", "M", "L", "S", "M"])
cat_sizes = pd.Categorical(sizes, categories=["S", "M", "L"], ordered=True)
print(cat_sizes)

Now "S" < "M" < "L" is respected in comparisons.






Comparisons

cat_sizes = pd.Categorical(["S", "M", "L"], categories=["S","M","L"], ordered=True)

# True (S < L)
print(cat_sizes.codes[0] < cat_sizes.codes[2])

Need to be ordered for the above to work.






Common Operations

Check Categories:

cat_sizes.categories

Index(['S', 'M', 'L'], dtype='object')



Change Categories:

cat_sizes = cat_sizes.rename_categories(["Small", "Medium", "Large"])


Reorder Categories:

cat_sizes = cat_sizes.reorder_categories(["Large", "Medium", "Small"], ordered=True)


Add or Remove Categories:

cat_sizes = cat_sizes.add_categories(["XL"])
cat_sizes = cat_sizes.remove_categories(["Small"])





Integration with DataFrames

df = pd.DataFrame({
    "Size": ["S", "M", "L", "S", "M"]
})
df["Size"] = df["Size"].astype("category")


.cat methods:

  • .cat.categories → list categories
  • .cat.codes → underlying integer codes
  • .cat.set_categories([...]) → change categories
  • .cat.remove_unused_categories() → drop unused ones

Example:

df["Size"].cat.codes

Output:

0 0
1 1
2 2
3 0
4 1
dtype: int8

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact