Categorical Data
Pandas Basics
1 min read
This section is 1 min read, full guide is 30 min read
Published Sep 29 2025, updated Oct 24 2025
21
Show sections list
0
Log in to enable the "Like" button
0
Guide comments
0
Log in to enable the "Save" button
Respond to this guide
Guide Sections
Guide Comments
PandasPython
What is Categorical Data?
- A Pandas data type that represents discrete values with limited categories.
- Examples:
- Gender:
"Male","Female" - Size:
"S","M","L","XL" - Days of week:
"Mon","Tue","Wed","Thu","Fri","Sat","Sun"
- Gender:
Why Use Categorical Data?
- Memory efficiency: stores values as integers under the hood, not full strings.
- Performance: comparisons and groupby operations are faster.
- Explicit ordering: you can specify an order (
Small < Medium < Large).
Creating Categorical Data
From a Series:
import pandas as pd
sizes = pd.Series(["S", "M", "L", "S", "M"])
cat_sizes = sizes.astype("category")
print(cat_sizes)
Copy to Clipboard
Output:
0 S
1 M
2 L
3 S
4 M
dtype: category
Categories (3, object): ['L', 'M', 'S']
Copy to Clipboard
Categories and Order:
sizes = pd.Series(["S", "M", "L", "S", "M"])
cat_sizes = pd.Categorical(sizes, categories=["S", "M", "L"], ordered=True)
print(cat_sizes)
Copy to Clipboard
Now "S" < "M" < "L" is respected in comparisons.
Comparisons
cat_sizes = pd.Categorical(["S", "M", "L"], categories=["S","M","L"], ordered=True)
# True (S < L)
print(cat_sizes.codes[0] < cat_sizes.codes[2])
Copy to Clipboard
Toggle show comments
Need to be ordered for the above to work.
Common Operations
Check Categories:
cat_sizes.categories
Copy to Clipboard
Index(['S', 'M', 'L'], dtype='object')
Change Categories:
cat_sizes = cat_sizes.rename_categories(["Small", "Medium", "Large"])
Copy to Clipboard
Reorder Categories:
cat_sizes = cat_sizes.reorder_categories(["Large", "Medium", "Small"], ordered=True)
Copy to Clipboard
Add or Remove Categories:
cat_sizes = cat_sizes.add_categories(["XL"])
cat_sizes = cat_sizes.remove_categories(["Small"])
Copy to Clipboard
Integration with DataFrames
df = pd.DataFrame({
"Size": ["S", "M", "L", "S", "M"]
})
df["Size"] = df["Size"].astype("category")
Copy to Clipboard
.cat methods:
.cat.categories→ list categories.cat.codes→ underlying integer codes.cat.set_categories([...])→ change categories.cat.remove_unused_categories()→ drop unused ones
Example:
df["Size"].cat.codes
Copy to Clipboard
Output:
0 0
1 1
2 2
3 0
4 1
dtype: int8
Copy to Clipboard














