Categorical Data
Pandas Basics
1 min read
This section is 1 min read, full guide is 29 min read
Published Sep 29 2025, updated Sep 30 2025
20
Show sections list
0
Log in to enable the "Like" button
0
Guide comments
0
Log in to enable the "Save" button
Respond to this guide
Guide Sections
Guide Comments
PandasPython
What is Categorical Data?
- A Pandas data type that represents discrete values with limited categories.
- Examples:
- Gender:
"Male"
,"Female"
- Size:
"S"
,"M"
,"L"
,"XL"
- Days of week:
"Mon"
,"Tue"
,"Wed"
,"Thu"
,"Fri"
,"Sat"
,"Sun"
- Gender:
Why Use Categorical Data?
- Memory efficiency: stores values as integers under the hood, not full strings.
- Performance: comparisons and groupby operations are faster.
- Explicit ordering: you can specify an order (
Small < Medium < Large
).
Creating Categorical Data
From a Series:
import pandas as pd
sizes = pd.Series(["S", "M", "L", "S", "M"])
cat_sizes = sizes.astype("category")
print(cat_sizes)
Copy to Clipboard
Output:
0 S
1 M
2 L
3 S
4 M
dtype: category
Categories (3, object): ['L', 'M', 'S']
Copy to Clipboard
Categories and Order:
sizes = pd.Series(["S", "M", "L", "S", "M"])
cat_sizes = pd.Categorical(sizes, categories=["S", "M", "L"], ordered=True)
print(cat_sizes)
Copy to Clipboard
Now "S" < "M" < "L"
is respected in comparisons.
Comparisons
cat_sizes = pd.Categorical(["S", "M", "L"], categories=["S","M","L"], ordered=True)
# True (S < L)
print(cat_sizes.codes[0] < cat_sizes.codes[2])
Copy to Clipboard
Toggle show comments
Need to be ordered for the above to work.
Common Operations
Check Categories:
cat_sizes.categories
Copy to Clipboard
Index(['S', 'M', 'L'], dtype='object')
Change Categories:
cat_sizes = cat_sizes.rename_categories(["Small", "Medium", "Large"])
Copy to Clipboard
Reorder Categories:
cat_sizes = cat_sizes.reorder_categories(["Large", "Medium", "Small"], ordered=True)
Copy to Clipboard
Add or Remove Categories:
cat_sizes = cat_sizes.add_categories(["XL"])
cat_sizes = cat_sizes.remove_categories(["Small"])
Copy to Clipboard
Integration with DataFrames
df = pd.DataFrame({
"Size": ["S", "M", "L", "S", "M"]
})
df["Size"] = df["Size"].astype("category")
Copy to Clipboard
.cat
methods:
.cat.categories
→ list categories.cat.codes
→ underlying integer codes.cat.set_categories([...])
→ change categories.cat.remove_unused_categories()
→ drop unused ones
Example:
df["Size"].cat.codes
Copy to Clipboard
Output:
0 0
1 1
2 2
3 0
4 1
dtype: int8
Copy to Clipboard