Categorical Variable Encoding

Feature-engine, a Python library for feature engineering

3 min read

Published Oct 3 2025


10
0
0
0

Feature EngineeringFeature-engineMachine LearningPandasPythonscikit-learnTransformers

One Hot Encoder

This technique replaces the categorical variable with a combination of binary variables (which takes value 0 or 1) where each new binary variable is related to a label from the categorical variable.

output = OneHotEncoder(variables=['olour'])

If Colour has 3 possible values of red, blue and white, it will create three new binary variables: Colour_red, Colour_blue and Colour_white, which will be given the value 0 or 1 to define which colour category it is.


This is called a redundant feature as you only actually need 2 binary variables to define the colour category:

  • Colour_red = 1 and Colour_blue = 0, meaning red
  • Colour_red = 0 and Colour_blue = 1, meaning blue
  • Colour_red = 0 and Colour_blue = 0, meaning white

You can add the parameter drop_last=True to remove the redundant binary variable to still record the colour category:

output = OneHotEncoder(variables=['Colour'], drop_last=True)

This removes the 3rd redundant Colour_white variable






Ordinal Encoder

This replaces categories with ordinal numbers, like 0, 1, 2, 3 etc. The encoding method can be set to ordered or arbitrary. When set to ordered, the categories are numbered in ascending order, based on the target mean value per category. When set to arbitrary, the categories are numbered arbitrarily. Ordered categories only work when your machine learning task contains a target such as regression or classification, it will fail if used with something like cluster.

output = OrdinalEncoder(encoding_method='arbitrary', variables=['region', 'sex'])

This will assign a number to each category. If you run output.encoder_dict_ then it will display what number has been assigned to what eg. {{'region': {'N': 0, 'S': 1, 'E': 2, 'W': 3},'sex': {'female': 0, 'male': 1}}






Rare Label Encoder

This encoder groups infrequent categories in a new category called 'Rare' (or other defined name).


There are some parameters to consider:

  • tol : The tolerance, to indicate the minimum proportion a category should have to be counted on its own. eg. 0.1 would mean if it is <= 10% of the most frequent categories then it will be grouped in the Rare bucket.
  • n_categories: The minimum number of defined categories that we require, this needs to be met before we start grouping things in to Rare.This parameter is useful when we have big datasets and do not have time to examine all categorical variables individually. This way, we ensure that variables with low cardinality are not reduced any further.
  • max_n_categories: Optionally the maximum number of distinct categories to have, so the lower frequencies all get grouped as Rare.
encoder = RareLabelEncoder(
    tol=0.1,
    n_categories=2,
    variables=['region', 'country'],
    replace_with='Other',
)

This sets any that are Rare as Other.


Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact