Handle Outliers

Feature-engine, a Python library for feature engineering

2 min read

Published Oct 3 2025


10
0
0
0

Feature EngineeringFeature-engineMachine LearningPandasPythonscikit-learnTransformers

These techniques aim to cap outliers based on a calculation or an arbitrary value. In addition, you may drop the outliers from the dataset.


Winsorizer

It caps the outliers as a continuous variable's maximum and/or minimum values. It calculates the capping values using specific methods.


Parameters:

  • capping_method : method used to determine what is considered an outlier, possible values are: 'gaussian', 'iqr', 'mad' or 'quantiles'
  • tail : whether to look for outliers on the right, left or both tails of the distribution, possible values are: 'left, 'right or 'both'
  • fold : The multiplication factor, what is multiplied is determined by the capping_method. You can use 'auto' to use the default factor for the selected capping_method.
wiqr = Winsorizer(capping_method='iqr', fold=1.5, tail='both', variables=['bmi', 'charges'])

You can view the left and right tail caps by looking at wiqr.right_tail_caps_ or wiqr.left_tail_caps_ respectively.






Arbitrary Outlier Capper

It caps a variable's maximum or minimum values at an arbitrary value indicated by the user.


Parameters:

  • max_capping_dict : A dictionary of the column names and the max values.
  • min_capping_dict : A dictionary of the column names and the min values.
aoc = ArbitraryOutlierCapper(max_capping_dict={'charges':20000 , 'bmi':40})





Outlier Trimmer

It removes observations with outliers from the data.


Parameters:

  • capping_method : method used to determine what is considered an outlier, possible values are: 'gaussian', 'iqr', 'mad' or 'quantiles'
  • tail : whether to look for outliers on the right, left or both tails of the distribution, possible values are: 'left, 'right or 'both'
  • fold : The multiplication factor, what is multiplied is determined by the capping_method. You can use 'auto' to use the default factor for the selected capping_method.
ot = OutlierTrimmer(capping_method='iqr', fold=1.5, tail='both', variables=['bmi', 'charges'])

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact