Handle Outliers
Feature-engine, a Python library for feature engineering
2 min read
This section is 2 min read, full guide is 23 min read
Published Oct 3 2025
10
Show sections list
0
Log in to enable the "Like" button
0
Guide comments
0
Log in to enable the "Save" button
Respond to this guide
Guide Sections
Guide Comments
Feature EngineeringFeature-engineMachine LearningPandasPythonscikit-learnTransformers
These techniques aim to cap outliers based on a calculation or an arbitrary value. In addition, you may drop the outliers from the dataset.
Winsorizer
It caps the outliers as a continuous variable's maximum and/or minimum values. It calculates the capping values using specific methods.
Parameters:
capping_method
: method used to determine what is considered an outlier, possible values are: 'gaussian', 'iqr', 'mad' or 'quantiles'tail
: whether to look for outliers on the right, left or both tails of the distribution, possible values are: 'left, 'right or 'both'fold
: The multiplication factor, what is multiplied is determined by the capping_method. You can use 'auto' to use the default factor for the selected capping_method.
wiqr = Winsorizer(capping_method='iqr', fold=1.5, tail='both', variables=['bmi', 'charges'])
Copy to Clipboard
You can view the left and right tail caps by looking at wiqr.right_tail_caps_
or wiqr.left_tail_caps_
respectively.
Arbitrary Outlier Capper
It caps a variable's maximum or minimum values at an arbitrary value indicated by the user.
Parameters:
max_capping_dict
: A dictionary of the column names and the max values.min_capping_dict
: A dictionary of the column names and the min values.
aoc = ArbitraryOutlierCapper(max_capping_dict={'charges':20000 , 'bmi':40})
Copy to Clipboard
Outlier Trimmer
It removes observations with outliers from the data.
Parameters:
capping_method
: method used to determine what is considered an outlier, possible values are: 'gaussian', 'iqr', 'mad' or 'quantiles'tail
: whether to look for outliers on the right, left or both tails of the distribution, possible values are: 'left, 'right or 'both'fold
: The multiplication factor, what is multiplied is determined by the capping_method. You can use 'auto' to use the default factor for the selected capping_method.
ot = OutlierTrimmer(capping_method='iqr', fold=1.5, tail='both', variables=['bmi', 'charges'])
Copy to Clipboard