Handling Missing Data

Feature-engine, a Python library for feature engineering

2 min read

Published Oct 3 2025


10
0
0
0

Feature EngineeringFeature-engineMachine LearningPandasPythonscikit-learnTransformers

Mean Median Imputer

It replaces missing data with the mean or median value of the variable. It works only with numerical variables.

imputer = MeanMedianImputer(imputation_method='median', variables=['Col1' , 'Col5'])

This is set to calculate the median to use for Col1 and Col5 missing values. After fit() method is ran, you can view what the learnt parameters are by calling imputer.imputer_dict_ which would display something like {'Col1': 3.0, 'Col5': 2.1}






Arbitrary Number

It replaces missing data in numerical variables with an arbitrary number determined by the user.

imputer = ArbitraryNumberImputer(arbitrary_number=200, variables=['Col9'])

Will set all missing values to a hard coded 200 in Col9. After fit() method is ran, you can view what the learnt parameters are by calling imputer.imputer_dict_ which would display {'Col9': 200}






Categorical Imputer

It replaces missing data in categorical variables by an arbitrary value (typically with the label 'missing') or by the most frequent category.


Arbitrary value example:

imputer = CategoricalImputer(imputation_method='missing',fill_value='Unkown',variables=['Department', 'Grade'])

Will fill any missing departments or grades with an 'Unkown' value.


Most frequent example:

imputer = CategoricalImputer(imputation_method='frequent', variables=['Country'])

Will fill any missing countries with whatever country is most frequent in the rest of the populated data.






Drop Missing Data

It deletes rows with missing values, similar to pd.drop_na(). It can handle numerical and categorical variables.

output = DropMissingData()

Drops all rows with missing values. Columns can be specified with the variables parameter, like in other examples above, or thresholds added to filter which rows are dropped.


Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact