Drop Features & Smart Correlated Features
Feature-engine, a Python library for feature engineering
2 min read
This section is 2 min read, full guide is 23 min read
Published Oct 3 2025
10
Show sections list
0
Log in to enable the "Like" button
0
Guide comments
0
Log in to enable the "Save" button
Respond to this guide
Guide Sections
Guide Comments
Feature EngineeringFeature-engineMachine LearningPandasPythonscikit-learnTransformers
Drop Features
It drops a list of variables indicated by the user. Sometimes, we create new variables combining other variables in the dataset, for example, we obtain the variable age
by subtracting date_of_application
from date_of_birth
. After we obtained our new variable, we do not need the date variables in the dataset any more.
df = DropFeatures(features_to_drop = ['Col1', 'Col'])
Copy to Clipboard
Smart Correlated Features
This transformer finds fields that correlate with each other, then drops fields from each correlated group, based on a specified method. It retains fields with:
- the highest variance
- the highest cardinality
- the least missing data
- the best performing model (based on a single feature)
- the strongest correlation with the target variable
Correlation is calculated with pandas.corr()
.
Parameters:
method
: can take ‘pearson’, ‘spearman’, ‘kendall’ or callable. It refers to the correlation method to be used to identify the correlated features.threshold
: The correlation threshold above which a feature will be deemed correlated with another one and removed from the dataset.selection_method
: takes the values 'missing_values', 'cardinality', 'variance', 'model_performance', and 'corr_with_target'.missing_values
: keeps the feature from the correlated group with the least missing observations.cardinality
: keeps the feature from the correlated group with the highest cardinality.variance
: keeps the feature from the correlated group with the highest variance.model_performance
: trains a machine learning model using each of the features in a correlated group and retains the feature with the highest importance.-
corr_with_target
: keeps the feature from the correlated group that has the highest (absolute) correlation with the target variable. The same correlation method defined in themethod
parameter is used to calculate the correlation between the features and the target.
scs = SmartCorrelatedSelection(method="pearson", threshold=0.6, selection_method="variance")
Copy to Clipboard