Drop Features & Smart Correlated Features

Feature-engine, a Python library for feature engineering

2 min read

Published Oct 3 2025


10
0
0
0

Feature EngineeringFeature-engineMachine LearningPandasPythonscikit-learnTransformers

Drop Features

It drops a list of variables indicated by the user. Sometimes, we create new variables combining other variables in the dataset, for example, we obtain the variable age by subtracting date_of_application from date_of_birth. After we obtained our new variable, we do not need the date variables in the dataset any more.


df = DropFeatures(features_to_drop = ['Col1', 'Col'])





Smart Correlated Features

This transformer finds fields that correlate with each other, then drops fields from each correlated group, based on a specified method. It retains fields with:

  • the highest variance
  • the highest cardinality
  • the least missing data
  • the best performing model (based on a single feature)
  • the strongest correlation with the target variable

Correlation is calculated with pandas.corr().


Parameters:

  • method : can take ‘pearson’, ‘spearman’, ‘kendall’ or callable. It refers to the correlation method to be used to identify the correlated features.
  • threshold : The correlation threshold above which a feature will be deemed correlated with another one and removed from the dataset.
  • selection_method : takes the values 'missing_values', 'cardinality', 'variance', 'model_performance', and 'corr_with_target'.
    • missing_values : keeps the feature from the correlated group with the least missing observations.
    • cardinality : keeps the feature from the correlated group with the highest cardinality.
    • variance : keeps the feature from the correlated group with the highest variance.
    • model_performance : trains a machine learning model using each of the features in a correlated group and retains the feature with the highest importance.
    • corr_with_target : keeps the feature from the correlated group that has the highest (absolute) correlation with the target variable. The same correlation method defined in the method parameter is used to calculate the correlation between the features and the target.
scs = SmartCorrelatedSelection(method="pearson", threshold=0.6, selection_method="variance")

Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact