Create your own transformer
Feature-engine, a Python library for feature engineering
1 min read
This section is 1 min read, full guide is 23 min read
Published Oct 3 2025
10
Show sections list
0
Log in to enable the "Like" button
0
Guide comments
0
Log in to enable the "Save" button
Respond to this guide
Guide Sections
Guide Comments
Feature EngineeringFeature-engineMachine LearningPandasPythonscikit-learnTransformers
Sometimes there will be situations where a transformer isn't available for a task you want to apply to the data. It is very straightforward to implement your own transformer by inheriting from the base classes.
Example where a fit, method is required to learn a parameter:
from sklearn.base import BaseEstimator, TransformerMixin
# Define three methods for the class: _init_, fit and transform
# The fit_transform() will be inherited since it is using BaseEstimator and TransformerMixin
# Define the transformer, and inherit the base classes
class MyCustomTransformerForMaxImputation(BaseEstimator, TransformerMixin):
# Here, you define the variables you need to parse when you initialise the class
def __init__(self, variables):
# Make sure the variables will be a list, even if only one element
if not isinstance(variables, list):
self.variables = [variables]
else: self.variables = variables
# Carry out the learning from the data here, in this case, the max value
def fit(self, X, y=None):
# We want to keep the max value in a dictionary
self.imputer_dict_ = {}
# loop over each variable, calculate the max and save it in the dictionary.
for feature in self.variables:
self.imputer_dict_[feature] = X[feature].max()
return self
# Transform the variables based on what you learned in the .fit()
def transform(self, X):
# loop over the variables and .fillna() in a given feature based on the max of a given feature
for feature in self.variables:
X[feature].fillna(self.imputer_dict_[feature], inplace=True)
return X
Copy to Clipboard
Toggle show comments
This example stores the max value of a column, and then when it transforms, it applies that learnt max value to all the missing values.
Example where a fit, method is not required to learn a parameter:
from sklearn.base import BaseEstimator, TransformerMixin
class ConvertTitleCase(BaseEstimator, TransformerMixin):
def __init__(self, variables):
if not isinstance(variables, list):
self.variables = [variables]
else: self.variables = variables
# The fit method is just there to ensure compatibility with sklearn pipelines
def fit(self, X, y=None):
return self
# The transform method where the actual transformation takes place
def transform(self, X):
for feature in self.variables:
if X[feature].dtype == 'object':
X[feature] = X[feature].apply(lambda x: x.title())
else:
print(f"Warning: {feature} data type should be object to use ConvertTitleCase()")
return X
Copy to Clipboard
Toggle show comments
This example still implements the fit() method so it is compatible with scikit-learn, however it just returns self and doesn't do any action. The transform function changes any text to title case, to make the first character of each word a capital letter.