Create your own transformer

Feature-engine, a Python library for feature engineering

1 min read

Published Oct 3 2025

Feature EngineeringFeature-engineMachine LearningPandasPythonscikit-learnTransformers

Sometimes there will be situations where a transformer isn't available for a task you want to apply to the data. It is very straightforward to implement your own transformer by inheriting from the base classes.

Example where a fit, method is required to learn a parameter:

from sklearn.base import BaseEstimator, TransformerMixin

# Define three methods for the class: _init_, fit and transform

# The fit_transform() will be inherited since it is using BaseEstimator and TransformerMixin

# Define the transformer, and inherit the base classes

class MyCustomTransformerForMaxImputation(BaseEstimator, TransformerMixin):

# Here, you define the variables you need to parse when you initialise the class

def __init__(self, variables):

# Make sure the variables will be a list, even if only one element

if not isinstance(variables, list):

self.variables = [variables]

else: self.variables = variables

# Carry out the learning from the data here, in this case, the max value

def fit(self, X, y=None):

# We want to keep the max value in a dictionary

self.imputer_dict_ = {}

# loop over each variable, calculate the max and save it in the dictionary.

for feature in self.variables:

self.imputer_dict_[feature] = X[feature].max()

return self

# Transform the variables based on what you learned in the .fit()

def transform(self, X):

# loop over the variables and .fillna() in a given feature based on the max of a given feature

for feature in self.variables:

X[feature].fillna(self.imputer_dict_[feature], inplace=True)

return X

This example stores the max value of a column, and then when it transforms, it applies that learnt max value to all the missing values.

Example where a fit, method is not required to learn a parameter:

from sklearn.base import BaseEstimator, TransformerMixin

class ConvertTitleCase(BaseEstimator, TransformerMixin):

def __init__(self, variables):

if not isinstance(variables, list):

self.variables = [variables]

else: self.variables = variables

# The fit method is just there to ensure compatibility with sklearn pipelines

def fit(self, X, y=None):

return self

# The transform method where the actual transformation takes place

def transform(self, X):

for feature in self.variables:

if X[feature].dtype == 'object':

X[feature] = X[feature].apply(lambda x: x.title())

else:

print(f"Warning: {feature} data type should be object to use ConvertTitleCase()")

return X

This example still implements the fit() method so it is compatible with scikit-learn, however it just returns self and doesn't do any action. The transform function changes any text to title case, to make the first character of each word a capital letter.