GroupBy
Pandas Basics
1 min read
This section is 1 min read, full guide is 30 min read
Published Sep 29 2025, updated Oct 24 2025
21
Show sections list
0
Log in to enable the "Like" button
0
Guide comments
0
Log in to enable the "Save" button
Respond to this guide
Guide Sections
Guide Comments
PandasPython
What is groupby?
- A method to split data into groups based on some key(s).
- Perform aggregations, transformations, or custom functions on each group.
- Combine results into a new DataFrame or Series.
Basic Syntax
df.groupby(by)[column].operation()
Copy to Clipboard
by→ column(s) or index level(s) to group on.operation→ aggregation (sum,mean,count, etc.), transformation, or apply.
Aggregation
Common built-ins:
.mean()→ average per group.sum()→ sum per group.count()→ count non-NA entries per group.size()→ group sizes (including NAs).max()/.min()→ extrema per group.median(),.std(),.var()
Example:
import pandas as pd
data = {
"Department": ["Sales", "Sales", "IT", "IT", "HR"],
"Salary": [50000, 60000, 70000, 80000, 45000]
}
df = pd.DataFrame(data)
df.groupby("Department")["Salary"].mean()
Copy to Clipboard
Output:
Department
HR 45000
IT 75000
Sales 55000
Name: Salary, dtype: int64
Copy to Clipboard
Multiple Aggregations
Use .agg() for flexibility:
df.groupby("Department")["Salary"].agg(["mean", "sum", "max"])
Copy to Clipboard
Or custom names:
df.groupby("Department")["Salary"].agg(
avg_salary="mean", total_salary="sum"
)
Copy to Clipboard
Grouping by Multiple Columns
df.groupby(["Department", "Salary"]).size()
Copy to Clipboard
This produces a MultiIndex.
Transformations
Unlike aggregation (reduces groups), transformations return the same shape as the original DataFrame.
Example: normalise salaries within departments:
df["Salary_normalized"] = df.groupby("Department")["Salary"].transform(lambda x: x / x.mean())
Copy to Clipboard
Filtering Groups
Use .filter() to drop groups based on a condition:
df.groupby("Department").filter(lambda x: x["Salary"].mean() > 60000)
Copy to Clipboard
Keeps only departments with an average salary above 60,000.
Iterating Over Groups
for name, group in df.groupby("Department"):
print(name)
print(group)
Copy to Clipboard
Returns multiple DataFrames:
name→ the group key (value of the column you grouped by).group→ a DataFrame containing only the rows that belong to that group.
If you group by multiple columns, name becomes a tuple of keys.














