Data Exploration

Pandas Basics

2 min read

Published Sep 29 2025, updated Sep 30 2025


20
0
0
0

PandasPython

When you load a dataset into a Pandas DataFrame, the first step is often exploring and understanding the data. This involves checking data types, missing values, summary statistics, unique values, correlations etc., some of the things you can do are:

  • Combine methods: Use .info() + .describe() + .value_counts() to get a quick holistic view.
  • Visual inspection: Use .head() and .tail() frequently to catch formatting or entry errors.
  • Investigate anomalies: Outliers or unexpected categories are easier to spot with .value_counts() and .unique().
  • Correlations early: .corr() helps identify potential predictive relationships or redundant features.
  • Missing data: Always check .isna().sum() .




.info()

Shows a concise summary of the DataFrame, including:

  • Number of rows and columns
  • Column names
  • Non-null counts
  • Data types of each column
df.info()

Example Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 # Column Non-Null Count Dtype
--- ------ -------------- -----
 0 ID 100 non-null int64
 1 Name 100 non-null object
 2 Age 95 non-null float64
 3 City 100 non-null object

Quickly see missing values and data types.






.head() and .tail()

  • .head(n) shows the first n rows (default 5).
  • .tail(n) shows the last n rows.
df.head(5)
df.tail(5)





.shape

Returns (number of rows, number of columns).

df.shape
# Example output: (100, 4)





.dtypes

Shows data type of each column.

df.dtypes
# Example output:
# ID int64
# Name object
# Age float64
# City object





.columns

Returns a list of column names.

df.columns
# Output: Index(['ID', 'Name', 'Age', 'City'], dtype='object')





.unique()

Shows all unique values in a column.

df['City'].unique()
# Output: array(['New York', 'Los Angeles', 'Chicago', 'Houston'], dtype=object)





.nunique()

Counts the number of unique values.

df['City'].nunique()
# Output: 4





.value_counts()

Counts the frequency of each unique value.

df['City'].value_counts()
# Output:
# New York 30
# Los Angeles 25
# Chicago 25
# Houston 20





.describe()

Provides summary statistics for numeric columns by default:

  • Count, mean, standard deviation
  • Minimum and maximum values
  • Quartiles (25%, 50%, 75%)
df.describe()

Example output:

             ID Age Salary
count 5.00000 5.00000 5.000000
mean 3.00000 35.00000 71000.000000
std 1.58114 7.90569 15588.457268
min 1.00000 25.00000 50000.000000
25% 2.00000 30.00000 60000.000000
50% 3.00000 35.00000 75000.000000
75% 4.00000 40.00000 80000.000000
max 5.00000 45.00000 90000.000000

categorical columns:

df.describe(include="object")

Example output:

       Name City
count 5 5
unique 5 4
top Alice Chicago
freq 1 2

Summary of categorical (string) columns:

  • count = number of entries
  • unique = number of distinct values
  • top = most frequent value
  • freq = frequency of top value




.corr()

Provides a correlation matrix.

df.corr()


Example output:

              ID Age Salary
ID 1.000000 1.000000 0.986241
Age 1.000000 1.000000 0.986241
Salary 0.986241 0.986241 1.000000

Interpretation:

  • ID and Age are perfectly correlated here (since IDs increase with Age in this sample).
  • Salary also has a strong positive correlation with both (≈ 0.99).




.isna() or .isnull()

  • .isna() or .isnull() identifies missing values.
  • .sum() can count missing values per column.
df.isna().sum()

Example output:

ID 0
Name 0
Age 5
City 0





.sample(n)

randomly selects n rows.

df.sample(5)

Useful for quick inspection without printing the entire dataset.


Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact