Importing and Exporting Data
Pandas Basics
2 min read
Published Sep 29 2025, updated Sep 30 2025
Guide Sections
Guide Comments
Data can be imported in to DataFrame objects from various external sources, such as CSV
files, JSON
files and scraped from HTML
page table records. There are also functions for exporting data to some file types. There are also various other import/export options, such as Excel
, XML
etc. that aren't included in this guide.
CSV Files
Pandas has pd.read_csv()
function to handle importing data from a CSV
file. The function has may optional parameters, making it quite flexible:
- Read files using a local
path
, or a remoteurl
. - For large files, you can use
chunksize
to read in chunks. - Automatically infer column data types, or you can override with
dtype
. - Read files with or without header rows.
- Explicitly set column headers manually.
- Change the delimiter character used to something other than a
,
. - Only select specific columns and specify how many rows to import.
- Specify a column to use as an index.
- Full details of the options available can be found here.
Pandas has the df.to_csv()
function to write DataFrame objects out to CSV
files. Full details of options can be found here.
Import examples
Local CSV read:
data.csv
Remote CSV read:
CSV without headers:
data_no_header.csv
Read only selected columns:
data-extra.csv
Use a column as index:
Parse dates:
sales.csv
Handle missing values:
data2.cs
Export examples
Basic export:
Exports DataFrame with index and headers.
Exclude index:
Change delimiter:
Select only certain columns:
Export to string instead of file:
Output:
JSON Files
The function pd.read_json()
loads JSON data into a Pandas DataFrame (or Series, depending on structure). df..to_json()
is used to export a DataFrame to a json file.
Import examples
Records (list of dicts):
Index-based:
Columns-based:
Line-delimited JSON (NDJSON):
Export examples
Export to JSON string:
Export to JSON file:
Different orient
options for the following data:
orient="records"
orient="index"
orient="columns"
orient="split"
orient="values"
orient="table"
HTML Files
Pandas has a function pd.read_html()
that reads HTML tables from a webpage or local HTML file and converts them into Pandas DataFrames.
It returns a list of DataFrames, one for each HTML table that is found.
Examples:
Read tables from a URL:
Read table from a local HTML file:
Select table by matching text:
Only tables containing the text 'Population'
will be returned. Can can be a plain string or regex.
Using attrs
to filter tables:
Only tables with id
attribute of mytable
will be returned.