Importing and Exporting Data

Pandas Basics

2 min read

Published Sep 29 2025, updated Sep 30 2025


20
0
0
0

PandasPython

Data can be imported in to DataFrame objects from various external sources, such as CSV files, JSON files and scraped from HTML page table records. There are also functions for exporting data to some file types. There are also various other import/export options, such as Excel, XML etc. that aren't included in this guide.




CSV Files

Pandas has pd.read_csv() function to handle importing data from a CSV file. The function has may optional parameters, making it quite flexible:

  • Read files using a local path, or a remote url.
  • For large files, you can use chunksize to read in chunks.
  • Automatically infer column data types, or you can override with dtype.
  • Read files with or without header rows.
  • Explicitly set column headers manually.
  • Change the delimiter character used to something other than a ,.
  • Only select specific columns and specify how many rows to import.
  • Specify a column to use as an index.
  • Full details of the options available can be found here.

Pandas has the df.to_csv() function to write DataFrame objects out to CSV files. Full details of options can be found here.


Import examples

Local CSV read:

data.csv

ID,Name,Age,City
1,Alice,25,New York
2,Bob,30,Los Angeles
3,Charlie,35,Chicago
4,David,40,Houston

df = pd.read_csv('data.csv')


Remote CSV read:

df = pd.read_csv('https://raw.githubusercontent.com/simplesteps-guide/pandas-basics/refs/heads/main/datafiles/data.csv')


CSV without headers:

data_no_header.csv

Alice,25,New York
Bob,30,Los Angeles
Charlie,35,Chicago
David,40,Houston

df = pd.read_csv('data_no_header.csv', header=None, names=['Name', 'Age', 'City'])


Read only selected columns:

data-extra.csv

ID,Name,Age,City,Country,Salary
1,Alice,25,New York,USA,50000
2,Bob,30,Los Angeles,USA,60000
3,Charlie,35,Chicago,USA,55000
4,David,40,Houston,USA,65000

df = pd.read_csv('data-extra.csv', usecols=['Name','Age'])


Use a column as index:

df = pd.read_csv('data-extra.csv', index_col='ID')


Parse dates:

sales.csv

OrderID,OrderDate,Customer,Amount
1001,2025-09-01,Alice,250
1002,2025-09-02,Bob,300
1003,2025-09-03,Charlie,150
1004,2025-09-04,David,400

df = pd.read_csv('sales.csv', parse_dates=['OrderDate'])


Handle missing values:

data2.cs

ID,Name,Age,City
1,Alice,25,New York
2,Bob,,Los Angeles
3,,35,Chicago
4,David,40,

df = pd.read_csv('data2.csv', na_values=['', 'NA'])


Export examples

Basic export:

df.to_csv("output.csv")

Exports DataFrame with index and headers.



Exclude index:

df.to_csv("output_no_index.csv", index=False)


Change delimiter:

df.to_csv("output_semicolon.csv", sep=';')


Select only certain columns:

df.to_csv("output_subset.csv", columns=['Name', 'Age'], index=False)


Export to string instead of file:

csv_string = df.to_csv(index=False)
print(csv_string)

Output:

Name,Age,City
Alice,25,New York
Bob,30,Los Angeles
Charlie,35,Chicago
David,40,Houston






JSON Files

The function pd.read_json() loads JSON data into a Pandas DataFrame (or Series, depending on structure). df..to_json() is used to export a DataFrame to a json file.


Import examples

Records (list of dicts):

[
  {"id": 1, "name": "Alice", "age": 25},
  {"id": 2, "name": "Bob", "age": 30}
]

df = pd.read_json("data.json", orient="records")


Index-based:

{
  "1": {"name": "Alice", "age": 25},
  "2": {"name": "Bob", "age": 30}
}

df = pd.read_json("data2.json", orient="index")


Columns-based:

{
  "name": {"0": "Alice", "1": "Bob"},
  "age": {"0": 25, "1": 30}
}

df = pd.read_json("data3.json", orient="columns")


Line-delimited JSON (NDJSON):

{"id":1,"name":"Alice","age":25}
{"id":2,"name":"Bob","age":30}

df = pd.read_json("data4.json", lines=True)



Export examples

Export to JSON string:

json_str = df.to_json(orient="records", indent=2)
print(json_str)

[
  {
    "id": 1,
    "name": "Alice",
    "age": 25
  },
  {
    "id": 2,
    "name": "Bob",
    "age": 30
  }
]


Export to JSON file:

df.to_json("output.json", orient="records", indent=2)


Different orient options for the following data:

import pandas as pd
df = pd.DataFrame({
    "id": [1, 2],
    "name": ["Alice", "Bob"],
    "age": [25, 30]
})

orient="records"

[{"id":1,"name":"Alice","age":25},{"id":2,"name":"Bob","age":30}]

orient="index"

{"0":{"id":1,"name":"Alice","age":25},"1":{"id":2,"name":"Bob","age":30}}

orient="columns"

{"id":{"0":1,"1":2},"name":{"0":"Alice","1":"Bob"},"age":{"0":25,"1":30}}


orient="split"

{"columns":["id","name","age"],
 "index":[0,1],
 "data":[[1,"Alice",25],[2,"Bob",30]]}

orient="values"

[[1,"Alice",25],[2,"Bob",30]]

orient="table"

{"schema":{"fields":[{"name":"id"},{"name":"name"},{"name":"age"}],
           "primaryKey":["id"],
           "pandas_version":"1.4.0"},
 "data":[{"id":1,"name":"Alice","age":25},{"id":2,"name":"Bob","age":30}]}






HTML Files

Pandas has a function pd.read_html() that reads HTML tables from a webpage or local HTML file and converts them into Pandas DataFrames.

It returns a list of DataFrames, one for each HTML table that is found.


Examples:

Read tables from a URL:

dfs = pd.read_html('https://simplesteps.guide/guides/technology/machine-learning-ai/pandas-basics/appendix-sample-html-tables-for-import-section')
# Number of tables found
print(len(dfs))
# First table
df = dfs[0]


Read table from a local HTML file:

dfs = pd.read_html('local_file.html')
df = dfs[0]


Select table by matching text:

dfs = pd.read_html('local_file.html', match='Population')

Only tables containing the text 'Population' will be returned. Can can be a plain string or regex.



Using attrs to filter tables:

dfs = pd.read_html('example.html', attrs={"id": "mytable"})
df = dfs[0]

Only tables with id attribute of mytable will be returned.


Products from our shop

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet - Print at Home Designs

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Mouse Mat

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Travel Mug

Docker Cheat Sheet Mug

Docker Cheat Sheet Mug

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet - Print at Home Designs

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Mouse Mat

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Travel Mug

Vim Cheat Sheet Mug

Vim Cheat Sheet Mug

SimpleSteps.guide branded Travel Mug

SimpleSteps.guide branded Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript - Travel Mug

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Dark

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Embroidered T-Shirt - Light

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - White

Developer Excuse Javascript Mug - Black

Developer Excuse Javascript Mug - Black

SimpleSteps.guide branded stainless steel water bottle

SimpleSteps.guide branded stainless steel water bottle

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Light

Developer Excuse Javascript Hoodie - Dark

Developer Excuse Javascript Hoodie - Dark

© 2025 SimpleSteps.guide
AboutFAQPoliciesContact