
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - IO Tools
The Pandas library offers powerful I/O tools (API) for data import and export, enabling seamless handling of various file formats like CSV, Excel, JSON, and many more. This API includes top-level reader functions like, pd.read_csv(), read_clipboard() and corresponding writer methods like, to_csv(), to_clipboard() for easy data handling.
In this tutorial, we will learn about the overview of the Pandas I/O tools and learn how to use them effectively.
Overview of Pandas IO Tools
The Pandas I/O API supports a wide variety of data formats. Here is a summary of supported formats and their corresponding reader and writer functions −
Among these, the most frequently used functions for handling text files are read_csv() and read_table(). Both convert flat files into DataFrame objects.
Example: Reading CSV Data
This example shows reading the CSV data using the pandas read_csv() function. In this example we are using the StringIO to load the CSV string into a Pandas DataFrame object.
import pandas as pd # Import StringIO to load a file-like object for reading CSV from io import StringIO # Create string representing CSV data data = """S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900""" # Use StringIO to convert the string data into a file-like object obj = StringIO(data) # read CSV into a Pandas DataFrame df = pd.read_csv(obj) print(df)
Its output is as follows −
S.No | Name | Age | City | Salary | |
---|---|---|---|---|---|
0 | 1 | Tom | 28 | Toronto | 20000 |
1 | 2 | Lee | 32 | HongKong | 3000 |
2 | 3 | Steven | 43 | Bay Area | 8300 |
3 | 4 | Ram | 38 | Hyderabad | 3900 |
Customizing Parsing Options
Pandas allows several customization options when parsing data. You can modify how the data is parsed using parameters like −
Index_col
dtype
names
skiprows
Below we will discuss about the common parsing options for customization.
Customizing the index
You can customize the row labels or index of the Pandas object by using index_col parameter. Setting index_col=False forces Pandas to not use the first column as the index, which can be helpful when handling malformed files with extra delimiters.
Example
This example uses the index_col parameter to customize the row labels while reading the CSV data.
import pandas as pd # Import StringIO to load a file-like object for reading CSV from io import StringIO # Create string representing CSV data data = """S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900""" # Use StringIO to convert the string data into a file-like object obj = StringIO(data) # read CSV into a Pandas DataFrame df = pd.read_csv(obj, index_col=['S.No']) # Display the DataFrame print(df)
Its output is as follows −
S.No | Name | Age | City | Salary |
---|---|---|---|---|
1 | Tom | 28 | Toronto | 20000 |
2 | Lee | 32 | HongKong | 3000 |
3 | Steven | 43 | Bay Area | 8300 |
4 | Ram | 38 | Hyderabad | 3900 |
Converters
Pandas also provides the ability to specify the data type for columns using the dtype parameter. You can convert columns to specific types like {'Col_1': np.float64, 'Col_2': np.int32, 'Col3': 'Int64'}.
Example
This example customizes the data type of a JSON data while parsing the data using the read_json() method with the dtype parameter.
import pandas as pd from io import StringIO import numpy as np # Create a string representing JSON data data = """[ {"Name": "Braund", "Gender": "Male", "Age": 30}, {"Name": "Cumings", "Gender": "Female", "Age": 25}, {"Name": "Heikkinen", "Gender": "Female", "Age": 35} ]""" # Use StringIO to convert the JSON-formatted string data into a file-like object obj = StringIO(data) # Read JSON into a Pandas DataFrame df = pd.read_json(obj, dtype={'Age': np.float64}) # Display the DataFrame print(df.dtypes)
Its output is as follows −
Name object Gender object Age float64 dtype: object
By default, the dtype of the 'Age' column is int, but the result shows it as float because we have explicitly casted the type.
Thus, the data looks like float −
Name | Gender | Age | |
---|---|---|---|
0 | Braund | Male | 30.0 |
1 | Cumings | Female | 25.0 |
2 | Heikkinen | Female | 35.0 |
Customizing the Header Names
When reading data files, Pandas assumes the first row as the header. However, you can customize this using the names Parameter to provide custom column names.
Example
This example reads the XML data into a Pandas DataFrame object by customizing the header names using the names parameter of the read_xml() method.
import pandas as pd from io import StringIO # Create a String representing XML data xml = """<?xml version="1.0" encoding="UTF-8"?> <bookstore> <book category="cooking"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="children"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="web"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>""" # Parse the XML data with custom column names df = pd.read_xml(StringIO(xml), names=['a', 'b', 'c','d','e']) # Display the Output DataFrame print('Output DataFrame from XML:') print(df)
Its output is as follows −
Output DataFrame from XML:
a | b | c | d | e | |
---|---|---|---|---|---|
0 | cooking | Everyday Italian | Giada De Laurentiis | 2005 | 30.00 |
1 | children | Harry Potter | J K. Rowling | 2005 | 29.99 |
2 | web | Learning XML | Erik T. Ray | 2003 | 39.95 |
Example: Reading with custom column names and header row
If the header is in a row other than the first, pass the row number to header. This will skip the preceding rows.
import pandas as pd # Import StringIO to load a file-like object for reading CSV from io import StringIO # Create string representing CSV data data = """S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900""" # Use StringIO to convert the string data into a file-like object obj = StringIO(data) # read CSV into a Pandas DataFrame df = pd.read_csv(obj, names=['a', 'b', 'c','d','e'], header=0) # Display the DataFrame print(df)
Its output is as follows −
a | b | c | d | e | |
---|---|---|---|---|---|
0 | S.No | Name | Age | City | Salary |
1 | 1 | Tom | 28 | Toronto | 20000 |
2 | 2 | Lee | 32 | HongKong | 3000 |
3 | 3 | Steven | 43 | Bay Area | 8300 |
4 | 4 | Ram | 38 | Hyderabad | 3900 |
Skipping Rows
The skiprows parameter allows you to skip a specific number of rows or line numbers when reading a file. It can also accept a callable function to decide which rows to skip based on conditions.
Example
This example shows skipping the rows of a input data while parsing.
import pandas as pd # Import StringIO to load a file-like object for reading CSV from io import StringIO # Create string representing CSV data data = """S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900""" # Use StringIO to convert the string data into a file-like object obj = StringIO(data) # read CSV into a Pandas DataFrame df = pd.read_csv(obj, skiprows=2) # Display the DataFrame print(df)
Its output is as follows −
S.No | Name | Age | City | Salary | |
---|---|---|---|---|---|
2 | Lee | 32 | HongKong | 3000 | |
0 | 3 | Steven | 43 | Bay Area | 8300 |
1 | 4 | Ram | 38 | Hyderabad | 3900 |