Python Pandas - Reading Data from an Excel File



The Pandas library provides powerful tool for data manipulation and analysis. Among its many features, it offers the ability to read and write data to Excel files easily. Excel files are widely used to store and organize data in tabular format, making them an excellent choice for analysis. Excel files can have multiple sheets, each containing rows and columns of data. Excel files usually come with extensions like .xls or .xlsx.

Pandas provides easy-to-use methods for working with Excel data directly in Python. One of such method is the read_excel() method, which reads Excel files and loads the data into a Pandas DataFrame.

In this tutorial, we will learn how to read data from Excel files using the pandas.read_excel() method, covering different scenarios like loading a single sheet, specific sheets, and multiple sheets.

Supported Excel File Formats in Pandas

Pandas uses different backends to read various Excel file formats −

  • The Excel 2007+ (.xlsx) files can be read using the openpyxl Python module.

  • The Excel 2003 (.xls) files can be read using the xlrd module.

  • The Binary Excel (.xlsb) files can be read using the pyxlsb module.

  • Additionally, all formats can be read using the Calamine engine.

Note: Please make sure that you have installed the required packages (xlrd and openpyxl) in your system. If these packages are not installed, use the following commands to install them −
pip install openpyxl
or
pip install xlrd

Reading an Excel File in Pandas

The pandas.read_excel() method reads Excel files and loads the data into a Pandas DataFrame. This method supports multiple Excel file formats like, .xls, .xlsx, .xlsm, and more from a local filesystem or a URL.

Example

Here is a simple example of reading a local system Excel file into a DataFrame using the pandas.read_excel() method by specifying the file path.

import pandas as pd # Read an Excel file df = pd.read_excel('data.xlsx') # Print the DataFrame print('Output DataFrame:') print(df)

Following is the output of the above code −

Output DataFrame:
Sr.no Name Gender Age
0 1 Braund female 38
1 2 Cumings male 22
2 3 Heikkin female 35
3 4 Futrelle female 26

Reading a Specific Sheet from an Excel file

The Excel files may contain multiple sheets with different names. To read a specific sheet into a Pandas DataFrame, you can specify the sheet name or index to the sheet_name parameter of the pandas.read_excel() method.

Example

The following example demonstrates how to read a specific sheet from an Excel file into a Pandas DataFrame using pandas.read_excel() method. Here we will specify the sheet name to the sheet_name parameter to read that specific sheet.

import pandas as pd # Read a specific sheet df = pd.read_excel('data.xlsx', sheet_name="Sheet_2") # Print the DataFrame print('Output DataFrame:') print(df)

Following is the output of the above code −

Output DataFrame:
Name Value
0 string1 1
1 string2 3
2 Comment 5

Reading Multiple Sheets as a Dictionary of DataFrames

If an Excel file contains multiple sheets and you need to read few of them into the Pandas DataFrame, you can pass a list of sheet names or indices to the sheet_name parameter of the pandas.read_excel() method.

Example

This example uses the pandas.read_excel() method to read the multiple sheets in an Excel file to a dictionary of DataFrames.

import pandas as pd # Read multiple sheets df = pd.read_excel('data.xlsx', sheet_name=[0, 1]) # Print the DataFrame print('Output Dict of DataFrames:') print(df)

Following is the output of the above code −

Output Dict of DataFrames:
{0:    Sr.no      Name  Gender  Age
0      1    Braund  female   38
1      2   Cumings    male   22
2      3   Heikkin  female   35
3      4  Futrelle  female   26, 1:       Name  Value
0  string1      1
1  string2      3
2  Comment      5}

Reading MultiIndex Data from Excel

You can read MultiIndexed data from an Excel file using the pandas.read_excel() method, which is useful for working with hierarchical data. By specifying the lists of columns for index_col and lists of rows for header parameters to handle MultiIndex indices and columns.

Example

This example uses the to_excel() method and pandas.read_excel() method create an excel sheet with MultiIndexed data and read it back to Pandas DataFrame respectively.

Open Compiler
import pandas as pd # Create a MultiIndex object index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')]) # Create a DataFrame data = [[1, 2], [3, 4], [5, 6], [7, 8]] df = pd.DataFrame(data, index=index, columns=['X', 'Y']) df.to_excel("multiindex_data.xlsx") # Read MultiIndex rows and columns df = pd.read_excel("multiindex_data.xlsx", index_col=[0, 1]) print('Output DataFrame from Excel File:') print(df)

Following is the output of the above code −

Output DataFrame from Excel File:
X Y
A one 1 2
two 3 4
B one 5 6
two 7 8
Advertisements