
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Reading Data from an Excel File
The Pandas library provides powerful tool for data manipulation and analysis. Among its many features, it offers the ability to read and write data to Excel files easily. Excel files are widely used to store and organize data in tabular format, making them an excellent choice for analysis. Excel files can have multiple sheets, each containing rows and columns of data. Excel files usually come with extensions like .xls or .xlsx.
Pandas provides easy-to-use methods for working with Excel data directly in Python. One of such method is the read_excel() method, which reads Excel files and loads the data into a Pandas DataFrame.
In this tutorial, we will learn how to read data from Excel files using the pandas.read_excel() method, covering different scenarios like loading a single sheet, specific sheets, and multiple sheets.
Supported Excel File Formats in Pandas
Pandas uses different backends to read various Excel file formats −
The Excel 2007+ (.xlsx) files can be read using the openpyxl Python module.
The Excel 2003 (.xls) files can be read using the xlrd module.
The Binary Excel (.xlsb) files can be read using the pyxlsb module.
Additionally, all formats can be read using the Calamine engine.
Note: Please make sure that you have installed the required packages (xlrd and openpyxl) in your system. If these packages are not installed, use the following commands to install them −orpip install openpyxlpip install xlrd
Reading an Excel File in Pandas
The pandas.read_excel() method reads Excel files and loads the data into a Pandas DataFrame. This method supports multiple Excel file formats like, .xls, .xlsx, .xlsm, and more from a local filesystem or a URL.
Example
Here is a simple example of reading a local system Excel file into a DataFrame using the pandas.read_excel() method by specifying the file path.
import pandas as pd # Read an Excel file df = pd.read_excel('data.xlsx') # Print the DataFrame print('Output DataFrame:') print(df)
Following is the output of the above code −
Output DataFrame:
Sr.no | Name | Gender | Age | |
---|---|---|---|---|
0 | 1 | Braund | female | 38 |
1 | 2 | Cumings | male | 22 |
2 | 3 | Heikkin | female | 35 |
3 | 4 | Futrelle | female | 26 |
Reading a Specific Sheet from an Excel file
The Excel files may contain multiple sheets with different names. To read a specific sheet into a Pandas DataFrame, you can specify the sheet name or index to the sheet_name parameter of the pandas.read_excel() method.
Example
The following example demonstrates how to read a specific sheet from an Excel file into a Pandas DataFrame using pandas.read_excel() method. Here we will specify the sheet name to the sheet_name parameter to read that specific sheet.
import pandas as pd # Read a specific sheet df = pd.read_excel('data.xlsx', sheet_name="Sheet_2") # Print the DataFrame print('Output DataFrame:') print(df)
Following is the output of the above code −
Output DataFrame:
Name | Value | |
---|---|---|
0 | string1 | 1 |
1 | string2 | 3 |
2 | Comment | 5 |
Reading Multiple Sheets as a Dictionary of DataFrames
If an Excel file contains multiple sheets and you need to read few of them into the Pandas DataFrame, you can pass a list of sheet names or indices to the sheet_name parameter of the pandas.read_excel() method.
Example
This example uses the pandas.read_excel() method to read the multiple sheets in an Excel file to a dictionary of DataFrames.
import pandas as pd # Read multiple sheets df = pd.read_excel('data.xlsx', sheet_name=[0, 1]) # Print the DataFrame print('Output Dict of DataFrames:') print(df)
Following is the output of the above code −
Output Dict of DataFrames: {0: Sr.no Name Gender Age 0 1 Braund female 38 1 2 Cumings male 22 2 3 Heikkin female 35 3 4 Futrelle female 26, 1: Name Value 0 string1 1 1 string2 3 2 Comment 5}
Reading MultiIndex Data from Excel
You can read MultiIndexed data from an Excel file using the pandas.read_excel() method, which is useful for working with hierarchical data. By specifying the lists of columns for index_col and lists of rows for header parameters to handle MultiIndex indices and columns.
Example
This example uses the to_excel() method and pandas.read_excel() method create an excel sheet with MultiIndexed data and read it back to Pandas DataFrame respectively.
import pandas as pd # Create a MultiIndex object index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')]) # Create a DataFrame data = [[1, 2], [3, 4], [5, 6], [7, 8]] df = pd.DataFrame(data, index=index, columns=['X', 'Y']) df.to_excel("multiindex_data.xlsx") # Read MultiIndex rows and columns df = pd.read_excel("multiindex_data.xlsx", index_col=[0, 1]) print('Output DataFrame from Excel File:') print(df)
Following is the output of the above code −
Output DataFrame from Excel File:
X | Y | ||
---|---|---|---|
A | one | 1 | 2 |
two | 3 | 4 | |
B | one | 5 | 6 |
two | 7 | 8 |