
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Reindexing
Reindexing is a powerful and fundamental operation in Pandas that allows you to align your data with a new set of labels. Whether you're working with rows or columns, reindexing gives you control over how your data aligns with the labels you specify.
This operation is especially useful when working with time series data, aligning datasets from different sources, or simply reorganizing data to match a particular structure.
What is Reindexing?
Reindexing in Pandas refers to the process of conforming your data to match a new set of labels along a specified axis (rows or columns). This process can accomplish several tasks −
Reordering: Reorder the existing data to match a new set of labels.
Inserting Missing Values: If a label in the new set does not exist in the original data, Pandas will insert a missing value (NaN) for that label.
Filling Missing Data: You can specify how to fill in missing values that result from reindexing, using various filling methods.
The reindex() method is the primary tool for performing reindexing in Pandas. It allows you to modify the row and column labels of Pandas data structures.
Key Methods Used in Reindexing
reindex(): This method is used to align an existing data structure with a new index (or columns). It can reorder and/or insert missing labels.
reindex_like(): This method allows you to reindex one DataFrame or Series to match another. It's useful when you want to ensure two data structures are aligned similarly.
Filling Methods: When reindexing introduces NaN values, you can fill them using methods like ffill, bfill, and nearest.
Example: Reindexing a Pandas Series
The following example demonstrates reindexing a Pandas Series object using the reindex() method. In this case, the "f" label was not present in the original Series, so it appears as NaN in the output reindexed Series.
import pandas as pd import numpy as np s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"]) print("Original Series:\n",s) s_reindexed = s.reindex(["e", "b", "f", "d"]) print('\nOutput Reindexed Series:\n',s_reindexed)
On executing the above code you will get the following output −
Original Series: a 0.148874 b 0.592275 c -0.903546 d 1.031230 e -0.254599 dtype: float64 Output Reindexed Series: e -0.254599 b 0.592275 f NaN d 1.031230 dtype: float64
Example: Reindexing a DataFrame
Consider the following example of reindexing a DataFrame using the reindex() method. With a DataFrame, you can reindex both the rows (index) and columns.
import pandas as pd import numpy as np N=5 df = pd.DataFrame({ 'A': pd.date_range(start='2016-01-01',periods=N,freq='D'), 'x': np.linspace(0,stop=N-1,num=N), 'y': np.random.rand(N), 'C': np.random.choice(['Low','Medium','High'],N).tolist(), 'D': np.random.normal(100, 10, size=(N)).tolist() }) print("Original DataFrame:\n", df) #reindex the DataFrame df_reindexed = df.reindex(index=[0,2,5], columns=['A', 'C', 'B']) print("\nOutput Reindexed DataFrame:\n",df_reindexed)
Its output is as follows −
Original DataFrame: A x y C D 0 2016-01-01 0.0 0.513990 Medium 118.143385 1 2016-01-02 1.0 0.751248 Low 91.041201 2 2016-01-03 2.0 0.332970 Medium 100.644345 3 2016-01-04 3.0 0.723816 High 108.810386 4 2016-01-05 4.0 0.376326 High 101.346443 Output Reindexed DataFrame: A C B 0 2016-01-01 Medium NaN 2 2016-01-03 Medium NaN 5 NaT NaN NaN
Reindex to Align with Other Objects
Sometimes, you may need to reindex one DataFrame to align it with another. The reindex_like() method allows you to do this seamlessly.
Example
The following example demonstrates how to reindex a DataFrame (df1) to match another DataFrame (df2) using the reindex_like() method.
import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3']) df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3']) df1 = df1.reindex_like(df2) print(df1)
Its output is as follows −
col1 col2 col3 0 -2.467652 -1.211687 -0.391761 1 -0.287396 0.522350 0.562512 2 -0.255409 -0.483250 1.866258 3 -1.150467 -0.646493 -0.222462 4 0.152768 -2.056643 1.877233 5 -1.155997 1.528719 -1.343719 6 -1.015606 -1.245936 -0.295275
Note: Here, the df1 DataFrame is altered and reindexed like df2. The column names should be matched or else NAN will be added for the entire column label.
Filling While ReIndexing
The reindex() method provides an optional parameter method for filling missing values. The available methods include −
pad/ffill: Fill values forward.
bfill/backfill: Fill values backward.
nearest: Fill from the nearest index values.
Example
The following example demonstrates the working of the ffill method.
import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(6, 3), columns=['col1', 'col2', 'col3']) df2 = pd.DataFrame(np.random.randn(2, 3), columns=['col1', 'col2', 'col3']) # Padding NaNs print(df2.reindex_like(df1)) # Now fill the NaNs with preceding values print("Data Frame with Forward Fill:") print(df2.reindex_like(df1, method='ffill'))
Its output is as follows −
col1 col2 col3 0 1.311620 -0.707176 0.599863 1 -0.423455 -0.700265 1.133371 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill: col1 col2 col3 0 1.311620 -0.707176 0.599863 1 -0.423455 -0.700265 1.133371 2 -0.423455 -0.700265 1.133371 3 -0.423455 -0.700265 1.133371 4 -0.423455 -0.700265 1.133371 5 -0.423455 -0.700265 1.133371
Note: The last four rows are padded.
Limits on Filling While Reindexing
The limit argument provides additional control over filling while reindexing. The limit specifies the maximum count of consecutive matches.
Example
Let us consider the following example to understand specifying limits on filling −
import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3']) df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3']) # Padding NaNs print(df2.reindex_like(df1)) # Now fill the NaNs with preceding values print("Data Frame with Forward Fill limiting to 1:") print(df2.reindex_like(df1, method='ffill', limit=1))
Its output is as follows −
col1 col2 col3 0 0.247784 2.128727 0.702576 1 -0.055713 -0.021732 -0.174577 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill limiting to 1: col1 col2 col3 0 0.247784 2.128727 0.702576 1 -0.055713 -0.021732 -0.174577 2 -0.055713 -0.021732 -0.174577 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN
Note: The forward fill (ffill) is limited to only one row.