
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Pivoting
Pivoting in Python Pandas is a powerful data transformation technique that reshapes data for easier analysis and visualization. It changes the data representation from a "long" format to a "wide" format, making it simpler to perform aggregations and comparisons.
This technique is particularly useful when dealing with time series data or datasets with multiple columns. Pandas provides two primary methods for pivoting −
pivot(): Reshapes data according to specified column or index values.
pivot_table(): It is a more flexible method that allows you to create a spreadsheet-style pivot table as a DataFrame.
In this tutorial, we will learn about the pivoting in Pandas using these methods with examples to demonstrate their applications in data manipulation.
Pivoting with pivot()
The Pandas df.pivot() method is used to reshape data when there are unique values for the specified index and column pairs. It is straightforward and useful when your data is well-structured without duplicate entries for the index/column combination.
Example
Here is a basic example demonstrating pivoting a Pandas DataFrame with the Pandas df.pivot() method.
import pandas as pd # Create a DataFrame df = pd.DataFrame({"Col1": range(12),"Col2": ["A", "A", "A", "B", "B","B", "C", "C", "C", "D", "D", "D"], "date": pd.to_datetime(["2024-01-03", "2024-01-04", "2024-01-05"] * 4)}) # Display the Input DataFrame print('Original DataFrame:\n', df) # Pivot the DataFrame pivoted = df.pivot(index="date", columns="Col2", values="Col1") # Display the output print('Pivoted DataFrame:\n', pivoted)
Following is the output of the above code −
Original DataFrame:
Col1 | Col2 | date | |
---|---|---|---|
0 | 0 | A | 2024-01-03 |
1 | 1 | A | 2024-01-04 |
2 | 2 | A | 2024-01-05 |
3 | 3 | B | 2024-01-03 |
4 | 4 | B | 2024-01-04 |
5 | 5 | B | 2024-01-05 |
6 | 6 | C | 2024-01-03 |
7 | 7 | C | 2024-01-04 |
8 | 8 | C | 2024-01-05 |
9 | 9 | D | 2024-01-03 |
10 | 10 | D | 2024-01-04 |
11 | 11 | D | 2024-01-05 |
Col2 | A | B | C | D |
---|---|---|---|---|
date | ||||
2024-01-03 | 0 | 3 | 6 | 9 |
2024-01-04 | 1 | 4 | 7 | 10 |
2024-01-05 | 2 | 5 | 8 | 11 |
Note: The pivot() method requires that the index and columns specified have unique values. If your data contains duplicates, you should use the pivot_table() method instead.
Pivoting with pivot_table()
The pivot() method is a straightforward way to reshape data, while pivot_table() offers flexibility for aggregation, making it suitable for more complex data manipulation tasks. This is particularly useful for summarizing data when dealing with duplicates and requires aggregation of data.
Example
This example demonstrates pivoting a DataFrame using the df.pivot_table() method.
import numpy as np import pandas as pd import datetime # Create a DataFrame df = pd.DataFrame({"A": [1, 1, 2, 3] * 6, "B": ["A", "B", "C"] * 8, "C": ["x", "x", "x", "y", "y", "y"] * 4, "D": np.random.randn(24), "E": np.random.randn(24), "F": [datetime.datetime(2013, i, 1) for i in range(1, 13)] +[datetime.datetime(2013, i, 15) for i in range(1, 13)]}) # Display the Input DataFrame print('Original DataFrame:\n', df) # Pivot the DataFrame pivot_table = pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"]) # Display the output print('Pivoted DataFrame:\n', pivot_table)
Following is the output of the above code −
Original DataFrame:
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
0 | 1 | A | x | -0.530395 | 2.819976 | 2013-01-01 |
1 | 1 | B | x | -0.593347 | 0.639877 | 2013-02-01 |
2 | 2 | C | x | 0.817982 | -0.270499 | 2013-03-01 |
3 | 3 | A | y | -1.438048 | 1.707790 | 2013-04-01 |
4 | 1 | B | y | 0.207012 | -0.387364 | 2013-05-01 |
5 | 1 | C | y | 0.462550 | 1.145260 | 2013-06-01 |
6 | 2 | A | x | 3.032849 | -0.027888 | 2013-07-01 |
7 | 3 | B | x | -0.972964 | -0.546819 | 2013-08-01 |
8 | 1 | C | x | 0.613289 | -0.041706 | 2013-09-01 |
9 | 1 | A | y | -1.072169 | -0.061843 | 2013-10-01 |
10 | 2 | B | y | 1.305400 | 0.463981 | 2013-11-01 |
11 | 3 | C | y | -1.265300 | -2.020158 | 2013-12-01 |
12 | 1 | A | x | 0.165842 | -0.154173 | 2013-01-15 |
13 | 1 | B | x | -1.038789 | -0.705948 | 2013-02-15 |
14 | 2 | C | x | 0.690639 | 0.843855 | 2013-03-15 |
15 | 3 | A | y | -0.592202 | 0.718281 | 2013-04-15 |
16 | 1 | B | y | 0.048085 | -1.111917 | 2013-05-15 |
17 | 1 | C | y | 1.111408 | -0.125214 | 2013-06-15 |
18 | 2 | A | x | 0.086012 | 0.276006 | 2013-07-15 |
19 | 3 | B | x | -1.043673 | -0.777043 | 2013-08-15 |
20 | 1 | C | x | -0.471818 | 0.736921 | 2013-09-15 |
21 | 1 | A | y | -0.138187 | -0.849134 | 2013-10-15 |
22 | 2 | B | y | -2.286761 | -2.859258 | 2013-11-15 |
23 | 3 | C | y | 0.392779 | -0.104737 | 2013-12-15 |
Pivoting with Aggregation
The Pandas pivot_table() method can be used to specify an aggregation function. By default it calculates the mean, but you can also use functions like sum, count, or even custom functions for applying aggregation to the pivoting.
Example
This example demonstrates how to apply aggregation function with pivoting a DataFrame using the df.pivot_table() method.
import numpy as np import datetime import pandas as pd # Create a DataFrame df = pd.DataFrame({"A": [1, 1, 2, 3] * 6, "B": ["A", "B", "C"] * 8, "C": ["x", "x", "x", "y", "y", "y"] * 4, "D": np.random.randn(24), "E": np.random.randn(24), "F": [datetime.datetime(2013, i, 1) for i in range(1, 13)] +[datetime.datetime(2013, i, 15) for i in range(1, 13)]}) # Display the Input DataFrame print('Original DataFrame:\n', df) # Pivot the DataFrame with a aggregate function pivot_table = pd.pivot_table(df, values=["D", "E"], index=["B"], columns=["A", "C"], aggfunc="sum") # Display the output print('Pivoted DataFrame:\n', pivot_table)
Following is the output of the above code −
Original DataFrame:
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
0 | 1 | A | x | 0.921728 | 0.807799 | 2013-01-01 |
1 | 1 | B | x | 0.565152 | -0.369947 | 2013-02-01 |
2 | 2 | C | x | 1.260114 | 0.352844 | 2013-03-01 |
3 | 3 | A | y | -1.369645 | 1.504198 | 2013-04-01 |
4 | 1 | B | y | 0.882293 | -1.177686 | 2013-05-01 |
5 | 1 | C | y | 0.560940 | 1.126121 | 2013-06-01 |
6 | 2 | A | x | 1.496309 | 0.131623 | 2013-07-01 |
7 | 3 | B | x | 0.557194 | 1.545635 | 2013-08-01 |
8 | 1 | C | x | 0.518436 | -0.581113 | 2013-09-01 |
9 | 1 | A | y | -0.607266 | -1.032699 | 2013-10-01 |
10 | 2 | B | y | -0.665019 | -0.628637 | 2013-11-01 |
11 | 3 | C | y | 2.146093 | -0.175748 | 2013-12-01 |
12 | 1 | A | x | 0.203505 | 1.402464 | 2013-01-15 |
13 | 1 | B | x | 0.602630 | 1.028011 | 2013-02-15 |
14 | 2 | C | x | -0.137840 | 0.099252 | 2013-03-15 |
15 | 3 | A | y | 1.517678 | 1.027651 | 2013-04-15 |
16 | 1 | B | y | -0.949576 | -0.268656 | 2013-05-15 |
17 | 1 | C | y | -0.328505 | -0.092841 | 2013-06-15 |
18 | 2 | A | x | 0.089315 | 0.771653 | 2013-07-15 |
19 | 3 | B | x | 1.889772 | -0.348258 | 2013-08-15 |
20 | 1 | C | x | 1.081143 | -0.006387 | 2013-09-15 |
21 | 1 | A | y | 0.223791 | 0.944354 | 2013-10-15 |
22 | 2 | B | y | 0.111047 | -1.634985 | 2013-11-15 |
23 | 3 | C | y | -1.475421 | 0.586825 | 2013-12-15 |