Python Pandas - Pivoting



Pivoting in Python Pandas is a powerful data transformation technique that reshapes data for easier analysis and visualization. It changes the data representation from a "long" format to a "wide" format, making it simpler to perform aggregations and comparisons.

This technique is particularly useful when dealing with time series data or datasets with multiple columns. Pandas provides two primary methods for pivoting −

  • pivot(): Reshapes data according to specified column or index values.

  • pivot_table(): It is a more flexible method that allows you to create a spreadsheet-style pivot table as a DataFrame.

In this tutorial, we will learn about the pivoting in Pandas using these methods with examples to demonstrate their applications in data manipulation.

Pivoting with pivot()

The Pandas df.pivot() method is used to reshape data when there are unique values for the specified index and column pairs. It is straightforward and useful when your data is well-structured without duplicate entries for the index/column combination.

Example

Here is a basic example demonstrating pivoting a Pandas DataFrame with the Pandas df.pivot() method.

Open Compiler
import pandas as pd # Create a DataFrame df = pd.DataFrame({"Col1": range(12),"Col2": ["A", "A", "A", "B", "B","B", "C", "C", "C", "D", "D", "D"], "date": pd.to_datetime(["2024-01-03", "2024-01-04", "2024-01-05"] * 4)}) # Display the Input DataFrame print('Original DataFrame:\n', df) # Pivot the DataFrame pivoted = df.pivot(index="date", columns="Col2", values="Col1") # Display the output print('Pivoted DataFrame:\n', pivoted)

Following is the output of the above code −

Original DataFrame:
Col1 Col2 date
0 0 A 2024-01-03
1 1 A 2024-01-04
2 2 A 2024-01-05
3 3 B 2024-01-03
4 4 B 2024-01-04
5 5 B 2024-01-05
6 6 C 2024-01-03
7 7 C 2024-01-04
8 8 C 2024-01-05
9 9 D 2024-01-03
10 10 D 2024-01-04
11 11 D 2024-01-05
Pivoted DataFrame:
Col2 A B C D
date
2024-01-03 0 3 6 9
2024-01-04 1 4 7 10
2024-01-05 2 5 8 11

Note: The pivot() method requires that the index and columns specified have unique values. If your data contains duplicates, you should use the pivot_table() method instead.

Pivoting with pivot_table()

The pivot() method is a straightforward way to reshape data, while pivot_table() offers flexibility for aggregation, making it suitable for more complex data manipulation tasks. This is particularly useful for summarizing data when dealing with duplicates and requires aggregation of data.

Example

This example demonstrates pivoting a DataFrame using the df.pivot_table() method.

Open Compiler
import numpy as np import pandas as pd import datetime # Create a DataFrame df = pd.DataFrame({"A": [1, 1, 2, 3] * 6, "B": ["A", "B", "C"] * 8, "C": ["x", "x", "x", "y", "y", "y"] * 4, "D": np.random.randn(24), "E": np.random.randn(24), "F": [datetime.datetime(2013, i, 1) for i in range(1, 13)] +[datetime.datetime(2013, i, 15) for i in range(1, 13)]}) # Display the Input DataFrame print('Original DataFrame:\n', df) # Pivot the DataFrame pivot_table = pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"]) # Display the output print('Pivoted DataFrame:\n', pivot_table)

Following is the output of the above code −

Original DataFrame:
A B C D E F
0 1 A x -0.530395 2.819976 2013-01-01
1 1 B x -0.593347 0.639877 2013-02-01
2 2 C x 0.817982 -0.270499 2013-03-01
3 3 A y -1.438048 1.707790 2013-04-01
4 1 B y 0.207012 -0.387364 2013-05-01
5 1 C y 0.462550 1.145260 2013-06-01
6 2 A x 3.032849 -0.027888 2013-07-01
7 3 B x -0.972964 -0.546819 2013-08-01
8 1 C x 0.613289 -0.041706 2013-09-01
9 1 A y -1.072169 -0.061843 2013-10-01
10 2 B y 1.305400 0.463981 2013-11-01
11 3 C y -1.265300 -2.020158 2013-12-01
12 1 A x 0.165842 -0.154173 2013-01-15
13 1 B x -1.038789 -0.705948 2013-02-15
14 2 C x 0.690639 0.843855 2013-03-15
15 3 A y -0.592202 0.718281 2013-04-15
16 1 B y 0.048085 -1.111917 2013-05-15
17 1 C y 1.111408 -0.125214 2013-06-15
18 2 A x 0.086012 0.276006 2013-07-15
19 3 B x -1.043673 -0.777043 2013-08-15
20 1 C x -0.471818 0.736921 2013-09-15
21 1 A y -0.138187 -0.849134 2013-10-15
22 2 B y -2.286761 -2.859258 2013-11-15
23 3 C y 0.392779 -0.104737 2013-12-15
Pivoted DataFrame: C x y A B 1 A -0.182276 -0.605178 B -0.816068 0.127548 C 0.070736 0.786979 2 A 1.559430 NaN B NaN -0.490681 C 0.754310 NaN 3 A NaN -1.015125 B -1.008318 NaN C NaN -0.436260

Pivoting with Aggregation

The Pandas pivot_table() method can be used to specify an aggregation function. By default it calculates the mean, but you can also use functions like sum, count, or even custom functions for applying aggregation to the pivoting.

Example

This example demonstrates how to apply aggregation function with pivoting a DataFrame using the df.pivot_table() method.

Open Compiler
import numpy as np import datetime import pandas as pd # Create a DataFrame df = pd.DataFrame({"A": [1, 1, 2, 3] * 6, "B": ["A", "B", "C"] * 8, "C": ["x", "x", "x", "y", "y", "y"] * 4, "D": np.random.randn(24), "E": np.random.randn(24), "F": [datetime.datetime(2013, i, 1) for i in range(1, 13)] +[datetime.datetime(2013, i, 15) for i in range(1, 13)]}) # Display the Input DataFrame print('Original DataFrame:\n', df) # Pivot the DataFrame with a aggregate function pivot_table = pd.pivot_table(df, values=["D", "E"], index=["B"], columns=["A", "C"], aggfunc="sum") # Display the output print('Pivoted DataFrame:\n', pivot_table)

Following is the output of the above code −

Original DataFrame:
A B C D E F
0 1 A x 0.921728 0.807799 2013-01-01
1 1 B x 0.565152 -0.369947 2013-02-01
2 2 C x 1.260114 0.352844 2013-03-01
3 3 A y -1.369645 1.504198 2013-04-01
4 1 B y 0.882293 -1.177686 2013-05-01
5 1 C y 0.560940 1.126121 2013-06-01
6 2 A x 1.496309 0.131623 2013-07-01
7 3 B x 0.557194 1.545635 2013-08-01
8 1 C x 0.518436 -0.581113 2013-09-01
9 1 A y -0.607266 -1.032699 2013-10-01
10 2 B y -0.665019 -0.628637 2013-11-01
11 3 C y 2.146093 -0.175748 2013-12-01
12 1 A x 0.203505 1.402464 2013-01-15
13 1 B x 0.602630 1.028011 2013-02-15
14 2 C x -0.137840 0.099252 2013-03-15
15 3 A y 1.517678 1.027651 2013-04-15
16 1 B y -0.949576 -0.268656 2013-05-15
17 1 C y -0.328505 -0.092841 2013-06-15
18 2 A x 0.089315 0.771653 2013-07-15
19 3 B x 1.889772 -0.348258 2013-08-15
20 1 C x 1.081143 -0.006387 2013-09-15
21 1 A y 0.223791 0.944354 2013-10-15
22 2 B y 0.111047 -1.634985 2013-11-15
23 3 C y -1.475421 0.586825 2013-12-15
Pivoted DataFrame: D ... E A 1 2 ... 2 3 C x y x ... y x y B ... A 1.125233 -0.383475 1.585623 ... NaN NaN 2.531849 B 1.167782 -0.067283 NaN ... -2.263622 1.197377 NaN C 1.599579 0.232435 1.122273 ... NaN NaN 0.411078 [3 rows x 12 columns]
Advertisements