Creating Pivot Table with Multiple Columns using Python Pandas
PythonPandas make data manipulation, representation and analysis easier. Pandas Pivot Tables are used to create spreadsheet-style pivot tables as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.
Pivot Table with Multiple Columns using Pandas
A pivot table in pandas is a way of summarizing and aggregating data in a DataFrame, especially when you have multiple dimensions or categorical variables. It allows you to reshape and transform your data, making it easier to analyze and gain insights. In a pivot table, you can specify which columns of the original DataFrame should become the new index, which columns should become new columns, and which columns should be used for aggregating data.
Syntax:
pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=_NoDefault.no_default, sort=True)
Parameters:
data
: The DataFrame to be used for creating the pivot table.values
: Column(s) to aggregate. This can be a list, a single column name, or a scalar. It specifies the values that will be aggregated in the resulting pivot table.index
: Column, Grouper, array, or list of the previous. The column or columns whose unique values will become the index of the pivot table.columns
: Column, Grouper, array, or list of the previous. The column or columns whose unique values will become the columns of the pivot table.aggfunc
: Function, list of functions, or a dictionary. Specifies how to aggregate the values. Common options include 'mean', 'sum', 'count', 'min', 'max', etc. It can be a single function, a list of functions, or a dictionary where keys are column names, and values are aggregation functions.fill_value
: Scalar, defaultNone
. The value to use for filling missing values in the resulting pivot table.margins
: Bool, defaultFalse
. IfTrue
, it adds row/column margins (subtotals) to the pivot table.dropna
: Bool, defaultTrue
. IfTrue
, it excludes NA/null values from the result.margins_name
: Str, default'All'
. Name to be used for the row/column that will contain the totals whenmargins=True
.observed
: Bool, defaultFalse
(Deprecated since version 2.2.0). This parameter is deprecated and no longer used.sort
: Bool, defaultTrue
. Sort the result DataFrame by the names of the index and column levels ifTrue
.
Creating Pivot Table with Multiple Columns using Pandas
Pivot Table for Students Report
Let's create a DataFrame (df
) with columns Name, Subject, Score, and Grade. Subsequently, a pivot table is generated using the pivot_table
method, and the 'Name' column is designated as the index. The aggregation functions for score and grade columns are defined using the aggfunc
parameter. For 'Score', the mean value is computed and 'Grade', the 'first' value encountered is chosen as the representative grade.
import pandas as pd
data = {
'Name': ['Shravan', 'Jeetu', 'Ram', 'Jeetu', 'Shravan', 'Ram'],
'Subject': ['DSA', 'DSA', 'DSA', 'DBMS', 'DBMS', 'DBMS'],
'Score': [85, 90, 75, 80, 88, 82],
'Grade': ['A', 'A', 'B', 'B', 'A', 'B']
}
df = pd.DataFrame(data)
# Creating a pivot table with multiple columns
pivot_table = df.pivot_table(index='Name',
columns='Subject',
values=['Score', 'Grade'],
aggfunc={'Score': 'mean', 'Grade': 'first'})
print(pivot_table)
Output:
Grade Score
Subject DBMS DSA DBMS DSA
Name
Jeetu B A 80 90
Ram B B 82 75
Shravan A A 88 85
Pivot Table for Shop Items Report
A structured DataFrame is created to capture information related to dates Date, different categories Category, and corresponding numerical values Value1 and Value2. Setting margins=True
adds subtotals along both the rows and columns, creating a more detailed summary of the data.
import pandas as pd
data = {
'Date': ['2022-01-01', '2022-01-01', '2022-01-02', '2022-01-02'],
'Category': ['A', 'B', 'A', 'B'],
'Value1': [10, 15, 20, 25],
'Value2': [30, 35, 40, 45]
}
df = pd.DataFrame(data)
# Create a pivot table with multiple columns with additional parameters
pivot_table = df.pivot_table(
index='Date',
columns='Category',
values=['Value1', 'Value2'],
aggfunc='sum',
margins=True,
margins_name='Total',
sort=True
)
print(pivot_table)