How to count duplicates in Pandas Dataframe?
Last Updated :
28 Jul, 2020
Improve
Let us see how to count duplicates in a Pandas DataFrame. Our task is to count the number of duplicate entries in a single column and multiple columns.
Under a single column : We will be using the pivot_table()
function to count the duplicates in a single column. The column in which the duplicates are to be found will be passed as the value of the index
parameter. The value of aggfunc
will be ‘size’.
# importing the module import pandas as pd # creating the DataFrame df = pd.DataFrame({ 'Name' : [ 'Mukul' , 'Rohan' , 'Mayank' , 'Sundar' , 'Aakash' ], 'Course' : [ 'BCA' , 'BBA' , 'BCA' , 'MBA' , 'BBA' ], 'Location' : [ 'Saharanpur' , 'Meerut' , 'Agra' , 'Saharanpur' , 'Meerut' ]}) # counting the duplicates dups = df.pivot_table(index = [ 'Course' ], aggfunc = 'size' ) # displaying the duplicate Series print (dups) |
Output :
Across multiple columns : We will be using the pivot_table()
function to count the duplicates across multiple columns. The columns in which the duplicates are to be found will be passed as the value of the index
parameter as a list. The value of aggfunc
will be ‘size’.
# importing the module import pandas as pd # creating the DataFrame df = pd.DataFrame({ 'Name' : [ 'Mukul' , 'Rohan' , 'Mayank' , 'Sundar' , 'Aakash' ], 'Course' : [ 'BCA' , 'BBA' , 'BCA' , 'MBA' , 'BBA' ], 'Location' : [ 'Saharanpur' , 'Meerut' , 'Agra' , 'Saharanpur' , 'Meerut' ]}) # counting the duplicates dups = df.pivot_table(index = [ 'Course' , 'Location' ], aggfunc = 'size' ) # displaying the duplicate Series print (dups) |
Output