How to Find & Drop duplicate columns in a Pandas DataFrame?

How to count duplicates in Pandas Dataframe?

Last Updated : 28 Jul, 2020

Let us see how to count duplicates in a Pandas DataFrame. Our task is to count the number of duplicate entries in a single column and multiple columns.

Under a single column : We will be using the pivot_table() function to count the duplicates in a single column. The column in which the duplicates are to be found will be passed as the value of the index parameter. The value of aggfunc will be ‘size’.

# importing the module 
import pandas as pd 
   
# creating the DataFrame 
df = pd.DataFrame({'Name' : ['Mukul', 'Rohan', 'Mayank',  
                             'Sundar', 'Aakash'], 
                   'Course' : ['BCA', 'BBA', 'BCA', 'MBA', 'BBA'], 
                   'Location' : ['Saharanpur', 'Meerut', 'Agra',  
                                 'Saharanpur', 'Meerut']}) 
  
# counting the duplicates 
dups = df.pivot_table(index = ['Course'], aggfunc ='size') 
  
# displaying the duplicate Series 
print(dups) 

Output :

Across multiple columns : We will be using the pivot_table() function to count the duplicates across multiple columns. The columns in which the duplicates are to be found will be passed as the value of the index parameter as a list. The value of aggfunc will be ‘size’.

# importing the module 
import pandas as pd 
   
# creating the DataFrame 
df = pd.DataFrame({'Name' : ['Mukul', 'Rohan', 'Mayank',  
                             'Sundar', 'Aakash'], 
                   'Course' : ['BCA', 'BBA', 'BCA', 'MBA', 'BBA'], 
                   'Location' : ['Saharanpur', 'Meerut', 'Agra',  
                                 'Saharanpur', 'Meerut']}) 
  
# counting the duplicates 
dups = df.pivot_table(index = ['Course', 'Location'], aggfunc ='size') 
  
# displaying the duplicate Series 
print(dups) 

Output

How to Find & Drop duplicate columns in a Pandas DataFrame?

mukulsomukesh

News

Improve

Article Tags :

Practice Tags :

python

Similar Reads

How to count duplicates in Pandas Dataframe?

Let us see how to count duplicates in a Pandas DataFrame. Our task is to count the number of duplicate entries in a single column and multiple columns. Under a single column : We will be using the pivot_table() function to count the duplicates in a single column. The column in which the duplicates a

How to Find & Drop duplicate columns in a Pandas DataFrame?

Letâ€™s discuss How to Find and drop duplicate columns in a Pandas DataFrame. First, Letâ€™s create a simple Dataframe with column names 'Name', 'Age', 'Domicile', and 'Age'/'Marks'.Â Find Duplicate Columns from a DataFrameTo find duplicate columns we need to iterate through all columns of a DataFrame a

Concatenate Pandas DataFrames Without Duplicates

Concatenating Pandas DataFrames refers to combining multiple DataFrames into one, either by stacking them vertically (rows) or horizontally (columns). However, when the DataFrames being combined have overlapping rows or columns, duplicates can occur. The core idea is simple: use theÂ pd.concatÂ method

Delete duplicates in a Pandas Dataframe based on two columns

A dataframe is a two-dimensional, size-mutable tabular data structure with labeled axes (rows and columns). It can contain duplicate entries and to delete them there are several ways. The dataframe contains duplicate values in column order_id and customer_id. Below are the methods to remove duplicat

How to drop duplicates and keep one in PySpark dataframe

In this article, we will discuss how to handle duplicate values in a pyspark dataframe. A dataset may contain repeated rows or repeated data points that are not useful for our task. These repeated values in our dataframe are called duplicate values. To handle duplicate values, we may use a strategy

Python | Pandas dataframe.get_dtype_counts()

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.get_dtype_counts() function returns the counts of dtypes in the given

Pandas dataframe.drop_duplicates()

Pandas drop_duplicates() method helps in removing duplicates from the Pandas Dataframe allows to remove duplicate rows from a DataFrame, either based on all columns or specific ones in python. By default, drop_duplicates() scans the entire DataFrame for duplicate rows and removes all subsequent occu

How to duplicate a row N time in Pyspark dataframe?

In this article, we are going to learn how to duplicate a row N times in a PySpark DataFrame. Method 1: Repeating rows based on column value In this method, we will first make a PySpark DataFrame using createDataFrame(). In our example, the column "Y" has a numerical value that can only be used here

How to Count Distinct Values of a Pandas Dataframe Column?

Let's discuss how to count distinct values of a Pandas DataFrame column. Using pandas.unique()You can use pd.unique()to get all unique values in a column. To count them, apply len()to the result. This method is useful when you want distinct values and their count. [GFGTABS] Python import pandas as p

Remove duplicates from a dataframe in PySpark

In this article, we are going to drop the duplicate data from dataframe using pyspark in Python Before starting we are going to create Dataframe for demonstration: C/C++ Code # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # cre