Open In App

Python – Removing duplicate dicts in list

Last Updated : 30 Dec, 2024
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Share
Report
News Follow

In Python, we may often work with a list that contains dictionaries, and sometimes those dictionaries can be duplicates. Removing duplicate dictionaries from a list can help us clean our data. In this article, we will explore various methods to Remove duplicate dicts in the list

Using Set and frozenset

A frozenset is an immutable set, and since sets don’t allow duplicates, it helps us remove repeated dictionaries efficiently. However, it’s more limited because frozen sets can’t contain other dictionaries.

a = [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}, {"name": "Alice", "age": 25}]


# convert the list into a set to remove duplicates
b = list({frozenset(d.items()) for d in a})

# Convert back to dictionaries
b = [dict(f) for f in b]

print(b)

Output
[{'name': 'Bob', 'age': 30}, {'age': 25, 'name': 'Alice'}]

Other methods that we can use to remove duplicate dictionaries from a list in Python are:

Using List Comprehension with Helper Set

For smaller lists or if we want a simpler approach without using external libraries, we can use a combination of list comprehension and a helper set to track the dictionaries we’ve already seen.

a = [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}, {"name": "Alice", "age": 25}]

# Set to keep track of dictionaries we've seen
seen = set()

# Using list comprehension to remove duplicates
b = [d for d in a if tuple(d.items()) not in seen and not seen.add(tuple(d.items()))]

print(b)

Output
[{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]

Using Loops and Conditional Checks

The most basic way to remove duplicates from a list of dictionaries is by using loops and conditional checks. This method works by manually checking each dictionary against the ones we’ve already added to the new list.

a = [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}, {"name": "Alice", "age": 25}]

# List to store unique dictionaries
b = []

# Loop through each dictionary in the list
for d in a:
    if d not in b:  
        b.append(d)  

print(b)

Output
[{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]

Using JSON Serialization

In some cases, dictionaries may contain non-hashable data types. A creative approach to handle this is by serializing the dictionaries into JSON strings and then removing duplicates.

import json

a = [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}, {"name": "Alice", "age": 25}]

# Serialize dictionaries to JSON strings to remove duplicates
b = list({json.dumps(d, sort_keys=True) for d in a})

# Convert the JSON strings back to dictionaries
b = [json.loads(d) for d in b]

print(b)

Output
[{'age': 30, 'name': 'Bob'}, {'age': 25, 'name': 'Alice'}]
  • This method uses json.dumps() to serialize each dictionary and removes duplicates based on the string representation. While it’s powerful, it may not be the most efficient for large datasets with simple data.
  • Using pandas Library

Using Pandas

When working with large datasets, the pandas library offers a very efficient way to remove duplicates. It is fast and provides easy-to-use functions like drop_duplicates() to handle such tasks with minimal code. After that, we can convert the DataFrame back into a list of dictionaries.

import pandas as pd

a = [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}, {"name": "Alice", "age": 25}]

# Convert the list of dictionaries to a pandas DataFrame
df = pd.DataFrame(a)

# Drop duplicate rows
df = df.drop_duplicates()

# Convert the DataFrame back to a list of dictionaries
b = df.to_dict(orient='records')

print(b)

Output
[{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]


Next Article

Similar Reads

three90RightbarBannerImg