How to Deduplicate a List in Python
Sometimes in Python, we have a list of values, among which some are duplicates. It’s an everyday use case to remove all duplicates from the list, so that all remaining values in the list are unique.
We can achieve this using different methods, some of which do preserve the original order of elements, while others do not.
Deduplicate a Python List Without Preserving Order
If it’s not a requirement to preserve the original order, we can deduplicate a list using the built-in set
data structure.
set
is a data structure which may only contain unique element by design.
By constructing such set
from our initial list, all duplicate elements are ignored. Then we may convert the set back into a list and will get a list of unique elements.
Unfortunately, the order of the elements changes, since deduplicating
functionality of the set
data structure is implemented using hash tables, which
do not remember which elements where inserted first.
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> unique_set = set(names)
>>> unique_list = list(unique_set)
>>> unique_list
['Stacy', 'Sarah', 'Jim', 'Bob']
If you use NumPy package for scientific computing in Python, you can also use the numpy.unique()
function.
>>> import numpy
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> numpy.unique(names).tolist()
['Bob', 'Jim', 'Sarah', 'Stacy']
Note that the above method doesn’t preserve the original element order either.
The order-preserving NumPy way is more involved, and you can find it below.
Deduplicate a Python List With Preserving Order
A simple solution, which allows preserving the initial order, is to use a double for-each loop.
The first loop traverses all elements of the original list.
The second loop checks if we have already seen an element with the same value.
If we haven’t, we add it to the unique
list, which, in the end,
will contain unique elements in the original order.
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> unique = []
>>> for name in names: # 1st loop
... if name not in unique: # 2nd loop
... unique.append(name)
...
>>> unique
['Bob', 'Stacy', 'Sarah', 'Jim']
Another way to deduplicate a list while preserving the original order is to use the collections.OrderedDict
data structure. OrderedDict
is a special kind of a dictionary data structure in Python, that remembers the order of key insertion.
>>> from collections import OrderedDict
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> unique = list(OrderedDict.fromkeys(names))
>>> unique
['Bob', 'Stacy', 'Sarah', 'Jim']
If you use Pandas Python data analysis library, pandas.unique
may be helpful as well. This method is order-preserving.
>>> import pandas
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> pandas.unique(names).tolist()
['Bob', 'Stacy', 'Sarah', 'Jim']
A NumPy’s way to deduplicate a list while preserving the order is a little more complicated. You have to remember an index of each distinct element and then recreate a unique list
from the original one using such indexes.
>>> import numpy
>>> names = ['Bob', 'Stacy', 'Sarah', 'Jim', 'Stacy', 'Jim']
>>> _, indexes = numpy.unique(names, return_index=True)
>>> unique = [names[i] for i in numpy.sort(indexes)]
>>> unique
['Bob', 'Stacy', 'Sarah', 'Jim']
Related Article - Python List
- How to Convert a Dictionary to a List in Python
- How to Remove All the Occurrences of an Element From a List in Python
- How to Remove Duplicates From List in Python
- How to Get the Average of a List in Python
- What Is the Difference Between List Methods Append and Extend
- How to Convert a List to String in Python