I need to find the most efficient way to remove duplicates from a list in Python.
I am doing it this way:
for i in mj:
if i not in mj2:
mj2.append(i)
where kj
is a list like [2, 4, 4, 4, 4, 4, 9, 9]
and the output mj2
is of the form:
[2, 4, 9]
There is a more efficient way that doesn't include loops, since I have to parse large lists.
The simplest is to use
set()
:If you want to keep the order (since the
sets
are an unordered list of elements), you can pass asort
at the end:Another option, if your list is originally ordered and you want to maintain the order, you can use the class
OrderedDict
and leverage it to maintain this order:OrderedDict
is an implementation of dictionaries that allows you to "remember" the order in which its elements have been inserted. Therefore, you can use thefromkeys
dictionary method to use the elements ofmj
as the keys of the dictionary, since the elements ofmj
are pre-ordered so the order is preserved.You can test how the performance is with the following line of code:
although using sorted able to consume some resource. If you don't have problems with the order you can use as follows:
If the original list is very large and ordered, it's much more efficient to use
itetools.groupby
which creates an iterator without creating new lists:It is possible to get the first elements without processing the whole list:
For 'groups' you can do it like this too.