I have the following method that goes through a certain input file with an amount of data that increases exponentially and returns a list with the coordinates of a city.
The code block that represents the above is the following:
def recorrido(puntos):
for i in range(0, len(puntos)):
for j in range(i + 1, len(puntos)):
objeto.add_recorrido(i, j, peso = distancia(puntos[i], puntos[j]))
return objeto
The problem with this block is that going through for
a list with a very large amount of data will take twice as long to go through it as doing it with a set()
whose efficiency is much higher. So I'm looking for a way to pass the above method, specifically the loops, for
to a set()
.
On the other hand, the next block of code to be optimized is composed of an empty list in which the first element of a tuple that is contained in a list is stored and the repeated elements, if they have already been entered in the empty list, are not included again.
C = [(0, 4), (4, 0), (0, 1), (1, 2), (2, 3), (3, 2), (2, 1), (1, 0)]
solution = []
for i, j in C:
if i not in solution:
solution.append(i)
return solution
Therefore, I want to take this block composed by for
to another much more efficient functionality, so that, in this way, I avoid manual iteration of the for
. I tried to implement it by means of a set()
of the form:
miListaSinRepeticion = list(set(listaoriginal))
sorted(miListaSinRepeticion)
Clarifications
You have a little trouble with the idea of efficiency in relation to
set()
.First, a
set()
is not an alternative to a loop , sinceset()
a is a data structure, while a loop is a control structure. Theset()
would be an alternative to a list, not a loop .It is true though that sometimes having a
set()
instead of a list saves you a loop, since you can check whether or not a given element belongs to the set using the conditionelemento in conjunto
. In a list, to see if an element belongs to the list, you would have to make a loop that iterates through it, comparing the elements one by one. It's true that Python supports the operatorin
for lists as well, so you could doelemento in lista
, which seems to be equivalent to the set case, but it's not the same. So that python can answerTrue
orFalse
to the questionelemento in lista
, it has to iterate through all the elements of the list until it finds the requested one. However, in the case of a set, because of the way sets are implemented internally, you can answer that question without iterating.In short, testing if an element belongs to a set is much more efficient than doing the same thing in a list.
However, if your algorithm needs to go through all the elements of your collection (be it a list or a set) one by one, because it has to do something with each one of them, using a
set()
ni will free you from doing the loop that iterates, nor will it be faster than iterating through the list, if they have the same elements.Sets are often used to remove duplicates in a list. When you do
sin_repeticiones = set(lista)
, the result is a set that has the same elements as that list, but with the repetitions removed. That statement is basically equivalent to the following loop (and has the same efficiency):Do the equivalent with lists, i.e. this:
it would be slower due to the condition
elemento not in sin_repeticiones
, since to verify if the element was already in the list or not, all the elements that were there must be looked at (which is not necessary in a set).The main disadvantage of sets is that they are unordered . There is no "first element" in a set. When you try to iterate through the elements of the set (or convert them back to a list) they may come out in any order.
In this sense the second block of code has an advantage: it maintains the order of the original list. That is, the list
sin_repeticiones
will have the elements of the original list, in the same order in which they appeared, but skipping the repeated ones.In short, if the order of the elements is important, a set would not be the appropriate data type. And if most of the operations you do on your collection involve visiting its elements one at a time, the set isn't going to give you any speed gains either. Where sets are really useful is when you often need to look at whether a piece of data was already in your collection or not, or when you have to perform operations between sets, such as getting their union, intersection, etc.
Your case
Having said all of the above, it's not entirely clear what you need to optimize in the code. In your loop:
since you need to compute the distance between each pair of points (i,j), nothing will save you from iterating through the elements of the list
puntos
, and converting it as a set will not bring any improvement (and only problems, since there will be no element i -th nor j-th in the set).However that loop can be optimized if:
puntos
shorter, for example by removing duplicates in it (for which you can use any of the approaches seen above). This is probably not possible in your case.distancia()
so that it takes less time (because this function, being called many times, is your bottleneck). You don't show the code for this function so it's not possible to know if it could be optimized. Probably not.distancia()
. This trick is that the function "remembers" if it has been called more times with the same arguments. If, for example, you calldistancia((0,0), (1,2))
, the function will compute the necessary value and return it to you, but at the same time it will "remember" that result. If you later call the function again with the same parameters,distancia((0,0), (1,2))
this time it will not calculate anything but will immediately return the "remembered" value .Option 3 seems interesting as well as complex to implement. You actually only need to add one line to your code to implement it, thanks to the
functools
.Just do a
from functools import lru_cache
and then put a decorator in front of your function:That decorator makes the function remember previous calls and return results that it has already calculated.