I have a list [1,2,3,4,5,6,7,8,9]
and I would like to get all the lists that arise as a result of grouping its elements in pairs and a solitary one, regardless of the order of the pairs. That is, I expect something like:
[[1,2],[3,4],[5,6],[7,8],[9]],
[[1,2],[3,4],[5,6],[7,9],[8]],
[[1,2],[3,4],[5,6],[8,9],[7]],
[[1,2],[3,4],[5,7],[6,8],[9]],
[[1,2],[3,4],[5,7],[6,9],[8]],
...
As @abulafia indicates, the total number of lists in the case of 9 elements would be:
C(9,2) * C(7,2) * C(5,2) * C(3,2)
where C(n,m) are the combinations of n elements taken from m to m. With 22680 possibilities as a result, quite far from the 36 that it indicated.
A greeting and thanks in advance
There are many more combinations than 36.
In fact, I get 22680.( Update All this is incorrect. The definitive answer is under "Update 2". However, I leave the original answer so that you can see the process of how it has been arriving at the good answer)It has cost me a lot to find an algorithm that generates them all, and I have had to resort to recursion. Still the algorithm is quite difficult to understand. For now, I present the code without explanations and an example of the result of its execution for a list of only 5 numbers, which already produces 30 cases. Explanations later.
Code
Execution example:
Discussion of the result
As you can see, the 30 cases that appear are not the 10 that you would expect if you applied the formula of "combinations of 5 elements taken 2 by 2". And it is that the combinations of 5 elements of 2 in 2 are indeed 10, these 10:
But those 10 are not the whole story. They are only the 10 possible ways to start each of the (30) cases of the result. Once we have chosen one of those combinations, for example (1,2), there is still the rest of the list (3,4,5) from which we can again take combinations of 2 in 2 (another 3 cases: (3 .4), (3.5) and (4.5)). In this example in which the list has 5 elements, that's where the thing ends, since once one of those second cases has been chosen, there is only one possible element left to choose from.
That's where the 30 cases come from. We have 10 possible ways to start and for each of them another 3 possible ways to continue.
In the case of having 9 numbers the formula basically becomes:
Being C(n,m) the number of combinations of n elements taken from m to m. If you do the numbers it comes out:
which are what the code I put above actually produces.
Code explanation
The code uses recursion, which basically translates to:
The code does the following:
In the event that the received list has two elements or less, it simply returns a list of lists that have the pair of elements inside (or a single element). For example, if it receives a list with [2,3] as a parameter, it would return
[[[2,3]]]
. If it receives a list with 9 as a parameter, it would return[[[9]]]
.If you look, that answer is correct. It contains all the ways to group by pairs the list that it has received as a parameter.
Otherwise we must build a list with all possible cases. For it:
itertools.combinations()
) all the possible ways to start our case (which are all the combinations of the given list, taking its elements two by two)It only remains to write the "magic" function that knows how to solve the problem for a sublist. But, thanks to the wonders of recursion, it turns out that we already have that function, it's the one we just wrote!
Although it seems incredible (it always seems so to me about recursion), it works. The function just calls itself.
Update
The above code generates cases that only differ in the order of the joins. For example, among the outputs above are:
To avoid this you have to modify the code slightly. Instead of making each pair a list, I'll make it a tuple. The difference in our case is that tuples can be put into sets, and lists cannot. Thus, each "case" can be reduced to a set and in them the order does not matter, so that two cases that only differ in order will give rise to the same set. I am adding those sets to the result if they were not already there.
One problem with sets is that they do not have an internal order when it comes to displaying them, so the final results could show cases like
[(1,), (3,4), (2,5)]
, that is, the "lone" element would no longer necessarily appear at the end.To avoid that "aesthetic" detail, I keep two lists. One of normal results (same as that of the initial solution) and another of sets. I use this second one only to avoid putting repeated cases in the first one.
This is the new code:
And now when executing for
[1,2,3,4,5]
only 15 cases come out:For the 9 digits the thing has been reduced to 2235 cases. But I couldn't tell you what the general formula is where this number comes from (another challenge! Damn!)
Update 2
The previous code has a bug (it surprised me that the final number of combinations for the 9 figures was 2235, which does not factor well since it is 3x5x149, a strange trio of primes that did not look good at all).
The bug becomes apparent if we generate the list of combinations for the case
[4,5,6,7,8]
, instead of[1,2,3,4,5]
. Obviously the same number of combinations should come out (15), but instead 27 come out. Examining the results I find that "duplicate" cases appear that shouldn't be:The problem here is that the pair
(8,6)
is considered different from the pair(6,8)
, so the set of different combinations is in fact considered to be two different valid combinations.The case did not appear when we used it as input
[1,2,3,4,5]
because coincidentally for that case all the generated tuples were in increasing order. That is, in each generated tuple(x,y)
it was true thatx<y
. Therefore, the version(y,x)
of that same tuple never appeared.This behavior can be considered an accident. Actually we have no guarantees of the order in which the tuples will come out, because when we call the function recursively we are no longer passing a list, but a set (
set(elementos)-set(pareja)
).itertools.combinations()
will iterate over the elements of that set to generate pairs, but a set gives no guarantees about the order in which it will return its elements, so it could have returned both as the first(3,4)
pair(4,3)
.That accidental behavior of the list
[1,2,3,4,5]
by which the tuples were generated randomly in order, also appears for other lists. But instead it disappears in the list[4,5,6,7,8]
where unordered tuples begin to be seen and therefore are not recognized as repetitions.Bug fix
For the tuple to
(x,y)
be considered the same as the tuple(y,x)
, it is best to stop using tuples to represent combinations, and use sets as well.Apparently, therefore, it would suffice to change all the occurrences of the word
tuple()
toset()
. However it is not so easy.Issues:
set()
not. Not all python data types can be members of a set. Only the ones that are hashable (and in particular immutable). This was why I had changed lists to tuples in the first place, so I could put them into sets. If I change it back toset()
I can no longer put it in a set, which I need to recognize repeated cases.set()
usefrozenset()
to represent each pair. This is a special type of set that cannot be added to or removed from. That is, an immutable set, which is therefore hashable and can be part of other sets.The only problem with
frozenset()
is that it makes the output of the program very dirty, because now then a possible combination would be shown like this when printed:instead of like this:
Luckily it's easy to define how you want your own class to print. In the following code I define my class
MySet
, which inherits fromfrozenset()
, but redefines the method__repr__()
so that the output on the screen is more compact and readable. I use this class where before I usedtuple()
.Execution example:
Now works correctly for the case where the input is
[4,5,6,7,8]
, producing 15 combinations instead of 27.And for the case where the input is
[1,2,3,4,5,6,7,8,9]
the number of combinations generated is only 945 (instead of 2235).Bonuses
I have found the formula for the total number of combinations to come out. Just divide the originally calculated number (22680) by the factorial of 4 (24) and you get 945. This is because in each case there are 4 pairs, whose order does not matter and therefore we divide by the number of permutations of those 4 elements.
The general formula would be to divide by the factorial of the number of pairs that are formed for each case. Namely:
(It would be necessary to implement the function
C(n,m)
andfactorial(n)
)I add another answer because extending the previous one again seems to me that, in addition to making it too long, it would make it more confusing.
Support repeated elements in the input
In a comment to the previous answer the user (@Bugzilla) mentions the possibility that the input list can have repeated elements. This complicates the situation quite a bit since, while repeated elements are allowed on input, repeated combinations should not be allowed on output, but the exact semantics of what this might mean is not entirely clear.
For example, let's assume the input list is Are
[1,1,2,3]
those two1
that appear in the input to be considered equal? Or are they two1
"different" in some sense. If they are different, for example let's distinguish them with a subscript, the input sequence would be[1₁, 1₂, 2, 3]
and the possible combinations would be:But since in reality the two ones are represented the same, that is,
1
, the output would be:in which the last two combinations appear to be the same.
If the above output were valid (there are "apparently" repeated but "really" not repeated combinations because some of their elements are "really" different even though they have the same representation), it could be achieved by keeping the data in a list and using to get the pairs another list with the indices of that data. Once the index pairs are obtained, they are used to access the "true" data in the first list.
However, if the previous output were not admissible because the last two cases were considered identical, and therefore the only valid combinations would be:
then it can still be done, but with another approach.
another approach
The approach is not to use sets to hold each pair, but to return to the solution that used tuples, and also not to use sets to recursively descend to "combine" the rest of the list. In particular, I am referring to the line:
The idea of that line is that, once we have extracted a pair from the original list, we recursively call the function so that it gives us all the recombinations of what is left of the list. Since the considered pair does not have to correspond to the first two elements of the list
elementos
, we cannot do something likeelementos[2:]
obtaining "the rest". So I resorted to set arithmetic, turning the original list into a set and subtracting the subset with the extracted pair.Naturally that approach worked correctly only if there were no repeated elements in the list. Otherwise, casts to set will make repeated elements disappear.
But we can still build the sublist simply by making a copy of the original and removing from the copy (with
.remove()
) the elements that are in the considered pair. The code is a bit longer, but not as complex as I had feared.We can still use sets to detect repeated cases, since the elements of that set are no longer the numbers of the original list, but the tuples that we have been removing.
Por otro lado, ya que no hacemos conversiones a conjuntos de la lista de entrada, tenemos garantizado el orden en que se recorrerán los elementos, de modo que si en un momento aparece la tupla
(x,y)
sabemos que no saldrá ya nunca la tupla(y,x)
, por lo que ese problema (que nos obligó a usar conjuntos en vez de tuplas) también desaparece.Solución
En resumen, creo que esta implementación hace por fin lo que se espera, incluso en el caso de que haya elementos repetidos en la lista de entrada:
Ejemplos de ejecución: