I wanted to know if it is possible, and how it is done, to loop over groups of a Pandas dataframe. I have a text file, which I import with Pandas:
file= ("E:\Test.txt")
df = pd.read_csv(file, sep=';', dtype='str')
df
This gives me the following information:
Fecha ID Alum Par Nombre Codigo Descrip Cantidad Tiempo
0 09/03/2021 A935 809 00 CARMEN 660192 Hard Disk 1 2
1 09/03/2021 A935 809 00 CARMEN 660412 Floppy 25 1.5
2 09/03/2021 A935 809 00 CARMEN 660475 SSD 1 3
3 09/03/2021 A217 800 00 CONCEPCION 661070 DVD 1 15
4 09/03/2021 A217 800 00 CONCEPCION 662734 CD 3 36
5 09/03/2021 A218 801 00 ELVIRA 660192 Hard Disk 1 2
6 09/03/2021 A232 909 16 LORENZO 660343 Ata Disk 1 2
7 09/03/2021 A232 909 16 LORENZO 660475 SSD 1 3
the first column being the index.
Since I am interested in grouping it by ID, I apply the following code:
gb = df.groupby(['ID'])
for k, gp in gb:
print ('key=' + str(k))
print (gp)
with which I reach the groups that interest me, and that helps me to see how many rows each 'ID' has. The result is the following
key=A217
Fecha ID Alum Par Nombre Codigo Descrip Cantidad Tiempo
3 09/03/2021 A217 800 00 CONCEPCION 661070 DVD 1 15
4 09/03/2021 A217 800 00 CONCEPCION 662734 CD 3 36
key=A218
Fecha ID Socio Par Nombre Codigo Descrip Cantidad Tiempo
5 09/03/2021 A218 801 00 ELVIRA 660192 Hard Disk 1 2
key=A232
Fecha ID Socio Par Nombre Codigo Descrip Cantidad Tiempo
6 09/03/2021 A232 909 16 LORENZO 660343 Ata Disk 1 2
7 09/03/2021 A232 909 16 LORENZO 660475 SSD 1 3
key=A935
Fecha ID Socio Par Nombre Codigo Descrip Cantidad Tiempo
0 09/03/2021 A935 809 00 CARMEN 660192 Hard Disk 1 2
1 09/03/2021 A935 809 00 CARMEN 660412 Floppy 25 1.5
2 09/03/2021 A935 809 00 CARMEN 660475 SSD 1 3
My query is how I can iterate within the groups in order to have this information easier to see for each group (ID):
Resumen
El 'Alum', 'Nombre', tiene 'Cantidad' de 'Descrip'.
Fin
Where you would have to have 1 line per group line. Is there a way to iterate within each group in Pandas? Any suggestion is welcome. From already thank you very much.
It is not entirely clear what you are asking, since it seems that you want to obtain a line in the output for each of the rows of each group, but if so, it is not very clear why you group them before...
In any case, the thing would be as follows. We prepare a function that will receive a row (either from the group or from the original dataframe) and that will return a string with the "summary" you request:
Now we have several ways to apply this function:
1) Iterating through rows of
df
With the help of
.iterrows()
on the complete dataframe:Comes out:
2) Iterating through rows in each group
That is, iterating over groups and then over rows of the group:
The result is exactly the same, although perhaps with the groups in a different order. That's why I don't understand why you group before.
3) Using
.apply()
Instead of iterating, we use
df.apply()
so that it applies our function to each row. The result will be another dataframe with a single column whose contents will be the "summary" strings:Result:
Extension
Another approach, in which the pre-creation of groups would make more sense, could be to summarize each group in a single line, which could be for example:
In this case we need a function that can receive the Quantity column and the Description column, in which there can be several elements, and from them synthesize the text string "15 DVDs and 3 CDs", for example. We want that if there is only one element, it returns a string like this: "15 DVD", but if there are two, separate them by "and", and if there are more than two, separate the first ones by commas and the last one by "and".
This function would do that:
Now we can make use of it while iterating through the groups:
And the output would be: