What is a promise in Javascript?

Question

Luis Desde Canada

Asked: 2022-04-17 11:54:02 +0800 CST 2022-04-17 11:54:02 +0800 CST 2022-04-17 11:54:02 +0800 CST

How to iterate over groups of a dataframe

772

I wanted to know if it is possible, and how it is done, to loop over groups of a Pandas dataframe. I have a text file, which I import with Pandas:

file= ("E:\Test.txt")
df = pd.read_csv(file, sep=';', dtype='str')
df

This gives me the following information:

      Fecha      ID   Alum Par  Nombre  Codigo  Descrip Cantidad    Tiempo
0   09/03/2021  A935    809 00  CARMEN  660192  Hard Disk   1   2
1   09/03/2021  A935    809 00  CARMEN  660412  Floppy  25  1.5
2   09/03/2021  A935    809 00  CARMEN  660475  SSD 1   3
3   09/03/2021  A217    800 00  CONCEPCION  661070  DVD 1   15
4   09/03/2021  A217    800 00  CONCEPCION  662734  CD  3   36
5   09/03/2021  A218    801 00  ELVIRA  660192  Hard Disk   1   2
6   09/03/2021  A232    909 16  LORENZO 660343  Ata Disk    1   2
7   09/03/2021  A232    909 16  LORENZO 660475  SSD 1   3

the first column being the index.

Since I am interested in grouping it by ID, I apply the following code:

gb = df.groupby(['ID'])
for k, gp in gb:
   print ('key=' + str(k))
   print (gp)

with which I reach the groups that interest me, and that helps me to see how many rows each 'ID' has. The result is the following

key=A217
        Fecha    ID  Alum Par    Nombre  Codigo Descrip Cantidad Tiempo
3  09/03/2021  A217   800  00  CONCEPCION  661070     DVD        1     15
4  09/03/2021  A217   800  00  CONCEPCION  662734      CD        3     36
key=A218
        Fecha    ID Socio Par  Nombre  Codigo    Descrip Cantidad Tiempo
5  09/03/2021  A218   801  00  ELVIRA  660192  Hard Disk        1      2
key=A232
        Fecha    ID Socio Par   Nombre  Codigo   Descrip Cantidad Tiempo
6  09/03/2021  A232   909  16  LORENZO  660343  Ata Disk        1      2
7  09/03/2021  A232   909  16  LORENZO  660475       SSD        1      3
key=A935
        Fecha    ID Socio Par  Nombre  Codigo    Descrip Cantidad Tiempo
0  09/03/2021  A935   809  00  CARMEN  660192  Hard Disk        1      2
1  09/03/2021  A935   809  00  CARMEN  660412     Floppy       25    1.5
2  09/03/2021  A935   809  00  CARMEN  660475        SSD        1      3

My query is how I can iterate within the groups in order to have this information easier to see for each group (ID):

   Resumen
   El 'Alum', 'Nombre', tiene 'Cantidad' de 'Descrip'.
   Fin

Where you would have to have 1 line per group line. Is there a way to iterate within each group in Pandas? Any suggestion is welcome. From already thank you very much.

1 Answers

Voted

abulafia · Answer 1 · 2022-04-17T12:33:43+08:00

It is not entirely clear what you are asking, since it seems that you want to obtain a line in the output for each of the rows of each group, but if so, it is not very clear why you group them before...

In any case, the thing would be as follows. We prepare a function that will receive a row (either from the group or from the original dataframe) and that will return a string with the "summary" you request:

def resume_fila(fila):
  return "El {}, {} tiene {} {}".format(
      fila.Alum, fila.Nombre, fila.Cantidad, fila.Descrip
  )

Now we have several ways to apply this function:

1) Iterating through rows of`df`

With the help of .iterrows()on the complete dataframe:

for idx, fila in df.iterrows():
  print(resume_fila(fila))

Comes out:

El 809, CARMEN tiene 1 Hard Disk
El 809, CARMEN tiene 25 Floppy
El 809, CARMEN tiene 1 SSD
El 800, CONCEPCION tiene 1 DVD
El 800, CONCEPCION tiene 3 CD
El 801, ELVIRA tiene 1 Hard Disk
El 909, LORENZO tiene 1 Ata Disk
El 909, LORENZO tiene 1 SSD

2) Iterating through rows in each group

That is, iterating over groups and then over rows of the group:

for k, gp in gb:
  for idx, fila in gp.iterrows():
    print(resume_fila(fila))

The result is exactly the same, although perhaps with the groups in a different order. That's why I don't understand why you group before.

3) Using`.apply()`

Instead of iterating, we use df.apply()so that it applies our function to each row. The result will be another dataframe with a single column whose contents will be the "summary" strings:

df.apply(resume_fila, axis=1)

Result:

0    El 809, CARMEN tiene 1 Hard Disk
1      El 809, CARMEN tiene 25 Floppy
2          El 809, CARMEN tiene 1 SSD
3      El 800, CONCEPCION tiene 1 DVD
4       El 800, CONCEPCION tiene 3 CD
5    El 801, ELVIRA tiene 1 Hard Disk
6    El 909, LORENZO tiene 1 Ata Disk
7         El 909, LORENZO tiene 1 SSD

Extension

Another approach, in which the pre-creation of groups would make more sense, could be to summarize each group in a single line, which could be for example:

El 800, CONCEPCION tiene 15 DVD 1 y 3 CD

In this case we need a function that can receive the Quantity column and the Description column, in which there can be several elements, and from them synthesize the text string "15 DVDs and 3 CDs", for example. We want that if there is only one element, it returns a string like this: "15 DVD", but if there are two, separate them by "and", and if there are more than two, separate the first ones by commas and the last one by "and".

This function would do that:

def enumerar_items(cant, desc):
  items = list(zip(cant, desc))
  items_str = ", ".join("{} {}".format(cant, desc) for cant, desc in items[:-1])
  if len(items)>1:
    items_str+= " y "
  items_str += "{} {}".format(*items[-1])
  return items_str

Now we can make use of it while iterating through the groups:

for k, gp in gb:
  alum = gp.Alum.iloc[0]
  nombre = gp.Nombre.iloc[0]
  items = enumerar_items(gp.Cantidad, gp.Descrip)
  print("El {}, {} tiene {}".format(alum, nombre, items))

And the output would be:

El 800, CONCEPCION tiene 1 DVD y 3 CD
El 801, ELVIRA tiene 1 Hard Disk
El 909, LORENZO tiene 1 Ata Disk y 1 SSD
El 809, CARMEN tiene 1 Hard Disk, 25 Floppy y 1 SSD

How to iterate over groups of a dataframe

1) Iterating through rows of`df`

2) Iterating through rows in each group

3) Using`.apply()`

Extension

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

How to iterate over groups of a dataframe

1 Answers

1) Iterating through rows ofdf

2) Iterating through rows in each group

3) Using.apply()

Extension

1) Iterating through rows of`df`

3) Using`.apply()`