What is a promise in Javascript?

Question

Asked: 2020-03-27 12:12:43 +0800 CST 2020-03-27 12:12:43 +0800 CST 2020-03-27 12:12:43 +0800 CST

How to loop through a dataframe to calculate frequencies of values in multiple columns?

772

I have a Dataframe with an identifier column and 4 feature columns for each identifier. Here is an example:

    Código   C1  C2  C3  C4
    333      ab  aa  cc
    222      cc
    111      mm   nn  xx  ff
    111      xx
    222      nn   mm  zz

What I require is to go through the Dataframe to find how many records there are of each identifier and make a frequency count of the characteristics that correspond to each one, in one column without taking into account if they are different characteristics and in another taking into account if they are different . In other words, in my example, the answer would be:

    Código Apariciones Características Características diferentes
    111      2         5                  4
    222      2         4                  4
    333      1         3                  3

I have tried to do the following:

First get the list of codes eliminating the duplicates

    codigos = df['codigo']
    codigos= codigos.drop_duplicates()

Then try doing a 'for' to log the number of features per code like this:

 for i in codigos.values:
        datosindividuales=df[df['codigo']==i]
        apariciones=len(datosindividuales)

I don't know how to continue to find the frequency of the features, I tried groupby but it doesn't give me what I need. I am new to programming. I appreciate what you can help me

1 Answers

Voted

Patricio Moracho · Answer 1 · 2020-03-27T19:28:30+08:00

First we load your dataframeexample:

from io import StringIO
import pandas as pd

TESTDATA = StringIO("""Codigo;C1;C2;C3;C4
333;ab;aa;cc;
222;cc;;;
111;mm;nn;xx;ff
111;xx;;;
222;nn;mm;zz;
""")

df = pd.read_csv(TESTDATA, sep=";")

The first thing we are going to do is make a list with the values of each column for each Codigo:

df['CN'] = df[['C1','C2','C3','C4']].values.tolist()
df = df.groupby('Codigo').agg({'CN': 'sum', 'Codigo': 'count'})
# Borramos los valores nan de cada lista
df['CN'] = df['CN'].apply(lambda x: [e for e in x if str(e) != "nan"])
# Renombramos columna
df=df.rename(columns = {'Codigo':'Apar'})
print(df)

                          CN  Apar
Codigo
111     [mm, nn, xx, ff, xx]     2
222         [cc, nn, mm, zz]     2
333             [ab, aa, cc]     1

Now we simply have to count the total elements and the different elements of each Codigo:

df['Car.'] = df['CN'].apply(lambda x: len(x))
df['Car.Dif'] = df['CN'].apply(lambda x: len(set(x)))
# Cosmetica: Borramos la columna de trabajo y reseteamos índice
df = df.drop('CN', 1)
df = df.reset_index()

print(df)

   Codigo  Apar  Car.  Car.Dif
0     111     2     5        4
1     222     2     4        4
2     333     1     3        3

How to loop through a dataframe to calculate frequencies of values in multiple columns?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

How to loop through a dataframe to calculate frequencies of values ​​in multiple columns?

1 Answers

How to loop through a dataframe to calculate frequencies of values in multiple columns?