I have a Dataframe with an identifier column and 4 feature columns for each identifier. Here is an example:
Código C1 C2 C3 C4
333 ab aa cc
222 cc
111 mm nn xx ff
111 xx
222 nn mm zz
What I require is to go through the Dataframe to find how many records there are of each identifier and make a frequency count of the characteristics that correspond to each one, in one column without taking into account if they are different characteristics and in another taking into account if they are different . In other words, in my example, the answer would be:
Código Apariciones Características Características diferentes
111 2 5 4
222 2 4 4
333 1 3 3
I have tried to do the following:
First get the list of codes eliminating the duplicates
codigos = df['codigo']
codigos= codigos.drop_duplicates()
Then try doing a 'for' to log the number of features per code like this:
for i in codigos.values:
datosindividuales=df[df['codigo']==i]
apariciones=len(datosindividuales)
I don't know how to continue to find the frequency of the features, I tried groupby but it doesn't give me what I need. I am new to programming. I appreciate what you can help me
First we load your
dataframe
example:The first thing we are going to do is make a list with the values of each column for each
Codigo
:Now we simply have to count the total elements and the different elements of each
Codigo
: