I have the following DataFrame df
:
df=
codigo
1 901452
2 904443
3 071111
4 360102
5 891201
6 893420
What I need is to determine the frequency of occurrence in the column codigo
taking into account the first two values of the number. In this case the answer would be:
07 1
36 1
89 2
90 2
I used the following line: df.codigo.value_counts()
but it counts the complete values, I don't know how to adjust it so that it does it taking into account only the first two numbers.
In addition, I first tried to apply a boolean filter checking if it started with the numbers and then do the count, but when doing the first part with the following instruction for each possible number df['codigo'].astype(str).str.startswith('89')
, all of them came out false.
I appreciate what you can help me
You can create a new column (
codigo_short
in the example) with the first two characters and calculate the frequency based on that:This gives you the dataframe
df_grouped
, which has this form (based on the example you gave):