A recent question, which didn't specify what language/technology I wanted to solve it with, prompted me to think about how I would do it with Pandas.
The question can be rephrased in the context of Pandas as follows.
Given a dataframe such as the one I build here:
import pandas as pd
datos = [["jose", "manzana"],
["andres","pera"],
["luis", "pera"],
["jose", "manzana"]]
df = pd.DataFrame(datos, columns=["Nombre", "Fruta"])
print(df)
Which looks like this:
Nombre Fruta
0 jose manzana
1 andres pera
2 luis pera
3 jose manzana
I want to obtain another one in which the index is the names (without repetitions), the columns are the fruits, and the cells are the number of times each name appears with each fruit. Namely:
Fruta manzana pera
Nombre
andres 0 1
jose 2 0
luis 0 1
my attempt
It seems that this is asking for a pivot_table()
, but in order to solve it I have not thought of another way than to add to the original dataframe an extra column full of 1
:
>>> print(df.assign(Cuantas=1))
Nombre Fruta Cuantas
0 jose manzana 1
1 andres pera 1
2 luis pera 1
3 jose manzana 1
In order to use that column as a value to add in the pivot_table
. So my solution looks like this:
sol = df.assign(Cuantas=1).pivot_table(
index="Nombre",
columns="Fruta",
values="Cuantas",
aggfunc="sum",
fill_value=0)
and indeed this gives the desired result, but it seems a bit cumbersome to me.
The question is , is there an easier way? .
What you are looking for seems to be what is called contingency table , it
pandas
has that functionality incrosstab()
, you can do something like this:Departure: