In the following code it shows a grouping, adding the total of occurrences, assuming that you only have df2
, what is the best way to obtain df
?
>>> import pandas as pd
>>> d = {
"var_1":['a','a','a','c','c','c','c'],
"var_2":['b','b','b','d','d','d','d'],
"total":[1,1,1,1,1,1,1],
"days":['4','4','4','2','2','2','2']
}
>>> # Create a dataframe since a dictionary
>>> df = pd.DataFrame(d)
>>> df
var_1 var_2 total days
0 a b 1 4
1 a b 1 4
2 a b 1 4
3 c d 1 2
4 c d 1 2
5 c d 1 2
6 c d 1 2
>>> df2 = df.groupby(['var_1','var_2','days']).sum('total').reset_index()
>>> df2
var_1 var_2 days total
0 a b 4 3
1 c d 2 4
Step 1: Create a column where each row contains a list of the same number of elements as repetitions of the row needed.
Step 2: Use
explode
to split each list into rows.Step 3: Drop the column.
Step 4: Set the total column to 1
Step 5: Reset the index.
One way to do it:
Reference