I have the following base in pd.DataFrame
:
Index ID Date Days Name
1 1 5-1-20 10 Josh
2 1 9-1-20 10 Josh
3 1 19-1-20 6 Josh
4 2 1-1-20 10 Mike
5 3 1-4-20 10 George
6 4 1-2-20 10 Rose
7 4 11-5-20 5 Rose
8 5 1-9-20 10 Mark
9 6 1-4-21 10 Joe
10 7 1-1-21 10 Jill
And I want the IDs to not be repeated in the database, expanding the Date and Days columns as needed. In this case, 3 columns of Dates and 3 columns of days, since ID 1 is the one that is repeated the most. The desired result would be the following:
Index ID Date 1 Date 2 Date 3 Days1 Days2 Days3 Name
1 1 5-1-20 9-1-20 19-1-20 10 10 6 Josh
2 2 1-1-20 10 Mike
3 3 1-4-20 10 George
4 4 1-2-20 11-5-20 10 5 Rose
5 5 1-9-20 10 Mark
6 6 1-4-21 10 Joe
7 7 1-1-21 10 Jill
The first thing you should do is group your dataframe by the identifiers you want, in this case ID and Name, using
list
as a function to add the columns you want to expandThis generates a dataframe of this type:
Now, to expand those lists to columns we can make a small function that calculates the number of resulting columns and generates a dataframe with the series of that column. For this we can use the toList() method and then add those columns to our existing dataframe.
Now we should just call our function for each column we want to expand:
The whole script would be simply:
The result would be the following (removing the nan):