It turns out that I have a dataframe about the coronavirus which I have grouped by country using the pandas groupby function to have the cases in each of them. This grouped dataframe has the following structure:
5/22/20 5/23/20 5/24/20 ...
Country
Afghanistan 9219 10001 10585 ...
Albania 981 989 998 ...
Algeria 7918 8113 8306 ...
Andorra 762 762 762 ...
Angola 60 61 69 ...
....
You can see that the column headers are the dates and the rows contain each of the coronavirus cases (note that in the original dataframe the data is organized like this and I have not made any changes, I have only made a grouping).
So, what I would like to achieve is a dictionary that has the following format:
{
"Afghanistan": {"fecha": [5/22/20, 5/23/20, 5/24/20, ...], "casos": [9219, 10001, 10585...],
"Albania": {"fecha": [5/22/20, 5/23/20, 5/24/20, ...], "casos": [981, 989, 998,...],
...
}
I have tried to make the dictionary using the pandas function to_dict()
, but the outputs are not what I wanted. I understand that for the "date" and "cases" fields to appear in the dictionary, the first thing would be to create them in the dataframe and then apply to_dict()
to it, but since the dates are in the headers of each column of the dataframe and the cases are the data from each of the rows, I'm not quite sure how I could create these two new fields.
I show the output that I have obtained through the function to_dict
and that is the one that would be closest to the searched dictionary
# En el dataframe agrupado sumo por grupos (que son los países), y lo transformo en diccionario a través del índice:
grouped.sum().to_dict('index')
Out:
{'Afghanistan': {'1/22/20': 0,
'1/23/20': 0,
'1/24/20': 0,
'1/25/20': 0,
'1/26/20': 0,
'1/27/20': 0,
'1/28/20': 0,
'1/29/20': 0,
'1/30/20': 0,
'1/31/20': 0,
'2/1/20': 0,
'2/2/20': 0,
'2/3/20': 0,
'2/4/20': 0,
'2/5/20': 0,
'2/6/20': 0,
'2/7/20': 0,
....
The previous dictionary would be the one that most resembles the one I am looking for, but since I have not created the "date" and "cases" fields, it does not appear as I would like, and the issue is that I do not know how I could create these fields due to the structure of the data frame.
Update:
I have managed to distribute the data to have two columns called "date" and "cases" thanks to the melt
pandas function, which I applied on the original dataframe:
melt_cov = covid19.melt(id_vars=["Country/Region"],
var_name="fecha",
value_name="casos")
melt_cov
Out:
Country/Region fecha casos
0 Afghanistan 1/22/20 0
1 Albania 1/22/20 0
2 Algeria 1/22/20 0
3 Andorra 1/22/20 0
4 Angola 1/22/20 0
... ... ... ... ...
35889 Vietnam 5/31/20 328
However when grouping by country and trying to make the dictionary I don't get the one I want:
melt_cov.groupby(["Country/Region", "fecha"]).sum().to_dict('index')
Out:
{('Afghanistan', '1/22/20'): {'casos': 0},
('Afghanistan', '1/23/20'): {'casos': 0},
('Afghanistan', '1/24/20'): {'casos': 0},
('Afghanistan', '1/25/20'): {'casos': 0},
('Afghanistan', '1/26/20'): {'casos': 0},
....
melt_cov.groupby(["Country/Region", "fecha"]).sum().to_dict('list')
Out:
{'casos': [0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
...
I am not an expert working with the module
pandas
but I will try to give you a solution even if it is not the most optimal (maybe):I'm going to try to simulate part of the dataframe:
The content of the dataframe (
df
):Now: