What is a promise in Javascript?

Question

Asked: 2022-04-10 04:29:30 +0800 CST 2022-04-10 04:29:30 +0800 CST 2022-04-10 04:29:30 +0800 CST

Create a dictionary based on a dataframe in Python

772

It turns out that I have a dataframe about the coronavirus which I have grouped by country using the pandas groupby function to have the cases in each of them. This grouped dataframe has the following structure:

             5/22/20     5/23/20    5/24/20  ...

Country
Afghanistan     9219       10001    10585  ...
Albania          981         989      998  ...
Algeria         7918        8113     8306  ...
Andorra          762         762      762  ...
Angola            60          61       69  ...
....

You can see that the column headers are the dates and the rows contain each of the coronavirus cases (note that in the original dataframe the data is organized like this and I have not made any changes, I have only made a grouping).

So, what I would like to achieve is a dictionary that has the following format:

{
"Afghanistan": {"fecha": [5/22/20, 5/23/20, 5/24/20, ...], "casos": [9219, 10001, 10585...],
"Albania": {"fecha": [5/22/20, 5/23/20, 5/24/20, ...], "casos": [981, 989, 998,...],
...
}

I have tried to make the dictionary using the pandas function to_dict(), but the outputs are not what I wanted. I understand that for the "date" and "cases" fields to appear in the dictionary, the first thing would be to create them in the dataframe and then apply to_dict()to it, but since the dates are in the headers of each column of the dataframe and the cases are the data from each of the rows, I'm not quite sure how I could create these two new fields.

I show the output that I have obtained through the function to_dictand that is the one that would be closest to the searched dictionary

# En el dataframe agrupado sumo por grupos (que son los países), y lo transformo en diccionario a través del índice:

grouped.sum().to_dict('index')

Out:
{'Afghanistan': {'1/22/20': 0,
  '1/23/20': 0,
  '1/24/20': 0,
  '1/25/20': 0,
  '1/26/20': 0,
  '1/27/20': 0,
  '1/28/20': 0,
  '1/29/20': 0,
  '1/30/20': 0,
  '1/31/20': 0,
  '2/1/20': 0,
  '2/2/20': 0,
  '2/3/20': 0,
  '2/4/20': 0,
  '2/5/20': 0,
  '2/6/20': 0,
  '2/7/20': 0,
   ....

The previous dictionary would be the one that most resembles the one I am looking for, but since I have not created the "date" and "cases" fields, it does not appear as I would like, and the issue is that I do not know how I could create these fields due to the structure of the data frame.

Update:

I have managed to distribute the data to have two columns called "date" and "cases" thanks to the meltpandas function, which I applied on the original dataframe:

melt_cov = covid19.melt(id_vars=["Country/Region"], 
        var_name="fecha", 
        value_name="casos")

melt_cov

Out:
      Country/Region    fecha   casos
0        Afghanistan    1/22/20 0
1            Albania    1/22/20 0
2            Algeria    1/22/20 0
3            Andorra    1/22/20 0
4             Angola    1/22/20 0
... ... ... ... ...
35889         Vietnam   5/31/20 328

However when grouping by country and trying to make the dictionary I don't get the one I want:

melt_cov.groupby(["Country/Region", "fecha"]).sum().to_dict('index')

Out:
{('Afghanistan', '1/22/20'): {'casos': 0},
 ('Afghanistan', '1/23/20'): {'casos': 0},
 ('Afghanistan', '1/24/20'): {'casos': 0},
 ('Afghanistan', '1/25/20'): {'casos': 0},
 ('Afghanistan', '1/26/20'): {'casos': 0},
....


melt_cov.groupby(["Country/Region", "fecha"]).sum().to_dict('list')

Out:
{'casos': [0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
...

1 Answers

Voted

Adrián · Answer 1 · 2022-04-10T08:34:58+08:00

I am not an expert working with the module pandasbut I will try to give you a solution even if it is not the most optimal (maybe):

I'm going to try to simulate part of the dataframe:

import pandas as pd

# Seteamos los datos (como el ejemplo que tienes)
data = [
    [9219, 10001, 10585],
    [981, 989, 998],
]
df = pd.DataFrame(
    data,
    # Los indices que serian los paises:
    index = ['Algeria', 'Andorra'],
    # Las columnas que serian las fechas
    columns = ['5/22/20', '5/23/20', '5/24/20'],
)
# El nombre como indice del dataframe:
df.index.name = 'Countries'

The content of the dataframe ( df):

           5/22/20  5/23/20  5/24/20
Countries                           
Algeria       9219    10001    10585
Andorra        981      989      998

Now:

I don't exactly understand the reason for inserting the dates in a list and repeating it for each country. If you have 100 countries you will repeat the same dates 100 times. Still, you could store it in a list like this:

dates = df.columns.values.tolist()

Rest of code explained between comments:

# Creamos la variable como diccionario
result = dict()

# Por cada una de las columnas del dataframe
for column in df:
    # Accedemos a los datos del indice, es decir, pais y el valor de la columna iterando por cada uno de los items:
    for country, row_value in df[column].iteritems():
        # Si no existe la key country en el diccionario lo creamos
        if not country in result:
            result[country] = {}
        # Si no existe la key casos lo creamos
        if not "casos" in result[country]:
            result[country]["casos"] = []
        # Si no existe la key fecha lo creamos
        if not "fecha" in result[country]:
            result[country]["fecha"] = dates
        # Anadimos a la lista de casos los elemento del resto de columnas
        result[country]["casos"].append(row_value)

Full code:

import pandas as pd

# Seteamos los datos (como el ejemplo que tienes)
data = [
    [9219, 10001, 10585],
    [981, 989, 998],
]
df = pd.DataFrame(
    data,
    # Los indices que serian los paises:
    index = ['Algeria', 'Andorra'],
    # Las columnas que serian las fechas
    columns = ['5/22/20', '5/23/20', '5/24/20'],
)
# El nombre como indice del dataframe:
df.index.name = 'Countries'

dates = df.columns.values.tolist()

# Creamos la variable como diccionario
result = dict()

# Por cada una de las columnas del dataframe
for column in df:
    # Accedemos a los datos del indice, es decir, pais y el valor de la columna iterando por cada uno de los items:
    for country, row_value in df[column].iteritems():
        # Si no existe la key country en el diccionario lo creamos
        if not country in result:
            result[country] = {}
        # Si no existe la key casos lo creamos
        if not "casos" in result[country]:
            result[country]["casos"] = []
        # Si no existe la key fecha lo creamos
        if not "fecha" in result[country]:
            result[country]["fecha"] = dates
        # Anadimos a la lista de casos los elemento del resto de columnas
        result[country]["casos"].append(row_value)

print(result)

Result:

{'Algeria': {'casos': [9219, 10001, 10585], 'fecha': ['5/22/20', '5/23/20', '5/24/20']}, 'Andorra': {'casos': [981, 989, 998], 'fecha': ['5/22/20', '5/23/20', '5/24/20']}}

Create a dictionary based on a dataframe in Python

Update:

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?