What is a promise in Javascript?

Question

Asked: 2021-12-10 06:48:04 +0800 CST 2021-12-10 06:48:04 +0800 CST 2021-12-10 06:48:04 +0800 CST

Group and sum data from a CSV with Bash or AWK

772

I have a CSV file with the following structure:

country,year,sex,age,suicides_no,population,suicides/100k pop,country-year,HDI for year, gdp_for_year ($) ,gdp_per_capita ($),generation
Albania,1987,male,15-24 years,21,312900,6.71,Albania1987,,2156624900,796,Generation X
Albania,1987,male,35-54 years,16,308000,5.19,Albania1987,,2156624900,796,Silent
Albania,1987,female,15-24 years,14,289700,4.83,Albania1987,,2156624900,796,Generation X
Albania,1987,male,75+ years,1,21800,4.59,Albania1987,,2156624900,796,G.I. Generation
Albania,1987,male,25-34 years,9,274300,3.28,Albania1987,,2156624900,796,Boomers
Albania,1987,female,75+ years,1,35600,2.81,Albania1987,,2156624900,796,G.I. Generation
Albania,1987,female,35-54 years,6,278800,2.15,Albania1987,,2156624900,796,Silent
Albania,1987,female,25-34 years,4,257200,1.56,Albania1987,,2156624900,796,Boomers
Albania,1987,male,55-74 years,1,137500,0.73,Albania1987,,2156624900,796,G.I. Generation
Albania,1987,female,5-14 years,0,311000,0,Albania1987,,2156624900,796,Generation X
Albania,1987,female,55-74 years,0,144600,0,Albania1987,,2156624900,796,G.I. Generation
Albania,1987,male,5-14 years,0,338200,0,Albania1987,,2156624900,796,Generation X
Albania,1988,female,75+ years,2,36400,5.49,Albania1988,,2126000000,769,G.I. Generation
Albania,1988,male,15-24 years,17,319200,5.33,Albania1988,,2126000000,769,Generation X

I am trying to script a for loop to get the number of suicides by country, year and gender. For now, the loop I have done is the following:

#!/bin/bash

# Primero, guardamos en varios CSV las los países, años y sexo.

tail -n +2 suicidios_final.csv | cut -d "," -f1 | sort | uniq > country.csv
tail -n +2 suicidios_final.csv | cut -d "," -f2 | sort | uniq > year.csv
tail -n +2 suicidios_final.csv | cut -d "," -f3 | sort | uniq > sex.csv

# Creamos arrays de las las variables anteriores mediante el comando mapfile:

mapfile -t countries < country.csv
mapfile -t years < year.csv
mapfile -t sex < sex.csv

# Finalmente, realizamos la iteración mediante un bucle for para los países, otro bucle for anidado para los años y un tercero para el sexo.
# Además, añadiremos un color diferente para cada una de las variables, para distinguirlas bien:

for i in "${countries[@]}"; do
 echo -e "\e[36m== $i ==\e[0m"
 for j in "${years[@]}"; do
  echo -e "     \e[33m$j\e[0m"
  for k in "${sex[@]}"; do
   echo -e "     \e[31m$k\e[0m"  
   tail -n +2 suicidios_final.csv | grep -F "$i" | grep -F "$j" | grep -F "$k" > bucle.csv
   suicidios=$(cat bucle.csv | cut -d "," -f5 | paste -s -d "+" | bc)
   echo -e "      \e[34mNúmero de suicidios: $suicidios\e[0m"
  done
 done
done

However, when executing the script, the output I get is not the desired one, since the loop performs the sums of the suicides for the "female" category of the sex variable correctly, but for the "male" category what it is doing is adding the rows regardless of whether it is "male" or "female":

./script.sh

== Albania ==
     1985
     female
      Número de suicidios: 15
     male
      Número de suicidios: 50
     1986
     female
      Número de suicidios: 
     male
      Número de suicidios: 
     1987
     female
      Número de suicidios: 25
     male
      Número de suicidios: 73
     1988
     female
      Número de suicidios: 22
     male
      Número de suicidios: 63
     1989
     female
      Número de suicidios: 15
     male
      Número de suicidios: 68
     1990
     female
      Número de suicidios: 
     male
      Número de suicidios: 
     1991
     female
      Número de suicidios: 
     male
      Número de suicidios: 
     1992
     female
      Número de suicidios: 14
     male
      Número de suicidios: 47
     1993
     female
      Número de suicidios: 27
     male
      Número de suicidios: 73
     1994
     female
      Número de suicidios: 15
     male
      Número de suicidios: 50
     1995
     female
      Número de suicidios: 34
     male
      Número de suicidios: 88
     1996
     female
      Número de suicidios: 39
     male
      Número de suicidios: 89

             ......

Actually, in the first result I already get errors, because I do not have data for Albania in the year 1985, but I have looked at the result of other different countries in various years and I do not see this type of error occurring, from what I understand that the error may be from the file data itself. Regardless, the error that I do not understand is that of the sex variable, because in the "female" category section it does add the number of suicides correctly, but then for "male" it gives me the sum of both the cases of "male as for "female". I know that my question is a bit difficult because it is a triple nested for loop and it will not be easy to see an error at first glance, but if someone knows what could be going on and tell me I would really appreciate it.

1 Answers

Voted

Cuauhtli · Answer 1 · 2021-12-10T09:25:52+08:00

Restate your problem in two different ways; the first using GNU Datamash, and the second with a awk.

The example file I took it from is this one I called suicidios_final.csv:

country,year,sex,age,suicides_no,population,suicides/100k pop,country-year,HDI for year, gdp_for_year ($) ,gdp_per_capita ($),generation
Albania,1987,male,15-24 years,21,312900,6.71,Albania1987,,2156624900,796,Generation X
Albania,1988,male,35-54 years,16,308000,5.19,Albania1987,,2156624900,796,Silent
Albania,1988,female,15-24 years,14,289700,4.83,Albania1987,,2156624900,796,Generation X
Albania,1987,male,75+ years,1,21800,4.59,Albania1987,,2156624900,796,G.I. Generation
Albania,1987,male,25-34 years,9,274300,3.28,Albania1987,,2156624900,796,Boomers
Albania,1987,female,75+ years,1,35600,2.81,Albania1987,,2156624900,796,G.I. Generation
Albania,1989,female,35-54 years,6,278800,2.15,Albania1987,,2156624900,796,Silent
Albania,1987,female,25-34 years,4,257200,1.56,Albania1987,,2156624900,796,Boomers
Albania,1987,male,55-74 years,1,137500,0.73,Albania1987,,2156624900,796,G.I. Generation
México,1987,female,5-14 years,0,311000,0,México1987,,2156624900,796,Generation X
México,1989,female,55-74 years,0,144600,0,México1987,,2156624900,796,G.I. Generation
México,1989,male,5-14 years,0,338200,0,México1987,,2156624900,796,Generation X
México,1988,female,75+ years,2,36400,5.49,México1988,,2126000000,769,G.I. Generation
México,1988,male,15-24 years,17,319200,5.33,México1988,,2126000000,769,Generation X
México,1988,male,15-24 years,17,319200,5.33,México1988,,2126000000,769,Generation X
Colombia,1988,male,15-24 years,17,319200,5.33,Colombia1988,,2126000000,769,Generation X
Colombia,1987,female,5-14 years,0,311000,0,Colombia1987,,2156624900,796,Generation X
Colombia,1967,female,55-74 years,0,144600,0,Colombia1987,,2156624900,796,G.I. Generation
Colombia,1957,male,5-14 years,0,338200,0,Colombia1987,,2156624900,796,Generation X
Colombia,1988,female,75+ years,2,36400,5.49,Colombia1988,,2126000000,769,G.I. Generation
Colombia,1988,male,15-24 years,17,319200,5.33,Colombia1988,,2126000000,769,Generation X
Colombia,1988,male,15-24 years,17,319200,5.33,Colombia1988,,2126000000,769,Generation X
Colombia,1988,male,15-24 years,17,319200,5.33,Colombia1988,,2126000000,769,Generation X

With`datamash`

In a single line:

$ datamash --sort -t , -H -g 1,2,3 sum 5 < suicidios_final.csv | column -t -s ,

Where I ask datamash to group by fields 1,2,3, and then add field 5 using the "," character as a separator.

Resulting in:

GroupBy(country)  GroupBy(year)  GroupBy(sex)  sum(suicides_no)
Albania           1987           female        5
Albania           1987           male          32
Albania           1988           female        14
Albania           1988           male          16
Albania           1989           female        6
Colombia          1957           male          0
Colombia          1967           female        0
Colombia          1987           female        0
Colombia          1988           female        2
Colombia          1988           male          68
México            1987           female        0
México            1988           female        2
México            1988           male          34
México            1989           female        0
México            1989           male          0

If you don't have datamash, install it with sudo apt install datamash.

Wearing`awk`

Here I just mixed this script that comes in the official documentation to display ("walk" through) the content of a multidimensional array.

Then I used the respective fields assigning them as keys of the array. Thus, awkit is in charge of doing all the work, since there cannot be repeated keys in an array, so the grouping is done automatically due to the nature of the array keys, and the sum is only indicated with the operator +=on the fifth field.

In a file called main.awk, we put the following content:

#!/usr/bin/awk

# 1. Country
# 2. Year
# 3. Sex
# 4. Age
# 5. Suicides No

function walk_array(arr, name, i) {
    for (i in arr) {
        if (isarray(arr[i]))
            walk_array(arr[i], (name "[" i "]"))
        else
            printf("%s[%s] = %s\n", name, i, arr[i])
    }
}
# Aquí nos saltamos el primer renglón
NR!=1{
    reporte[$1][$2][$3]+=$5
}
END {       
    walk_array(reporte,"Reporte: ")
}

And in the terminal we run:

$ awk -f main.awk -F , suicidios_final.csv

Getting:

Reporte: [México][1987][female] = 0
Reporte: [México][1988][male] = 34
Reporte: [México][1988][female] = 2
Reporte: [México][1989][male] = 0
Reporte: [México][1989][female] = 0
Reporte: [Albania][1987][female] = 5
Reporte: [Albania][1987][male] = 32
Reporte: [Albania][1988][female] = 14
Reporte: [Albania][1988][male] = 16
Reporte: [Albania][1989][female] = 6
Reporte: [Colombia][1957][male] = 0
Reporte: [Colombia][1967][female] = 0
Reporte: [Colombia][1987][female] = 0
Reporte: [Colombia][1988][female] = 2
Reporte: [Colombia][1988][male] = 68

Note:

Using a large mix of GNU/Linux tools can be tempting in the first few years, but it's highly inefficient (and highly unattractive), as each program opens file descriptors, then closes them, they can create temporary programs that you later delete. Keep in mind that Bash cannot be used lightly as if it were a programming language, but rather a great tool for orchestrating programs.

For this reason, it is better to use only utilities dedicated to the task that we want or powerful languages such as sed, awk, python, perl, etc.

Group and sum data from a CSV with Bash or AWK

With`datamash`

Wearing`awk`

Note:

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

Group and sum data from a CSV with Bash or AWK

1 Answers

Withdatamash

Wearingawk

Note:

With`datamash`

Wearing`awk`