In a personal.csv file I have the following information:
Center_code Name City_code Men Women
800 SCHOOL NUMBER ONE 8000 28 31
801 SCHOOL NUMBER TWO 8010 33 9
802 INSTITUTE GALCERAN PINE 8020 16 22
803 EASD PINE 8030 43 17
804 SCHOOL NUMBER THREE 8040 14 5
805 INSTITUTE CAN CLOS 8050 6 18
806 ESCRBC CAT 8060 5 6
807 SCHOOL NUMBER FOUR 8070 9 8
808 EASD TOWER 8080 5 11
... ........... .... ... ...
I have the following awk command to calculate the sum of the values of a column of a csv file:
awk '{SUM+=$4}END{print SUM}' file2
With the following command I calculate the average of the column:
awk 'BEGIN{s=0;}{s=s+$4;}END{print s/NR;}' file2
And with this last command I calculate the standard deviation of the same column:
awk '{sum+=$4; sumsq+=$4*$4} END {print sqrt(sumsq/NR - (sum/NR)^2)}' file2
I need to implement the above three expressions in a single awk script (in a file using the shebang#!/usr/bin/awk -f
, not using a command line), saving each result in a separate variable.
The output should be of the following type (values are made up):
MEN
Total men = 134(24%) --- Average men = 2,978 --- Standar deviation men = 2,266
WOMEN
Total women = 421(75%) --- Average women = 9,356 --- Standar deviation women = 7,874
The calculations must be done with those rows of the personal.csv file that contain a school type that must be passed to the script. For example, if we pass it as EASD school , it should only do the calculations with the following rows:
803 EASD PINE 8030 43 17
808 EASD TOWER 8080 5 11
How can I do it? Thanks in advance.
Putting the commands together consists of simply grouping them:
In a single line:
Note that it is not necessary to define the initial value of the sums: when they are used for the first time they are already initialized.
To have it in a script, paste it there :)
Save it as .awk file and then run with
awk -f archivo.awk fichero
.If you want to "tune" the result you can do
printf "el resultado es: %f\n", sum
and things like that.My proposal focuses both on the presentation of the code and the presentation of the data. Although fedorqui answered by emphasizing the problem of the owner of the publication about putting multiple statements of
awk
, what I focus on is the presentation.I show these examples assuming that the file from which it is obtained is, literally, one
csv
with the first line as its header , for example.The code is:
The way to run the code (let's say we named it
calculos.awk
) is:Or (per the comments):
And the output is very similar to the desired one:
There is another option.
datamash
it is a tool that I have only known for a year that can give you much of what you want.For example:
Resulting in something like: