I have a CSV file on the suicide rate in a given range of years. I am making an awk script that, when passing the name of the country and the year, returns the lines that have these parameters. However, I am having trouble doing it.
The file I have has the following structure:
country,year,sex,age,suicides_no,population,suicides/100k pop,country-year,HDI for year, gdp_for_year ($) ,gdp_per_capita ($),generation
Albania,1987,male,15-24 years,21,312900,6.71,Albania1987,,2156624900,796,Generation X
Albania,1987,male,35-54 years,16,308000,5.19,Albania1987,,2156624900,796,Silent
Albania,1987,female,15-24 years,14,289700,4.83,Albania1987,,2156624900,796,Generation X
Albania,1987,male,75+ years,1,21800,4.59,Albania1987,,2156624900,796,G.I. Generation
Albania,1987,male,25-34 years,9,274300,3.28,Albania1987,,2156624900,796,Boomers
Albania,1987,female,75+ years,1,35600,2.81,Albania1987,,2156624900,796,G.I. Generation
Albania,1987,female,35-54 years,6,278800,2.15,Albania1987,,2156624900,796,Silent
Albania,1987,female,25-34 years,4,257200,1.56,Albania1987,,2156624900,796,Boomers
Albania,1987,male,55-74 years,1,137500,0.73,Albania1987,,2156624900,796,G.I. Generation
Those are the first 10 records that the file has. Well, the awk script that I have made is such that:
#!/usr/bin/awk -f
BEGIN{
FS=","
RS="\n"
}
$0 ~ country
$0 ~ year
However, when executing the script, for example, as ./script.awk country=Albania year=2001 fichero
, the output I get is that of all the records of Albania, regardless of the year, and that of the rest of the countries in the year 2001, when what I would want is only to obtain Albanian records in 2001.
Does anyone know where in the script I'm making a mistake?
You have a CSV and therefore
FS
it must be the comma. The record separator is the line break, so youRS
don't need to touch it as it is the default value.Thus, the solution we would like is of the type:
If we move it to a file, it would be of the type:
Note that in your case telling it works
$0 ~ country year
because you are somehow joining field 1 and 2 and asking it to make the entire line look like the union of "country" and "year". Specifically, they start the same. Now, this is somewhat weak because if you spent a year "19" on it, everything that is of the 19XX type would surely be worth it. So if you want to use an exact filter it's more worth checking field by field and saycountry == $1
yyear == $2
.Ok, it was as simple as putting the country and year variables on the same line:
Sorry for the inconvenience!!