I have a csv file and I would like to extract those data that:
- of a column containing the month of October, which appears as "/10/"
- also those data from another column that start with "08".
So far I can only get the data for October by:
cat fichero.csv | cut -d "," -f2,3 | grep /10/
What I can't do now is join another regular expression on the same command line that indicates the data in the second column that starts with 08.
I know that the regular expression to extract data starting with a certain pattern could be grep ^08
, but I don't know how to concatenate it to the grep used above.
I add a part of the output I get so far:
$ cat datosCovid.csv | cut -d "," -f2,3 | grep /10/
01/10/2020,8040138
01/10/2020,43007051
02/10/2020,8000271
02/10/2020,8000347
01/10/2020,8000384
01/10/2020,8000499
02/10/2020,8000578
02/10/2020,8000918
02/10/2020,8001029
02/10/2020,8001030
01/10/2020,8001133
01/10/2020,8001157
02/10/2020,8001297
01/10/2020,8001731
01/10/2020,8001777
01/10/2020,8001807
01/10/2020,8001923
02/10/2020,8002071
01/10/2020,8002186
01/10/2020,8002198
01/10/2020,8002216
01/10/2020,8002368
01/10/2020,8002587
02/10/2020,8002666
02/10/2020,8003695
02/10/2020,8003816
01/10/2020,8003907
01/10/2020,8004031
01/10/2020,8004225
01/10/2020,8004717
01/10/2020,8004869
It is a file with a lot of data so I show only some of it. As you can see, I have located the month of October. Now I would like to locate those codes that are after the date that start with 08.
I understand that you have a file with fields separated by commas. A CSV, wow.
You want to display the 2nd and 3rd column of lines where:
All this we can say to Awk like this:
Namely:
BEGIN{FS=OFS=","}
defines which is the field separator (FS) and which one we use for the output (on print, OFS)condiciones {print $2,$3}
if conditions are met, print 2nd and 3rd column$2 ~ /\/10\// && $3 ~ /^08/
that they come to check regular expressions: that the 2nd field contains /10/ and that the 3rd begins with 08.As I explain in the comments, the Awk version is powerful and useful, but maybe a bit tricky if you're just starting out with word processing. Therefore, I also recommend a version that consists of using "grep" twice in a row:
The first extracts lines that are of the form "NN/10/", that is, those in which the column has /10/ after one or two numbers.
The second extracts, from among those lines, the ones with "08" at the beginning.
And how about chaining a second grep?
It would basically extend the command you had already prepared:
Since you have previously only output two columns, there will only be one comma, and therefore you can filter on ".08".
The danger sometimes of filtering by expressions as simple as ".08" is that you can find that same pattern elsewhere (for example, if you have several columns), but in this case there should be no problem because there is only one comma.