I have a CSV file with two columns, one belonging to MEPs and one to their IDs. These identifiers have between 3 and 6 numbers, and I would like to print on the screen those parliamentarians along with their identifiers, as long as these are 4 digits. I am going to put the first data of the CSV printed on the screen that is obtained by means of the following command that I have used so that it is better understood:
cat europarlamentarios.csv | cut -d "," -f1,2 | head -10
Uma AALTONEN,23752
Damien ABAD,96850
Claudette ABELA BALDACCHINO,118860
Jean-Pierre ABELIN,1829
Victor ABENS,1802
William ABITBOL,4361
Carlos ABOIM INGLEZ,1680
Gérard d'ABOVILLE,2202
Lars ADAKTUSSON,124990
Gordon J. ADAM,1427
As you can see, the second value corresponds to the identifiers, and these have different digits. Well, I would like to print on the screen those names of MEPs with their respective identifiers if they have only 4 digits. I have tried to use the command cut
but I have not known how to apply it in this case since all the lines have different lengths. I have also thought that it would still be necessary to apply control structures such as if\else
, but this is still beyond my knowledge since I am starting to program, and I think there must be some way to get the result I want without using this control structure.
If someone has an idea of how it could be done and gives me a cable, I appreciate it!
The solution is to use egrep to find the ones with that pattern
Search after a comma, those with 4 characters in the second field. When looking for a pattern, those values in brackets usually indicate the minimum and maximum length (and if it's a single number, it represents both).
Note the character
$
at the end, which indicates that it will look for that match at the end of each line. For future cases, if you use strings with something else after the pattern you're checking, you should keep this in mind (sayPEPITO,1234,OtraCosa
).The output is:
Only with
bash
you could do something like this:Comments:
IFS=,
)"${#col2}"
if [ "${#col2}" -eq 4 ]
) we print the lineI propose a one-command answer with the extension
grep
that allows PERL-like regular expressions.Which results:
In this case, the regular expression
'.*?,\d{4}$'
indicates:.*?
, all characters without being greedy,
, followed by a comma\d{4}
, immediately followed by four digits$
, then ending with a line ending