I have an XML file with quite a few records from which I put the first entry:
<meps>
<mep>
<fullName>Magdalena ADAMOWICZ</fullName>
<country>Poland</country>
<politicalGroup>
Group of the European People's Party (Christian Democrats)
</politicalGroup>
<id>197490</id>
<nationalPoliticalGroup>Independent</nationalPoliticalGroup>
</mep>
<mep>
I would like to extract all the records that are between the "politicalGroup" field using sed
, that is, Group of the European People's Party (Christian Democrats)
in the example that I put.
In the post Using sed to extract text between 2 tags I have seen that the following command is used to extract text between two tags. I adapt it to my file:
sed -n 's:.*<politicalGroup>\(.*\)</politicalGroup>.*:\1:p' fichero
By using the command I have managed to print the desired field on the screen but only one record, when I would like to be able to extract all the records from the file. Is it possible to modify some of the code to be able to extract all the lines that are between both labels?
Thank you very much.
Investigating a little more I have managed to use another different code that has served me:
cat fichero | tr "<" "\n" | sed -n '/politicalGroup>./p' | sed 's/politicalGroup>//' | sort | uniq -c | sort -nr
In this way I have managed to print all the records on the screen and through the commands
sort, uniq
I have managed to count the repeated values of the file.