I have a .sdf file about molecules from which I am going to put some data that it has to put us in context:
$$$$
> <FORMULA>
C7H11N3O2
> <MOLECULAR_WEIGHT>
169.1811
> <EXACT_MASS>
169.085126611
> <JCHEM_ACCEPTOR_COUNT>
4
> <JCHEM_AVERAGE_POLARIZABILITY>
17.110928254345183
$$$$
> <FORMULA>
C3H10N2
> <MOLECULAR_WEIGHT>
74.1249
> <EXACT_MASS>
74.08439833
> <JCHEM_ACCEPTOR_COUNT>
2
> <JCHEM_AVERAGE_POLARIZABILITY>
9.059383875573541
> <JCHEM_BIOAVAILABILITY>
1
> <JCHEM_DONOR_COUNT>
2
> <JCHEM_FORMAL_CHARGE>
0
> <JCHEM_GHOSE_FILTER>
0
> <JCHEM_IUPAC>
propane-1,3-diamine
$$$$
Well, I would like to obtain only what is in the FORMULA field, that is,
C3H10N2
C7H11N3O2
I've tried several things and so far the closest thing has been
awk '/FORMULA/, /MOLECULAR/' fichero | grep -v FORMULA | grep -v MOLECULAR | grep ^[A-Z]
With this code I have managed to show on the screen what is between the FORMULA and MOLECULAR fields. However, taking a look at the output obtained in the terminal I have seen that I have obtained some things that I do not want and this is because it is not always true that the FORMULA field is followed by the MOLECULAR field. Would there be any other way to use awk
to get the desired output?
It's faster with Sed:
This looks for a pattern and prints the following line.
With Awk you can use flags to see when it was found and print it afterwards:
More and better at Printing with sed or awk a line following a matching pattern by the great Ed Morton.
The solution:
The explanation:
Assuming that the formula in question is always on the line following the one in which the field appears , just search for that field and when the match
<FORMULA>
occurs , execute , which reads the next line. He prints it.getline
print