I need to obtain data from an NLP string of vehicle sales phrases and obtain an array with dictionaries of two elements, of the type:
[
{vehiculo:'Car', Cantidad: 1},
{vehiculo:'Motorbike', Cantidad: 1}
]
I have almost everything done except the easiest thing which is to extract the tags from the Regex Parser grammar.
At the moment I have the following: With the input phrase: "I sold a car and a motorbike"
1.- Segment the phrase and get:
['\nI sold a car and a motorbike']
2.- Tokenized:
['I', 'sold', 'a', 'car', 'and', 'a', 'motorbike']
3.- Post Tagger morphological analysis:
[('I', 'PRP'), ('sold', 'VBD'), ('a', 'DT'), ('car', 'NN'), ('and', 'CC'), ('a', 'DT'), ('motorbike', 'NN')]
4.- RegexpParser with the following grammar:
grammar = r'''
Vehiculo: {<CD>*<NN>+}
{<JJ>*<NN>+}
{<CD>*<NN><IN>*<NN>+}
Cantidad: {<JJ>}
{<CD>}
{<DT>}
'''
And I get:
Parsed Sentence = (S
I/PRP
sold/VBD
(Cantidad a/DT)
(Vehiculo car/NN)
and/CC
(Cantidad a/DT)
(Vehiculo motorbike/NN))
My question is how I can obtain the dictionaries of this type by extracting the labels and data from the previous statement, with some command without having to do a manual search for text within the string:
[
{vehiculo:'Car', Cantidad: 1},
{vehiculo:'Motorbike', Cantidad: 1}
]
Thank you and regards,