I need to obtain data from an NLP string of vehicle sales phrases and obtain an array with dictionaries of two elements, of the type:
[
{vehiculo:'Car', Cantidad: 1},
{vehiculo:'Motorbike', Cantidad: 1}
]
I have almost everything done except the easiest thing which is to extract the tags from the Regex Parser grammar.
At the moment I have the following: With the input phrase: "I sold a car and a motorbike"
1.- Segment the phrase and get:
['\nI sold a car and a motorbike']
2.- Tokenized:
['I', 'sold', 'a', 'car', 'and', 'a', 'motorbike']
3.- Post Tagger morphological analysis:
[('I', 'PRP'), ('sold', 'VBD'), ('a', 'DT'), ('car', 'NN'), ('and', 'CC'), ('a', 'DT'), ('motorbike', 'NN')]
4.- RegexpParser with the following grammar:
grammar = r'''
Vehiculo: {<CD>*<NN>+}
{<JJ>*<NN>+}
{<CD>*<NN><IN>*<NN>+}
Cantidad: {<JJ>}
{<CD>}
{<DT>}
'''
And I get:
Parsed Sentence = (S
I/PRP
sold/VBD
(Cantidad a/DT)
(Vehiculo car/NN)
and/CC
(Cantidad a/DT)
(Vehiculo motorbike/NN))
My question is how I can obtain the dictionaries of this type by extracting the labels and data from the previous statement, with some command without having to do a manual search for text within the string:
[
{vehiculo:'Car', Cantidad: 1},
{vehiculo:'Motorbike', Cantidad: 1}
]
Thank you and regards,
The result of the RegexParser is a
Tree
, and as such it has methods to loop through it, flatten it, and do a lot of operations on it. Without knowing exactly what structure all your example sentences can have, or if the sentence can contain different amounts ofa
, etc., it is impossible to give a general solution. In any case, I show you a code example that would work for this case, and you can now adapt it to your needs.First, so that the code is reproducible for everyone, I show all the
import
necessary and previous steps of the analysis:If you try to just print that result, what you get is its representation as a string:
But it
resultado
is actually of typeTree
. That allows us, at a minimum, to be able to iterate through its elements and operate accordingly. For example, if the element is a "leaf" (a terminal node), it will be a tuple whose element [0] is the word and element [1] its part of speech. If, on the other hand, it is not a leaf, it will be an intermediate node with new branches (as occurs in the Quantity and Vehicle cases). In this case, the node has one.label()
that will give us the value "Quantity" or "Vehicle", and its own sub-nodes, which will already be leaves.With this information we can set up a loop like the following:
At the end,
data
it will contain:Naturally, as I said, this is not very general. Every time a vehicle appears I count it and put quantity=1. I don't know if you should be able to support phrases like "I sold two cars and three motorbikes", which would make things quite complicated, or if you can have structures with higher levels of nesting. In any case, you already have some clues where to shoot. See also the Tree documentation .
In the end I solved it this way, I suppose there will be simpler ways to do it and with better results, but my knowledge at that time was what it was, I hope it helps you: