I have a database based on individual words extracted from comments. Suppose the following example that will serve as an example of what I have:
Comment | word 1 | word 2 | word 3 |
---|---|---|---|
i like r | I | like | R |
I like programming | I | like | program |
programming is fun | Program | it is | funny |
The thing is, I have a dictionary with each of the unique words found in this array of words:
Words_dictionary |
---|
funny |
it is |
like |
I |
program |
R |
And here comes the challenge... How can I create a column whose variable is based on a binary code according to the appearance of the words in the dictionary?
Better with an example. Based on the dictionary and array created above, the resulting variable would look like this:
Comment | word 1 | word 2 | word 3 | Code |
---|---|---|---|---|
i like r | I | like | R | 001101 |
I like programming | I | like | program | 001110 |
programming is fun | Program | it is | funny | 110010 |
As you can see, each word in the comment is assigned a 0 or a 1 depending on its position in the dictionary. The first value of the binary code corresponds to the first word of the dictionary (in this case 'fun'). As in the first comment this word does not appear, in this position it is assigned a 0.
A proof of concept:
Explanation:
diccionario()
strsplit
match()
and starting from the dictionary we locate the positions of each word within the chain, those that we do not find will beNA
NA
will be 0 and the!NA
, 1.You can implement this to process your column like so: