I have a database based on individual words extracted from comments. Suppose the following example that will serve as an example of what I have:
Comment | word 1 | word 2 | word 3 |
---|---|---|---|
i like r | I | like | R |
I like programming | I | like | program |
programming is fun | Program | it is | funny |
The thing is, I have a dictionary with each of the unique words found in this array of words:
Words_dictionary |
---|
funny |
it is |
like |
I |
program |
R |
And here comes the challenge... How can I create a column whose variable is based on a binary code according to the appearance of the words in the dictionary?
Better with an example. Based on the dictionary and array created above, the resulting variable would look like this:
Comment | word 1 | word 2 | word 3 | Code |
---|---|---|---|---|
i like r | I | like | R | 001101 |
I like programming | I | like | program | 001110 |
programming is fun | Program | it is | funny | 110010 |
As you can see, each word in the comment is assigned a 0 or a 1 depending on its position in the dictionary. The first value of the binary code corresponds to the first word of the dictionary (in this case 'fun'). As in the first comment this word does not appear, in this position it is assigned a 0.