I have a matrix whose columns are the separation of different phrases into independent words. Example:
Phrase | Word_1 | word_2 | Word_3 |
---|---|---|---|
sample text | Text | of | example |
example sentence | Phrase | of | example |
separate word | Word | separated | NA |
Thanks in advance | Thank you | of | beforehand |
I need to extract a vector as a dictionary in which the unique values of these separate words are found. Applied to the example, the result I need is the following:
Dictionary |
---|
beforehand |
of |
example |
phrase |
Thank you |
word |
separated |
text |
Note that the code must omit NAs, as all phrases are not the same length.
Thank you very much for your help.
Regards, J
dplyr
An alternative is to use the ,stringr
and , librariestidyr
through the following code:I saw that the desired output contains all lowercase words, so you can use
stringr::str_to_lower
to standardize the words.Output
If you want the dataframe to be saved as a separate object you can save it by assigning it to a new variable called
nuevo_dataframe
:In the case of the key parameter in the gather function, it refers to the name of the column where the names of the columns you are grouping are recorded. In the case of the previous code you cannot see it, since the dataframe is modified until the desired result is obtained, but if you want to see the result, you can check it by running this part of the code: