From a vcf file:
Amaranthus_sp <- fread("variants_Amaranthus_sp.vcf")
I try to select those values that have a ","
"#CHROM" "POS" "ID" "REF" "ALT" "QUAL"
M3V_32_Cucumis_melo 1 . T <*> 0
M3V_32_Cucumis_melo 2 . C T,<*> 0
M3V_32_Cucumis_melo 3 . A <*> 0
M3V_32_Cucumis_melo 4 . G C,<*> 0
M3V_32_Cucumis_melo 5 . C <*> 0
M3V_32_Cucumis_melo 6 . T A,<*> 0
I try to select those dataframe values that have a , in REF to get something like this:
"#CHROM" "POS" "ID" "REF" "ALT" "QUAL"
M3V_32_Cucumis_melo 2 . C T,<*> 0
M3V_32_Cucumis_melo 4 . G C,<*> 0
M3V_32_Cucumis_melo 6 . T A,<*> 0
To do this I run the following commands:
Amaranthus_sp<-Amaranthus_sp[which(Amaranthus_sp$ALT == ","),]];dim(Amaranthus_sp)
and I get an empty table.
Instead when I run this command:
Amaranthus_sp<-Amaranthus_sp[which(Amaranthus_sp$ALT == "<*>"),];dim(Amaranthus_sp)
Select those rows that do not have , in that column.
Also the type of my column is:
class(Amaranthus_sp$ALT)
[1] "character"
I don't know what my mistake is.
Thanks in advance.
The grepl function
The first argument to this function is the search string and the second is the text string to search for. Returns TRUE if the search value is found and FALSE otherwise.
Surely there are other solutions. For example the Tidyverse I tried the
stringr::str_detect
.What is your mistake?
You are using the function incorrectly
==
, it returns true when the values are exactly the same. When you useAmaranthus_sp$ALT == "<*>"
it returns what you expect because the values in ALT are exactly"<*>"
. However when you doAmaranthus_sp$ALT == ","
the values in ALT they are not exactly","
but they are of typeC,<*>
.You could also skip using
which
. When you work with data.frame and make a filter of the typedf[v,]
v
it can be a numeric vector, whose inputs are the number of rows you want to keep after the filter. Butv
it can also be a vector of logical values whose length is equal to the number of rows in which,df
in this case, the filter will return those rows wherev
you are "TRUE". SinceAmaranthus_sp$ALT == "<*>
it returns a vector of logical values of length equal to the number of rows inAmaranthus_sp
then we are in the second case and we can omit thewhich
.