Suppose we have a string of last names like the following:
nom <- c("Perez Conchito", "Juanin Juanharry", "Von Bola")
I have designed a small function to extract each part of the string and then gather them into a data.frame with two columns.
extract_apellidos <- function(x) {
first_split <- strsplit(x, " ")
split_un <- unlist(first_split)
primer_ap <- split_un[c(TRUE, FALSE)] #Nos da el primer apellido
segundo_ap <- split_un[c(FALSE,TRUE)] #Nos da el segundo apellido
data.frame(Primer_ap=primer_ap, Segundo_ap=segundo_ap)
}
extract_apellidos(nom)
Primer_ap Segundo_ap
1 Perez Conchito
2 Juanin Juanharry
3 Von Bola
As you can see, it works correctly. However, I would like to know if it is possible to optimize it using regular functions since I suspect that this would allow me to reduce the number of steps used. I thank you in advance for any guidance on this.
Alejandro, at least with base R, whatever you do with regular expressions, you're going to end up with a list, that is, in the same place it leaves you:
For example with:
we have obtained a list, with vectors of 3 elements, the complete string that made "match" and the first and second words. You can't gain practically anything from your routine (check, just as you have an error
eto_todos
, it doesn't exist, I understand it should besplit_un
). On the other hand, using regular expressions to separate words from a space is unnecessarily complex.Another thing is, if you use
stringr
, since you can take advantagestr_match()
of capturing groups and that their output is already an array, with this you can shorten the code a bit:But it also has an extra advantage: by respecting the methodology of
tydverse
, the return is consistent with the input object, so that if it does not find a pattern, it will still return a row for that case: