What is a promise in Javascript?

Question

Alejandro Carrera

Asked: 2020-02-11 16:55:37 +0800 CST 2020-02-11 16:55:37 +0800 CST 2020-02-11 16:55:37 +0800 CST

Is it possible to optimize this string extraction function using regex in R?

772

Suppose we have a string of last names like the following:

nom <- c("Perez Conchito", "Juanin Juanharry", "Von Bola")

I have designed a small function to extract each part of the string and then gather them into a data.frame with two columns.

extract_apellidos <- function(x) {
    first_split <- strsplit(x, " ")
    split_un <- unlist(first_split)
    primer_ap <- split_un[c(TRUE, FALSE)] #Nos da el primer apellido
    segundo_ap <- split_un[c(FALSE,TRUE)] #Nos da el segundo apellido
    data.frame(Primer_ap=primer_ap, Segundo_ap=segundo_ap)
}

extract_apellidos(nom)


  Primer_ap Segundo_ap
1     Perez   Conchito
2    Juanin  Juanharry
3       Von       Bola

As you can see, it works correctly. However, I would like to know if it is possible to optimize it using regular functions since I suspect that this would allow me to reduce the number of steps used. I thank you in advance for any guidance on this.

1 Answers

Voted

Patricio Moracho · Answer 1 · 2020-02-11T20:20:01+08:00

Alejandro, at least with base R, whatever you do with regular expressions, you're going to end up with a list, that is, in the same place it leaves you:

first_split <- strsplit(x, " ")

For example with:

regmatches(nom, regexec("(.*)\\s(.*)", nom))

we have obtained a list, with vectors of 3 elements, the complete string that made "match" and the first and second words. You can't gain practically anything from your routine (check, just as you have an error eto_todos, it doesn't exist, I understand it should be split_un). On the other hand, using regular expressions to separate words from a space is unnecessarily complex.

Another thing is, if you use stringr, since you can take advantage str_match()of capturing groups and that their output is already an array, with this you can shorten the code a bit:

library("stringr")

match_df <- as.data.frame(str_match(nom, "(.*)\\s(.*)"))[, c(2,3)]
colnames(match_df) <- c("Primer_ap", "Segundo_ap")

But it also has an extra advantage: by respecting the methodology of tydverse, the return is consistent with the input object, so that if it does not find a pattern, it will still return a row for that case:

nom <- c("Perez Conchito", "Juanin Juanharry", "Von Bola", "Pedro")
match_df <- as.data.frame(str_match(nom, "(.*)\\s(.*)"))[, c(2,3)]
colnames(match_df) <- c("Primer_ap", "Segundo_ap")
match_df

  Primer_ap Segundo_ap
1     Perez   Conchito
2    Juanin  Juanharry
3       Von       Bola
4      <NA>       <NA>

Is it possible to optimize this string extraction function using regex in R?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?