What is a promise in Javascript?

Question

Asked: 2020-03-04 09:32:23 +0800 CST 2020-03-04 09:32:23 +0800 CST 2020-03-04 09:32:23 +0800 CST

Group capture without capture or group capture with capture

772

Today answering a question on this site I found a very interesting possible solution, because I accidentally deleted a part of the solution and that solution worked even though it didn't make sense to me.

Without any more preambles:

const regex = /([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+)/

const strings = [
        'AAAA_BBBB_CCCC_1_15_17'
        ,'AAAA_BBBB_1'
        ,'AAAA_BBBB_15_17'
        ,'AAAA_BBBB_CCCC_1_2'
    ]

strings.forEach(string => {
  const [fullMatch, ...groups] = string.match(regex)
  console.log(groups)
})

As you can see, I captured a group without capture using ((?:_\d+)+), and on the regex101 site it works for all languages, which up to now are:

pcre (php)
javascript
python
golang

Note: seeing that not everyone reads all the available information, the important thing is that I am obtaining the behavior of

/([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+(?:_\d+)*)/

wearing

/([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+)/

which is strange, because if I don't catch, the caught group is just the last part that matches:

const regex = /(_\d+)+/g;
const str = `_1_2_3_4_5_6_7`;
let m;

while ((m = regex.exec(str)) !== null) {
  // This is necessary to avoid infinite loops with zero-width matches
  if (m.index === regex.lastIndex) {
    regex.lastIndex++;
  }

  // The result can be accessed through the `m`-variable.
  m.forEach((match, groupIndex) => {
    console.log(`Found match, group ${groupIndex}: ${match}`);
  });
}

I wish someone would explain to me why using a double capture worked and what the implications (positive or negative) are of trapping a group without capturing the way I did.

2 Answers

Voted

Mariano · Answer 1 · 2020-03-06T16:16:35+08:00

I captured a non-capture group using ((?:_\d+)+), and on the regex101 site it works for me for all languages that are so far

And it will work for you in any regex dialect.

All except BREs, POSIX EREs or Oracle to be exact, as they don't support non-capturing groups: (?:… ).

I am getting the behavior of

/([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+(?:_\d+)*)/

wearing

/([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+)/

In fact, using the first form would be an error, since you are unnecessarily repeating the (?:_\d+)*one at the end, which will never match anything, because the previous construction ( (?:_\d+)+), has already consumed all that there were, leaving nothing for the last one.

It can be corroborated with an example, adding one more group around the last (?:_\d+)*.

const texto = '_123_456_789_0',
      regex = /((?:_\d+)+((?:_\d+)*))/;
      [match, grupo1, grupo2] = regex.exec(texto);

console.log(`Grupo 1: "${grupo1}"`);
console.log(`El último '(?:_\d+)*' coincidió con: "${grupo2}"`);

I wish someone would explain to me why using a double capture worked

You are not using a double capture. In ((?:_\d+)+), only the outer group is the one that captures. And precisely (?:… )it is a group without capture .

A structure like ((?:_\d+)+)this is perfectly normal and is frequently used. Think of it this way: it's the same as (\d+), only what's repeated in ((?:_\d+)+)aren't just digits but underscores followed by digits.

Nesting groups (with or without capturing) is just as valid as, and pretty much the same as, using nested loops in your code... Simple as that.

what are the implications (positive or negative) of catching a no catch group like I did.

None. Neither positive nor negative. You wouldn't have achieved the same result without nesting a non-capturing group inside a capturing one like that... Again, it's a completely normal structure.

In fact, as a general rule of thumb, you should always use non-capturing groups (?:... )when you don't need to get the text that was matched. A group without capturing does not take up unnecessary memory (neither in capturing the text, nor in generating the indices of the start and end positions).

If you're interested in going into a lot more detail, a group without capturing is slightly slower to compile, but more efficient to run. However, this difference is negligible, and it is usually chosen to prefer to save memory (it is better seen from the point of view of good practices).

By the way, one more correction. Use a structure like:

([a-zA-Z]+_?[a-zA-Z]+?)

it's a mistake. You are consecutively repeating 2 constructions that match the same thing. Since el _is optional, the regex can be converted to [a-zA-Z]+[a-zA-Z]+?, and such a construct is the perfect recipe for catastrophic backtracking .

This is an issue that isn't going to throw an error in the cases you're seeing, but with a slightly more complicated regex, longer texts, and a condition that doesn't match, it could cause the browser to freeze without returning a result.

Let's look at a test, not so drastic, but obvious enough:

const regex = /^([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+)$/,
      N = 1000,
      texto = 'X_'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
            + '_1_2_ERROR';

//Tu regex
let a, b, resultado;

a = performance.now()
for (let i = 0; i < N; i++) {
    resultado = regex.exec(texto);
}
b = performance.now();

console.log('"([a-zA-Z]+_?[a-zA-Z]+?)" Tardó:', (b - a), 'ms. en devolver:', resultado);


//Con un grupo sin captura anidado
const regexConGrupo = /^([a-zA-Z]+)_([a-zA-Z]+(?:_[a-zA-Z]+)?)((?:_\d+)+)$/;
a = performance.now()
for (let i = 0; i < N; i++) {
    resultado = regexConGrupo.exec(texto);
}
b = performance.now();

console.log('"([a-zA-Z]+(?:_[a-zA-Z]+)?)" Tardó:', (b - a), 'ms. en devolver:', resultado);

And this, if it were part of a more complicated regex, could bring you serious problems.

Also, by using ([a-zA-Z]+_?[a-zA-Z]+?), you're requiring it to be at least 2 characters long, so it wouldn't match something like A_B_1.

Klaimmore · Answer 2 · 2020-03-04T10:47:23+08:00

The truth is that it has no implications. A non-capturing group is simply used to group an expression for convenience, without the result being returned in a group, this does not mean that it cannot be part of another group.

Considering the following example:

"cababaabc".match(/c(a|b)*c/).slice(1) // => ["b"]

I don't get the group of a's and b's, but a group that I may not be interested in: the last aor last bof the expressiona|b

If I use a non-capturing group:

"cababaabc".match(/c(?:a|b)*c/).slice(1) // => []

I don't get any group.

But if I am interested in knowing the complete chain between the two c, I am forced to put a group, completely enclosing the expression of interest, including *:

"cababaabc".match(/c((?:a|b)*)c/).slice(1) // => ["ababaab"]

Getting the full set of a's and b's

EDITION:

If what you are interested in is comparing your 2 expressions:

/([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+(?:_\d+)*)/

Y

/([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+)/

Let me tell you that they are completely equivalent:

The last group in both:

((?:_\d+)+(?:_\d+)*)
((?:_\d+)+)

Is the same as:

((?:A)+(?:A)*)
((?:A)+)

With A = _\d+and in the first:

(?:A)+(?:A)*is equivalent to A+A*what is undoubtedly the same asA+

Notice that you are not even capturing the same non-capture group, but a different one:

((?:A)+)the quantifier +is made by a different expression, even if it was the same expression, there is nothing to prevent capturing the same group:

((A))is as valid as((?:A))

Group capture without capture or group capture with capture

EDITION:

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?