I have several strings with the following structure:
AAAA_BBBB_CCCC_1_15_17
AAAA_BBBB_1
AAAA_BBBB_15_17
I am trying to find a regex that captures the following groups:
GRUPO1: AAAA GRUPO2: BBBB_CCCC GRUPO3: 1_15_17
GRUPO1: AAAA GRUPO2: BBBB GRUPO3: 1
GRUPO1: AAAA GRUPO2: BBBB GRUPO3: 15_17
I have tried the following:
([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)_(\d+_?)+
I'm having trouble with the third group, as it seems that the next one match
overwrites the previous one using the +
for group (\d+_?)+
.
Example
const regex = /([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)_(\d+_?)+/
const string = 'AAAA_BBBB_CCCC_1_2'
const [fullMatch, ...groups] = string.match(regex)
console.log(groups)
In the example 1_2
I only get the 2
.
How can I capture this last full group?
A capture is always overwritten . This is how regular expressions work. Repeating a group always overwrites the capture with the last match. For example, something like
will always capture the last digit of the integer.
Beware of nested quantifiers . You're also having a problem in your regex, which isn't obvious now, but may cause problems in the future. When using:
you are consecutively repeating 2 constructions that match the same thing. Since el
_
is optional, the regex can be converted to[a-zA-Z]+[a-zA-Z]+?
, and such a construct is the perfect recipe for catastrophic backtracking !Also, with that same construct, you're requiring it to be at least 2 characters long, so it's never going to match a text like
A_B_1
.Solution: Repeat within the group . To avoid this, what we do is enclose the entire optional part in a group without capturing , quantified with
?
. Namely:And we apply the same logic to the digit part (repeated with
*
).Regex:
Code:
Similar to how you caught the second group, the third can be done with the help of non capturing groups .
The state machine would look something like this:
When you put the symbols
?:
inside a group, you are telling it not to capture that group, or better said, to capture it but not to register it among the captured groups.It is important to note that although it seems that the first half of group 3 would suffice, this is not the case, because if we have more than one repetition, the group will only catch the ending that matches and not the entire group of characters that we want.