I'm learning RegEx, and an example from: Regex Golf (see level 6, "Abba") caught my attention.
The proposed problem is to filter all the words that do not contain letters that follow the pattern XYYX
.
For example, leaving out:
- abba
- an alla gmatic
- b assa risk
- chorio alla ntois
- c occo myces
One of the suggested answers was
^(?!.*(.)(.)\2\1)
Here is my interpretation of the expression:
- Start of the line
^
- If not followed by 0 or + characters
.*
- Group the next character in 1
(.)
- Group the next character in 2
(.)
- Group 2
\2
- Group 1
\1
The problem is that, even though I'm starting to write my first regular expressions and have already done some exercises, this particular problem leaves me with a lot of questions.
Can anyone give a more intuitive interpretation?
Here's another explanation:
^
, string start(?!
, look ahead to see if there is not the following.*
, any character (except\n
), as many as possible(.)
, any character (except\n
), 1st group(.)
, any character (except\n
), 2nd group\2
, the same as the 2nd group\1
, the same as the 1st group)
end of looking forwardWhat this regular expression does is search the string for a text that is not followed (from the beginning of the string) by the pattern
.*(.)(.)\2\1
.For example, for the string
anallagmatic
, the pattern.*(.)(.)\2\1
holds in . However, since you don't want to find such a pattern in the string, then the entire regular expression, , does not hold.analla
gmatic
^(?!.*(.)(.)\2\1)
Depending on the implementation of the regular expression engine 1 , it will search for the pattern, based on the regular expression ( regex-directed ) or on the text ( text-directed ), in the string and report whether it was found or not.
For example, the site's Regex Debugger
https://regex101.com/
shows the steps taken by the engine until it "failed" (no matches found) with the textanallagmatic
:Grades