I have to validate a input
, which contains a Contract code.
The formats can be:
AAAXXXX
orAAAXXXX/XX-XX
orAAAXXXX/VXXXX
AXXXXXX
eitherAXXXXXX/VXXXX
(Being A
an alphabetic character, X
a digit and the rest ( / - V
) literal)
The regex that I have proposed is this:
/^((\w{3}\d{4}(\/\d{2}-\d{2})?)|(((\w\d{6})|(\w{3}\d{4}))(\/v\d{4})?))$/i
I have some test cases online. It works fine for all but the last case, which takes it for granted when it shouldn't:
C123456/30-02
What would the resulting regex look like so that it satisfies all cases?
It is an error to assume that it
\w
matches only letters, whena-z
,A-Z
,0-9
and_
(equals[A-Za-z0-9_]
) match, so the match is made with the first option found (AAAXXXX/XX-XX
since itA
includesX
).This could be the correct pattern:
^(([A-Za-z]{3}\d{4}(\/\d{2}-\d{2})?)|((([A-Za-z]\d{6})|([A-Za-z]{3}\d{4}))(\/v\d{4})?))$
(watch online)If you're not going to make use of the data returned in each match group, it's better to use non-capturing parentheses (?:x) :
^(?:[a-z]{3}\d{4}(?:\/(?:\d{2}-\d{2}|V\d{4}))?|[a-z]\d{6}(?:\/v\d{4})?)$
(watch online)To make it easier to debug and maintain the regular expression, I keep the groups exactly as you define them in your rules. It will not impact performance as the code will be executed only at the user's request (in the form submit event).
The logic of your attempt is perfect, you just have to take into account:
The error is raised because it
\w
matches[A-Za-z\d_]
, so it also matches digits, hence the error.Some groups are being used by others. For example,
^((A)|(B))$
it is exactly the same as^(A|B)$
|
which works like aor
) saves the lowest precedence after the parentheses.Instead of using capturing groups (with parentheses), which store the matched text in memory, I always recommend using a non-capturing group :
(?:subpatrón)
.Regular phrase
AAAXXXX
As you can see, while I could have grouped the ,AXXXXXX
,AAAXXXX/VXXXX
and optionsAXXXXXX/VXXXX
on one side, andAAAXXXX/XX-XX
on the other, I'm unwrapping from left to right. The regex engine always attempts a left-to-right match, so presenting the options in that order, while it may produce a longer pattern, is usually more efficient.Thus, for example, the match will only be attempted with the first letter 1 only once, without generating backtracking towards other alternatives when an attempt does not match.
show
https://regex101.com/r/GChkQv/3/tests