What is a promise in Javascript?

Question

PaperBirdMaster

Asked: 2020-03-07 03:30:35 +0800 CST 2020-03-07 03:30:35 +0800 CST 2020-03-07 03:30:35 +0800 CST

Regular expression with all optional components How to avoid empty catches?

772

I have to process a string of comma separated values containing triplets of values and translate at run time each triplet to different types according to its content, the input data would be similar to:

"1x2y3z,80r160g255b,48h30m50s,1x3z,255b,1h,..."

So each sub-string should be processed as follows:

1x2y3zwould be processed as Vector3with x = 1, y = 2and z = 3.
80r160g255bwould be processed as Colorwith r = 80, g = 160and b = 255.
48h30m50swould be processed as Timewith h = 48, m = 30and s = 50.

The problem I run into is that each component is optional (although it always appears in the same order) so the following strings are also Vector3, Colorand Timecorrect:

1x3zwould be processed as Vector3with x = 1, y = 0and z = 3.
255bwould be processed as Colorwith r = 0, g = 0and b = 255.
1hwould be processed as Timewith h = 1, m = 0and s = 0.

What have I tried so far?

All components as optional.

((?:\d+A)?(?:\d+B)?(?:\d+C)?)

The characters A, Band Cwould be replaced by the correct letter in each case. This expression works fine except for the fact that it returns twice as many expected results (one for the searched string and another returns an empty string just after the first match), for example:

1h1m1stwo matches:
1. "1h1m1s".
2. "".
11x50ztwo matches:
1. "11x50z".
2. "".
11111htwo matches:
1. "11111h".
2. "".

I can't say I didn't expect it... after all an empty string matches the provided regular expression when all components are empty; so to fix this issue i tried the following:

Quantifier from 1 to 3 elements.

((?:\d+[ABC]){1,3})

But with this expression, strings are captured with the wrong order or even with repeated elements:

1s1m1ha match, it should not match (wrong order).
11z50za match, it should not match (repeated components).
1r1r1ba match, it should not match (repeated components).

So I made another try with a modified version of my first try:

Match from start `^`to end `$`of the string.

^((?:\d+A)?(?:\d+B)?(?:\d+C)?)$

It works better than the first version but still matches empty strings, with the added disadvantage that I must first separate the string by each comma ( ,) and pass the expression over each sub-string.

Using Lookahead

The attempt using Lookahead:

\b(?=[^,])(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b

Against the following chain:

"48h30m50s,1h,1h1m1s,11111h,1s1m1h,1h1h1h,1s,1m,1443s,adfank,12322134445688,48h"

The results are very good, it detects valid matches without adding false positives. Unfortunately, every time a string is found that doesn't match the expression, it adds an empty string just before the invalid string (finds ""before "1s1m1h", "1h1h1h", "adfank"and "12322134445688") so I've made one last try by modifying the lookahead condition:

\b(?=(?:\d+[ABC]){1,3})(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b

This expression removes empty strings found before any strings that do not match (?:\d+[ABC]){1,3})(empty strings before "adfank"and "12322134445688"), but empty strings before "1s1m1h", "1h1h1h"are still caught.

So my question is: Is there a regular expression that matches value triplets in a given order, without repetitions, with all optional components but composed of at least one element, and doesn't match empty strings?

The regex tool I'm using is <regex>from C++11 .

2 Answers

Voted

Mariano · Answer 1 · 2020-03-07T05:53:19+08:00

Let us start from the expression where each of the three magnitudes are optional

(?:\d+A)?(?:\d+B)?(?:\d+C)?

1. Anchor to the beginning of a value

To ensure that a value starts at the start of the text or at a comma, we add both options to the beginning.

(?:^|,) (?:\d+A)?(?:\d+B)?(?:\d+C)?
^^^^^^^

2. Avoid empty matches

As you showed in your last attempt, a positive assertion ( positive lookahead ) can be used to guarantee that there is some character before the comma, without consuming this character within the global match. We just need to verify that there is at least 1 digit ( \d).

(?:^|,) (?=\d) (?:\d+A)?(?:\d+B)?(?:\d+C)?
       ^^^^^^

3. Only match if it matches the entire pattern

Now, as you mentioned in your last comment, such a pattern could match the lookahead, but then match an empty string. For that, we will add that at the end it must match a comma or the end of the string. In this case, we use another assertion, so that it doesn't consume the next comma (and is available for a next match).

(?:^|,)(?=\d)(?:\d+A)?(?:\d+B)?(?:\d+C)? (?=,|$) 
                                        ^^^^^^^

Demo en regex101.com

4. Capture numbers and units separately

For practicality, we should use groups ( in parentheses ) to capture each of the values separately.

(?:^|,)(?=\d)(?: ( \d+ )( A ) )?(?: ( \d+ )( B ) )?(?: ( \d+ )( C ) )?( ?=,|$)
                ^ ^^ ^ ^ ^^ ^ ^ ^^ ^

Demo en regex101.com

Code

#include <iostream>
#include <regex>
using namespace std;

int main() {
    string texto("48A30B50C,1A,1A1B1C,11111A,1C1B1A,1A1A1A,1C,1B,1443C,adfank,12322134445688,48A");
    regex patron(R"/((?:^|,)(?=\d)(?:(\d+)(A))?(?:(\d+)(B))?(?:(\d+)(C))?(?=,|$))/");

    //Iterar cada una de las coincidencias
    sregex_iterator next(texto.begin(), texto.end(), patron);
    sregex_iterator end;
    while (next != end) {
        smatch match = *next;

        //Coincidencia global (incluye la coma)
        cout << "Valor: " << match.str() << endl;

        //Iterar cada uno de los grupos (saltando de a 2)
        for( int grupo = 1; grupo < match.size(); grupo += 2 ) {
            string numero = match[grupo];
            string letra  = match[grupo+1];

            //Está el grupo o devolvió una cadena vacía por ser opcional?
            if (!letra.empty()) {
                cout << "\tNúmero: " << numero << endl
                     << "\tLetra:  " << letra  << endl;
            }
        }
        next++;
    }
}

Result:

Valor: 48A30B50C
    Número: 48
    Letra:  A
    Número: 30
    Letra:  B
    Número: 50
    Letra:  C
Valor: ,1A
    Número: 1
    Letra:  A
Valor: ,1A1B1C
    Número: 1
    Letra:  A
    Número: 1
    Letra:  B
    Número: 1
    Letra:  C
Valor: ,11111A
    Número: 11111
    Letra:  A
Valor: ,1C
    Número: 1
    Letra:  C
Valor: ,1B
    Número: 1
    Letra:  B
Valor: ,1443C
    Número: 1443
    Letra:  C
Valor: ,48A
    Número: 48
    Letra:  A

Demo en ideone.com

eferion · Answer 2 · 2020-03-07T04:28:53+08:00

Let's take one of the three possible groups since the solution should be later extensible:

1x2y3z

What would be schematized:

\d+x\d+y\d+z

Now, each of these three groups is optional, although to avoid false positives we must assume that at least one will always be present. That is, the group must contain at least \d+x or \d+yor \d+z. This assumption has certain implications:

If the group begins with \d+xit is possible that we find \d+yand\d+z
If the group begins with \d+ywe may find \d+zbut we will never find\d+x
If the group begins with \d+zwe will not be able to find neither \d+xnor\d+y

This taken to the regular expression would look like this:

(\d+x(?:\d+y)?(?:\d+z)?|\d+y(?:\d+z)?|\d+z)

This solution avoids retrieving empty strings since it always forces there to be at least one element.

As a result of a conversation in the chat with @Mariano, another option:

(\d+[xyz][^,]*)

This would only be indicated if you can guarantee that the received data is correct, since it would be eaten, for example 1x2345abracadabra, but of course it should be faster.

The above expression could be expanded to be slightly less forgiving:

((?:\d+[xyz]){1,3})

Regular expression with all optional components How to avoid empty catches?

All components as optional.

Quantifier from 1 to 3 elements.

Match from start `^`to end `$`of the string.

Using Lookahead

Code

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

Regular expression with all optional components How to avoid empty catches?

All components as optional.

Quantifier from 1 to 3 elements.

Match from start ^to end $of the string.

Using Lookahead

2 Answers

Code

Match from start `^`to end `$`of the string.