What is a promise in Javascript?

Question

Chofoteddy

Asked: 2020-12-09 12:44:45 +0800 CST 2020-12-09 12:44:45 +0800 CST 2020-12-09 12:44:45 +0800 CST

How to negate (not select) with regular expressions? in PHP or JavaScript

772

Added Details

I found on the internet the following two ways to deny:

?!
[^\w]

But I can't find documentation in Spanish and English that they use to describe the operation, I consider it too advanced to understand the meaning of both and how to use them properly to obtain the expected result. The current answer fixes the problem but gives no usage definition.

Problem Statement

I want to select all those words that are not in quotes within a text. I know how to do the opposite.

Example:

Lorem ipsum "pain sit amet" , consectetur adipiscing elit, maecenas est felis "sit amet" .

With the following regular expression you could take the words that are in parentheses:

/"([\w\s]+)"/gim

The result

[
    1 => 'dolor sit amet',
    2 => 'sit amet'
]

What I look for

[
    1 => 'Lorem ipsum ',
    2 => ', consectetur adipiscing elit, maecenas est felis ',
    3 => '.',
]

Another example would be, from the following list:

Hello
hello
hello
hi*
hello

Print / select those that do not use alphanumeric characters (I know how to take the opposite of the established indication). Take everything that is not an alphanumeric character, take all those words that do not have an "l", take everything that does not start with the letter "z", etc.

Working example: http://www.regextester.com/15

I would expect to do something like this for everything that doesn't start with "a":

/!^a.*/

But obviously it doesn't work for me, I'm waiting for your feedback.

Clarification

I would also like to understand the solution proposed and not just a copy-paste to solve the problem.

Note: The regular expression that I quote here to obtain the text works for me in PHP and JavaScript (languages that I use to solve the problem), I have seen that there are small variations of regular expressions in the different languages but between these 2 it is not something substantial. Therefore I would like the proposed solution to work in one of the 2.

Support sources

4 Answers

Voted

Federico Piazza · Answer 1 · 2020-06-01T14:42:27+08:00

I add this answer as information related to regular expressions. It is my first answer on SO in Spanish and it is not a translation, so if it is not correct I can delete it or correct it.

Regarding what you commented in your question:

(?!  ) -> conocido como Negated lookahead en inglés
[^\w]  -> conocido como Character class en inglés (en este caso negada)

These are two different concepts. On the one hand you have what is considered a lookaroundand on the other hand a character class. They work like this:

Lookarounds

Lookarounds could be understood as different ways of seeing if a pattern is (or is not) preceded or succeeded by another pattern. For example, the expression hola(?!chau)will match the word holaas long as the following word does not exist chau.

Regular expression visualization

Namely:

hola, ¿qué tal?   <-- OK
hola SO           <-- OK
holachau          <-- Falla

Your question is related to "how to deny", but I also wanted to mention that lookarounds are divided into:

Lookahead (see ahead):
- Positive : is defined as hola(?=chau)and will match the word hello only if there is then bye
- Negative : it is defined as hola(?!chau)and it will match the word hello only if then there is NO bye
Lookbehind (see behind):
- Positive : defined as (?<=chau)holaand will match the word hello only if a exists bye before hello
- Negative : is defined as (?<!chau)holaand will match the word hello only if there is NO bye before hello

It is important to mention that lookbehinds are not supported by Javascript in all browsers ( see compatibility ).

You can find more information about lookarounds at:
http://www.regular-expressions.info/lookaround.html

Character classes

On the other hand, there are character classes , which in Spanish would be understood as a set of characters (or class of characters) and is used using the square brackets [.. ].

In other words, if we have [aeiou], only the vowels without accent marks will be matched.

Regular expression visualization

Likewise, a class can be negated, as you mentioned ^at the beginning using ... so [^aeiou]in this case it's going to match a character that is n't a tildeless vowel.

Regular expression visualization

Here is more information about the character classes:
http://www.regular-expressions.info/charclass.html

verbs

Now, after giving you a bit of context. If you want to use regular expressions to catch/match all words that are not in quotes, then PCRE (Perl Compatible Regular Expressions, supported by PHP, R, Delphi and others) has verbs that are very useful in your case.

The best known are (*SKIP)and (*FAIL)are often used together and are usually used in this way:

".*?"(*SKIP)(*FAIL)|(\w+)

Practical example

These types of patterns are often called a discard technique, and they always use the same form of patterns separated by OR:

patrón de descarte 1|patrón de descarte 2|patrón de descarte N|(GUARDAR ESTO)

Regular expression visualization

Thus, the above expression ".*?"(*SKIP)(*FAIL)|(\w+)will discard all matches of whatever comes before skip and fail( ".*?"), and will capture the last pattern (which is using parentheses...parentheses are used to capture content).

The regular expression ".*?"(*SKIP)(*FAIL)|(\w+)explained would be:

".*?"     Lo uso para buscar lo que SI quiero descartar, y para indicarle
          al engine que descarte agrego (*SKIP)(*FAIL)
|(\w+)    o (si el patrón no se descarta) busco las palabras y las capturo

Therefore, in the link above, when that expression is applied to the text:

Lorem ipsum "pain sit amet", consectetur adipiscing elit, maecenas est felis "sit amet".

The following content is captured:

MATCH 1
1.  [0-5]   `Lorem`
MATCH 2
1.  [6-11]  `ipsum`
MATCH 3
1.  [30-41] `consectetur`
MATCH 4
1.  [42-52] `adipiscing`
MATCH 5
1.  [53-57] `elit`
MATCH 6
1.  [59-67] `maecenas`
MATCH 7
1.  [68-71] `est`
MATCH 8
1.  [72-77] `felis`

Conclusion, regular expressions in my opinion are spectacular but only if you know how to use them. In my personal case, I can't live without them, but like everything... to drive a nail you need a hammer and not a screwdriver. In the case of regex, they are great for pattern matching, but if you need logic then this is definitely not the tool to use.

JuanK · Answer 2 · 2020-12-09T14:55:22+08:00

It's best in these cases to take the easy way out (Regexp is hell). So if you already have how to find what you don't want to find with

/"([\w\s]+)"/gim

So the easiest thing is to use preg_splitto delete everything that matches that expression

preg_split("/\"([\w\s]+)\"/", $input_line);

When executing this in the chain that you have as an example, it will return three blocks, which are the blocks that are not contained in quotes

[
    1 => 'Lorem ipsum ',
    2 => ', consectetur adipiscing elit, maecenas est felis ',
    3 => '.',
]

If you want to get what will be deleted you do first preg_match()and then you can do a normal split of the string using explodeno need preg_split.

Of course you can use preg_splitbut it would be unnecessary processing cycles.

For the other case it is a bit easier

anything with non-alphanumeric characters

hello
hello
hello
hol*
holo
Print / select those that do not use alphanumeric characters

Simply use a negated range like this expression which marks all non-alphanumeric characters

[^a-zA-z0-9]+

Already with this expression you can get the inputs that do mathc usingpreg_grep

preg_grep("/[^a-zA-z0-9]+/", explode("\n", $input_lines));

output

array(3
1   =>  Hol@
2   =>  hol.
3   =>  hol*
)

everything that doesn't start with a

rice
kiss
academy
Photo
play
art

With this expression:^[^a]+

preg_grep("/^[^a]+/", explode("\n", $input_lines));

output

array(3
1   =>  beso
3   =>  foto
4   =>  juego
)

Paul Vargas · Answer 3 · 2020-12-09T19:26:28+08:00

Paul Vargas

2020-12-09T19:26:28+08:002020-12-09T19:26:28+08:00

If you use the following regular expression:

"([\w ]*)"

regex

Or more exactly something similar to the following code:

<?php
  $str = "Lorem ipsum \"dolor sit amet\", consectetur adipiscing elit, maecenas est felis \"sit amet\".";
  $array = preg_split("/\"([\\w ]*)\"/", $str, -1,
          PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
  print_r($array);
?>

You get the following output:

Array
(
    [0] => Lorem ipsum 
    [1] => dolor sit amet
    [2] => , consectetur adipiscing elit, maecenas est felis 
    [3] => sit amet
    [4] => .
)

^{View demo online.}

If you take a look at the function's documentation preg_split, you'll find that the flag PREG_SPLIT_NO_EMPTYremoves empty strings from the output, and the flag PREG_SPLIT_DELIM_CAPTUREreturns the part of the regular expression enclosed in parentheses in the result.

4

Pollo · Answer 4 · 2021-11-08T22:12:15+08:00

Note: Federico Piazza's answer is excellent, read that one first.
-I want to complement it with how it would be done in JavaScript. Federico talks about the discard technique, but introduces it with control verbs that are exclusive to PCRE (they are not in JavaScript).

Discard technique (Also called " the best Regex trick " by RexEgg) -Works
in JavaScript.

It is very simple, it consists of

/lo que no quieras|(esto sí)/

That is all!

This "trick" is based on the fact that it will match what one does not want to match, but here comes the trick: it will not be captured! That subtle difference is what will let us know if it matched our exception or if it matched what we wanted it to match.

The parentheses in (esto sí)create a group and, like any group, when they coincide with the text they capture it... That means that they are obtained separately in the result of RegExp.exec()or of String.matchAll(). So it's just a matter of checking if something was captured in group 1 or not.

Let's take the example from the question: select all the text except the parts in quotes.

/".*?"|([^"]+)/g

Code:

const regex = /".*?"|([^"]+)/g,
      texto = 'Lorem ipsum "dolor sit amet", consectetur adipiscing elit, maecenas est felis "sit amet".';

let resultado = 
        [ ...texto.matchAll(regex) ]                  //obtenemos todas las coincidencias
            .filter(match => match[1] !== undefined)  //excepto si no capturaron en grupo 1
            .map(match => match[0]);                  //nos quedamos con la coincidencia

console.log(resultado);

How to negate (not select) with regular expressions? in PHP or JavaScript

Added Details

Problem Statement

The result

What I look for

Clarification

Support sources

Lookarounds

Character classes

verbs

anything with non-alphanumeric characters

output

everything that doesn't start with a

output

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?