What is a promise in Javascript?

Question

Asked: 2020-03-19 11:02:51 +0800 CST 2020-03-19 11:02:51 +0800 CST 2020-03-19 11:02:51 +0800 CST

Regular expression for code comment removal

772

I am trying to make a regular expression that removes comments of the style //and /**/, at the moment, I used one taken from this site :

(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*)

The problem was when the comment was in the form of a literal string ( with either single or double quotes ), eg:

var a = "//Holaaa";

So I tried to use lookbehindand lookaheadtogether to escape both quotes and it came out like this:

(?<!\"|\')((/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*))(?!\"|\')

The problem with this is that it doesn't work for cases like the following:

var ar = "asasasas /*dsdsdsd*/ "
var ar = "asaasas //dsdsdsd"
var ar = "asasasas /*dsdsdsd*/ dsadasdsda"
var ar = "asaasas //dsdsdsd asdsadsadsad"

I tried changing (?<!\"|\')by (?<!\"|\'.*)and (?!\"|\')by (?!.*\"|\'), but that didn't work either.

What am I missing?

Note: the idea is to use it in Java, but the answer does not necessarily have to be in its standard, as long as I know the expression later I can adapt it on my own.

1 Answers

Voted

Mariano · Answer 1 · 2020-03-19T13:01:48+08:00

Problem

Honestly, the regex you pulled from that page sucks, not only in what it leaves out, but also in terms of efficiency. Your attempt to fix it with assertions ( lookahead / lookbehind ) is good, but it's a strategy that doesn't work very well. The explanation of why it will not work is too long, but it could be summarized in that something like (?<!"|')only checks 1 character back from the current position and, as much as we could do it with a variable length (-no, it can't), you would not be able to determine if the previous quote is opening or closing a comment. In short: wrong strategy (in which we have all fallen).

Solution

For this type of case, where all the syntax prior to the position in which the match is sought is relevant, the way to get to that point is by consuming each part of the text, while validating each structure.

The regex should be anchored at the beginning of the text, or at the end of the previous replacement ( \G), and match the text where a comment has no meaning, until the comment is found. Broadly speaking, it would replace

\G  ([lo que no es comentario]*)  comentario

where all previous text is captured and included in when replacing with

$1

Regular phrase

Now, finding everything that is not a comment involves matching all characters except those with special meaning , and adding rules to match each of those exceptions (one \that escapes a character, quoted text, etc).

As a way to simplify the explanation, I commented out the regex with the target of each structure:

\G                                  # Anclar a \A o fin de coincidencia previa
(                                   # GRUPO 1: capturar todo lo que no es comentario en $1:
  [^"'/\\]*                         #   caracteres sin significado especial
  (?:                               #   estructuras especiales:
    (?: \\.                         #       a. barra escapando caracter
      | /(?![*/])                   #       b. una / que no está seguida de / o *
      | "[^"\\]*(?:\\.[^"\\]*)*"    #       c. texto entre comillas dobles
      | '[^'\\]*(?:\\.[^'\\]*)*'    #       d. texo entre comillas simples
    )                               #
    [^"'/\\]*                       #     seguido de más caracteres sin significado
  )*+                               #   (estructuras especiales repetidas 0 a inf)
)                                   # fin de Grupo 1

(?:                                 # COMENTARIOS (no está dentro de $1)
   //.*                             #   a. // hasta el final de la linea
|  /\*[^*]*(?:\*(?!/)[^*]*)*\*/     #   b. /* hasta el siguiente */
)

Or in a line without comments:

\G([^"'/\\]*(?:(?:\\.|/(?![*/])|"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')[^"'/\\]*)*+)(?://.*|/\*[^*]*(?:\*(?!/)[^*]*)*\*/)

With escaped slashes and quotes for Java:

final String regex = "\\G([^\"'/\\\\]*(?:(?:\\\\.|/(?![*/])|\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\.[^'\\\\]*)*')[^\"'/\\\\]*)*+)(?://.*|/\\*[^*]*(?:\\*(?!/)[^*]*)*\\*/)";

show

https://regex101.com/r/wDg8LJ/1/

Java code

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = 
       "\\G"                                         // Anclar a \\A o fin de coincidencia previa
     + "("                                           // GRUPO 1: capturar todo lo que no es comentario en $1:
     + "  [^\"'/\\\\]*"                              //   caracteres sin significado especial
     + "  (?:"                                       //   estructuras especiales:
     + "    (?: \\\\."                               //       a. barra escapando caracter
     + "      | /(?![*/])"                           //       b. una / que no está seguida de / o *
     + "      | \"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""  //       c. texto entre comillas dobles
     + "      | '[^'\\\\]*(?:\\\\.[^'\\\\]*)*'"      //       d. texo entre comillas simples
     + "    )"                                       //
     + "    [^\"'/\\\\]*"                            //     seguido de más caracteres sin significado
     + "  )*+"                                       //   (estructuras especiales repetidas 0 a inf)
     + ")"                                           // fin de Grupo 1
     + "(?:"                                         // COMENTARIOS (no está dentro de $1)
     + "   //.*"                                     //   a. // hasta el final de la linea
     + "|  /\\*[^*]*(?:\\*(?!/)[^*]*)*\\*/"          //   b. /* hasta el siguiente */
     + ")";

final String texto = 
       "var ar1 = \"asasasas /*dsdsdsd*/ \"\n"
     + "var ar2 = \"asaasas //dsdsdsd\"\n"
     + "var ar3 // = \"asaasas //dsdsdsd asdsadsadsad\"\n"
     + "/*var ar4*/ = \"asasasas /*dsdsdsd*/ dsadasdsda\" /*\n"
     + "var ar5 = \"asaasas //dsdsdsd asdsadsadsad\" //comentario\n"
     + "var ar6 = \"x\" */ + \"asaasas //dsdsdsd asdsadsadsad\"\n"
     + "var ar7 = \"x\" // + \"asaasas //dsdsdsd asdsadsadsad\"\n"
     + "var ar8 = \"asaasas //dsdsdsd asdsadsadsad\" //comentario\n"
     + "var ar9 = \"asaasas //dsdsdsd asdsadsadsad\"";

final String reempl = "$1";

final Pattern pattern = Pattern.compile(regex, Pattern.COMMENTS);
final Matcher matcher = pattern.matcher(texto);

final String resultado = matcher.replaceAll(reempl);

System.out.println("TEXTO:\n" + texto);
System.out.println("\nRESULTADO:\n" + resultado);

Result

TEXTO:
var ar1 = "asasasas /*dsdsdsd*/ "
var ar2 = "asaasas //dsdsdsd"
var ar3 // = "asaasas //dsdsdsd asdsadsadsad"
/*var ar4*/ = "asasasas /*dsdsdsd*/ dsadasdsda" /*
var ar5 = "asaasas //dsdsdsd asdsadsadsad" //comentario
var ar6 = "x" */ + "asaasas //dsdsdsd asdsadsadsad"
var ar7 = "x" // + "asaasas //dsdsdsd asdsadsadsad"
var ar8 = "asaasas //dsdsdsd asdsadsadsad" //comentario
var ar9 = "asaasas //dsdsdsd asdsadsadsad"

RESULTADO:
var ar1 = "asasasas /*dsdsdsd*/ "
var ar2 = "asaasas //dsdsdsd"
var ar3 
 = "asasasas /*dsdsdsd*/ dsadasdsda"  + "asaasas //dsdsdsd asdsadsadsad"
var ar7 = "x" 
var ar8 = "asaasas //dsdsdsd asdsadsadsad" 
var ar9 = "asaasas //dsdsdsd asdsadsadsad"

show

http://ideone.com/NSGmCL

Regular expression for code comment removal

Problem

Solution

Regular phrase

Java code

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?