What is a promise in Javascript?

Question

Asked: 2020-01-24 07:17:24 +0800 CST 2020-01-24 07:17:24 +0800 CST 2020-01-24 07:17:24 +0800 CST

Expression in Regex101.com works but in non-Python 2.7 code

772

Another one about regex.. I have created a regex on a text of a pdf formatted through the tika library , that is, the text of the pdf saved in a variable, in unicode format.

'^[A-Z]\S{2,} *(?:\n+ *\S+ *)*?\n*.*?\d+ +\d+(?:[.,]\d+)?%'

With it I want to get:

Analista programador-DyD 1 49,54%

Programador-DyD 1 50,46%

TOTAL 2 100%

The appearance of the text when doing print() is this:

If we display the content of the variable without doing print() we get this:

That is, where \n appear, they are actually line breaks, as can be seen in the first image, in which we show the content of the variable through the print() function

When I take this text to the web page regex101.com the text is captured as I want, but when running the script it always returns an empty list (I use the findall method of the re module).

Both in this link and in the one above you can see how it matches. It should be noted that on the regex101.com page I have replaced the \n that the raw variable returns (without using the print() function, nor parsing str, nor anything, pure unicode) for line breaks, so that regex101.com don't treat \n as string.

Now the doubt. Why on the web if it works but when passing the text in unicode it doesn't work?

Thank you very much for your time!!

1 Answers

Voted

abulafia · Answer 1 · 2020-01-26T02:28:16+08:00

If you look at the regex101 web page , the regular expression has certain flags activated:

Specifically, it has the "Global" and "Multiline" options active. The "Global" option is irrelevant when you use findall()(although it has its importance for match()), but the "Multiline" option is essential, since with it ^it refers to the beginning of any line, but without it it refers to the beginning of the string. If you try to deactivate it you will see that it no longer finds anything.

In python these flags are activated with additional parameters of findall. In this case it would be:

ll = re.findall(r, pdf, re.MULTILINE)

Now the result (over the text I copied from the regex101 page) is:

['Analista \nprogramador-\nDyD \n\n1 49,54%',
 'Programador-\nDyD \n\n1 50,46%',
 'TOTAL 2 100%',
 'Jefe \nproyecto 1 100%']

Expression in Regex101.com works but in non-Python 2.7 code

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?