What is a promise in Javascript?

Question

Asked: 2020-04-04 02:54:10 +0800 CST 2020-04-04 02:54:10 +0800 CST 2020-04-04 02:54:10 +0800 CST

Possible irregular behavior of the re module in Python

772

The following HTML tag structure is available (excerpted from another larger document):

texto = '''
                <div id="breadcrumb_feature_div" data-feature-name="breadcrumb" data-template-name="breadcrumb" class="a-section a-spacing-none feature t-prnt t-full">
                    <style type="text/css">
                        #a-page .dp-breadcrumb .breadcrumb-inline-links {
                            display: inline;
                        }

                        .dp-breadcrumb {
                            31.25px;
                        }
                    </style>
                    <div class="a-section a-spacing-large">
                        <h4 class="a-spacing-small">Estás aquí</h4>
                        <div class="a-section dp-breadcrumb">
                            <div aria-live="polite" data-a-expander-collapsed-height="125" class="a-expander-collapsed-height a-row a-expander-container a-expander-partial-collapse-container" style="max-height:125px; _height:125px">
                                <div aria-expanded="false" class="a-expander-content a-expander-partial-collapse-content">
                                    <a class="a-spacing-base a-link-normal" href=''>departamentos</a>
                                    <a class="a-size-base a-link-child breadcrumb-inline-links" href=''>Tienda Kindle</a>
                                    <a class="a-size-base a-link-child breadcrumb-inline-links" href=''>eBooks Kindle</a>
                                    <a class="a-size-base a-link-child breadcrumb-inline-links" href=''>Deportes</a>
                                    <a class="a-size-base a-link-child breadcrumb-inline-links" href=''>Ciclismo</a>
                                </div>
                                <div class="a-expander-header a-expander-partial-collapse-header"><a href='' data-action="a-expander-toggle" class="a-declarative" data-a-expander-toggle="{&quot;allowLinkDefault&quot;:true, &quot;expand_prompt&quot;:&quot;Mostrar más&quot;, &quot;collapse_prompt&quot;:&quot;Mostrar menos&quot;}"><i class="a-icon a-icon-extender-expand"></i><span class="a-expander-prompt">Mostrar más</span></a></div>
                            </div>
                        </div>
                    </div>
                </div>
                <div id="returnPolicy_feature_div" data-feature-name="returnPolicy" data-template-name="returnPolicy" class="a-section a-spacing-none feature t-prnt t-full">
                </div>
                <div class="aw-campaigns"></div>
'''

It is intended to extract the 4 labels that contain the classes a-size-base a-link-child breadcrumb-inline-links.

For reasons that are irrelevant, it is considered that the original HTML document is broken (for which the use of parsing libraries such as BeautifulSoup is ruled out) and the use of regular expressions is proposed as a solution, as follows:

>>> import re
>>> z = re.findall(r'^[^\n]+a-size-base a-link-child breadcrumb-inline-links[^\n]+$', texto)
>>> z
[]
>>>

The result we get is an empty list (no matches found), but when we call a regular expression validator like this one , we do get the expected result:

Why does the regular expression produce the expected result in the validator, and not when implemented through the module re?

1 Answers

Voted

FJSevilla · Answer 1 · 2020-04-04T03:21:11+08:00

There is no malfunction, the cause of not finding matches is much simpler. The problem is that you do not have activated the modifier MULTILINESthat the online validator has activated by default.

This causes the anchors ^and $to match at the beginning and end of each line respectively, instead of beginning and ending the entire string.

import re


patt = re.compile(
    r'^[^\n]+a-size-base a-link-child breadcrumb-inline-links[^\n]+$',
    re.MULTILINE
    )
z = patt.findall(texto)

>>> z
['                                    <a class="a-size-base a-link-child breadcrumb-inline-links" href=\'\'>Tienda Kindle</a>',
 '                                    <a class="a-size-base a-link-child breadcrumb-inline-links" href=\'\'>eBooks Kindle</a>',
 '                                    <a class="a-size-base a-link-child breadcrumb-inline-links" href=\'\'>Deportes</a>',
 '                                    <a class="a-size-base a-link-child breadcrumb-inline-links" href=\'\'>Ciclismo</a>']

Possible irregular behavior of the re module in Python

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?