Borjinha10's Questions

Asked: 2020-01-24 07:17:24 +0800 CST

Expression in Regex101.com works but in non-Python 2.7 code

Another one about regex.. I have created a regex on a text of a pdf formatted through the tika library , that is, the text of the pdf saved in a variable, in unicode format.

'^[A-Z]\S{2,} *(?:\n+ *\S+ *)*?\n*.*?\d+ +\d+(?:[.,]\d+)?%'

With it I want to get:

Analista programador-DyD 1 49,54%

Programador-DyD 1 50,46%

TOTAL 2 100%

The appearance of the text when doing print() is this:

If we display the content of the variable without doing print() we get this:

That is, where \n appear, they are actually line breaks, as can be seen in the first image, in which we show the content of the variable through the print() function

When I take this text to the web page regex101.com the text is captured as I want, but when running the script it always returns an empty list (I use the findall method of the re module).

Both in this link and in the one above you can see how it matches. It should be noted that on the regex101.com page I have replaced the \n that the raw variable returns (without using the print() function, nor parsing str, nor anything, pure unicode) for line breaks, so that regex101.com don't treat \n as string.

Now the doubt. Why on the web if it works but when passing the text in unicode it doesn't work?

Thank you very much for your time!!

Borjinha10

Asked: 2020-01-19 03:13:46 +0800 CST

Question about python and assigning variables to a dictionary in a loop

The question is:

Based on the image I attached, I would like to know why the conditional I have created inside the loop in the second loop assigns None to the dictionary variable? It should do it right but I can't find the answer.

I have tried to search the Python documentation but I can't find anything, if someone can clarify this question for me I would appreciate it !!

Attached image:

Borjinha10

Asked: 2020-01-17 02:58:57 +0800 CST

How to use Regex in Python 2.7

I put you in context: I have made a couple of scripts with regular expressions through the python re module.

def get_document_emails(pdf_format_content):
    """ The function runs through the document and extracts all the email addresses it finds. 
        This method returns an ordered list of document emails without repeating. """
    new_list = []
    document_emails = re.findall(r'[w.\w]*@w*.[w.\w]*', pdf_format_content)
    for i in document_emails:
        if i not in new_list and not str(i).endswith('.'):
            new_list.append(i)
    return sorted(new_list)

def get_document_provider(pdf_format_content):
    """ Return the name of the provider """
    return re.match(r'(PROVEEDOR:)+(.*?\\n)', pdf_format_content)

The problem arises when executing the second function. I have used the tika library to extract the information from the pdf's and then call these functions to extract the emails from the document and the supplier data.

Both regular expressions I have tested in Regex101 and they capture what I need, at the web page scope. When I run my scripts in the console, with ipython, the first one works fine but the second one doesn't, I've tried the findall(), match(), search() functions... and they all return NoneType or [].

The first function works on a unicode string, and returns the list of mails from the document without problems, in unicode format, but the second function does not. I have also tried to encode it as utf-8 but it gives me an encoding error in some chars, also convert it to String with the following statement: fpdf = fpdf.encode('utf-8').strip() But the result is the same , or empty list or NoneType..

I've read the documentation for the re module , surfed the web on multiple sites and tried a bunch of lines of code but always get the same result.

What bothers me the most is that I can't understand why with emails it works but with the other regex it doesn't.

The image shows that the pattern works correctly.

In this other image I show the console output, I have removed the sensitive information but I think you understand what I want you to see. If anyone can help me fix this I would be very grateful.

Thank you all!!!

EDIT

After the comments of the colleagues and solving the regular expression leaving it like this r'(PROVIDER:)+(.*?\n)' I tried to see if it solved it.

Nothing could be further from the truth, although I still don't understand why the first expression works and finds results and the second expression doesn't work, on the same variable that contains the text in unicode format.

I attach the complete code that I use:

# -*- coding: UTF-8 -*-
import re
from tika import parser


def get_pdf_content(path):
    """ Return the text content from the file given through path variable. """
    pdf = parser.from_file(path)
    return pdf['content']

def format_pdf_content(pdf_content):
    """ The method formats the content of the pdf, removes the line breaks and returns a unicode string. """
    variable = filter(lambda i: i != '\r', pdf_content)
    return "".join(variable)

def get_document_emails(pdf_format_content):
    """ The function runs through the document and extracts all the email addresses it finds. 
        This method returns an ordered list of document emails without repeating. """
    new_list = []
    document_emails = re.findall(r'[w.\w]*@w*.[w.\w]*', pdf_format_content)
    for i in document_emails:
        if i not in new_list and not str(i).endswith('.'):
            new_list.append(i)
    return sorted(new_list)

def get_document_provider(pdf_format_content):
    """ Return the name of the provider """
    return re.match(r'(PROVEEDOR:)+(.*?\r)', pdf_format_content)

Thank you very much again!!

Borjinha10

Asked: 2020-05-07 08:22:21 +0800 CST

tomcat 9.0 server only return first service

Hello good afternoon, I ask this question because I have searched a lot on the internet but I have not found a solution to this problem.

The issue is that I start the server and everything goes smoothly, I have several services implemented, PUT and GET services, and when calling them through postman they all return a 200 response, as long as it is the first request after starting the server . That is, the second and subsequent services are lost in limbo, since the connection is accepted by the server.

Of course I have put breakpoints in the services and started the server in "debug-mode" to see if I could get some info but all to no avail.

I attach images of the server console, of the persistence.xml configuration, since I use JPA with Hibernate as the persistence manager, and of the pom.xml. In principle, I think that the JPA annotations are fine, since the ide does not give me any errors and in the first service I add, delete and consult records from the database. Add that I use Jersey as a serializer.

Thank you very much in advance, I hope you can help me. :)

EDIT:
I have managed to isolate the failures, and these are in the GET services, the PUT services work fine. There seems to be some problem with the connection to the database, because when it gets "hanged" if I try to access the database records through the mysql console, the query is not returned until I close the raw server (pressing the "terminate" button) I add the stacktrace of the error below. Thank you very much again.

Information from the server console, persistence.xml and pom.xml:

INFO: HHH10001501: Connection obtained from JdbcConnectionAccess [org.hibernate.engine.jdbc.env.internal.JdbcEnvironmentInitiator$ConnectionProviderJdbcConnectionAccess@5eb3a986] for (non-JTA) DDL execution was not in auto-commit mode; the Connection 'local transaction' will be committed and the Connection will be set into auto-commit mode. May 16, 2018 10:06:00 AM org.apache.catalina.core.StandardServer await INFORMATION: A shutdown command was received through the shutdown port. Stopping the Server instance. May 16, 2018 10:06:00 AM org.apache.coyote.AbstractProtocol pause INFO: Pausing ProtocolHandler ["http-nio-8080"] May 16, 2018 10:06:00 AM org.apache.coyote.AbstractProtocol pause INFO : Pausing ProtocolHandler ["ajp-nio-8009"] May 16, 2018 10:06:00 AM org.apache.catalina.1 instance(s) to retrieve reserved space May 16, 2018 10:06:01 AM org.apache.catalina.core.StandardWrapper unload INFORMATION: Waiting for 1 instance(s) to retrieve reserved space May 16, 2018 10: 06:02 AM org.apache.catalina.core.StandardWrapper unload INFO: Waiting for 1instance(s) to retrieve your reserved space May 16, 2018 10:06:02 AM org.apache.catalina.loader.WebappClassLoaderBase clearReferencesJdbc WARNING: Web application [ILoan] registered JDBC driver [com.mysql.jdbc.Driver] but it failed to unregister while the web app was stopped. To prevent a memory leak, the JDBC driver has been forcibly unregistered. May 16, 2018 10:06:02 AM org.apache.catalina.loader.WebappClassLoaderBase clearReferencesThreads WARNING: The web application [ILoan] is still processing a request that has yet to finish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Context implementation. Stack trace of request processing thread:[ java.net.SocketInputStream. The web application [ILoan] appears to have started a thread named [Abandoned connection cleanup thread] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.lang.Object.wait(Native Method) java.lang.ref.ReferenceQueue.remove(Unknown Source) com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:43) May 16, 2018 10 :06:02 AM org.apache.catalina.loader.WebappClassLoaderBase clearReferencesThreads WARNING: The web application [ILoan] appears to have started a thread named [pool-2-thread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) java.util.concurrent.locks. AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source) java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.lang.Thread.run(Unknown Source) May 16, 2018 10:06:02 AM org.apache.catalina.loader.WebappClassLoaderBase clearReferencesThreads WARNING: The web application [ILoan] appears to have started a thread named [pool-3-thread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks. LockSupport.parkNanos(Unknown Source) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take (Unknown Source) java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.lang.Thread .run(Unknown Source) May 16, 2018 10:06:02 AM org.apache.catalina.loader.WebappClassLoaderBase clearReferencesThreads WARNING: The web application [ILoan] appears to have started a thread named [pool-4-thread-1] but you have failed to stop it. This is very likely to create a memory leak. Stack trace of thread: ThreadLocal@39c0db85]) and a value of type [org.glassfish.jersey.internal.Errors] (value [org.glassfish.jersey.internal.Errors@42630ea9]) but could not remove it when the web application stopped. The threads will be renewed over time to try to avoid a possible memory leak. May 16, 2018 10:06:02 AM org.apache.catalina.loader.WebappClassLoaderBase checkThreadLocalMapForLeaks FATAL: Web application [ILoan] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal @23fdc732]) and a value of type [org.glassfish.jersey.process.internal.RequestScope.Instance] (value [Instance{id=37621d2c-f027-4731-b0e6-c0ec9732fb4a, referenceCounter=2, store size=3} ]) but could not remove it when the web application stopped. The threads will be renewed over time to try to avoid a possible memory leak. May 16, 2018 10:06:02 AM org.apache.coyote.AbstractProtocol stop INFO: Stopping ProtocolHandler ["http-nio-8080"] May 16, 2018 10:06:07 AM org.apache.tomcat.util.net.AbstractEndpoint shutdownExecutor WARNING: The executor associated with thread pool [http-nio-8080] has not fully shutdown. Some application threads may still be running. May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol stop INFORMATION: Stopping ProtocolHandler ["ajp-nio-8009"] May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol destroy INFORMATION : Destroying ProtocolHandler ["http-nio-8080"] May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol destroy INFORMATION: Destroying ProtocolHandler ["ajp-nio-8009"] Stopping ProtocolHandler ["http-nio-8080"] May 16, 2018 10:06:07 AM org.apache.tomcat.util.net.AbstractEndpoint shutdownExecutor WARNING: The executor associated with thread pool [http-nio-8080] has not fully shutdown. Some application threads may still be running. May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol stop INFORMATION: Stopping ProtocolHandler ["ajp-nio-8009"] May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol destroy INFORMATION : Destroying ProtocolHandler ["http-nio-8080"] May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol destroy INFORMATION: Destroying ProtocolHandler ["ajp-nio-8009"] Stopping ProtocolHandler ["http-nio-8080"] May 16, 2018 10:06:07 AM org.apache.tomcat.util.net.AbstractEndpoint shutdownExecutor WARNING: The executor associated with thread pool [http-nio-8080] has not fully shutdown. Some application threads may still be running. May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol stop INFORMATION: Stopping ProtocolHandler ["ajp-nio-8009"] May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol destroy INFORMATION : Destroying ProtocolHandler ["http-nio-8080"] May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol destroy INFORMATION: Destroying ProtocolHandler ["ajp-nio-8009"] The executor associated with thread pool [http-nio-8080] has not fully shutdown. Some application threads may still be running. May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol stop INFORMATION: Stopping ProtocolHandler ["ajp-nio-8009"] May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol destroy INFORMATION : Destroying ProtocolHandler ["http-nio-8080"] May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol destroy INFORMATION: Destroying ProtocolHandler ["ajp-nio-8009"] The executor associated with thread pool [http-nio-8080] has not fully shutdown. Some application threads may still be running. May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol stop INFORMATION: Stopping ProtocolHandler ["ajp-nio-8009"] May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol destroy INFORMATION : Destroying ProtocolHandler ["http-nio-8080"] May 16, 2018 10:06:07 AM org.apache.coyote.AbstractProtocol destroy INFORMATION: Destroying ProtocolHandler ["ajp-nio-8009"]

Expression in Regex101.com works but in non-Python 2.7 code

Question about python and assigning variables to a dictionary in a loop

How to use Regex in Python 2.7

tomcat 9.0 server only return first service

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?

Borjinha10's questions