What is a promise in Javascript?

Question

Asked: 2020-05-09 15:16:02 +0800 CST 2020-05-09 15:16:02 +0800 CST 2020-05-09 15:16:02 +0800 CST

How can I execute asynchronous tasks in Python?

772

When users use my web application, I would like to run tasks that take a long time in the background.

The threading library does not actually execute code in parallel because of the GIL . How can I execute tasks in another thread?

2 Answers

Voted

g3rv4 · Answer 1 · 2020-05-09T15:16:02+08:00

An efficient and scalable way to run tasks asynchronously is to use a queue library like celery . With this library, you define "workers" that are processes (not threads) that are the ones that execute the heavy tasks. An interesting aspect of this solution is that there can be many workers (even on different servers) executing the tasks.

The architecture of the solution is as follows:

Consumer: the application that users use, if it is web it can be a django or flask application
Producer: the "worker" who is the one who does the heavy lifting
Broker: is the mechanism used by the consumer to store the "pending jobs" and is what "wakes up" the producer when there is work to be done
Backend: it is the mechanism that the producer uses to store the results of the task (if the consumer does not need to know the result, it can not be configured)

The following is a simple application that you use redisas a Broker. consumidor.pysend messages toproductor.py

I assume that it redisis working and that it virtualenvis installed. With these commands you can install all dependencies:

virtualenv env
source env/bin/activate
pip install celery[redis]

Both the consumer and the producer need the settings, so I'm going to store them in (I'm using my local config.pydatabase ):1redis

config = {
    'broker': 'redis://localhost:6379/1'
}

This is the content of productor.py. It only exposes a task ( ejecutar_tarea) that takes 10 seconds before printing the result. This is to see how this delay affects the consumer:

from config import config
from celery import Celery
from time import sleep

app = Celery('tasks', broker=config['broker'])

# pasar la configuracion como json
app.conf.CELERY_ACCEPT_CONTENT = ['json']
app.conf.CELERY_TASK_SERIALIZER = 'json'


@app.task
def ejecutar_tarea(mensaje):
    # esto demoooora
    sleep(10)

    print 'Mensaje recibido: %s' % mensaje

And this is the code for consumidor.py. All it does is receive a message from the console and send it to the producer:

from productor import ejecutar_tarea

while True:
    mensaje = raw_input('Mensaje: ')
    ejecutar_tarea.delay(mensaje)

The best way to test this is in the console, open dos and activate the virtual environment.

The producer runs like this:

celery -A productor worker -l info

here I am starting it with log level: infoto have details of what is happening. The various options that can be used when starting a worker are detailed in the celery documentation .

The consumer is executed by running:

python consumidor.py

When writing a message to the consumer, you can see that the producer receives it and after 10 seconds prints it. The interesting thing is that the consumer does not need to wait those 10 seconds, but is instantly available to process another. If the producer receives many messages, then they are queued.

All in all, this strategy is easy to set up and works very well. When more force is needed, it is relatively easy to add new workers to help with the load.

ChemaCortes · Answer 2 · 2020-05-11T08:51:39+08:00

Despite the comment I made before, I'm going to try to give an example using Futures :

The "futures" are an abstraction for the concurrent execution of code that are equally useful for the execution of threads ( Threads ) and for the execution of processes ( Process ). To do this, it uses an execution manager that is in charge of execution and delivering the results (the "promises" ) to whoever requests them.

For example, we want to calculate the factorial of a sequence of numbers. With

with executor as e:
    futures = {e.submit(fact, i):i for i in range(0,4000)}

we would launch the parallel execution of the calculation of the factorial of 4000 numbers. He executoris the one who manages threads ( ThreadPoolExecutor) or processes ( ProcessPoolExecutor), we can even let him decide the number of "workers" that he should use depending on the number of cores of our CPU.

The full example:

import time
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, wait

def fact(n):
    res = 1
    for i in range(2, n+1):
        res *= i
    return res

if __name__ == "__main__":

    t0 = time.time()

    with ThreadPoolExecutor(max_workers=50) as e:
        fs = {e.submit(fact, i):i for i in range(4000)}
        wait(fs)

    t1 = time.time()

    with ProcessPoolExecutor() as e:
        fs = {e.submit(fact, i):i for i in range(4000)}
        wait(fs)

    t2 = time.time()

    print(f"Ejecución con hilos: {t1-t0:.2f}s")
    print(f"Ejecución con procesos: {t2-t1:.2f}s")

In general, the times in windows with processes are quite bad, and can be worse than with threads in some cases.

One way to work "asynchronously" would be to add more tasks to the executor as you go, something like this:

import time
from collections import deque
from itertools import islice
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, as_completed

def fact(n):
    res = 1
    for i in range(2, n+1):
        res *= i
    return res

if __name__ == "__main__":

    with ProcessPoolExecutor() as e:
        # metemos 1000 procesos
        fs1 = { e.submit(fact, i):i for i in range(1000) }
        print("Lanzados 1000 futuros")

        # obtenemos los primeros 10 resultados
        res10 = { fs1[f]:f.result() for f in islice(as_completed(fs1), 10)}

        # metemos otros 100000 más
        fs2 = { e.submit(fact, i):i for i in range(1000, 100000) }
        print("Lanzados 100000 futuros más")

        # obtenemos 10 resultados de la segunda tanda
        res20 = { fs2[f]:f.result() for f in islice(as_completed(fs2), 1000, 1010)}

        # obtenemos los últimos 10 resultados de la tanda anterior
        res30 = { fs1[f]:f.result() for f in deque(as_completed(fs1), 10)}

        # Espera para comprobar cómo sube el consumo de CPU
        print("Espera de 3 segundos")
        time.sleep(3)

        # Cancelamos todos los procesos que no hayan acabado de la segunda tanda
        for f in fs2:
            f.cancel()
        print("Cierre del ejecutor de procesos")

    print()
    for (n,res) in res10.items():
        print(f"fact({n:4d}) = {str(res)[:80]}...")

    print()
    for (n,res) in res20.items():
        print(f"fact({n:4d}) = {str(res)[:80]}...")

    print()
    for (n,res) in res30.items():
        print(f"fact({n:4d}) = {str(res)[:80]}...")

Edit : I have simplified the code to make it somewhat better understood.

It should be noted that it res20stores results only from the second batch of futures futures2. At that moment we could have canceled the execution of the pending futures of the first round if they were not going to be needed for anything else.

PS: if it doesn't work for you, make sure you use python 3.6

How can I execute asynchronous tasks in Python?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?