When users use my web application, I would like to run tasks that take a long time in the background.
The threading library does not actually execute code in parallel because of the GIL . How can I execute tasks in another thread?
When users use my web application, I would like to run tasks that take a long time in the background.
The threading library does not actually execute code in parallel because of the GIL . How can I execute tasks in another thread?
An efficient and scalable way to run tasks asynchronously is to use a queue library like celery . With this library, you define "workers" that are processes (not threads) that are the ones that execute the heavy tasks. An interesting aspect of this solution is that there can be many workers (even on different servers) executing the tasks.
The architecture of the solution is as follows:
The following is a simple application that you use
redis
as a Broker.consumidor.py
send messages toproductor.py
I assume that it
redis
is working and that itvirtualenv
is installed. With these commands you can install all dependencies:Both the consumer and the producer need the settings, so I'm going to store them in (I'm using my local
config.py
database ):1
redis
This is the content of
productor.py
. It only exposes a task (ejecutar_tarea
) that takes 10 seconds before printing the result. This is to see how this delay affects the consumer:And this is the code for
consumidor.py
. All it does is receive a message from the console and send it to the producer:The best way to test this is in the console, open dos and activate the virtual environment.
The producer runs like this:
here I am starting it with log level:
info
to have details of what is happening. The various options that can be used when starting a worker are detailed in the celery documentation .The consumer is executed by running:
When writing a message to the consumer, you can see that the producer receives it and after 10 seconds prints it. The interesting thing is that the consumer does not need to wait those 10 seconds, but is instantly available to process another. If the producer receives many messages, then they are queued.
All in all, this strategy is easy to set up and works very well. When more force is needed, it is relatively easy to add new workers to help with the load.
Despite the comment I made before, I'm going to try to give an example using Futures :
The "futures" are an abstraction for the concurrent execution of code that are equally useful for the execution of threads ( Threads ) and for the execution of processes ( Process ). To do this, it uses an execution manager that is in charge of execution and delivering the results (the "promises" ) to whoever requests them.
For example, we want to calculate the factorial of a sequence of numbers. With
we would launch the parallel execution of the calculation of the factorial of 4000 numbers. He
executor
is the one who manages threads (ThreadPoolExecutor
) or processes (ProcessPoolExecutor
), we can even let him decide the number of "workers" that he should use depending on the number of cores of our CPU.The full example:
In general, the times in windows with processes are quite bad, and can be worse than with threads in some cases.
One way to work "asynchronously" would be to add more tasks to the executor as you go, something like this:
Edit : I have simplified the code to make it somewhat better understood.
It should be noted that it
res20
stores results only from the second batch of futuresfutures2
. At that moment we could have canceled the execution of the pending futures of the first round if they were not going to be needed for anything else.PS: if it doesn't work for you, make sure you use python 3.6