I have a question about the library subprocess
:
Why, when capturing the output of the terminal, either in Popen()
or in subprocess.run()
, do we have to give the argument stdout
the value of subprocess.PIPE
to capture the output correctly?
According to it subprocess.PIPE
is a pipe, but what is a pipe according to the library subprocess
or according to python
? And why precisely do you have to use pipes to capture the output of the terminal?
I haven't found much information on exactly what pipes are.
Thanks in advance for your answers!
Starting at the beginning, we have three constructor arguments
subprocess.Popen
closely related to your questionstdin
: , ,stdout
andstderr
. As their name indicates, they allow us to specify what we want to do with the output, input, and standard error output of our thread.Valid values are:
None
: there is no redirection. They are inherited directly from the parent process. It is the default value.An existing file descriptor.
An existing file object: we can directly redirect to an open file via
open
for example:subprocess.DEVNULL
: the system special file is usedos.devnull
. The purpose is generally to hide and ignore the output.subprocess.PIPE
: Indicates that a new pipeline should be created for the child process.Now comes the question, what is a pipe?
Pipes are not something specific to Python, it is a very old invention whose conceptual father is Douglas McIlro and which was later implemented in the beginning of the UNIX system back in the 70s. He realized that on the console a lot of the time just the output of one process was passed to the input of another in a chained fashion. To this day, they are a fundamental part of UNIX and Linux by extension and have been implemented in other operating systems.
A pipe is just a very simple way to redirect the standard output of one program to the standard input of another.
|
In the linux console this is done using the (pipe) symbol , for example:The above line gets the processes running on the system (command
ps
) and redirects their output to the input ofsort
for ordering.Imagine a pipe (pipe) of water (data) with several pumps interspersed (processes). As in a physical pipe, the first byte that enters is the first that leaves the other end, it is what is known as FIFO (First In First Out). Like a physical pipe, they are technically unidirectional, you can't pass water in both directions at once (some SO's implemented bidirectional pipes, but that's another matter).
There are two types of pipes:
anonymous
They are created by a parent process and both endpoints (file descriptors) are in memory . Typically, the parent creates the thread and passes the pipe(s) to the child process connecting the two processes through them and which can serve as a communication channel between processes.
After process creation one of them will normally close the writable end of its copy of the pipeline and the other process will close the complementary (readable) end. At that point, each process has only one end of the pipeline, one being the writer process and the other the reader. Obviously, we can create a pair of pipes and make the communication bidirectional and secure.
There are various ways to implement pipelines, on *nix an anonymous pipeline is nothing more than a block of memory or buffer in the kernel, which is read and written by processes. Logically it has a limited size (64KiB in size on Linux if I remember correctly). If the output of one process is very large and at the other end the other process does not consume at an adequate rate. This can cause an infinite lock for example if it is used
Popen.wait
and the process writing to the pipe generates a lot of data, as the writing process can sit forever waiting for the kernel to let it write to a full buffer.With name
They are usually called FIFOs, although any pipe, as mentioned, is FIFOs. A named pipe is opened as a file , in read or write mode, using the normal file system calls to read or write.
Unlike anonymous ones, they have a name, occupying an entry in the file system and that name is used to access them. In addition, they can interconnect two unrelated processes, that is, they do not have to be a process and a thread. Unlike the anonymous ones, they exist until they are explicitly deleted, like any file, which is what they essentially are with some nuances.
In short, going back to Python and subprocess, what pipes allow is to communicate between the parent process and its child processes. In this way, you can get the standard or error output of the child process in your parent process and simultaneously send information via stdin to the child process. Obviously there are more ways to communicate processes, including queues (which are normally implemented on pipes, but allow several processes to consume and write for example)