I have a function that is called by several different functions. It is called directly or, depending on the routine, it is called by a process Process
. This function also uses a global variable defined at the beginning of the script. The problem is that when the function is called via a Process
global variable is no longer defined. I expose an example [Very simplified]:
#!/usr/bin/python3.5
from multiprocessing import Process
def funcion1():
print(mi_variable)
def funcionProceso():
try:
evaluacion = Process(target=funcion1)
evaluacion.start()
evaluacion.join()
except Exception as e:
e = str(e)
print(e)
if __name__ == '__main__':
mi_variable = 'Hola'
##Al llamar la función obviamente se imprime mi_variable en forma correcta
funcion1()
##Al llamar la función a través de la función funcionProceso
# ocurre el error: NameError: name 'mi_variable' is not defined
funcionProceso()
In funcion1
, I could check if the variable is defined or not and define it in case it is not, but I still have the question since I have no idea why what I expose above happens:
def funcion1():
global mi_variable
try:
mi_variable
except NameError:
mi_variable = 'Hola'
print(mi_variable)
else:
print(mi_variable)
Rounding out Trauma's answer a bit.
There are different ways to implement the new process and it basically depends on the operating system in turn, the two main and best known ways (in very broad strokes) are:
fork
: in this case a copy of the parent process is created, the child process is already in possession of exactly the same resources as the parent, every data structure, open file, connection, etc. that existed in the parent process is still there and you can use on the secondary immediately. From here a fork occurs (hence the "fork") and each process continues its work on its own. Process creation is much faster and lighter on resources.spawn
: in this case a new process is created from scratch, with its own Python interpreter and loads all the modules again .The method
fork
is typical of POSIX systems and will therefore not be available in some cases, such as in Windows, which lacks the ability to callfork()
.In the case of using
spawn
, a new interpreter is launched and it executes the module again (imports everything again, instantiates global objects from scratch, etc), it is almost as if we execute the module in a new terminal, I say "almost" because logically there are some special instructions. The child process does not execute the sectionif __name__ == '__main__'
, which is where you define your variable, simply because the child process's interpreter does not execute the module as a "parent module" . By not executing the conditional, the variable will not exist when the function is called.In fact, the conditional
if __name__ == '__main__'
is used to protect the entry point from executing things that shouldn't be executed in the child process, most obviously not being calledProcess
from the child process again. The use of this conditional, although it is always a good idea, is especially relevant in Windows or when we force the use ofspawn
.From this it follows that if you declare the variable in the global space, but outside the conditional, you will not have the problem mentioned:
In POSIX systems the creation of processes can use the fork as we have said and all the global state of the parent is kept intact from start . In this case, the variable does exist because it was created by the parent process, which you executed as the main module, and then "copied" by the child to prepare the fork, so it doesn't show the error you describe.
The consequences and benefits of one system or another when setting up a new process go further and it is a long discussion.
If someone wants to reproduce the error in POSIX, just change the process creation mode, which is
fork
by default, to usespawn
:In any case, be careful with global variables, if you also join them to multiprocess/multithread... Keep in mind that the variable is "copied" in any case, it is never shared between processes , if the child process modifies it by the parent it won't find out and vice versa no matter what
fork
or is usedspawn
. As Trauma comments in his answer , in no case can you assume that the variable has the same value in both processes, not even at the beginning , for example assuming that it is usedspawn
and your variable is defined bymi_variable = random.randint(1, 100)
the safest thing is that in each process it has a different value . It is generally more appropriate to pass variables as an argument, as long as the object is picklable, but they will be different variables anyway:If you really need to share the variable, you must use adequate and safe methods for this, such as queues, shared memory (
multiprocesing.Value
,multiprocesing.Array
),Manager
, etc.From the python documentation:
Broadly speaking, the way python implements process duplication varies from platform to platform; Even on the same platform, there may be more than one way to duplicate the program.
Some of these methods are more restrictive than others: specifically, the
spawn
and methodsforkserver
include the following warning :The effect you're seeing is due to that: on your platform, the method used to clone your process causes that effect.
From this we also deduce one thing: the effect does not have to occur on different platforms . In yours we see that yes, in a different one it doesn't have to.