-The question is simple: how does Python store extremely large numbers in memory, in other words...
def factorial(n):
facto = 1
while n > 0:
facto = facto*n
n-=1
return facto
print(factorial(100000000000000000000000000000000000000000000))
-How does this work? It is assumed that a number of such magnitude would never fit into a single memory space...
Clarification: I propose the example in Python, since it is the only language that I know that is capable of storing numbers that are too large, I have never managed to do something like that with C for example, in fact in C, using the data type more large (long double
) it is not capable of storing a result of such magnitude. In the event that someone can make some code inC
which a number like this is stored in memory, it would be pertinent to establish it as an answer
In C, the integers have a size fixed by the compiler and chosen by the programmer, through the reserved words
short
(16 bits),long
(32 bits) orlong long
(64 bits).Obviously a fixed size imposes a limit, which is 2 raised to the number of bits (for integers
unsigned
) or raised to the number of bits minus 1 (for signed integers).Thus, for example, in a data type
unsigned long int
(32 bits), the largest data that can fit is 2 32 . If you add 1 to that data, you get zero as a result (and in the ALU the carry bit would be activated, but that fact would be ignored, and you would definitely have an erroneous result).In Python instead the implementation is more intelligent. By default, integers start with 32 bits, but when any operation with integers is going to be done, the interpreter determines in advance if the result will fit in 32 bits or not (for example, if two 32-bit integers are going to be multiplied, it is likely that the result does not fit, as it may require up to 64 bits).
When the interpreter detects that the result will not fit in 32 bits, it "increases" it. In reality, what happens is that the integer is implemented through an array of
unsigned long int
, and each of these elements stores a single "cipher" of the integer to be stored, but expressed in base 2 30 .For example, if the number is less than 2 30 , it would fit in a single "digit" (that is, a single element of the array), but if it is greater, that array is resized to add more "digits".
For example
Consider the result
factorial(20)
that is:This number is greater than 2 30 (which is
1073741824
), so you have to pass it to the base 1073741824 to find its digits. The conversion consists of dividing by 1073741824 to get the "tens", and the rest of that division will be the "ones". In this case it outputs:2432902008176640000 // 1073741824
= 22658165622432902008176640000 % 1073741824
= 118332914As you can see, the units already come out less than 2 30 (it cannot be otherwise because it is the rest of the division), but the "tens" still come out higher than 2 30 , so you have to divide again to get "hundreds". ":
2265816562 // 1073741824
= 22265816562 % 1073741824
= 118332914 (coincidentally in this case it is equal to the "units")So the number in question occupies three "digits" in base 2 30 , and can therefore be stored in an array of 3 positions (where each position is a
unsigned long int
C type).This is how Python manages to store numbers of any size. As the number grows, it dynamically allocates memory for more elements in that array, and thus for more "ciphers".
Naturally, operating with this type of number is not as immediate as operating with native C types. When in C you do
a + b
, beinga
andb
variables of typelong int
, basically the operation is reduced to transferring these variables to two CPU registers and using the ALU of the CPU to add those registers, which basically resolves itself in a single assembly operation. In Python, on the other hand, a whole logic is necessary that allows operating with the elements of the array explained above (the "numbers"), propagating the carry from one "number" to another, etc.However, if the data handled by the python program is less than 2 30 , since that fits into a single "figure", the operation would also be resolved in a few CPU cycles.
In Python 2.7 there was a difference between int and long int . For the case of int , 32 bits are used, while the case of long int is not restricted by the number of bits, but can be expanded throughout the memory (until it reaches its limit).
As of Python 3 , there is only one data type, which is int , and they include the two types explained.
Here you can find some more information: Doing Math in Python
You can also consult this article which explains in detail how such numbers can be stored in memory: How python implements super long integers?
I will try to answer your question technically (in Spanish) since the last point you mention refers to memory space , I assume you mean Hardware since you do not specify it:
- how Python does to store extremely large numbers:
A. Python 3 only has one data type which is int and they include normal and long types.
- How does this work? It is assumed that a number of such magnitude would never fit into a single memory space
A. Technically Python (Language) does not Manage Memory (Hardware) to be used to store data; this language gives the instruction to the PWM Virtual Machine or execution environment, which is written in C so that it communicates with the Hardware (CPU and Memory), so that in machine language ( Bytecode in this case) the type data and memory.
Also the Virtual Machine Manages its own Virtual RAM space so to speak and can be even larger than the Physical RAM.
Integers: in this case, if a memory space is not enough, which usually happens, the system will create a memory index (array) that will store the number divided by spaces, and this is dynamic. you can have an integer stored in 128b and another integer that occupies 256b in memory.
An example would be the following:
Before saving the data, the virtual machine in machine language must validate if there is enough space for the hardware or Virtual RAM to save the information; avoiding a memory overflow and a system crash. python doesn't request this; this is executed internally between the virtual machine, the machine language and the hardware, the virtual machine will cause an exception or type of error if it detects that it does not have the capacity (I don't know how rare it happens).
and it's called Bytecode
So as far as I understand CPython doesn't do JIT out of the box, but you can get a simple JIT for CPython with numba.
Also, there is at least one implementation of the Python language with JIT out of the box: Pypy or Shedskin or pyrex.
(A "JIT" converts byte code to machine code to improve performance)
I clarify: I am not an expert in Python, but it is important that we know the theory of how our programming languages work and part or all of their "background" and not just writing the code.
Related links:
https://leanpub.com/insidethepythonvirtualmachine/read
https://www.geeksforgeeks.org/internal-working-of-python/
https://stackoverflow.com/a/61892575/4717133
https://towardsdatascience.com/ understanding-python-bytecode-e7edaae8734d
https://caiocozza-art.medium.com/a-quick-overview-of-the-python-virtual-machine-pt-1-315e74c036f4
https://doc.pypy.org/ in/latest/interpreter.html