Through this question, I would like to know what is the difference between mutability and immutability in Python
, beyond the fact that some can be modified and others cannot . I pose the question for two reasons:
1-I am quite curious about the fact that some iterable objects in Python only support immutable objects (dictionaries, sets, etc...). I can come to understand in the case of dictionaries that it is for example so that the programmer does not accidentally change the key of some value, and subsequently does not know why he cannot access the dictionary, but what about sets, why don't support mutable objects?
2-In a part of the documentation , right at the beginning of the explanation of the dictionaries), it says the following...
The keys of a dictionary can be of almost any type. Values that are not hashable, such as values that contain lists, dictionaries, or other mutable types (that are compared by value, not by reference), cannot be used as keys."
This means that mutable objects are compared by value and not by reference, and immutable objects do exactly the opposite. And this makes sense when we look at Python's behavior in assigning values. When dealing with strings, Python checks if the value is in the internal string table , and we can see this in the following example:
>>> a = 'foo'
>>> b = 'foo'
>>> a is b
True
However, if we try it with any other data type, mutable or immutable, it doesn't work... (Note that in fact, when the strings are very long, the interpreter doesn't check the existence of underlying strings)
>>> a = 1
>>> b = 1
>>> a is b
True (ver modificacion)
>>> a = set([1])
>>> b = set([1])
>>> a is b
False
It's pretty clear that what happened to strings is due to pure chance (or that Python only searches for existing strings when the one being searched for is short??), since this only happens from time to time with strings and integers (see modification). The point is this: if immutable objects are compared by reference, why isn't there a larger reference passing system, in other words, why isn't the existence of underlying objects checked more often, precisely so that the comparison by reference be more exact .
Not to mention, that immutable objects are compared through references and not values, is not entirely clear...
>>> a = (1,2)
>>> b = (1,2)
>>> a == b
True
>>> a is b
False
>>>
>>> a = frozenset([1,2])
>>> b = frozenset([1,2])
>>> a == b
True
>>> a is b
False
Finally, the question would be : what is the difference between the administration of mutable and immutable objects in Python, except that some can be modified and others cannot?
Modification: in the example of the integers, a is b
since the value 10
is 'precreated' (like all numbers from -5 to 256), therefore when saying a = 10
, a
it will point to the 10
one that already exists and the same happens with b
, which generates that a is b
( reference ).
I think the main problem is according to the definition of variable categories and their differences. I am going to try to focus mainly on the sequences because they are usually the type of variables in which you can find the most differences.
Although initially they do not answer your questions, I believe that defining these concepts can help you to have a more complete vision of the panorama and to be able to delimit each category and what advantages they may imply.
Sequences can be classified based on several parameters, one of these parameters is the type of variables that it can contain, from this two types of sequences are defined:
From this classification, the only thing that is intended to be highlighted is the fact that flat sequences types can only contain one type of object, that is, the str type only contains characters while a container sequences can contain some nested structures in the same object, strings, integers, etc. The way in which a sequence can be of type container sequence is achieved conceptually through a structure similar to an array of pointers (in C), that is, a list is an array, where the first index points to the first object and this is where the difference in performance between these two types of sequences is mainly marked.
Until now and due to the aforementioned operation, flat sequences have mainly two advantages:
Marking these main differences, there is also the classification according to the mutability of the structures.
The main difference besides the fact that they can or cannot be modified is in their internal workings, for example the comparison between list and tuple is according to their performance, and this can be expressed in several ways:
float(f)
returns a reference to the same object.From the above, I want to emphasize that immutability does not only occur in sequences, it can also be spoken of variables of numerical type, however the comparison or the advantages that can be presented with this type of data are very specific to each circumstance, really it is not that strictly some category is compared by reference and another by value.
Based on the above, I am going to begin to clarify some points that you mentioned.
This is not entirely true, dictionaries and sets do not only accept immutable objects as keys, in fact any instance of a user-defined class can be used as a key and it does not have to be immutable. The requirement for an object to be used as a key is that it be hashable, this means that:
__hash__
that returns the hash code that represents the object. In case it is an instance of a user-defined class, this method returns the value of the functionid
by default, that is, the memory address of the object.__eq__
that must return a boolean value that determines if it is equal to another object. Each pair of objects returnedTrue
when performing an equality comparison with__eq__
, should also return the same hash code in their__hash__
. In case it is an instance of a user-defined class, this method returns the value of the functionid
by default, that is, the memory address of the object.The need for both methods to be defined is mainly due to how dictionaries work and how collisions are handled in their hash codes.
Last but not least...
The answer is no, regardless of whether the variable is mutable or immutable, you have two comparison methods.
__eq__
. Called by the operator==
and generally checks the equality of objects according to the values it containsis
. Compares two objects but by their reference (memory address).In short, it can be said that a variable is not a value or an object, it is a label or link to an object, several labels can point to the same object and when there are no labels pointing to an object or the only labels are from objects that are are pointing to each other, then these objects are removed by the interpreter using the
garbage collector
.So both types of objects can be compared either by value or by their reference(memory address) or label, the reason why you get this result:
It is due to optimizations made by python, these depend on the version of python and are defined in the style of... the first 256 numbers are cached in memory, which is why any variable (label that points to them) will refer to the same object in memory, that's why the comparison with
is
is true.A similar optimization occurs with strings but in both cases you should not assume that these optimizations are always performed, it may vary between versions.
Let's do something quick, first you have to know that it
is
compares if the variable points to the same object, but not if they contain the same value.Every time you assign a value to a variable, Python creates the object (mutable or immutable)
In the case of immutable objects, as their name says, they cannot change their value, if we have a variable
b=10
and then add it,1
python does not replace the variableb
a11
, but creates another object and then adds the values10+1
and finally creates another object and assigns the value11
to it, which does not happen with mutable objects. A very common example is to replace a position of a string, but being immutable, this will give you an error.Mistake
In the case of mutable data, if you can do that, such as lists:
Well now let's see what happens with the immutable data, according to your example:
This is totally correct and I show it to you with the following graph
The two objects are created, each one with its respective value, none of the variables points to the same object, so when using it,
is
it will return you ,False
but when using the purchase operator,==
the result will beTrue
. We can make the variable point to the same object by making the corresponding assignmentBut with strings something different happens because here the internal string table comes into play, this is like a list or container that stores the created stings and you create another variable whose value is found here, the reference of this value will be returned to you. Let's look at the graph
If we create the variables
a,b,c
wherea="foo"
,b="bar"
andc="foo"
all these strings will be stored in the table of internal strings where if there is already a string with the value that is assigned to another, the new variable is only made to point to the same string, at the end the variablesa
andc
point to the same object and for that reason.