As the question says, How does the HashSet work internally?
Seeing this code:
HashSet<String> a = new HashSet<>();
a.add("hola");
a.add("hola");
a.add("holaa");
a.add("hola");
Another question, how does the collection check that the element added to the collection is not duplicated?
As you ask only about
HashSet<T>
, well it's simple: this class delegates almost all of its functionality to aHashMap<T, Object>
, where the object you insert into itHashSet
will be the key of the internal map, and will be registered with the value of a default object inside the map. All the logic of verifying that the element is not repeated is handled by the internal map used in the set instance.Since a
Set
single serves as a wrapper for aMap
, since Java SE 6 you can create aSet
from any instance ofMap
viaCollections#newSetFromMap
.For a more background answer: how do you handle element equalities when performing
HashMap#put(llave, valor)
? You must first understand the structure that handlesHashMap
. It first defines a nodeNode<K,V>
to store the key/value pairs andHashMap
maintains an array of these nodes. These nodes implement a simple linked list . Since Java 8, the implementation has been changed so that as the map grows over time, as it contains many more values, then the nodes change their structure from a linked list to a red-black tree , which provides better performance.Here is an explanation of the internal algorithm used within
HashMap#put
:Object#hashCode
.HashMap
will not use the value of this hash directly, on this value it will apply another formula to reduce the value as much as possible.arreglo[hashInterno] == null
).If the box is not empty due to a hash collision: two or more objects produced the same hash value (does not mean that the two objects have the same result of
Object#hashCode
), the current node of the box is returned. We will call the nodeN
:4.1. If the key in
N
is equal to the new key viallaveActual.equals(nuevaLlave)
, the current value stored in is replacedN
with the new value.4.2. If the key in
N
is not equal to the new key to insert andN
it has the tree structure, then the tree nodes will be navigated to add the new key/value pair. The same premise is followed: if it finds a nodeNx
in the tree whose key is equal to the new via keyllaveActual.equals(nuevaLlave)
, the value in is replacedNx
. If at the end of the search a node with the same key is not found, a new node will be added to the tree.4.3. If the key in
N
is not equal to the new key and has the linked list structure, it will navigate through all the nodes in the linked list looking to see if any keys in the nodes are equal to the new key. If no such node is found, a new node will be added to the end of the list with the new key/value pair. After the insert, if the linked list has a value greater than an internal threshold (in HotSpot,TREEIFY_THRESHOLD
that is equal to 8) then the linked list will become a red-black tree.Checks if the node array should increase in size using an internally configured threshold. If so, it increases it.
As per the previous explanation,
HashMap
it will navigate through the necessary nodes (does not mean that it will navigate through all the elements) to check if a node with key equal to the new via key existsObject#equals
. If it does not exist and all searches have been exhausted, it will add a new node with the key/value information.It has certain characteristics...
Objects are not ordered
This type does not support duplicate values
Instead of put, use add
The hashset is used as a set, it is not of type (key, value)
It is often used when you want a unique list of objects.
I hope this was helpful!