One of MongoDB's properties is that it is "schema-less" which means that I can have a document like
{nombre : 'diego'}
in the same collection where there is another document like
{nombre : 'diego2', edad : '35'}
I understand that the indexes in MongoDB are implemented using a b-tree just like in other databases, in which the nodes of the tree point to documents in the collection. Using the indexes a query does not need to do a full table scan of the collection when doing a find or sort operation that can use a certain index.
My question is, what happens to documents in the collection that do not have a field that is used in the index? For example, what happens if I try to index by "age" in a collection that has the two documents that I have put above?
The document that does not have the field is "lost" to the index tree? It tells me db.coleccion.createIndex({'edad' : 1})
that I can't create the index?
Well, it depends on the way you create the index. If you create it this way:
Documents that do not contain the field will also be indexed which may have a performance impact.
If you want to index a field that is not going to be present in all the documents in the collection then it would be best to use Sparse Indexes . But be aware of the following behavior:
Which means that if the result of a query is incomplete when using the index, MongoDB will not use it unless you explicitly say so using
cursor.hint()
.For example:
By using
{sparse: true}
in index creation, you make sure to index only those documents that have the field indexed.In MongoDB fields that do not exist in a document evaluate to
null
, and there is also an entry for them (for the valuenull
) in the B-Tree that implements that index).For example, if we insert into a collection the objects
and we do
db.coleccion.createIndex({ x : 1})
, then by doing adb.coleccion.find().sort({x : 1})
we will get the documents (I omit _ids for clarity and convenience) in order.Again,
That is, even if a document does not have a field, a value for that field is evaluated when creating an index on it.
Even taking things a bit further, if we create an index on the collection with the {unique : true} option we won't be able to have more than one document in the collection that doesn't have the field included in the index, since the index doesn't supports duplicate values for the field and two documents that did not have said field (evaluated to
null
) would be repeating the null value for the field (unless of course we used the sparse option and then only the docs with that field would be indexed and we could have more than a document that did not have the field despite having a unique index ).