In general we are used to seeing that collection type objects, I am talking about lists, arrays, matrices, recordsets, or whatever they are called in any language, are "indexed" starting at position 0. However, in R
, for some decision By design, any objects (in fact R
there are no scalar objects) are "indexed" starting at 1.
> vector <- c(1,2,3)
> vector[1]
[1] 1
> vector[0]
numeric(0)
Historically speaking, what motivated this decision? Does it have any particular advantage over "indexing" starting at 0?
It is on this particular topic that the difference between the concept of "machine-oriented languages" and "human-oriented languages" is very clear. Of course it is somewhat arbitrary, because all languages are oriented to satisfy the need of humans to communicate with machines, however, the decision to start the indexes in
0
or in1
, clearly has to do with this. Let's go by parts:Why start from 0?
Undoubtedly, this has a direct relation to how contiguous areas of the physical memory of a machine are accessed, assuming an object of
n
elements, each one of the same size, access to each portion of memory that each element has to do is done having take into account the initial memory address plus an offset of a pointer in the area and the length of each element in memory, but to access the first, there is no other option than the offset is0
. Assuming we have an array of elements of a fixed8
byte size, which we know start at an absolute position689000
, accessing each element using a base index0
is highly transparent:This is the mechanics of most of the current languages, without a doubt the one that popularized this was
C
, but certainly there were previous languages that used this methodology (Case BCPL ) and those that came after it, reused it, we can see a more complete list here .And so ...? Why start from 1?
Certainly in many disciplines of Mathematics, to speak of the first element of a matrix or vector is to speak of element 1, in fact in our real life any first element of something is 1. Hence this abstraction, although it adds computational complexity having to do makes it
indice - 1
much more natural, particularly for humans who are not necessarily programmers.Returning to the case of
R
, this language is born from an older one calledS
which, in turn, was strongly influenced by the "father" of all languages:FORTAN
. And this one in particular, a math-oriented language, just likeR
(statistics in particular) uses by default1
as the basis for all data collection (actually you can use arbitrary indices as well). The advantage of this is that it is much more transparent to understand and handle arrays, vectors, matrices, etc., for any human in general, and particularly those who come from mathematical disciplines, it does not require any translation in this regard, to when transferring formulas to the language itself.Curiosities of
R
Surprisingly, it is possible to change the way elements are accessed from a 1-based mode to a 0-based one, globally, we can override the select-by-index function:
References