Why Arrays Start at Zero

Have you ever wondered why arrays in programming always seem to start at zero? It’s more than just a quirk; it’s a convention deeply rooted in computer history, architecture, and memory management. Let’s peel back the onion on the historical reasonings of how this convention began.
What is an Array?
An array as defined by Wikipedia is:
In computer science, an array is a data structure consisting of a collection of elements (values or variables), of same memory size, each identified by at least one array index or key.
For those familiar with programming, this definition likely meets your understanding. However, I’d like to highlight a crucial point to revisit later: the importance of each array member having consistent memory size.
Manual Memory Management
A deep understanding of manual memory management is not required here, but I want to do a quick 10,000 foot view because it’s relevant to an implcit aspect of this definition. In low level languages, scalar data types like an integer all require a certain amount of memory to store. For example, an integer typically uses 4 bytes of memory while a single character only uses 1. This can vary between systems and architectures.
So if each data type has it’s own memory size, this tells us that an array having the same memory size for each member gives us the additional constraint of each member being the same data type. This is important to remember for later.
Arrays in High-Level Languages
I’m using PHP as an example here, but this applies to most high-level languages like JavaScript, Python, Java, etc.
$arr = ['Jody', "Johnny", "Jenny", "Jessie"];
This checks out. It’s a list of values of the same data type, they can all be stored in the same size in memory. But wait a second, this also works.
$arr = [1, "Jody", false];
Now I’m afraid we have some peas in our porridge. And PHP also supports an “associative array” that looks like this.
$arr = [
"name" => "Jody",
"age" => 32,
];
Now we are not only mixing data types, but this looks a lot like an object or a map doesn’t it? This is because what PHP calls an array is actually an ordered map
under the hood.
Arrays in Low-Level Languages
To contrast, let’s look at the idea of an array in a lower level language like C.
int ages[] = {32, 36, 41, 42, 45};
The syntax is almost identical, but there is different behavior occurring here.
In C, we explicitly define that our array will be of type int
and we are also implicitly defining that it has a size of five by giving it that many items rather than an explicit size. This is the most important detail.
The C compiler must know at initialization what the size of an array is going to be so that it can allocate the correct amount of memory. Each array item is stored in a contigious block of memory one after the other. The value of ages
in the above code will actually be a pointer to a memory address.
Pointers and Pointer Arithmetic
For the sake of this example, picture a single row in an Excel spreadsheet where each cell has a row and column label like A1. A pointer works in a very similar manager where the pointer is not your actual data, but the pointer is an identifer that “points” you to the memory address where your data lives.
This is the core reason why arrays are indexed from zero. We often think of arrays in terms of the Nth item and have internalized the N-1, but the reality is that the index is actually the offset from the address that the pointer is referencing.
To re-visit Excel, imagine that each value requires four columns (cells) to store each item in your array. Because the compiler knows this, it can tell you that your data starts at A1. Then if you add four more cells, you will have A5, which is the second item in our array.
Conclusion
So when you utilize arrayName[0]
, rather than the 0th or 1st item in the array, this is actually the value at the memory address directory with no offset. While an index or offset of one would be the next item in the block.
Most high-level languages abstract this away from you, and will give you lots of perkrs like mixed data or dynamic re-sizing behind the scenes. While you may not be consuming an array directly, the convention is still followed in most languages.