Arrays

Up to this point, we have used Python lists as our main data structure. Lists may seem like a simple collection of data, but there's actually a simpler sequential data structure available for us to study: the array.

An array is a fixed-length data sequence that contains elements of a single type. The elements of an array are stored contiguously in memory (i.e., in adjacent memory locations). A sample array of four-byte integers (and some sample memory addresses) is shown below:

A visualization of an array as as sequence of ten boxes with integers inside, representing ten locations in memory. Each memory location has a memory address label.

Aside. Although Python doesn't have a built-in array type like some other languages (e.g., Java and C++), there is an array library:
import array
arr = array.array('i', [1, 2, 3]) # creates an array of three integers
print(arr)
However, we will not be using the array package in this class.

Accessing items in an array

A useful property of arrays is that they allow efficient random access of elements. In other words, we can access any element i of an array in O(1) time. Typically, programming languages use notation like arr[i] to get element i of an array named arr.

Note the requirement that the array only contains elements of a single type. This is actually what makes it possible for accessing arr[i] to be an O(1) operation. If we know the memory address of the start of the array and also know the size of each element, then we can use arithmetic to quickly compute the location of element i in memory:

memory_address(arr[i]) = memory_address(arr) + (i * array_item_size)

For example, each element in the above array is a four-byte integer, so we can quickly compute the memory address of the element at index 6:

memory_address(arr[6]) = 1040 + (6 * 4)
memory_address(arr[6]) = 1064

If the elements were of different types, they might be different sizes (e.g., eight-byte floating point numbers, 1 byte ASCII characters, etc.), and therefore we wouldn't be able to quickly compute the memory address of arr[i].

Adding items to an array

The main disadvantage of arrays is that they are fixed-length, so they cannot be easily resized. If you need to store more elements than the capacity of an array would allow, you would need to allocate a new, larger array and copy over all of the elements from the old array.

Python lists vs. arrays

You might at this point be thinking that arrays sound very similar to Python lists. They are indeed similar, but not exactly the same. Arrays are a more general concept, and are used across different programming languages. Python lists are -- of course -- specific to Python, and are actually a data structure that is implemented using arrays!

There are a couple of differences between arrays and Python lists:

Python lists can contain different data types
Python lists are extendable -- with a caveat!

Lists can contain different data types

In Python, you can create lists that have different types of data:

lst = ['hello', 1, False, 'world', 5.0]

But if lists use arrays under the hood, and arrays can only hold a single type of data, how is this possible? The trick is that in Python, all data types are objects. Therefore, Python lists are really arrays of references to objects:

An array of references to Python objects of various types, including a string, an integer, a boolean, and a float.

Each reference to an object is the same size, so we can still use simple arithmetic to compute the memory address of location i in a Python list.

Lists are "extendable"

Python lists can seemingly grow indefinitely. If we start with a collection of data in a list, we can later add elements to the list using append():

lst = [1, 2, 3]
lst.append(4)
lst.append(5)
...

lst.append(99)

But again -- how can this be true? If Python lists are implemented using arrays, and arrays have a fixed length, then how can we grow lists indefinitely?

The answer is that occasionally, append() needs to resize the underlying array in order to add another element. For example, if we create a Python list with three elements:

lst = [1, 2, 3]

Under the hood, Python might actually create an array that has enough capacity for six elements:

lst = [1, 2, 3]    # stored in an array [1, 2, 3, , , ]

The program would then be able to make three calls to append() before the underlying array is full. But on the fourth call to append(), the underlying array would need to be resized (e.g., by doubling its length) in order to accomodate the new element:

lst.append(4)     # array now [1, 2, 3, 4, , ]
lst.append(5)     # array now [1, 2, 3, 4, 5, ]
lst.append(6)     # array now [1, 2, 3, 4, 5, 6]
lst.append(7)     # array now [1, 2, 3, 4, 5, 6, 7, , , , , ]