Open Addressing II

The final probing technique for resolving collisions that we'll consider is double hashing, which actually uses two different hash functions.

Double hashing

In double hashing, we use two hash functions instead of one. When there is a collision when using our first hash function, h1(x), we'll add in increasingly larger factors of our second function, h2(x). The probe sequence is as follows:

h1(x), h1(x) + 1*h2(x), h1(x) + 2*h2(x), h1(x) + 3*h2(x), ...

Because we are repeatedly adding in multiples of h2(x), we must pick an h2 that can never output 0 -- otherwise, the probe sequence would not change.

Watch the video below to see how double hashing uses two different hash functions to find an empty space in the table:

With double hashing, we get the best of both worlds: a probing scheme that spreads the keys throughout the table like quadratic probing does, and is also guaranteed to find an open position (if one exists, and the table size is a prime number) like linear probing does.

Analysis of open addressing

In many cases, looking up a key in an open addressing hash table is O(1), especially if there are no collisions. However, when there are collisions, we need to utilize one of the probing algorithms, which can inflate the time complexity.

Worst case analysis

In the worst case, the table is full and the key we are looking for (or inserting) is not yet in the hash table. In that case, the probing algorithms will lead us to probe all $$m$$ slots in the table to ensure that the key is not present, which is O(m). This is because in the worst case, we hash a key to a slot in the hash table, but as we probe through the table, we continue to find full slots until we have checked every slot.

Average case analysis

In the average case, when hashing a key to a slot in the table, we might encounter some small number of collisions during the probing algorithm. However, as long as the load factor is not too high (say, $$\alpha < 1/2$$), we will be able to find the key (or be sure it's not present) within a constant number of probes. Therefore, similar to separate chaining, under decent conditions (a good hash function and a table that's not too full), we can expect performance that approaches O(1).