Open Addressing II

The final probing technique for resolving collisions that we'll consider is double hashing, which actually uses two different hash functions.

Double hashing

In double hashing, we use two hash functions instead of one. When there is a collision when using our first hash function, h1(x), we'll add in increasingly larger factors of our second function, h2(x). The probe sequence is as follows:

h1(x), h1(x) + 1*h2(x), h1(x) + 2*h2(x), h1(x) + 3*h2(x), ...

Because we are repeatedly adding in multiples of h2(x), we must pick an h2 that can never output 0 -- otherwise, the probe sequence would not change.

Watch the video below to see how double hashing uses two different hash functions to find an empty space in the table:

With double hashing, we get the best of both worlds: a probing scheme that spreads the keys throughout the table like quadratic probing does, and is also guaranteed to find an open position (if one exists, and the table size is a prime number) like linear probing does.

Check your understanding

Consider the following open addressing hash table:

'anaconda'

'bat'

'bison'

'elephant'

Assume that the hash function h(x) = index related to the first character of x.

What would the sequence of probed positions be when trying to look up 'bear' using linear probing?
What would the sequence of probed positions be when trying to look up 'elephant' using quadratic probing?
What would the sequence of probed positions be when trying to look up 'ant' using double hashing with h2(x) = length of x?

Click to reveal the answer.

Answer.

h('bear') = 1, which is occupied but not the key we're looking for, so we try h('bear') + 1 = 2 which is also occupied but not the key we're looking for. We then find an empty position at h('bear') + 2 = 3, so we then know that 'bear' is not in the table.
h('elephant') = 4, which is occupied with the value we're looking for, so the lookup is complete. For the record, if it had been occupied with a different value, we would have resumed the search at h('elephant') + 1² = 5 % 5 = 0, which would also have been occupied with something other than the key we're looking for, so we would have needed to continue searching.
h('ant') = 0, which is occupied but not the key we're looking for, so we try h('ant') + 1*h2('ant') = 0 + 3 = 3 which is empty. Therefore, we know that 'ant' is not in the table.

Analysis of open addressing

In many cases, looking up a key in an open addressing hash table is O(1), especially if there are no collisions. However, when there are collisions, we need to utilize one of the probing algorithms, which can inflate the time complexity.

Worst case analysis

In the worst case, the table is full and the key we are looking for (or inserting) is not yet in the hash table. In that case, the probing algorithms will lead us to probe all $$m$$ slots in the table to ensure that the key is not present, which is O(m). This is because in the worst case, we hash a key to a slot in the hash table, but as we probe through the table, we continue to find full slots until we have checked every slot.

Average case analysis

In the average case, when hashing a key to a slot in the table, we might encounter some small number of collisions during the probing algorithm. However, as long as the load factor is not too high (say, $$\alpha < 1/2$$), we will be able to find the key (or be sure it's not present) within a constant number of probes. Therefore, similar to separate chaining, under decent conditions (a good hash function and a table that's not too full), we can expect performance that approaches O(1).

Course Name

Open Addressing II

Double hashing

Analysis of open addressing

Worst case analysis

Average case analysis