Hash Table

Set Interface

Sets maintain a collection of items in an intrinsic order based on a unique key, each item x has key x.key. Sets are generattion of set and dictionary.

Usages	Operations	Descriptions
Container	`create(X)`	Given an iterable `X`, create set from items in `X`.
	`size()`	The number of items.
Static	`get(k)`	Return items with key `k`.
	`set_at(i, x)`	Replace the i-th item with `x`.
Dynamic	`insert(x)`	Add item `x` (Replace if `x.key` exists).
	`delete(k)`	Remove and return the i-th item.
Order	`iterator()`	Return items one-by-one in key order.
	`find_min()`	Return item with smallest key.
	`find_max()`	Return item with largest key.

A dynamic set that supports the dictionary operations: insert(x), delete(k) and get(k), and there are some to fulfill this:

Direct access address: it's effificent to get(k), however, it takes lots of space.
Hashing: We compute the array index from key.
Chaining: It's a way to handle collisions for hashing.

We use some symbols for the whole topic:

U: Universe, represents all the possible keys.

T: Table, the array we represent the dynamic set.

Slot: T[i], the position of the array.

K: Keys, the actual keys in the universe.

Direct Access Table

It works well when the universe U of key is small, we use array, key as index, store the value corresponds to key index.

class DirectAccessAddressHashTable<T> {
    private val table = ArrayOf<T>(UNIVERSE_SIZE)

    fun get(key: Int): T? = table[key]
    fun insert(value: T): table[key(value)] = value
    fun delete(value: T) { table[key(value)] = null }
}

Each operations take O(1) time, but it requires O(|U|) size to save all possible keys.

Hash Tables

The disadvantage of direct access table is obvious, the actual keys might be small relative to U. To fix this, we can use hash function h(key) to reduce space to Θ(|K|) and it still takes only O(1) for searching.

The hash function h(key) calculates the hash value of key k, maps the universe U into the slots of a hash table T[i].

For this mechanism, we will face a problem: collision, that is, the two different keys may hash to the same slot.

There are two ways to resolve this:

Chaining
Open Address

Chaining

In chaining, we store all values that have the same hash value in a linked list

insert(value: T): Insert to the head of list T[h(key(value))].
get(key: Int): search key in list T[h(key)].
delete(value: T): Delete from list T[h(key(value))].

The running time of insertion is O(1) and deletion is also O(1) if the list is doubly linked list. How about searching? It needs analysis based on some factor.

We assume that hash function calculation takes O(1) time.

Analysis of Searching

It might be OK to skip this part.

Given a hash table T with m slots that stores n elements, load factor n / m (i.e. items size / table size) defines the average number of elements stored in the chain of one slot.

The worst case running time is Θ(n) when all elements storing to the same slot.

The average running time depends the list length of a slot, and the list length depends on how well the hash function distributes key among the slots, we assume that any element is equally likely to hash into any slots, independently of where any other element hash hashed to, it called simple uniform hashing, the list length will be n / m mentioned above on average, so successful or unsuccessful search takes Θ(1 + n / m) time. (Θ(1) for hash function calculation).

If the number of slots m is proportional to the number of elements n, that is, n = O(m), then n / m will be 1, and Θ(1 + n / m) will be O(1). The searching takes O(1) constant time.

Open Addressing

We store all element directly in the hash table itself and we keep looking for next empty slot to insert the key until we find it or the table is full. (called probe)

To determine which slots to probe, we extend the has function to include the probe number, i.e. h(key, probe) where 0 <= probe <= m (slots)

class OpenAddress<T> {
    private val table = ArrayOf<T>(UNIVERSE_SIZE)

    fun insert(key: Int): Int {
        var probe = 0
        do {
            val hash = h(key(value), probe)
            if (table[hash] == null) {
                table[hash] = value
                return hash
            }
            probe++
        } while (probe < table.size)
        throw OverflowException()
    }

    fun get(key: Int): Int? {
        var probe = 0
        do {
            var hash = h(key, probe)
            if (table[hash] == key) {
                return hash
            }
            probe++
        } while (table[hash] == null || probe == table.size)
        return null
    }
}

Probing Methods

Linear Probling: h(key, probe) = (h1(key) + probe) mod m.
Quadratic Probling: h(key, probe) = (h1(key) + c1 * probe + c2 * probe^2) mod m.
Double Hashing: h(key, probe) = (h1(key) + probe * h2(key)) mod m.

Sample problems

705. Design HashSet

706. Design HashMap

Hash Table API

Python

TODO

Kotlin

`HashSet`

val set = HashSet<Int>()
set.size
set.add(123)
set.contains(123)
set.isEmpty()

`HashMap`

val map = HashMap<Int, Int>()
map.size
map.keys
map.values
map.isEmpty()
map.containsKey(123)
map[123] = 999
map.remove(123)

for (key in map.keys) { ... }
for (value in map.values) { ... }
for ((key, value) in map) { ... }
for (entry in map.entries) { ... }
map.forEach { key, value -> ... }

// Sort the map by value
val sortedMap = map.toList().sortedBy { (_, value) -> value }.toMap()

References

CLRS
MIT
基本資料結構系列文章
LC Learn
~~Fundamental of Data Structure~~ // Much as same as CLRS
CTCI
Tech Interview Handbook
Google Tech Dev Guide - Map/Dictionary
Google Recuriter Recommended Problems List
Code Interview University
leetcode-master
soft-eng-interview-prep
Tech-Interview-Cheat-Sheet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hash-table.md

hash-table.md

Hash Table

Set Interface

Direct Access Table

Hash Tables

Chaining

Analysis of Searching

Open Addressing

Probing Methods

Hash Table API

Python

Kotlin

`HashSet`

`HashMap`

References

Files

hash-table.md

Latest commit

History

hash-table.md

File metadata and controls

Hash Table

Set Interface

Direct Access Table

Hash Tables

Chaining

Analysis of Searching

Open Addressing

Probing Methods

Hash Table API

Python

Kotlin

HashSet

HashMap

References

`HashSet`

`HashMap`