Sets maintain a collection of items in an intrinsic order based on a unique key, each item x
has key x.key
. Sets are generattion of set
and dictionary
.
Usages | Operations | Descriptions |
---|---|---|
Container | create(X) |
Given an iterable X , create set from items in X . |
size() |
The number of items. | |
Static | get(k) |
Return items with key k . |
set_at(i, x) |
Replace the i-th item with x . |
|
Dynamic | insert(x) |
Add item x (Replace if x.key exists). |
delete(k) |
Remove and return the i-th item. | |
Order | iterator() |
Return items one-by-one in key order. |
find_min() |
Return item with smallest key. | |
find_max() |
Return item with largest key. |
A dynamic set that supports the dictionary operations: insert(x)
, delete(k)
and get(k)
, and there are some to fulfill this:
- Direct access address: it's effificent to
get(k)
, however, it takes lots of space. - Hashing: We compute the array index from key.
- Chaining: It's a way to handle collisions for hashing.
We use some symbols for the whole topic:
U
: Universe, represents all the possible keys.T
: Table, the array we represent the dynamic set.Slot
:T[i]
, the position of the array.K
: Keys, the actual keys in the universe.
It works well when the universe U
of key is small, we use array, key as index, store the value corresponds to key index.
class DirectAccessAddressHashTable<T> {
private val table = ArrayOf<T>(UNIVERSE_SIZE)
fun get(key: Int): T? = table[key]
fun insert(value: T): table[key(value)] = value
fun delete(value: T) { table[key(value)] = null }
}
Each operations take O(1)
time, but it requires O(|U|)
size to save all possible keys.
The disadvantage of direct access table is obvious, the actual keys might be small relative to U
. To fix this, we can use hash function h(key)
to reduce space to Θ(|K|)
and it still takes only O(1)
for searching.
The hash function h(key)
calculates the hash value of key k
, maps the universe U
into the slots of a hash table T[i]
.
For this mechanism, we will face a problem: collision, that is, the two different keys may hash to the same slot.
There are two ways to resolve this:
- Chaining
- Open Address
In chaining, we store all values that have the same hash value in a linked list
insert(value: T)
: Insert to the head of listT[h(key(value))]
.get(key: Int)
: search key in listT[h(key)]
.delete(value: T)
: Delete from listT[h(key(value))]
.
The running time of insertion is O(1)
and deletion is also O(1)
if the list is doubly linked list. How about searching? It needs analysis based on some factor.
We assume that hash function calculation takes
O(1)
time.
It might be OK to skip this part.
Given a hash table T
with m
slots that stores n
elements, load factor n / m
(i.e. items size / table size
) defines the average number of elements stored in the chain of one slot.
The worst case running time is Θ(n)
when all elements storing to the same slot.
The average running time depends the list length of a slot, and the list length depends on how well the hash function distributes key among the slots, we assume that any element is equally likely to hash into any slots, independently of where any other element hash hashed to, it called simple uniform hashing, the list length will be n / m
mentioned above on average, so successful or unsuccessful search takes Θ(1 + n / m)
time. (Θ(1)
for hash function calculation).
If the number of slots m
is proportional to the number of elements n
, that is, n
= O(m)
, then n / m
will be 1, and Θ(1 + n / m)
will be O(1)
. The searching takes O(1)
constant time.
We store all element directly in the hash table itself and we keep looking for next empty slot to insert the key until we find it or the table is full. (called probe)
To determine which slots to probe, we extend the has function to include the probe number, i.e. h(key, probe)
where 0 <= probe
<= m
(slots)
class OpenAddress<T> {
private val table = ArrayOf<T>(UNIVERSE_SIZE)
fun insert(key: Int): Int {
var probe = 0
do {
val hash = h(key(value), probe)
if (table[hash] == null) {
table[hash] = value
return hash
}
probe++
} while (probe < table.size)
throw OverflowException()
}
fun get(key: Int): Int? {
var probe = 0
do {
var hash = h(key, probe)
if (table[hash] == key) {
return hash
}
probe++
} while (table[hash] == null || probe == table.size)
return null
}
}
- Linear Probling:
h(key, probe) = (h1(key) + probe) mod m
. - Quadratic Probling:
h(key, probe) = (h1(key) + c1 * probe + c2 * probe^2) mod m
. - Double Hashing:
h(key, probe) = (h1(key) + probe * h2(key)) mod m
.
Sample problems
TODO
val set = HashSet<Int>()
set.size
set.add(123)
set.contains(123)
set.isEmpty()
val map = HashMap<Int, Int>()
map.size
map.keys
map.values
map.isEmpty()
map.containsKey(123)
map[123] = 999
map.remove(123)
for (key in map.keys) { ... }
for (value in map.values) { ... }
for ((key, value) in map) { ... }
for (entry in map.entries) { ... }
map.forEach { key, value -> ... }
// Sort the map by value
val sortedMap = map.toList().sortedBy { (_, value) -> value }.toMap()
- CLRS
- MIT
- 基本資料結構系列文章
- LC Learn
Fundamental of Data Structure// Much as same as CLRS- CTCI
- Tech Interview Handbook
- Google Tech Dev Guide - Map/Dictionary
- Google Recuriter Recommended Problems List
- Code Interview University
- leetcode-master
- soft-eng-interview-prep
- Tech-Interview-Cheat-Sheet