-
Notifications
You must be signed in to change notification settings - Fork 36
/
format.go
111 lines (102 loc) · 4.92 KB
/
format.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
package caskdb
// format file provides encode/decode functions for serialisation and deserialisation
// operations
//
// format methods are generic and does not have any disk or memory specific code.
//
// The disk storage deals with bytes; you cannot just store a string or object without
// converting it to bytes. The programming languages provide abstractions where you
// don't have to think about all this when storing things in memory (i.e. RAM).
// Consider the following example where you are storing stuff in a hash table:
//
// books = {}
// books["hamlet"] = "shakespeare"
// books["anna karenina"] = "tolstoy"
//
// In the above, the language deals with all the complexities:
//
// - allocating space on the RAM so that it can store data of `books`
// - whenever you add data to `books`, convert that to bytes and keep it in the memory
// - whenever the size of `books` increases, move that to somewhere in the RAM so that
// we can add new items
//
// Unfortunately, when it comes to disks, we have to do all this by ourselves, write
// code which can allocate space, convert objects to/from bytes and many other operations.
//
// This file has two functions which help us with serialisation of data.
//
// encodeKV - takes the key value pair and encodes them into bytes
// decodeKV - takes a bunch of bytes and decodes them into key value pairs
//
//**workshop note**
//
//For the workshop, the functions will have the following signature:
//
// func encodeKV(timestamp uint32, key string, value string) (int, []byte)
// func decodeKV(data []byte) (uint32, string, string)
import "encoding/binary"
// headerSize specifies the total header size. Our key value pair, when stored on disk
// looks like this:
//
// ┌───────────┬──────────┬────────────┬─────┬───────┐
// │ timestamp │ key_size │ value_size │ key │ value │
// └───────────┴──────────┴────────────┴─────┴───────┘
//
// This is analogous to a typical database's row (or a record). The total length of
// the row is variable, depending on the contents of the key and value.
//
// The first three fields form the header:
//
// ┌───────────────┬──────────────┬────────────────┐
// │ timestamp(4B) │ key_size(4B) │ value_size(4B) │
// └───────────────┴──────────────┴────────────────┘
//
// These three fields store unsigned integers of size 4 bytes, giving our header a
// fixed length of 12 bytes. Timestamp field stores the time the record we
// inserted in unix epoch seconds. Key size and value size fields store the length of
// bytes occupied by the key and value. The maximum integer
// stored by 4 bytes is 4,294,967,295 (2 ** 32 - 1), roughly ~4.2GB. So, the size of
// each key or value cannot exceed this. Theoretically, a single row can be as large
// as ~8.4GB.
const headerSize = 12
// KeyEntry keeps the metadata about the KV, specially the position of
// the byte offset in the file. Whenever we insert/update a key, we create a new
// KeyEntry object and insert that into keyDir.
type KeyEntry struct {
// Timestamp at which we wrote the KV pair to the disk. The value
// is current time in seconds since the epoch.
timestamp uint32
// The position is the byte offset in the file where the data
// exists
position uint32
// Total size of bytes of the value. We use this value to know
// how many bytes we need to read from the file
totalSize uint32
}
func NewKeyEntry(timestamp uint32, position uint32, totalSize uint32) KeyEntry {
return KeyEntry{timestamp, position, totalSize}
}
func encodeHeader(timestamp uint32, keySize uint32, valueSize uint32) []byte {
header := make([]byte, headerSize)
binary.LittleEndian.PutUint32(header[0:4], timestamp)
binary.LittleEndian.PutUint32(header[4:8], keySize)
binary.LittleEndian.PutUint32(header[8:12], valueSize)
return header
}
func decodeHeader(header []byte) (uint32, uint32, uint32) {
timestamp := binary.LittleEndian.Uint32(header[0:4])
keySize := binary.LittleEndian.Uint32(header[4:8])
valueSize := binary.LittleEndian.Uint32(header[8:12])
return timestamp, keySize, valueSize
}
func encodeKV(timestamp uint32, key string, value string) (int, []byte) {
header := encodeHeader(timestamp, uint32(len(key)), uint32(len(value)))
data := append([]byte(key), []byte(value)...)
return headerSize + len(data), append(header, data...)
}
func decodeKV(data []byte) (uint32, string, string) {
timestamp, keySize, valueSize := decodeHeader(data[0:headerSize])
key := string(data[headerSize : headerSize+keySize])
value := string(data[headerSize+keySize : headerSize+keySize+valueSize])
return timestamp, key, value
}