Snapshot

File-Based Snapshot vs. Object-Based Snapshot

The notion of "Snapshot" in the original paper is close to a physical file consisting of multiple chunks. However, that is not practical in real-world deployment; if you use a back-end database as a state machine, each snapshot will most likely be huge so sending and installing snapshots takes a long time. So then we will end up with one of two issues:

The database file cannot be modified while you are transferring the file itself, which blocks the commit of the state machine. OR,
A snapshot can be a separate physical clone of the state machine. In such a case, taking a snapshot becomes a super expensive operation. Moreover, each snapshot occupies disk space as big as the original state machine.

If the back-end database supports its own logical snapshot and proper isolation, we are happily willing to use it. To support such a concept, we provide a more generalized form of a snapshot, i.e., object-based snapshot.

In this library, a snapshot consists of one or more logical objects, where each object has a unique object ID that starts from 0. The definition of an object depends on how you define a snapshot; a snapshot may have a single object or many. Note that object ID does not need to be consecutive or even ordered, but object ID 0 should always exist as a starting point.

You can still support a physical file-based snapshot by using the object-based snapshot: just associate each chunk with an object.

Snapshot Transmission

Below diagram shows the overall protocol of snapshot transmission:

Leader      Follower
|           |
X           |   read_logical_snp_obj( obj_id = 0 )
|           |       => returns data D_0
X---------->|   send {0, D_0}
|           X   save_logical_snp_obj( obj_id = 0, D0 )
|           |       => returns obj_id = X_1
|<----------X   request obj_id = X_1
X           |   read_logical_snp_obj( obj_id = X1 )
|           |       => returns data D_{X_1}
X---------->|   send {X_1, D_{X_1}}
|           X   save_logical_snp_obj( obj_id = X1, D_{X_1} )
|           |       => returns obj_id = X_2
|<----------X   request obj_id = X_2
     ...
X           |   read_logical_snp_obj( obj_id = Y )
|           |       => returns data D_Y, is_last_obj = true
X---------->|   send {Y, D_Y}
|           X   save_logical_snp_obj( obj_id = Y, D_Y )
|           X   apply_snapshot()
|           |

Once a new empty node joins the Raft group, or there is a lagging node whose last log is older than the last log compaction, Raft starts to send a snapshot. It first calls read_logical_snp_obj() with obj_id = 0 in leader's side. Then your state machine needs to return corresponding data (i.e., a binary blob) D_0, and {0, D_0} pair will be sent to the follower who receives the snapshot.

Once the follower receives the object, it invokes save_logical_snp_obj() with received object ID and data. Your state machine properly stores the received data, and then changes the given obj_id value to the next object ID. That new object ID will be sent to the leader, and the leader will invoke read_logical_snp_obj() with the new ID.

Above process will be repeated until read_logical_snp_obj() sets is_last_obj = true. Once is_last_obj is set to true, follower's Raft will call save_logical_snp_obj() with the given data, and then call apply_snapshot() where your implementation needs to replace the data in the state machine with the newly received one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snapshot_transmission.md

snapshot_transmission.md

Snapshot

File-Based Snapshot vs. Object-Based Snapshot

Snapshot Transmission

Files

snapshot_transmission.md

Latest commit

History

snapshot_transmission.md

File metadata and controls

Snapshot

File-Based Snapshot vs. Object-Based Snapshot

Snapshot Transmission