Lifecycle Stage | Maturity | Status | Latest Revision |
---|---|---|---|
3A | Recommendation | Active | r2, 2020-03-12 |
Authors: @vyzo Editor: @yusefnapora Interest Group: @yusefnapora, @raulk, @whyrusleeping, @Stebalien, @jamesray1, @vasco-santos, @daviddias, @yiannisbot
See the lifecycle document for context about maturity level and spec status.
This is the specification for an extensible pubsub protocol over libp2p, based on randomized topic meshes and gossip. It is a general purpose pubsub protocol with moderate amplification factors and good scaling properties. The protocol is designed to be extensible by more specialized routers, which may add protocol messages and gossip in order to provide behavior optimized for specific application profiles.
Contents
- Motivations and Prior Work
- gossipsub: The gossiping mesh router
- Dependencies
- Parameters
- Router State
- Topic Membership
- Control Messages
- Message Processing
- Control Message Piggybacking
- Heartbeat
- Protobuf
The libp2p pubsub interface specification defines the RPC messages exchanged by peers, but deliberately does not define routing semantics, connection management, or other specifics of how peers interact. This is left to specific pubsub protocols, allowing a great deal of flexibility in protocol design to support different use cases.
Before introducing gossipsub itself,
let's first examine the properties of floodsub
, the simplest pubsub
implementation.
The initial implementation of the pubsub interface was floodsub
, which adopts a
very simple message propagation strategy - it simply "floods" the network by
having every peer broadcast to every other peer they know about in a given
topic.
With flooding, routing is almost trivial: for each incoming message, forward to all known peers in the topic. There is a bit of logic, as the router maintains a timed cache of previous messages, so that seen messages are not further forwarded. It also never forwards a message back to the source or the peer that forwarded the message.
The floodsub routing strategy has the following highly desirable properties:
- It is straightforward to implement.
- It minimizes latency; messages are delivered across minimum latency paths, so long as the overlay is sufficiently well connected.
- It is highly robust; there is very little maintenance logic or state to manage.
The problem however is that messages don't just follow the minimum latency paths; they follow all edges, thus creating a flood. The outbound degree of the network is unbounded, whereas we want it to be bounded in order to reduce bandwidth requirements and increase decentralization and scalability.
This unbounded outbound degree creates a problem for individual densely connected nodes, as they may have a large number of connected peers and cannot afford the bandwidth to forward all these pubsub messages. Similarly, the amplification factor is only bounded by the sum of degrees of all nodes in the overlay, which creates a scaling problem for densely connected overlays at large.
gossipsub addresses the key shortcomings of floodsub by imposing an upper bound on the outbound degree of each peer and globally controlling the amplification factor.
In order to do so, gossipsub peers form an overlay mesh, in which each peer forwards messages to a subset of its peers, rather than all known peers in a topic. The mesh is constructed by peers as they join a pubsub topic, and it is maintained over time by the exchange of control messages.
The initial construction of the mesh is random. When a peer joins a new topic,
it will examine its local state to find other peers that it
knows to be members of the topic. It will then select a subset of the topic
members, up to a maximum of D
, which is a configurable
parameter representing the desired degree of the network. These
will be added to the mesh for that topic, and the newly added peers will be
notified with a GRAFT
control message.
Upon leaving a topic, a peer will notify the members of its mesh with a PRUNE
message and remove the mesh from its local state.
Further maintenance is performed periodically as part of
the heartbeat procedure to keep the mesh size within acceptable
bounds as peers come and go.
Mesh links are bidirectional - when a peer receives a GRAFT
message informing
them that they have been added to another peer's mesh, they will in turn add the
peer to their own mesh, assuming they are still subscribed to the topic. In
steady state (after message processing), if a peer A
is
in the mesh of peer B
, then peer B
is also in the mesh of peer A
.
To allow peers to "reach beyond" their mesh view of a topic, we use gossip to propagate metadata about the message flow throughout the network. This gossip is emitted to a random subset of peers who are not in the mesh. We can think of the mesh members as "full message" peers, to whom we propagate the full content of all messages received in a topic. The remaining peers we're aware of in a topic can be considered "metadata-only" peers, to whom we emit gossip at regular intervals.
The metadata can be arbitrary, but as a baseline, we send the IHAVE
message, which includes the message ids of messages we've seen in the
last few seconds. These messages are cached, so that peers receiving the gossip
can request them using an IWANT
message.
The router can use this metadata to improve the mesh, for instance an episub router built on top of gossipsub can create epidemic broadcast trees, suitable for use cases in which a relatively small set of publishers broadcasts to a much larger audience.
Other possible uses for gossip include restarting message transmission at different points in the overlay to rectify downstream message loss, or accelerating message transmission to peers who may be at some distant in the mesh by opportunistically skipping hops.
Pubsub is designed to fit into the libp2p "ecosystem" of modular components that serve complementary purposes. As such, some key functionality is assumed to be present and is not specified as part of pubsub itself.
Before peers can exchange pubsub messages, they must first become aware of each others' existence. There are several peer discovery mechanisms available, for example: MulticastDNS for local area networks, random-walks through the libp2p-kad-dht, rendezvous protocols, and any other that conforms with the Peer Discovery interface for libp2p.
As peer discovery is broadly useful and not specific to pubsub, neither the pubsub interface spec nor this document prescribe a particular discovery mechanism. Instead, this function is assumed to be provided by the environment. A pubsub-enabled libp2p application must also be configured with a peer discovery mechanism, which will send ambient connection events to inform other libp2p subsystems (such as pubsub) of newly connected peers.
Whenever a new peer is connected, the gossipsub implementation checks to see if the peer implements floodsub and/or gossipsub, and if so, it sends it a hello packet that announces the topics that it is currently subscribing to.
This section lists the configurable parameters that control the behavior of gossipsub, along with a short description and reasonable defaults. Each parameter is introduced with full context elsewhere in this document.
Parameter | Purpose | Reasonable Default |
---|---|---|
D |
The desired outbound degree of the network | 6 |
D_low |
Lower bound for outbound degree | 4 |
D_high |
Upper bound for outbound degree | 12 |
D_lazy |
(Optional) the outbound degree for gossip emission | D |
heartbeat_interval |
Time between heartbeats | 1 second |
fanout_ttl |
Time-to-live for each topic's fanout state | 60 seconds |
mcache_len |
Number of history windows in message cache | 5 |
mcache_gossip |
Number of history windows to use when emitting gossip | 3 |
seen_ttl |
Expiry time for cache of seen message ids | 2 minutes |
Note that D_lazy
is considered optional. It is used to control the outbound
degree when emitting gossip, which may be tuned separately
than the degree for eager message propagation. By default, we simply use D
for
both.
The router keeps track of some necessary state to maintain stable topic meshes and emit useful gossip.
The state can be roughly divided into two categories: peering state, and state related to the message cache.
Peering state is how the router keeps track of the pubsub-capable peers it's aware of and the relationship with each of them.
There are three main pieces of peering state:
-
peers
is a set of ids of all known peers that support gossipsub or floodsub. Throughout this documentpeers.gossipsub
will denote peers supporting gossipsub, whilepeers.floodsub
denotes floodsub peers. -
mesh
is a map of subscribed topics to the set of peers in our overlay mesh for that topic. -
fanout
, likemesh
, is a map of topics to a set of peers, however, thefanout
map contains topics to which we are not subscribed.
In addition to the gossipsub-specific state listed above, the libp2p pubsub
framework maintains some "router-agnostic" state. This includes the set of
topics to which we are subscribed, as well as the set of topics to which each of
our peers is subscribed. Elsewhere in this document, we refer to
peers.floodsub[topic]
and peers.gossipsub[topic]
to denote floodsub or
gossipsub capable peers within a specific topic.
The message cache (or mcache
), is a data structure that stores message IDs and
their corresponding messages, segmented into "history windows." Each window
corresponds to one heartbeat interval, and the windows are shifted during the
heartbeat procedure following gossip emission.
The number of history windows to keep is determined by the mcache_len
parameter, while the number of windows to examine when sending
gossip is controlled by mcache_gossip
.
The message cache supports the following operations:
mcache.put(m)
: adds a message to the current window and the cache.mcache.get(id)
: retrieves a message from the cache by its ID, if it is still present.mcache.get_gossip_ids(topic)
: retrieves the message IDs for messages in the most recent history windows, scoped to a given topic. The number of windows to examine is controlled by themcache_gossip
parameter.mcache.shift()
: shifts the current window, discarding messages older than the history length of the cache (mcache_len
).
We also keep a seen
cache, which is a timed least-recently-used cache of
message IDs that we have observed recently. The value of "recently" is
determined by the parameter seen_ttl
, with a reasonable default
of two minutes. This value should be chosen to approximate the propagation delay
in the overlay, within a healthy margin.
The seen
cache serves two purposes. In all pubsub implementations, we can
first check the seen
cache before forwarding messages to avoid wastefully
republishing the same message multiple times. For gossipsub in particular, the
seen
cache is used when processing an IHAVE
message sent by
another peer, so that we only request messages we have not already seen before.
In the go implementation, the seen
cache is provided by the pubsub framework
and is separate from the mcache
, however other implementations may wish to
combine them into one data structure.
The pubsub interface spec defines the baseline RPC message format used by all libp2p pubsub routers. As part of the RPC message, peers can include announcements regarding the topics they wish to subscribe to or unsubscribe from. These announcements are sent to all known pubsub-capable peers, regardless of whether we currently have any topics in common.
For this document, we assume that the underlying pubsub framework is responsible for sending the RPC messages announcing subscription changes. A gossipsub implementation that does not build upon an existing libp2p pubsub framework would need to implement those control RPC messages.
In addition to the SUBSCRIBE
/ UNSUBSCRIBE
events sent by the pubsub
framework, gossipsub must do additional work to maintain the mesh for the topic
it is joining or leaving. We will refer to the two topic membership operations
below as JOIN(topic)
and LEAVE(topic)
.
When the application invokes JOIN(topic)
, the router will form a topic mesh by
selecting up to D
peers from its local peering
state first examining the fanout
map. If there are peers in
fanout[topic]
, the router will move those peers from the fanout
map to
mesh[topic]
. If the topic is not in the fanout
map, or if fanout[topic]
contains fewer than D
peers, the router will attempt to fill mesh[topic]
with peers from peers.gossipsub[topic]
which is the set of all
gossipsub-capable peers it is aware of that are members of the topic.
Regardless of whether they came from fanout
or peers.gossipsub
, the router
will inform the new members of mesh[topic]
that they have been added to the
mesh by sending them a GRAFT
control message.
The application can invoke LEAVE(topic)
to unsubscribe from a topic. The
router will inform the peers in mesh[topic]
by sending them a PRUNE
control
message, so that they can remove the link from their own topic mesh.
After sending PRUNE
messages, the router will forget mesh[topic]
and delete
it from its local state.
Control messages are exchanged to maintain topic meshes and emit gossip. This section lists the control messages in the core gossipsub protocol, although it is worth noting that extensions to gossipsub (such as episub may define further control messages for their own purposes.
For details on how gossipsub routers respond to control messages, see Message Processing.
The protobuf schema for control messages is detailed in the Protobuf section.
The GRAFT
message grafts a new link in a topic mesh. The GRAFT
informs a peer
that it has been added to the local router's mesh view for the included topic id.
The PRUNE
message prunes a mesh link from a topic mesh. PRUNE
notifies a
peer that it has been removed from the local router's mesh view for the
included topic id.
The IHAVE
message is emitted as gossip. It provides the
remote peer with a list of messages that were recently seen by the local router.
The remote peer may then request the full message content with an IWANT
message.
The IWANT
message requests the full content of one or more messages whose IDs
were announced by a remote peer in an IHAVE
message.
Upon receiving a message, the router will first process the message payload.
Payload processing will validate the message according to application-defined
rules and check the seen
cache to determine if the message
has been processed previously. It will also ensure that it was not the source of
the message; if the router receives a message that it published itself, it will
not forward it further.
If the message is valid, was not published by the router itself, and has not
been previously seen, the router will forward the message. First, it will
forward the message to every peer in peers.floodsub[topic]
for
backwards-compatibility with floodsub. Next,
it will forward the message to every peer in its local gossipsub topic mesh,
contained in mesh[topic]
.
After processing the message payload, the router will process the control messages:
-
On receiving a
GRAFT(topic)
message, the router will check to see if it is indeed subscribed to the topic identified in the message. If so, the router will add the sender tomesh[topic]
. If the router is no longer subscribed to the topic, it will respond with aPRUNE(topic)
message to inform the sender that it should remove its mesh link. -
On receiving a
PRUNE(topic)
message, the router will remove the sender frommesh[topic]
. -
On receiving an
IHAVE(ids)
message, the router will check it'sseen
cache. If theIHAVE
message contains message IDs that have not been seen, the router will request them with anIWANT
message. -
On receiving an
IWANT(ids)
message, the router will check itsmcache
and will forward any requested messages that are present in themcache
to the peer who sent theIWANT
message.
Apart from forwarding received messages, the router can of course publish messages on its own behalf, which originate at the application layer. This is very similar to forwarding received messages:
- First, the message is sent to every peer in
peers.floodsub[topic]
. - If the router is subscribed to the topic, it will send the message to all
peers in
mesh[topic]
. - If the router is not subscribed to the topic, it will examine the set of peers
in
fanout[topic]
. If this set is empty, the router will choose up toD
peers frompeers.gossipsub[topic]
and add them tofanout[topic]
. Assuming there are now some peers infanout[topic]
, the router will send the message to each.
Gossip and other control messages do not have to be transmitted in their own message. Instead, they can be coalesced and piggybacked on any other message in the regular flow, for any topic. This can lead to message rate reduction whenever there is some correlated flow between topics, which can be significant for densely connected peers.
For piggyback implementation details, consult the Go implementation.
Each peer runs a periodic stabilization process called the "heartbeat procedure"
at regular intervals. The frequency of the heartbeat is controlled by the
parameter heartbeat_interval
, with a reasonable default of 1
second.
The heartbeat serves three functions: mesh maintenance, fanout maintenance, and gossip emission.
Topic meshes are maintained with the following stabilization algorithm:
for each topic in mesh:
if |mesh[topic]| < D_low:
select D - |mesh[topic]| peers from peers.gossipsub[topic] - mesh[topic]
; i.e. not including those peers that are already in the topic mesh.
for each new peer:
add peer to mesh[topic]
emit GRAFT(topic) control message to peer
if |mesh[topic]| > D_high:
select |mesh[topic]| - D peers from mesh[topic]
for each new peer:
remove peer from mesh[topic]
emit PRUNE(topic) control message to peer
The parameters of the algorithm are:
D
: the desired outbound degree of the networkD_low
: an acceptable lower threshold forD
. If there are fewer thanD_low
peers in a given topic mesh, we attempt to add new peers.D_high
: an acceptable upper threshold forD
. If there are more thanD_high
peers in a given topic mesh, we randomly select peers for removal.
The fanout
map is maintained by keeping track of the last published time for
each topic. If we do not publish any messages to a topic within a configurable
TTL, the fanout state for that topic is discarded.
We also try to ensure that each fanout[topic]
set has at least D
members.
The fanout maintenance algorithm is:
for each topic in fanout:
if time since last published > fanout_ttl
remove topic from fanout
else if |fanout[topic]| < D
select D - |fanout[topic]| peers from peers.gossipsub[topic] - fanout[topic]
add the peers to fanout[topic]
The parameters of the algorithm are:
D
: the desired outbound degree of the network.fanout_ttl
: the time for which we keep the fanout state for each topic. If we do not publish to a topic withinfanout_ttl
, thefanout[topic]
set is discarded.
Gossip is emitted to a random selection of peers for each topic that are not already members of the topic mesh:
for each topic in mesh+fanout:
let mids be mcache.get_gossip_ids(topic)
if mids is not empty:
select D peers from peers.gossipsub[topic]
for each peer not in mesh[topic] or fanout[topic]
emit IHAVE(mids)
shift the mcache
Note that we use the same parameter D
as the target degree for both gossip and
mesh membership, however this is not normative. A separate parameter D_lazy
can be used to explicitly control the gossip propagation factor, which allows
for tuning the tradeoff between eager and lazy transmission of messages.
The gossipsub protocol extends the existing RPC
message
structure with a new field, control
. This is an instance of
ControlMessage
which may contain one or more control messages.
The four control messages are ControlIHave
for IHAVE
messages,
ControlIWant
for IWANT
messages, ControlGraft
for
GRAFT
messages and ControlPrune
for PRUNE
messages.
The protobuf is as follows:
message RPC {
// ... see definition in pubsub interface spec
optional ControlMessage control = 3;
}
message ControlMessage {
repeated ControlIHave ihave = 1;
repeated ControlIWant iwant = 2;
repeated ControlGraft graft = 3;
repeated ControlPrune prune = 4;
}
message ControlIHave {
optional string topicID = 1;
repeated string messageIDs = 2;
}
message ControlIWant {
repeated string messageIDs = 1;
}
message ControlGraft {
optional string topicID = 1;
}
message ControlPrune {
optional string topicID = 1;
}