How network addresses are encoded and used in libp2p
Lifecycle Stage | Maturity | Status | Latest Revision |
---|---|---|---|
3A | Recommendation | Active | r0, 2021-07-22 |
Authors: @yusefnapora
Interest Group: [@mxinden, @Stebalien, @raulk, @marten-seemann, @vyzo]
See the lifecycle document for context about the maturity level and spec status.
libp2p makes a distinction between a peer's identity and its location. A peer's identity is stable, verifiable, and valid for the entire lifetime of the peer (whatever that may be for a given application). Peer identities are derived from public keys as described in the peer id spec.
On a particular network, at a specific point in time, a peer may have one or more locations, which can be represented using addresses. For example, I may be reachable via the global IPv4 address of 198.51.100 on TCP port 1234.
In a system that only supported TCP/IP or UDP over IP, we could easily write our
addresses with the familiar <ip>:<port>
notation and store them as tuples of
address and port. However, libp2p was designed to be transport agnostic, which
means that we can't assume that we'll even be using an IP-backed network at all.
To support a growing set of transport protocols without special-casing each addressing scheme, libp2p uses multiaddr to encode network addresses for all supported transport protocols, in a self-describing manner.
This document does not cover the address format (multiaddr), but rather how multiaddr is used in libp2p. For details on the former, visit linked spec. For more information on other use cases, or to find links to multiaddr implementations in various languages, see the mulitaddr repository.
multiaddrs are used throughout libp2p for encoding network addresses. When addresses need to be shared or exchanged between processes, they are encoded in the binary representation of multiaddr.
When exchanging addresses, peers send a multiaddr containing both their network
address and peer id, as described in the section on the p2p
multiaddr.
A multiaddr is a sequence of instructions that can be traversed to some destination.
For example, the /ip4/198.51.100/tcp/1234
multiaddr starts with ip4
, which is
the lowest-level protocol that requires an address. The tcp
protocol runs on
top of ip4
, so it comes next.
The multiaddr above consists of two components, the /ip4/198.51.100
component,
and the /tcp/1234
component. It's not possible to split either one further;
/ip4
alone is an invalid multiaddr, because the ip4
protocol was defined to
require a 32 bit address. Similarly, tcp
requires a 16 bit port number.
Although we referred to /ip4/198.51.100
and /tcp/1234
as "components" of a
larger TCP/IP address, each is actually a valid multiaddr according to the
multiaddr spec. However, not every syntactically valid multiaddr is a
functional description of a process in the network. As we've seen, even a
simple TCP/IP connection requires composing two multiaddrs into one. See the
section on composing multiaddrs for information on how
multiaddrs can be combined, and the
Transport multiaddrs section for the combinations that
describe valid transport addresses.
The multiaddr protocol table contains all currently defined protocols and the length of their address components.
As shown above, protocol addresses can be composed within a multiaddr in a way that mirrors the composition of protocols within a networking stack.
The terms generally used to describe composition of multiaddrs are "encapsulation" and "decapsulation", and they essentially refer to adding and removing protocol components from a multiaddr, respectively.
A protocol is said to be "encapsulated within" another protocol when data from an "inner" protocol is wrapped by another "outer" protocol, often by re-framing the data from the inner protocol into the type of packets, frames or datagrams used by the outer protocol.
Some examples of protocol encapsulation are HTTP requests encapsulated within TCP/IP streams, or TCP segments themselves encapsulated within IP datagrams.
The multiaddr format was designed so that addresses encapsulate each other in
the same manner as the protocols that they describe. The result is an address
that begins with the "outermost" layer of the network stack and works
progressively "inward". For example, in the address /ip4/198.51.100/tcp/80/ws
,
the outermost protocol is IPv4, which encapsulates TCP streams, which in turn
encapsulate WebSockets.
All multiaddr implementations provide a way to encapsulate two multiaddrs into
a composite. For example, /ip4/198.51.100
can encapsulate /tcp/42
to become
/ip4/198.51.100/tcp/42
.
Decapsulation takes a composite multiaddr and removes an "inner" multiaddr from it, returning the result.
For example, if we start with /ip4/198.51.100/tcp/1234/ws
and decapsulate /ws
,
the result is /ip4/198.51.100/tcp/1234
.
It's important to note that decapsulation returns the original multiaddr up to
the last occurrence of the decapsulated multiaddr. This may remove more than
just the decapsulated component itself if there are more protocols encapsulated
within it. Using our example above, decapsulating either /tcp/1234/ws
or
/tcp/1234
from /ip4/198.51.100/tcp/1234/ws
will result in /ip4/198.51.100
. This is
unsurprising if you consider the utility of the /ip4/198.51.100/ws
address that
would result from simply removing the tcp
component.
libp2p defines the p2p
multiaddr protocol, whose address component is the
peer id of a libp2p peer. The text representation of a p2p
multiaddr looks like this:
/p2p/QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N
Where QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N
is the string representation
of a peer's peer ID derived from its public key.
By itself, a p2p
address does not give you enough addressing information to
locate a peer on the network; it is not a transport address. However, like the
ws
protocol for WebSockets, a p2p
address can be encapsulated
within another multiaddr.
For example, the above p2p
address can be combined with the transport address
on which the node is listening:
/ip4/198.51.100/tcp/1234/p2p/QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N
This combination of transport address plus p2p
address is the format in which
peers exchange addresses over the wire in the identify protocol
and other core libp2p protocols.
The p2p
multiaddr protocol was originally named ipfs
, and we've been eliminating
support for the ipfs string representation of this multiaddr component. It may be
printed as /ipfs/<peer-id>
instead of /p2p/<peer-id>
in its string representation
depending on the implementation in use. Both names resolve to the same protocol code,
and they are equivalent in the binary form.
Because multiaddr is an open and extensible format, it's not possible to
guarantee that any valid multiaddr is semantically meaningful or usable in a
particular network. For example, the /tcp/42
multiaddr, while valid, is not
useful on its own as a locator.
This section covers the types of multiaddr supported by libp2p transports. It's possible that this section will go out of date as new transport modules are developed, at which point pull-requests to update this document will be greatly appreciated.
Most libp2p transports use the IP protocol as a foundational layer, and as a result, most transport multiaddrs will begin with a component that represents an IPv4 or IPv6 address.
This may be an actual address, such as /ip4/198.51.100
or
/ip6/fe80::883:a581:fff1:833
, or it could be something that resolves to an IP
address, like a domain name.
libp2p will attempt to resolve "name-based" addresses into IP addresses. The current multiaddr protocol table defines four resolvable or "name-based" protocols:
protocol | description |
---|---|
dns |
Resolves DNS A and AAAA records into both IPv4 and IPv6 addresses. |
dns4 |
Resolves DNS A records into IPv4 addresses. |
dns6 |
Resolves DNS AAAA records into IPv6 addresses. |
dnsaddr |
Resolves multiaddrs from a special TXT record. |
When the /dns
protocol is used, the lookup may result in both IPv4 and IPv6
addresses, in which case IPv6 will be preferred. To explicitly resolve to IPv4
or IPv6 addresses, use the /dns4
or /dns6
protocols, respectively.
Note that in some restricted environments, such as inside a web browser, libp2p may not have access to the resolved IP addresses at all, in which case the runtime will determine what IP version is used.
When a name-based multiaddr encapsulates another multiaddr, only the name-based
component is affected by the lookup process. For example, if example.com
resolves to 192.0.2.0
, libp2p will resolve the address
/dns4/example.com/tcp/42
to /ip4/192.0.2.0/tcp/42
.
A libp2p-specific DNS-backed format, /dnsaddr
resolves addresses from a TXT
record associated with the _dnsaddr
subdomain of a given domain.
Note that this is different from dnslink, which uses
TXT
records to reference content addressed objects.
For example, resolving /dnsaddr/libp2p.io
will perform a TXT
lookup for
_dnsaddr.libp2p.io
. If the result contains entries of the form
dnsaddr=<multiaddr>
, the embedded multiaddrs will be parsed and used.
For example, asking the DNS server for the TXT records of one of the bootstrap
nodes, ams-2.bootstrap.libp2p.io
, returns the following records:
> dig +short _dnsaddr.ams-2.bootstrap.libp2p.io txt
"dnsaddr=/dns4/ams-2.bootstrap.libp2p.io/tcp/443/wss/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb"
"dnsaddr=/ip6/2604:1380:2000:7a00::1/tcp/4001/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb"
"dnsaddr=/ip4/147.75.83.83/tcp/4001/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb"
"dnsaddr=/ip6/2604:1380:2000:7a00::1/udp/4001/quic/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb"
"dnsaddr=/ip4/147.75.83.83/udp/4001/quic/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb"
"dnsaddr=/dns6/ams-2.bootstrap.libp2p.io/tcp/443/wss/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb"
The dnsaddr
lookup serves a similar purpose to a standard A-record DNS lookup,
however there are differences that can be important for some use cases. The most
significant is that the dnsaddr
entry contains a full multiaddr, which may
include a port number or other information that an A-record lacks, and it may
even specify a non-IP transport. Also, there are cases in which the A-record
already serves a useful purpose; using dnsaddr
allows a second "namespace" for
libp2p registrations.
The libp2p TCP transport is supported in all implementations and can be used wherever TCP/IP sockets are accessible.
Addresses for the TCP transport are of the form <ip-multiaddr>/tcp/<tcp-port>
,
where <ip-multiaddr>
is a multiaddr that resolves to an IP address, as
described in the IP and Name Resolution section.
The <tcp-port>
argument must be a 16-bit unsigned integer.
WebSocket connections are encapsulated within TCP/IP sockets, and the WebSocket multiaddr format mirrors this arrangement.
A libp2p WebSocket multiaddr is of the form <tcp-multiaddr>/ws
or
<tcp-multiaddr>/wss
(TLS-encrypted), where <tcp-multiaddr
> is a valid
mulitaddr for the TCP transport, as described above.
QUIC sessions are encapsulated within UDP datagrams, and the libp2p QUIC multiaddr format mirrors this arrangement.
A libp2p QUIC multiaddr is of the form <ip-multiaddr>/udp/<udp-port>/quic
,
where <ip-multiaddr>
is a multiaddr that resolves to an IP address, as
described in the IP and Name Resolution section.
The <udp-port>
argument must be a 16-bit unsigned integer in network byte order.
The libp2p circuit relay protocol allows a libp2p peer A to communicate with another peer B via a third party C. This is useful for circumstances where A and B would be unable to communicate directly.
Once a connection to the relay is established, peers can accept incoming
connections through the relay, using a p2p-circuit
address.
Like the ws
WebSocket multiaddr protocol the p2p-circuit
multiaddr does not
carry any additional address information. Instead it is composed with two other
multiaddrs to describe a relay circuit.
A full p2p-circuit
address that describes a relay circuit is of the form:
<relay-multiaddr>/p2p-circuit/<destination-multiaddr>
.
<relay-multiaddr>
is the full address for the peer relaying the traffic (the
"relay node").
The details of the transport connection between the relay node and the
destination peer are usually not relevant to other peers in the network, so
<destination-multiaddr>
generally only contains the p2p
address of the
destination peer.
A full example would be:
/ip4/192.0.2.0/tcp/5002/p2p/QmdPU7PfRyKehdrP5A3WqmjyD6bhVpU1mLGKppa2FjGDjZ/p2p-circuit/p2p/QmVT6GYwjeeAF5TR485Yc58S3xRF5EFsZ5YAF4VcP3URHt
Here, the destination peer has the peer id
QmVT6GYwjeeAF5TR485Yc58S3xRF5EFsZ5YAF4VcP3URHt
and is reachable through a
relay node with peer id QmdPU7PfRyKehdrP5A3WqmjyD6bhVpU1mLGKppa2FjGDjZ
running
on TCP port 5002 of the IPv4 loopback interface.