diff --git a/autonat/README.md b/autonat/README.md new file mode 100644 index 000000000..63f376fba --- /dev/null +++ b/autonat/README.md @@ -0,0 +1,7 @@ +# NAT Discovery +> How we detect if we're behind a NAT. + + +Specifications: +- [autonat v1](autonat-v1.md) +- [autonat v2](autonat-v2.md) diff --git a/autonat/autonat-v1.md b/autonat/autonat-v1.md index 33763eb97..65c2eebce 100644 --- a/autonat/autonat-v1.md +++ b/autonat/autonat-v1.md @@ -1,6 +1,3 @@ -# NAT Discovery -> How we detect if we're behind a NAT. - | Lifecycle Stage | Maturity | Status | Latest Revision | |-----------------|----------------|--------|-----------------| | 3A | Recommendation | Active | r1, 2023-02-16 | diff --git a/autonat/autonat-v2-amplification-attack-prevention.plantuml b/autonat/autonat-v2-amplification-attack-prevention.plantuml new file mode 100644 index 000000000..c960d37ba --- /dev/null +++ b/autonat/autonat-v2-amplification-attack-prevention.plantuml @@ -0,0 +1,17 @@ +@startuml +participant Cli +participant Srv + +skinparam sequenceMessageAlign center +skinparam defaultFontName monospaced + + +== Amplification Attack Prevention == + +Cli -> Srv: [dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)} +Srv -> Cli: [dial] DialDataRequest:{addrIdx: 1, numBytes: 120k} +Cli -> Srv: [dial] {120k bytes} +Srv -> Cli: [attempt]addr2 DialAttempt:{nonce: 0xabcd} +Srv -> Cli: [dial] DialResponse:{status: OK, dialStatuses:(E_TRANSPORT_NOT_SUPPORTED, OK)} + +@enduml diff --git a/autonat/autonat-v2-amplification-attack-prevention.svg b/autonat/autonat-v2-amplification-attack-prevention.svg new file mode 100644 index 000000000..3081117fc --- /dev/null +++ b/autonat/autonat-v2-amplification-attack-prevention.svg @@ -0,0 +1 @@ +CliCliSrvSrvAmplification Attack Prevention[dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)}[dial] DialDataRequest:{addrIdx: 1, numBytes: 120k}[dial] {120k bytes}[attempt]addr2 DialAttempt:{nonce: 0xabcd}[dial] DialResponse:{status: OK, dialStatuses:(E_TRANSPORT_NOT_SUPPORTED, OK)} \ No newline at end of file diff --git a/autonat/autonat-v2.md b/autonat/autonat-v2.md new file mode 100644 index 000000000..ba1c0918a --- /dev/null +++ b/autonat/autonat-v2.md @@ -0,0 +1,299 @@ +# AutonatV2: spec + + +| Lifecycle Stage | Maturity | Status | Latest Revision | +|-----------------|--------------------------|--------|-----------------| +| 1A | Working Draft | Active | r2, 2023-04-15 | + +Authors: [@sukunrt] + +Interest Group: [@marten-seemann], [@marcopolo], [@mxinden] + +[@sukunrt]: https://github.com/sukunrt +[@marten-seemann]: https://github.com/marten-seemann +[@mxinden]: https://github.com/mxinden +[@marcopolo]: https://github.com/marcopolo + + +## Overview + +A priori, a node cannot know if it is behind a NAT / firewall or if it is +publicly reachable. Moreover, the node may be publicly reachable on some of its +addresses and not on others. Knowing reachability for its addresses is essential +for the node to be well-behaved in the network: A node doesn't need to advertise +its unreachable addresses to the rest of the network, preventing superfluous +dials from other peers. Furthermore, in case it has no publicly reachable +addresses, it might actively seek to improve its connectivity by finding a relay +server, which would allow other peers to establish a relayed connection. + +In `autonat v2` client sends a request with a priority ordered list of addresses +and a nonce. On receiving this request the server dials the first address in the +list that it is capable of dialing and provides the nonce. Upon completion of +the dial, the server responds to the client with the response containing the +dial outcome. + +As the server dials _exactly_ one address from the list, `autonat v2` allows +nodes to determine reachability for individual addresses. Using `autonat v2` +nodes can build an address pipeline where they can test individual addresses +discovered by different sources like identify, upnp mappings, circuit addresses +etc for reachability. Having a priority ordered list of addresses provides the +ability to verify low priority addresses. Implementations can generate low +priority address guesses and add them to requests for high priority addresses as +a nice to have. This is especially helpful when introducing a new transport. +Initially, such a transport will not be widely supported in the network. +Requests for verifying such addresses can be reused to get information about +other addresses + +The client can verify the server did successfully dial an address of the same +transport as it reported in the response by checking the local address of the +connection on which the nonce was received on. + +Compared to `autonat v1` there are three major differences +1. `autonat v1` allowed testing reachability for the node. `autonat v2` allows +testing reachability for an individual address. +2. `autonat v2` provides a mechanism for nodes to verify whether the peer +actually successfully dialled an address. +3. `autonat v2` provides a mechanism for nodes to dial an IP address different +from the requesting node's observed IP address without risking amplification +attacks. `autonat v1` disallowed such dials to prevent amplification attacks. + + +## AutoNAT V2 Protocol + +![Autonat V2 Interaction](autonat-v2.svg) + +A client node wishing to determine reachability of its addresses sends a +`DialRequest` message to a server on a stream with protocol ID +`/libp2p/autonat/2/dial-request`. Each `DialRequest` is sent on a new stream. + +This `DialRequest` message has a list of addresses and a fixed64 `nonce`. The +list is ordered in descending order of priority for verification. AutoNAT V2 is +primarily for testing reachability on Public Internet. Client SHOULD NOT send any +private address as defined in [RFC +1918](https://datatracker.ietf.org/doc/html/rfc1918#section-3) in the list. The Server SHOULD NOT dial any private address. + +Upon receiving this request, the server selects an address from the list to +dial. The server SHOULD use the first address it is willing to dial. The server +MUST NOT dial any address other than this one. If this selected address has an +IP address different from the requesting node's observed IP address, server +initiates the Amplification attack prevention mechanism (see [Amplification +Attack Prevention](#amplification-attack-prevention) ). On completion, the +server proceeds to the next step. If the selected address has the same IP +address as the client's observed IP address, server proceeds to the next step +skipping Amplification Attack Prevention steps. + +The server dials the selected address, opens a stream with Protocol ID +`/libp2p/autonat/2/dial-back` and sends a `DialBack` message with the nonce +received in the request. The client on receiving this message replies with +a `DialBackResponse` message with the status set to `OK`. The client MUST +close this stream after sending the response. The dial back response provides +the server assurance that the message was delivered so that it can close the +connection. + +Upon completion of the dial back, the server sends a `DialResponse` message to +the client node on the `/libp2p/autonat/2/dial-request` stream. The response +contains `addrIdx`, the index of the address the server selected to dial and +`DialStatus`, a dial status indicating the outcome of the dial back. The +`DialStatus` for an address is set according to [Requirements for +DialStatus](#requirements-for-dialstatus). The response also contains an +appropriate `ResponseStatus` set according to [Requirements For +ResponseStatus](#requirements-for-responsestatus). + +The client MUST check that the nonce received in the `DialBack` is the same as +the nonce it sent in the `DialRequest`. If the nonce is different, it MUST +discard this response. + +The server MUST close the stream after sending the response. The client MUST +close the stream after receiving the response. + + +### Requirements for DialStatus + +On receiving a `DialRequest`, the server first selects an address that it will +dial. + +If server chooses to not dial any of the requested addresses, `ResponseStatus` +is set to `E_DIAL_REFUSED`. The fields `addrIdx` and `DialStatus` are +meaningless in this case. See [Requirements For +ResponseStatus](#requirements-for-responsestatus). + +If the server selects an address for dialing, `addrIdx` is set to the +index(zero-based) of the address on the list and the `DialStatus` is set +according to the following consideration: + +If the server was unable to connect to the client on the selected address, +`DialStatus` is set to `E_DIAL_ERROR`, indicating the selected address is not +publicly reachable. + +If the server was able to connect to the client on the selected address, but an +error occured while sending an nonce on the `/libp2p/autonat/2/dial-back` +stream, `DialStatus` is set to `E_DIAL_BACK_ERROR`. This might happen in case of +resource limited situations on client or server, or when either the client or +the server is misconfigured. + +If the server was able to connect to the client and successfully send a nonce on +the `/libp2p/autonat/2/dial-back` stream, `DialStatus` is set to `OK`. + +### Requirements for ResponseStatus + +The `ResponseStatus` sent by the server in the `DialResponse` message MUST be +set according to the following requirements + +`E_REQUEST_REJECTED`: The server didn't serve the request because of rate +limiting, resource limit reached or blacklisting. + +`E_DIAL_REFUSED`: The server didn't dial back any address because it was +incapable of dialing or unwilling to dial any of the requested addresses. + +`E_INTERNAL_ERROR`: Error not classified within the above error codes occured on +server preventing it from completing the request. + +`OK`: the server completed the request successfully. A request is considered +completed successfully when the server either completes a dial(successfully or +unsuccessfully) or rejects all addresses in the request as undialable. + +Implementations MUST discard responses with status codes they do not understand. + +### Amplification Attack Prevention + +![Interaction](autonat-v2-amplification-attack-prevention.svg) + +When a client asks a server to dial an address that is not the client's observed +IP address, the server asks the client to send some non trivial amount of bytes +as a cost to dial a different IP address. To make amplification attacks +unattractive, servers SHOULD ask for 30k to 100k bytes. Since most handshakes +cost less than 10k bytes in bandwidth, 30kB is sufficient to make attacks +unattractive. + +On receiving a `DialRequest`, the server selects the first address it is capable +of dialing. If this selected address has a IP different from the client's +observed IP, the server sends a `DialDataRequest` message with the selected +address's index(zero-based) and `numBytes` set to a sufficiently large value on +the `/libp2p/autonat/2/dial-request` stream + +Upon receiving a `DialDataRequest` message, the client decides whether to accept +or reject the cost of dial. If the client rejects the cost, the client resets +the stream and the `DialRequest` is considered aborted. If the client accepts +the cost, the client starts transferring `numBytes` bytes to the server. The +client transfers these bytes wrapped in `DialDataResponse` protobufs where the +`data` field in each individual protobuf is limited to 4096 bytes in length. +This allows implementations to use a small buffer for reading and sending the +data. Only the size of the `data` field of `DialDataResponse` protobufs is +counted towards the bytes transferred. Once the server has received at least +numBytes bytes, it proceeds to dial the selected address. Servers SHOULD allow +the last `DialDataResponse` message received from the client to be larger than +the minimum required amount. This allows clients to serialize their +`DialDataResponse` message once and reuse it for all Requests. + + +If an attacker asks a server to dial a victim node, the only benefit the +attacker gets is forcing the server and the victim to do a cryptographic +handshake which costs some bandwidth and compute. The attacker by itself can do +a lot of handshakes with the victim without spending any compute by using the +same key repeatedly. The only benefit of going via the server to do this attack +is not spending bandwidth required for a handshake. So the prevention mechanism +only focuses on bandwidth costs. There is a minor benefit of bypassing IP +blocklists, but that's made unattractive by the fact that servers may ask 5x +more data than the bandwidth cost of a handshake. + +## Implementation Suggestions + +For any given address, client implementations SHOULD do the following +- Periodically recheck reachability status. +- Query multiple servers to determine reachability. + +The suggested heuristic for implementations is to consider an address reachable +if more than 3 servers report a successful dial and to consider an address +unreachable if more than 3 servers report unsuccessful dials. Implementations +are free to use different heuristics than this one + +Servers SHOULD NOT reuse their listening port when making a dial back. In case +the client has reused their listen port when dialing out to the server, not +reusing the listen port for attempts prevents accidental hole punches. Clients +SHOULD only rely on the nonce and not on the peerID for verifying the dial back +as the server is free to use a separate peerID for the dial backs. + +Servers SHOULD determine whether they have IPv6 and IPv4 connectivity. IPv4 only servers SHOULD refuse requests for dialing IPv6 addresses and IPv6 only +servers SHOULD refuse requests for dialing IPv4 addresses. + + +## RPC Messages + +All RPC messages sent over a stream are prefixed with the message length in +bytes, encoded as an unsigned variable length integer as defined by the +[multiformats unsigned-varint spec][uvarint-spec]. + +All RPC messages on stream `/libp2p/autonat/2/dial-request` are of type +`Message`. A `DialRequest` message is sent as a `Message` with the `msg` field +set to `DialRequest`. `DialResponse` and `DialDataRequest` are handled +similarly. + +On stream `/libp2p/autonat/2/dial-back`, a `DialAttempt` message is sent +directly + +```proto3 + +message Message { + oneof msg { + DialRequest dialRequest = 1; + DialResponse dialResponse = 2; + DialDataRequest dialDataRequest = 3; + DialDataResponse dialDataResponse = 4; + } +} + + +message DialRequest { + repeated bytes addrs = 1; + fixed64 nonce = 2; +} + + +message DialDataRequest { + uint32 addrIdx = 1; + uint64 numBytes = 2; +} + + +enum DialStatus { + UNUSED = 0; + E_DIAL_ERROR = 100; + E_DIAL_BACK_ERROR = 101; + OK = 200; +} + + +message DialResponse { + enum ResponseStatus { + E_INTERNAL_ERROR = 0; + E_REQUEST_REJECTED = 100; + E_DIAL_REFUSED = 101; + OK = 200; + } + + ResponseStatus status = 1; + uint32 addrIdx = 2; + DialStatus dialStatus = 3; +} + + +message DialDataResponse { + bytes data = 1; +} + + +message DialBack { + fixed64 nonce = 1; +} + +message DialBackResponse { + enum DialBackStatus { + OK = 0; + } + + DialBackStatus status = 1; +} +``` + +[uvarint-spec]: https://github.com/multiformats/unsigned-varint + diff --git a/autonat/autonat-v2.plantuml b/autonat/autonat-v2.plantuml new file mode 100644 index 000000000..2247bebe9 --- /dev/null +++ b/autonat/autonat-v2.plantuml @@ -0,0 +1,20 @@ +@startuml +participant Cli +participant Srv + +skinparam sequenceMessageAlign center +skinparam defaultFontName monospaced + + +== Dial Request Success== + +Cli -> Srv: [dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)} +Srv -> Cli: [attempt]addr2 DialAttempt:{nonce: 0xabcd} +Srv -> Cli: [dial] DialResponse:{status: OK, dialStatuses:(E_TRANSPORT_NOT_SUPPORTED, OK)} + +== Dial Request Failure== + +Cli -> Srv: [dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)} +Srv ->x Cli: [attempt]addr2 DialAttempt:{nonce: 0xabcd} +Srv -> Cli: [dial] DialResponse:{status: OK, dialStatuses:(E_TRANSPORT_NOT_SUPPORTED, E_DIAL_ERROR)} +@enduml \ No newline at end of file diff --git a/autonat/autonat-v2.svg b/autonat/autonat-v2.svg new file mode 100644 index 000000000..adf10c680 --- /dev/null +++ b/autonat/autonat-v2.svg @@ -0,0 +1 @@ +CliCliSrvSrvDial Request Success[dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)}[attempt]addr2 DialAttempt:{nonce: 0xabcd}[dial] DialResponse:{status: OK, dialStatuses:(E_TRANSPORT_NOT_SUPPORTED, OK)}Dial Request Failure[dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)}[attempt]addr2 DialAttempt:{nonce: 0xabcd}[dial] DialResponse:{status: OK, dialStatuses:(E_TRANSPORT_NOT_SUPPORTED, E_DIAL_ERROR)} \ No newline at end of file