Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document IPNI HTTP provider specification #18

Merged
merged 4 commits into from
Aug 31, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
332 changes: 332 additions & 0 deletions IPNI_HTTP_PROVIDER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,332 @@
# IPNI HTTP Provider

![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square)

**Author(s)**:

- [Masih Derkani](https://github.com/masih)

**Maintainer(s)**:

- [Andrew Gillis](https://github.com/gammazero)
- [Ivan Schasny](https://github.com/ischasny)
- [Masih Derkani](https://github.com/masih)
- [Will Scott](https://github.com/willscott)

* * *

**Abstract**

The IPNI HTTP Provider specification formalizes the HTTP transport for the advertisement chain generated by IPNI
providers. The advertisement chain, comprising metadata and content, is first announced to the network and eventually
fetched by indexers. While the IPNI specification outlines transport options such
as [HTTP](https://github.com/ipni/specs/blob/main/IPNI.md#http)
and [DataTransfer](https://github.com/ipni/specs/blob/main/IPNI.md#libp2p), this document focuses on defining the HTTP
API specification that providers must adhere to become HTTP Index Providers.

This specification defines the endpoints, request/response formats, and behavior expected from an IPNI HTTP provider. It
aims to provide a standardized and well-defined protocol for interacting with HTTP-based index providers. The API
specification enables interoperability between providers, facilitates the grouping of capabilities, and opens up
opportunities to make the transport mechanism swappable.

## Table of Contents

- [Motivation](#motivation)
- [Background](#background)
- [Common Data Types](#common-data-types)
- [Encoding](#encoding)
- [Versioning](#versioning)
- [Namespacing](#namespacing)
- [Implicit Namespace Assumption](#implicit-namespace-assumption)
- [Custom Paths in Multiaddr](#custom-paths-in-multiaddr)
- [Specification](#specification)
- [GET `/ipni/v1/ad/head`](#get-ipniv1adhead)
- [Response](#response)
- [Example](#example)
- [GET `/ipni/v1/ad/{CID}`](#get-ipniv1adcid)
- [Response](#response-1)
- [Example](#example-1)
- [Security](#security)
- [Future Considerations](#future-considerations)
- [Related Resources](#related-resources)
- [Copyright](#copyright)

## Motivation

The motivation behind formalizing the HTTP API specification for IPNI providers stems from the need for a standardized
and well-defined protocol for interacting with HTTP-based index providers. Despite the existence of the HTTP protocol
for providers from early on, there has been no formal definition of the API specific to IPNI. By establishing a clear
and consistent API specification, we aim to achieve several goals:

1. **Explicit Definition** -- A formal definition of the HTTP API for IPNI providers fills a crucial gap in the IPNI
specification. It provides a comprehensive and explicit outline of the required endpoints, request/response formats,
and behavior expected from providers. Such definition can then be referenced as part of the capability required by a
provider that aims to offer retrieval over HTTP.

2. **Grouping of Capabilities** -- An implicit grouping of API endpoints is essential because HTTP servers running IPNI
capabilities often provide multiple functionalities. By organizing the API endpoints under a unified specification,
it becomes easier to understand and navigate the various capabilities offered by an IPNI provider while avoiding
conflict with other APIs such as the [IPFS HTTP Gateway](https://specs.ipfs.tech/http-gateways/) or Boost.

3. **Interoperability and Swappable Transport** -- A formalized API specification facilitates future opportunities for
interoperability between different IPNI providers and enables the potential for swappable transport mechanisms. By
adhering to a common API standard, providers can ensure compatibility and seamless integration with other components
of the IPNI ecosystem, while reducing engineering footprint.

Overall, the formal definition of the HTTP API for IPNI providers enhances clarity, promotes consistency, and opens up
opportunities for innovation and collaboration within the IPNI network.

## Background

### Common Data Types

The IPNI HTTP Provider API specification makes use of the following common data types:
masih marked this conversation as resolved.
Show resolved Hide resolved

- `CID` (Content Identifier): A [CID](https://docs.ipfs.tech/concepts/content-addressing/) is a self-describing content-addressed identifier that is commonly used in the
IPFS (InterPlanetary File System) ecosystem. It serves as a unique identifier for content, such as files or
directories.

- `Multihash`: A [Multihash](https://multiformats.io/multihash) is a self-describing hash format that supports various hash functions. It provides a
standardized representation for different hash algorithms, enabling interoperability and flexibility within the IPNI
ecosystem.

- `Multiaddr`: A [Multiaddr](https://multiformats.io/multiaddr/) is a format for representing network addresses that supports multiple transport protocols. It
provides a unified way of specifying network addresses, including information such as the network protocol, addressing
scheme, and transport options.

These common data types play a crucial role in the IPNI HTTP Provider API specification by ensuring consistency and
compatibility when working with content identification and cryptographic hash values. By utilizing these standardized
data types, the API promotes interoperability and ease of integration within the IPNI ecosystem.

### Encoding

IPNI uses [InterPlanetary Data Model (IPLD)](https://ipld.io/docs/data-model/) to represent information associated to
advertisements, their entries and their providers. The IPNI HTTP Provider API aims to offer a human-readable encoding
for the request and response payloads. Therefore, the default encoding
of [`application/vnd.ipld.dag-json`](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json) is
used. However, implementers have the flexibility to specify different encodings by utilizing appropriate HTTP headers as
long as it facilitates verifiability of the data exchanged. This allows for customization and optimization of the data
encoding based on specific requirements or preferences. The choice of encoding should be communicated through `Accept`
and `Content-Type` headers in the HTTP request and response headers. Alternative examples
include [`application/vnd.ipld.dag-cbor`](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor).

As a special case, the following content types are interpreted as their IPLD equivelant:

* `application/json` is considered to be the same as `application/vnd.ipld.dag-json`.
* `application/cbor` is considered to be the same as `application/vnd.ipld.dag-cbor`.

The common data types, such as `CID` and `Multihash`, are encoded in their
standard [multibase format](https://github.com/multiformats/multibase) when referenced in URLs.

### Versioning

The IPNI Provider HTTP API adopts a versioning strategy that utilizes the path prefix to indicate major releases. As a
best practice, future iterations that introduce breaking changes should increment the version number in the API path.
This approach ensures compatibility and allows clients to adapt to new versions while maintaining backward compatibility
with existing implementations. By explicitly incorporating versioning in the API path, it becomes easier to manage and
track changes over time, facilitating smooth transitions and providing a clear indication of API evolution.


### Namespacing

This specification introduces the `/ipni/` URL prefix to namespace the functionality associated with IPNI HTTP providers. The presence of this namespace allows IPNI specifications to evolve without conflicting with other URL patterns that an HTTP server may support. Therefore, this specification reserves the right for any sub-tree that appears after this prefix.
masih marked this conversation as resolved.
Show resolved Hide resolved

#### Implicit Namespace Assumption

Indexer nodes implicitly assume the `/ipni/` URL prefix. When receiving an announcement from providers, an indexer will attempt to fetch advertisements by joining the announced URL with the paths documented in this specification. Implementers have the freedom to include additional paths to the announced URL if desired.

For example, if the announced URL is `https://ipni-provider.example`, it will be accessed at `https://ipni-provider.example/ipni/v1/ad/head` to fetch the head advertisement. Similarly, if the announced URL is `https://ipni-provider.example/my/prefix`, it will be accessed at `https://ipni-provider.example/my/prefix/ipni/v1/ad/head`.

#### Custom Paths in Multiaddr

Custom paths in addresses specified as Multiaddr is not currently supported. See related issues below:
* https://github.com/ipni/go-libipni/issues/42
* https://github.com/libp2p/specs/pull/550

## Specification

### GET `/ipni/v1/ad/head`
masih marked this conversation as resolved.
Show resolved Hide resolved

This endpoint retrieves the most recent advertisement CID published by the provider, along with additional information such as the provider's public key and the topic under which the advertisement is announced.

Implementers should include explicit Cache-Control headers to manage caching behavior. This is beneficial for the following reasons:

- **Efficient Discovery**: Indexer nodes only need to be aware of the latest head advertisement. Since advertisements are chained, previous ads will be automatically discovered. By setting appropriate cache parameters on the response, indexers can determine how often they need to contact providers to discover a new head. This approach optimizes traffic between the provider and indexer nodes, improving overall efficiency.

- **Frequency of Head Changes**: The head advertisement of a typical provider may not change frequently. By setting cache parameters, providers can indicate the appropriate caching behavior to indexers. This helps indexers decide how often they should request updates for the head advertisement. Setting optimal cache parameters can result in more efficient utilization of network resources.

To disable HTTP caching and ensure that indexers always receive the latest response, providers should set the following Cache-Control header:

```
Cache-Control: no-cache, no-store, must-revalidate
```

Alternatively, if providers want to allow accepting a cached response and revalidating it in the background, they should use an appropriate `Cache-Control` header with a `max-age` value that reflects the frequency at which the provider generates new advertisements. Additionally, the `stale-while-revalidate` directive can be used to specify a period during which a stale cached response can still be served while a revalidation request is sent in the background.

For more information on caching headers, you can refer to [RFC5861](https://datatracker.ietf.org/doc/html/rfc5861).

#### Response

The response from the `/ipni/v1/ad/head` endpoint includes the following fields:

- **`head`** (required): The CID of the latest advertisement published by the provider.
- **`topic`** (optional): The topic name on which the advertisement is announced. If not specified, the default value
of `/indexer/ingest/mainnet` is assumed.
- **`pubkey`** (required): The serialized public key of the provider in protobuf format using the libp2p standard. The
public key can be marshalled using
the [`crypto.MarshalPublicKey`](https://pkg.go.dev/github.com/libp2p/go-libp2p-core/crypto#MarshalPublicKey) function.
- **`sig`** (required): The signature associated with the `head` CID, obtained by concatenating the bytes of the `head`
CID and the UTF-8 bytes of the `topic` (if present). The signature is verified against the `pubkey`.

Please note that the response provides the necessary information to validate the authenticity of the advertisement CID
and verify its integrity using the provider's public key and the associated signature.

The following snippet represents the IPLD schema of the signed head advertisement:

```ipldsch
type SignedHead struct {
head Link
topic optional String
pubkey Bytes
sig Bytes
}
```

#### Example

The following represents an abbreviated example of head advertisement response in `DAG-JSON` encoding:

```json
{
"head": {
"/": "baguqeerazx6qhbdzckrnuwvl3reqnrpysz3ry5qtdxzpo23losoeiywij65a"
},
"topic": "/indexer/ingest/devnet",
"pubkey": {
"/": {
"bytes": "CAASpgIwggEiMA0G..."
}
},
"sig": {
"/": {
"bytes": "LrZiicKdqDqkG2UFR..."
}
}
}
```

### GET `/ipni/v1/ad/{CID}`
masih marked this conversation as resolved.
Show resolved Hide resolved
masih marked this conversation as resolved.
Show resolved Hide resolved

This endpoint retrieves the content associated with an advertisement CID and its entries.
All links encountered during the traversal of the head advertisement are served by this endpoint.
The `CID` specified in the URL parameter must match the response body, as the data returned by this endpoint is immutable.

To ensure proper caching and immutability, implementers must include the following response header:

```
Cache-Control: public, max-age=29030400, immutable
```

By setting this response header, it instructs client-side caches and intermediate proxies to store the response for a long duration (`max-age=29030400`), as well as treat it as immutable. This helps improve performance and ensures that the same content is served consistently for the specified `CID`.

### Response

The response from the `/ipni/v1/ad/{CID}` endpoint returns the data associated to the advertisement CID and its embedded links, including `Entries`. See [IPNI Specification/Advertisements](IPNI.md#advertisements) for the advertisement and entries schema.
gammazero marked this conversation as resolved.
Show resolved Hide resolved

#### Example

The following represents and abbreviated advertisement encoded in `DAG-JSON`:

```json
{
"Addresses": [
"/dns4/content.example/tcp/443/wss"
],
"ContextID": {
"/": {
"bytes": "YmFndXFlZXJh..."
}
},
"Entries": {
"/": "baguqeeraog3uqu7au2mctrfslzvua5lzl3wyoquqhbyoig6hnputypej2uvq"
},
"IsRm": false,
"Metadata": {
"/": {
"bytes": "gBI"
}
},
"PreviousID": {
"/": "baguqeeranqfhpdacbe7ls44ndcgeis6yufvhk4zntydlqfaplzz6mdyuu6gq"
},
"Provider": "QmQzqxhK82kAmK...",
"Signature": {
"/": {
"bytes": "CqsCCAASpgIwggEiMA0..."
}
}
}
```

The following snippet shows an example of advertisement entries encoded in `DAG-JSON`:

```json
{
"Entries": [
{
"/": {
"bytes": "EiDYa/LrNYDdHjSlKGXpHjc1IyHds9Xdi/p8d25Q5UE/bQ"
}
}
],
"Next": {
"/": "baguqeeraog3uqu7au2mctrfslzvua5lzl3wyoquqhbyoig6hnputypej5guq"
}
}
```

### Security

The IPNI HTTP Provider API specification includes several security implications and considerations to ensure secure
communication and data integrity. Here are the key security aspects:

- **TLS (Transport Layer Security)**: It is recommended to use TLS whenever possible to establish secure connections
between clients and the IPNI HTTP provider. TLS encrypts the data during transmission, protecting it from unauthorized
access and tampering.

- **Pagination and Maximum Content Length**: Implementers should take into account setting a maximum content length for responses to mitigate potential abuse or resource exhaustion attacks. This helps ensure the API's performance and resilience against malicious actors. We suggest a maximum content length of 4MB. It is important to note that content pagination is achieved using IPLD links. For example, advertisement entry chunks are linked together using the `Next` field, forming a linked list of multihashes. Therefore, implementors must ensure that the generated entry chunks remain below the maximum content length accepted by the indexers.

gammazero marked this conversation as resolved.
Show resolved Hide resolved
- **Signature Validation**: Implementers should validate the signatures associated with the head advertisement CID to
verify the authenticity and integrity of the data. Signature validation ensures that the data is generated by the
legitimate provider and has not been tampered with during transmission.

- **Content Verification against CID**: Implementers should verify that the content associated with an advertisement or
entry CID matches the expected CID value. This verification ensures the consistency and accuracy of the data, guarding
against any manipulation or corruption.
gammazero marked this conversation as resolved.
Show resolved Hide resolved

## Future Considerations

As the IPNI HTTP Provider API specification evolves, there are several future considerations to keep in mind:

- **Writer Privacy**: Ensuring the privacy and confidentiality of the writers' data is an important aspect to address in
future iterations. Mechanisms such as encryption and access control can be explored to safeguard sensitive information
and protect the privacy of writers.

- **Swappable Transport**: The IPNI HTTP Provider API can explore integration with
the [`go-libp2p-http`](https://github.com/libp2p/go-libp2p-http) library to leverage its capabilities and enhance the
transport layer. This integration can enable seamless communication between IPNI providers and consumers using the
libp2p networking stack, offering additional features and benefits.

- **Bulk Fetch**: The IPNI HTTP Provider API can introduce support for [`application/vnd.ipld.car`](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) to allow bulk download of advertisements. Such extensions may include query parameters to further enable indexer nodes to refine and optimize the fetching process. See the [IPFS HTTP Gateway specification](https://specs.ipfs.tech/http-gateways/) for related work.
gammazero marked this conversation as resolved.
Show resolved Hide resolved

By considering these aspects in future developments, the IPNI HTTP Provider API can continue to improve its
functionality, security, and interoperability, providing a more robust and versatile solution for advertisement chain
transport.

## Related Resources

* [IPNI Specification](IPNI.md)

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).