Minimum graphs needed for top-level health reporting on the ipfs.io gateway #5

BigLep · 2024-01-26T00:54:27Z

Background

IP Shipyard has been entrusted to steward the ipfs.io gateway. Other leaders in the ecosystem should have the ability to see the health and usage of the ipfs.io gateway. This issue is about defining the minimum graphs needed to give others confidence in the maintained health of the service.

Graphs

Some general requests includes:

Provide time ranges 1+ year. Why? The longer context helps highlight small changes over time that can get missed in too short of a time period.
Weekly rather than daily. Why? It facilitates looking at longer time horizons. There has been discussion on how to accomplish this in slack thread.

Unique Clients accessing ipfs.io / dweb.link

Current source: https://probelab.io/ipfsgateways/#daily-unique-clients-accessing-ipfsio--dweblink
Snapshot:

Improvements needed:

Weekly numbers (rather than daily).
Combine with weekly data further back in time from https://docs.google.com/spreadsheets/d/1qnrAhqt_i5l9m48jge6617XD0hRK4qbTebTxWKhJdV0/edit#gid=1875197224 .

HTTP Requests to ipfs.io / dweb.link, by region

Current source: https://probelab.io/ipfsgateways/#daily-http-requests-to-ipfsio--dweblink-by-region
Snapshot:

Improvements needed:

Generate weekly (rather than daily) aggregates
Make clear what request/responses are included here. Is it all or only “successful” (HTTP 200) requests?
Combine with weekly data further back in time from https://docs.google.com/spreadsheets/d/1qnrAhqt_i5l9m48jge6617XD0hRK4qbTebTxWKhJdV0/edit#gid=1875197224 .

p95 of TTFB for “200” responses

Current source: none currently other that a weekly snapshot value in https://protocollabs.grafana.net/d/J2_IHYTVz/gateway-report?orgId=1 . I'm also not sure if that value is including "200" responses or all responses.
Existing data: in https://docs.google.com/spreadsheets/d/1qnrAhqt_i5l9m48jge6617XD0hRK4qbTebTxWKhJdV0/edit#gid=1875197224 there is
. That said, I don't know if that is for "200" responses or all responses.

What's needed:

Create new weekly plot for p95 of TTFB for “200” responses. We don't need to combine with existing data.

Response code distribution

For the requests in a given week, we should be able to show how the gateway is responding.

Why:

Catch if there is a deployment issue that is affecting traffic.
Prove the value of certain functionality.

Example looking at the last 7 days:

The high 410’s emphasizes the importance of “Badbits”. If we didn’t have it, the majority of requests would be served offering content we don’t want to serve.
If this distribution were ever to change (e.g., “badbits” was disabled) that would be bad and we’d want to see it.

Current source: none currently other that a weekly snapshot value in https://protocollabs.grafana.net/d/J2_IHYTVz/gateway-report?orgId=1

What's needed:

Create new weekly plot that shows status code distribution. Maybe use the top 5-10 status codes and bucket the rest as other.

Unique CIDs requested per week

Why: Gives a sense of how much of the content addressable space is being requested through the ipfs.io gateway.

What's needed:

weekly plot for the number of top-level / root CIDs requested by clients
weekly plot for total number of CIDs fetched by ipfs.io gateway
- If clients only fetched non-badbit content, then this number would always be larger than the number above.

BigLep · 2024-01-26T00:54:59Z

If it's helpful, I am happy to break this into multiple issues.

BigLep · 2024-01-26T01:01:36Z

Probelab website request for things that I think probelab could execute on independently: probe-lab/website#87 . That said, I would want the "Waterworks" crew to followup, review, or make any corrections given they're the stewards of this community infrastructure.

BigLep mentioned this issue Jan 26, 2024

Gateway metrics metrics should also be weekly and go further back in time probe-lab/website#87

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimum graphs needed for top-level health reporting on the ipfs.io gateway #5

Minimum graphs needed for top-level health reporting on the ipfs.io gateway #5

BigLep commented Jan 26, 2024

BigLep commented Jan 26, 2024

BigLep commented Jan 26, 2024 •

edited

Loading

Minimum graphs needed for top-level health reporting on the ipfs.io gateway #5

Minimum graphs needed for top-level health reporting on the ipfs.io gateway #5

Comments

BigLep commented Jan 26, 2024

Background

Graphs

Unique Clients accessing ipfs.io / dweb.link

HTTP Requests to ipfs.io / dweb.link, by region

p95 of TTFB for “200” responses

Response code distribution

Unique CIDs requested per week

BigLep commented Jan 26, 2024

BigLep commented Jan 26, 2024 • edited Loading

BigLep commented Jan 26, 2024 •

edited

Loading