Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum graphs needed for top-level health reporting on the ipfs.io gateway #5

Open
9 tasks
BigLep opened this issue Jan 26, 2024 · 2 comments
Open
9 tasks

Comments

@BigLep
Copy link

BigLep commented Jan 26, 2024

Background

IP Shipyard has been entrusted to steward the ipfs.io gateway. Other leaders in the ecosystem should have the ability to see the health and usage of the ipfs.io gateway. This issue is about defining the minimum graphs needed to give others confidence in the maintained health of the service.

Graphs

Some general requests includes:

  • Provide time ranges 1+ year. Why? The longer context helps highlight small changes over time that can get missed in too short of a time period.
  • Weekly rather than daily. Why? It facilitates looking at longer time horizons. There has been discussion on how to accomplish this in slack thread.

Unique Clients accessing ipfs.io / dweb.link

Current source: https://probelab.io/ipfsgateways/#daily-unique-clients-accessing-ipfsio--dweblink
Snapshot:
gateway-clients-overall
Improvements needed:

image

HTTP Requests to ipfs.io / dweb.link, by region

Current source: https://probelab.io/ipfsgateways/#daily-http-requests-to-ipfsio--dweblink-by-region
Snapshot:
gateway-requests-region
Improvements needed:

image

p95 of TTFB for “200” responses

Current source: none currently other that a weekly snapshot value in https://protocollabs.grafana.net/d/J2_IHYTVz/gateway-report?orgId=1 . I'm also not sure if that value is including "200" responses or all responses.
Existing data: in https://docs.google.com/spreadsheets/d/1qnrAhqt_i5l9m48jge6617XD0hRK4qbTebTxWKhJdV0/edit#gid=1875197224 there is
image. That said, I don't know if that is for "200" responses or all responses.

What's needed:

  • Create new weekly plot for p95 of TTFB for “200” responses. We don't need to combine with existing data.

Response code distribution

For the requests in a given week, we should be able to show how the gateway is responding.

Why:

  1. Catch if there is a deployment issue that is affecting traffic.
  2. Prove the value of certain functionality.

Example looking at the last 7 days:
image

The high 410’s emphasizes the importance of “Badbits”. If we didn’t have it, the majority of requests would be served offering content we don’t want to serve.
If this distribution were ever to change (e.g., “badbits” was disabled) that would be bad and we’d want to see it.

Current source: none currently other that a weekly snapshot value in https://protocollabs.grafana.net/d/J2_IHYTVz/gateway-report?orgId=1

What's needed:

  • Create new weekly plot that shows status code distribution. Maybe use the top 5-10 status codes and bucket the rest as other.

Unique CIDs requested per week

Why: Gives a sense of how much of the content addressable space is being requested through the ipfs.io gateway.

What's needed:

  • weekly plot for the number of top-level / root CIDs requested by clients
  • weekly plot for total number of CIDs fetched by ipfs.io gateway
    • If clients only fetched non-badbit content, then this number would always be larger than the number above.
@BigLep
Copy link
Author

BigLep commented Jan 26, 2024

If it's helpful, I am happy to break this into multiple issues.

@BigLep
Copy link
Author

BigLep commented Jan 26, 2024

Probelab website request for things that I think probelab could execute on independently: probe-lab/website#87 . That said, I would want the "Waterworks" crew to followup, review, or make any corrections given they're the stewards of this community infrastructure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant