Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more prometheus metrics #141
Add more prometheus metrics #141
Changes from 1 commit
6fbba22
cc756c6
e63f756
c6b4d4d
a9ea374
d16c6b7
a91a3d0
6ee9eb0
05ede4a
77ba3a4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Check warning on line 1 in main.go
GitHub Actions / fmt_vet_lint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern here is that someone could make a bunch of random requests to nonexistent archives and all of the 404's for unique resources would blow up the metric response size. To avoid that, we could either:
Thoughts @bdon ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer 1). for 2) we would have to recommend they put the whole go-pmtiles application behind a allowlist-proxy, right? that is too strong of a limitation imho.
for 1) I think that discovering failures about missing archives seems outside of the scope of performance metrics, I don't have a lot of experience in how people use prometheus though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case I want to catch is if people are reading from an archive, and it suddenly goes missing or server starts returning 500 errors we want to able to see which archive is no longer working. We could limit the per-archive metrics to only archives that we've gotten an OK response for the root directory, but that would miss if the server restarts and fails to read at startup... it would also mean 2 metrics you need to alert on to know if ties are down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to limit the universe of stored tileset names to ~100?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://prometheus.io/docs/practices/naming/#labels seems to recommend against unbounded cardinality... I feel like detecting a spike in 404s/500s and inspecting logs is good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, changed bucket and overall request metrics to set archive to "" on not found errors.