-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instantaneous BW and IOPS reporting to Prometheus #19
Conversation
Integrated instantaneous IOPS/BW collection by recording 1 second of IO and outputting it to Prometheus collector. Removed one prometheus minio collector from registering in metrics.go because it was causing Collect to run twice, didn't see any issues with doing it
…w_iops_reporting
Management plane asked for BW to reported in bytes/s, removing the conversion to KB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets discuss and address my comments..
atomic.AddUint64(&globalCurrBW, uint64(objInfo.Size)) | ||
atomic.AddUint64(&globalCurrIOCount, 1) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to separate IO and BW counters for PUT/GET, because GUI needs to show PUT/GET BW/IOPs separately.
Lets add counters for Del as well, no need for BW counters for DEL. Also, there are other calls in this file like Headobject etc that is also considered for Get calls..See all other related calls to categorize on Get/Put.
// Reset counters once reported | ||
atomic.StoreUint64(&globalCurrIOCount, 0) | ||
atomic.StoreUint64(&globalCurrBW, 0) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets discuss, the reason I have an issue with this design as metric collection is triggered by this function. Now, metrics should be independent to Prometheus..For example, if there is a request to get metrics via ustat, then you need to change the design..Rather what I suggested earlier from the metricsReporting() thread, you reset the counter every 1 sec and keep on collecting if a global variable like metric_collection is enabled during startup...The downside of that approach is, we keep on collecting the metrics till lifetime of minio..Lets discuss more..
…ad; added new counters Previously the design of metrics collection made it so that metrics could only be reported through Prometheus, since the code to reset counters was inside Prometheus Now, counters are reset inside metrics collection thread while it is constantly running. To address the issue of metrics being collected when counters are reset, new variables will keep track of last second of metrics which Prometheus will take. Also added new counters to separate PUT, GET, and DEL BW/IOPS and now incrementing these counters in other object API handlers that do IO NOTE: Not supporting multipart for now
…w_iops_reporting
Kudos, SonarCloud Quality Gate passed! 0 Bugs 0.0% Coverage The version of Java (11.0.17) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Description
Added instantaneous BW and IOPS for Prometheus to collect when called. Current implementation is by having a go routine that will block on a channel until Prometheus collector is called, then signal to IO threads to record IO. The metrics thread will wait 1 second then signal to stop recording and report the number of IO that happened in that 1 second to Prometheus collector.
Motivation and Context
For management plane to see bandwidth and IOPS; we have no implementation of this currently.
Regression
No
How Has This Been Tested?
Ran Elbencho and s3-benchmark then signaled Prometheus to collect and compared Prometheus output to reported BW/IOPS. Mostly relied on Elbencho for this, since it has live output feature that makes it easier to compare real-time stats.
Example Elbencho vs Promtheus output
Elbencho:
Prometheus BW/IOPS reported at about the same time:
Types of changes
Checklist:
mint
PR # here: )