Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instantaneous BW and IOPS reporting to Prometheus #19

Merged
merged 7 commits into from
Nov 17, 2023

Conversation

aaronlee121
Copy link
Contributor

Description

Added instantaneous BW and IOPS for Prometheus to collect when called. Current implementation is by having a go routine that will block on a channel until Prometheus collector is called, then signal to IO threads to record IO. The metrics thread will wait 1 second then signal to stop recording and report the number of IO that happened in that 1 second to Prometheus collector.

Motivation and Context

For management plane to see bandwidth and IOPS; we have no implementation of this currently.

Regression

No

How Has This Been Tested?

Ran Elbencho and s3-benchmark then signaled Prometheus to collect and compared Prometheus output to reported BW/IOPS. Mostly relied on Elbencho for this, since it has live output feature that makes it easier to compare real-time stats.

Example Elbencho vs Promtheus output
Elbencho:

Phase: READ  CPU:  52%  Active: 32  Elapsed: 10s
 Rank   %    DoneMiB      MiB/s       IOPS      Files    Files/s
Total   7      12414       1301       1301      12414       1301

Prometheus BW/IOPS reported at about the same time:

# HELP minio_metrics_bw Current BW of this Minio instance (KiB/s)
# TYPE minio_metrics_bw counter
minio_metrics_bw 1.32096e+06
# HELP minio_metrics_iops Current IOPS of this Minio instance
# TYPE minio_metrics_iops counter
minio_metrics_iops 1290

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added unit tests to cover my changes.
  • I have added/updated functional tests in mint. (If yes, add mint PR # here: )
  • All new and existing tests passed.

Integrated instantaneous IOPS/BW collection by recording 1 second of IO and outputting it to Prometheus collector.
Removed one prometheus minio collector from registering in metrics.go because it was causing Collect to run twice, didn't see any issues with doing it
@aaronlee121 aaronlee121 added enhancement New feature or request good first issue Good for newcomers labels Nov 9, 2023
@aaronlee121 aaronlee121 requested a review from a team as a code owner November 9, 2023 20:20
Copy link
Contributor

@somnathr somnathr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets discuss and address my comments..

cmd/globals.go Show resolved Hide resolved
cmd/metrics.go Show resolved Hide resolved
atomic.AddUint64(&globalCurrBW, uint64(objInfo.Size))
atomic.AddUint64(&globalCurrIOCount, 1)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to separate IO and BW counters for PUT/GET, because GUI needs to show PUT/GET BW/IOPs separately.
Lets add counters for Del as well, no need for BW counters for DEL. Also, there are other calls in this file like Headobject etc that is also considered for Get calls..See all other related calls to categorize on Get/Put.

cmd/xl-sets.go Show resolved Hide resolved
// Reset counters once reported
atomic.StoreUint64(&globalCurrIOCount, 0)
atomic.StoreUint64(&globalCurrBW, 0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets discuss, the reason I have an issue with this design as metric collection is triggered by this function. Now, metrics should be independent to Prometheus..For example, if there is a request to get metrics via ustat, then you need to change the design..Rather what I suggested earlier from the metricsReporting() thread, you reset the counter every 1 sec and keep on collecting if a global variable like metric_collection is enabled during startup...The downside of that approach is, we keep on collecting the metrics till lifetime of minio..Lets discuss more..

…ad; added new counters

Previously the design of metrics collection made it so that metrics could only be reported through Prometheus, since the code to reset counters was inside Prometheus
Now, counters are reset inside metrics collection thread while it is constantly running.
To address the issue of metrics being collected when counters are reset, new variables will keep track of last second of metrics which Prometheus will take.
Also added new counters to separate PUT, GET, and DEL BW/IOPS and now incrementing these counters in other object API handlers that do IO
NOTE: Not supporting multipart for now
Copy link

sonarcloud bot commented Nov 17, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

warning The version of Java (11.0.17) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Copy link
Contributor

@somnathr somnathr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@somnathr somnathr merged commit cdbbf3f into master Nov 17, 2023
3 checks passed
@somnathr somnathr deleted the bw_iops_reporting branch November 17, 2023 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants