feat: Add model_load_time metric #397

indrajit96 · 2024-10-14T08:20:57Z

What does the PR do?

Add new metric nv_load_time per model to metrics
Added load time gauge metric per model.
New metric added example

HELP nv_model_load_time Load Time per-model
TYPE nv_model_load_time gauge
nv_model_load_time{model="libtorch_float32_float32_float32",version="1"} 0.311423712

Checklist

PR title reflects the change and is of format <commit_type>: <Title>
Changes are described in the pull request.
Related issues are referenced.
Added test plan and verified test passes.
Verified that the PR passes existing CI.
Verified copyright is correct on all changed files.
Added succinct git squash message before merging ref.
All template sections are filled out.
Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

triton-inference-server/server#7697

Where should the reviewer start?

ReportModelLoadTime func use-age.

Test plan:

Added in server PR

CI Pipeline ID: https://gitlab-master.nvidia.com/dl/dgx/tritonserver/-/pipelines/19322776

src/model_repository_manager/model_lifecycle.cc

src/metrics.cc

src/model_repository_manager/model_lifecycle.cc

src/metric_model_reporter.h

src/metrics.cc

yinggeh · 2024-10-14T21:11:05Z

Can you also add me to this PR?

src/model_repository_manager/model_lifecycle.cc

GuanLuo · 2024-10-14T22:00:43Z

src/model_repository_manager/model_lifecycle.cc

+      const uint64_t now_ns =
+          std::chrono::duration_cast<std::chrono::nanoseconds>(
+              std::chrono::steady_clock::now().time_since_epoch())
+              .count();
+      uint64_t time_to_load_ns = now_ns - loaded.second->load_start_ns_;
+      std::chrono::duration<double> time_to_load =
+          std::chrono::duration_cast<std::chrono::duration<double>>(
+              std::chrono::nanoseconds(time_to_load_ns));


Why don't you time the load time here and directly storing the load time in model info. You can still put the metric update logic here.

Reason I added in the metric update in
ModelLifeCycle::OnLoadFinal() was because that is where we set the status to READY for the model
loaded.second->state_ = ModelReadyState::READY;
If I set the model load time here there is chance that ModelLifeCycle::OnLoadFinal() exits without loading the model and we might have a incorrect metric.

src/model_repository_manager/model_lifecycle.cc

src/metrics.cc

src/model_repository_manager/model_lifecycle.cc

src/model_repository_manager/model_lifecycle.h

…versions

yinggeh · 2024-10-17T23:28:07Z

Can you build with metrics disabled flags and make sure it works?
-DTRITON_ENABLE_METRICS=OFF -DTRITON_ENABLE_METRICS_CPU=OFF -DTRITON_ENABLE_METRICS_GPU=OFF -DTRITON_ENABLE_STATS=OFF

src/model_repository_manager/model_lifecycle.cc

yinggeh

LGTM. Make sure you addressed all the comments before merge.

kthui

A minor comment on model_info->load_start_ns_ lifecycle. LGTM overall!

src/model_repository_manager/model_lifecycle.cc

indrajit96 added 4 commits October 11, 2024 14:52

Model Load time added

79adcfc

Add nv_load_time per model P0 metric

09a13d5

Fix typo and pre-commit

2c09ad1

Add unit to Metric

c34d91b

indrajit96 requested review from kthui, rmccorm4 and GuanLuo October 14, 2024 08:20

indrajit96 mentioned this pull request Oct 14, 2024

test: TC for Metric P0 nv_load_time per model triton-inference-server/server#7697

Open

20 tasks

rmccorm4 reviewed Oct 14, 2024

View reviewed changes

src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed Oct 14, 2024

View reviewed changes

src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed Oct 14, 2024

View reviewed changes

src/metrics.cc Outdated Show resolved Hide resolved

Review comments and unit fixed

b7986bd

rmccorm4 reviewed Oct 14, 2024

View reviewed changes

src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed Oct 14, 2024

View reviewed changes

src/metric_model_reporter.h Show resolved Hide resolved

indrajit96 added 2 commits October 14, 2024 13:36

Update TRITON_ENABLE_METRICS block

a0bf378

Pre-commit whitespace and comments fixed

9802664

rmccorm4 reviewed Oct 14, 2024

View reviewed changes

src/metrics.cc Outdated Show resolved Hide resolved

indrajit96 requested a review from yinggeh October 14, 2024 21:11

GuanLuo reviewed Oct 14, 2024

View reviewed changes

yinggeh reviewed Oct 14, 2024

View reviewed changes

src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved

src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved

src/metrics.cc Outdated Show resolved Hide resolved

yinggeh reviewed Oct 14, 2024

View reviewed changes

src/metrics.cc Outdated Show resolved Hide resolved

indrajit96 added 3 commits October 14, 2024 23:41

Add CalculateAndReportLoadTime to reduce size of OnLoadFinal

319bc04

Fix .h file

1b30d28

Change name of metric to nv_model_load_duration_secs

867716d

indrajit96 requested review from rmccorm4, yinggeh and GuanLuo October 16, 2024 23:00

kthui reviewed Oct 17, 2024

View reviewed changes

src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved

Fix review comments

c592506

yinggeh reviewed Oct 17, 2024

View reviewed changes

src/model_repository_manager/model_lifecycle.h Outdated Show resolved Hide resolved

indrajit96 added 2 commits October 17, 2024 15:30

Move location of metric reporting to fix double counting in multiple …

3952575

…versions

Add if def

89c0ca4

Fix version issues

aaaa478

indrajit96 requested review from kthui and yinggeh October 18, 2024 01:34

yinggeh reviewed Oct 18, 2024

View reviewed changes

src/model_repository_manager/model_lifecycle.cc Show resolved Hide resolved

Guard caller of ReportModelLoadTime

56244ae

yinggeh previously approved these changes Oct 18, 2024

View reviewed changes

Merge branch 'main' into ibhosale_metrics

d78b4ff

kthui previously approved these changes Oct 18, 2024

View reviewed changes

src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved

Update CalculateAndReportLoadTime() signature

121120d

indrajit96 dismissed stale reviews from kthui and yinggeh via 121120d October 19, 2024 00:32

kthui approved these changes Oct 19, 2024

View reviewed changes

indrajit96 merged commit 1962cce into main Oct 21, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add model_load_time metric #397

feat: Add model_load_time metric #397

indrajit96 commented Oct 14, 2024 •

edited

Loading

yinggeh commented Oct 14, 2024

GuanLuo Oct 14, 2024

indrajit96 Oct 15, 2024

yinggeh commented Oct 17, 2024

yinggeh left a comment

kthui left a comment

feat: Add model_load_time metric #397

feat: Add model_load_time metric #397

Conversation

indrajit96 commented Oct 14, 2024 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

yinggeh commented Oct 14, 2024

GuanLuo Oct 14, 2024

Choose a reason for hiding this comment

indrajit96 Oct 15, 2024

Choose a reason for hiding this comment

yinggeh commented Oct 17, 2024

yinggeh left a comment

Choose a reason for hiding this comment

kthui left a comment

Choose a reason for hiding this comment

indrajit96 commented Oct 14, 2024 •

edited

Loading