Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monitoring: start a new workstream on common metrics across vendors #7

Open
glimchb opened this issue Aug 16, 2022 · 6 comments
Open

Comments

@glimchb
Copy link
Member

glimchb commented Aug 16, 2022

  • Marvel @prasunkapoor taking action item to propose common metrics from Marvel side they feel comfortable
  • Nvidia @AnuradhaKaruppiah taking same action item
  • Intel @dandaly will help find a relevant contact for us
@mestery
Copy link

mestery commented Sep 21, 2022

Worth mentioning ipdk-io/openconfig-public#1, as it is a PR on virtual device statistics exposed via openconfig.

@glimchb
Copy link
Member Author

glimchb commented Sep 21, 2022

thanks @mestery for that comment, that PR is completely un-aligned with that OPI tries to produce

@dandaly
Copy link

dandaly commented Sep 21, 2022

Hi Boris can you elaborate?

@dandaly
Copy link

dandaly commented Sep 21, 2022

For monitoring what we are supporting are two sets of schemas for stats in IPDK:

  1. openconfig-interfaces: There are a set of statistics that can be found here: https://github.com/ipdk-io/openconfig-public/tree/master/release/models/interfaces
  2. virtual-devices: These are virtual devices (networking and storage) that use the same set of statistics as openconfig-interfaces. These devices are different from openconfig interfaces but from a statistics point of view we have made extra effort to align them. They are documented here: https://github.com/ipdk-io/openconfig-public/blob/master/release/models/virtual-devices/openconfig-virtual-devices.yang

This is meant to work for both software targets like KVM (combination of kernel, vhost-user and vfio-user dataplanes) and hardware targets like DPUs, IPUs & switches.

@AnuradhaKaruppiah
Copy link

Here are the statistics published via the DOCA Telemetry service -
https://docs.nvidia.com/doca/sdk/doca-telemetry-service/index.html#description

In addition the following ethtool counters are supported -
https://support.mellanox.com/s/article/understanding-mlx5-ethtool-counters

Boris, you talked about starting with system level counters? I am not sure if that is covered by these two groups. Can you please give me an example of that and I can dig up that info.

@glimchb
Copy link
Member Author

glimchb commented Sep 27, 2022

@AnuradhaKaruppiah this is a good starting list with system level metrics here #5

@glimchb glimchb transferred this issue from opiproject/opi-prov-life Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants