Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-8331 client: Add client side metrics (#14030) #14204

Merged
merged 3 commits into from
Apr 29, 2024

Commits on Apr 24, 2024

  1. DAOS-11626 test: Adding MD on SSD metrics tests (#13661)

    Adding tests for WAL commit, reply, and checkpoint metrics.
    
    Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
    phender authored and mjmac committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    e8c30b5 View commit details
    Browse the repository at this point in the history
  2. nuke WAL metrics

    Required-githooks: true
    
    Change-Id: I17ac20dd02462edcf09af1267a66e31d39eac691
    Signed-off-by: Michael MacDonald <mjmac@google.com>
    mjmac committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    6b702f1 View commit details
    Browse the repository at this point in the history
  3. DAOS-8331 client: Add client side metrics (#14030)

    This commit comprises two separate patches to enable optional
    collection and export of client-side telemetry.
    
    The daos_agent configuration file includes new parameters to control
    collection and export of per-client telemetry. If the telemetry_port option
    is set, then per-client telemetry will be published in Prometheus format
    for real-time sampling of client processes. By default, the client telemetry
    will be automatically cleaned up on client exit, but may be optionally
    retained for some amount of time after client exit in order to allow for
    a final sample to be read.
    
    Example daos_agent.yml updates:
    telemetry_port: 9192 # export on port 9192
    telemetry_enable: true # enable client telemetry for all connected clients
    telemetry_retain: 1m # retain metrics for 1 minute after client exit
    
    If telemetry_enable is false (default), client telemetry may be enabled on
    a per-process basis by setting D_CLIENT_METRICS_ENABLE=1 in the
    environment for clients that should collect telemetry.
    
    Notes from the first patch by Di:
    
    Move TLS to common, so both client and server can have TLS,
    which metrics can be attached metrics on it.
    
    Add object metrics on the client side, enabled by
    export D_CLIENT_METRICS_ENABLE=1. And client metrics are organized
    as "/jobid/pid/xxxxx".
    
    During each daos thread initialization, it will created another
    shmem (pid/xxx), which all metrics of the thread will be attached
    to. And this metric will be destroyed once the thread exit, though
    if D_CLIENT_METRICS_RETAIN is set, these client metrics will be
    retain, and it can be retrieved by
    daos_metrics --jobid
    Add D_CLIENT_METRICS_DUMP_PATH dump metrics from current thread
    once it exit.
    
    Some fixes in telemetrics about conv_ptr during re-open the
    share memory.
    
    Add daos_metrics --jobid XXX options to retrieve all metrics
    of the job.
    
    Features: telemetry
    Required-githooks: true
    Change-Id: Ib80ff89f39d259e0dce26e0ae8388318f96a3540
    Co-authored-by: Di Wang <di.wang@intel.com>
    Signed-off-by: Di Wang <di.wang@intel.com>
    Signed-off-by: Michael MacDonald <mjmac@google.com>
    mjmac and Di Wang committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    a647825 View commit details
    Browse the repository at this point in the history