The repository, nvidia-metrics
, leverages the Nvidia Management Library (NVML), a C-based API that can interface with Nvidia GPUs. This repository gives a clear insight into the usage statistics of Nvidia GPUs like temperature, power consumption, memory usage, etc. It is intended for developers working on high performance computing, machine learning, and other GPU-intensive tasks. The gathered metrics are transmitted to Prometheus and Grafana for visualization. You can utilize the config/metrics.yaml
file to specify and modify the metrics that are required. This YAML configuration file allows you to define and adjust metrics and labels.
These instructions will provide you a guideline for installing prerequisites, running the application, and building necessary files.
Usage
-config string
Path to the configuration file (default "config/metrics.yaml")
-filelog string
Enable file logging (default "false")
-host string
Host to run the metrics server (default "0.0.0.0")
-interval string
Time interval in seconds to scrape metrics (default "5")
-logfile string
Log file path (default "logs/gpu-metrics.log")
-loglevel string
Log level (debug, info, warn, error,fatal) (default "info")
-port string
Port to run the metrics server (default "9500")
To use this repository, you should have the Nvidia CUDA toolkit installed on your system.
You can install the toolkit via:
sudo apt-get install nvidia-cuda-toolkit
Clone the repository to your local machine.
git clone <repo_link>
Navigate to the cloned directory.
cd nvidia-metrics
Compile the project.
make
After the project has been compiled, run the resulting binary.
./nvidiaMetrics --config config/metrics.yaml
docker run -e CONFIG_FILE=/path/to/config.yaml \
-e LOG_LEVEL=debug \
-e PORT=8080 \
-e HOST=0.0.0.0 \
-e INTERVAL=10 \
your-image-name
- NVML - A C-based GO API for monitoring and managing Nvidia GPUs.
- CUDA - A parallel computing platform and programming model developed by Nvidia for general computing on GPUs.
Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License.
Please feel free to contact the project maintainers if you encounter any issues or have any enquiries about the repository.
We hope you find this repository useful in your venture!