-
-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐕 Batch: Resource monitoring with different input scenarios and systems #1162
Comments
Hey @Malikbadmus thank you for the issue! Could you describe the issue in a few lines? Feel free to copy what you mentioned on Slack here. :) In the Objectives section, list out a few objectives - I can think of two - resource monitoring for the python binary running the model, and the docker container running the model. Please also link relevant documentation (eg on docker stats, and psutil, or any other module you propose to use for this purpose). |
Summary
We want to monitor and manage the performance and various resources (such as CPU, memory, peak memory, and possibly others) being used throughout a model Runtime.
Monitoring and tracking this resource can be a little complex, as every system has a different approach to getting these metrics, for example reading Docker memory usage statistics using cgroups are Linux-specific features, Windows and macOS do not use this filesystem layout for resource management. so we have to tailor our approach to take into account containers in other system (like WSL2 for Windows, or via Docker Desktop).
And also cgroup-v2 only exposes a subset of memory stats, fields like
max_usage
andfailcnt
were not implemented in this version and is therefore not supported by the Docker driver.We are also interested in the total physical memory being consumed by the process, which will include us taking into account memories that might have been cached, and swapped to disk during runtime.
Related to #1090
Objective(s)
psutil
which is platform independent instead oftracemalloc
, astracemalloc
does not provide memory information at the OS level, and only tracks internal allocation (pymalloc), it also wraps each malloc call to track memory, which adds to the memory overhead and slows down run time.Documentation
This blog post was referenced in the official psutil documentation, it provides a great read.
Also, Docker APIs for getting several resource metrics pertaining to a container can be found here.
The one pertaining to our issue can be found here DOCKER APIs
memory.peak
added tocgroups-v2
by linux maintainers hereThe text was updated successfully, but these errors were encountered: