A more recent version of this repository can be found at gureya/bwap
A library for dynamic page placement in NUMA nodes.
The library will move pages between the NUMA nodes during your program's execution in order to speed it up.
A simple explanation of the library is as follows:
- Interleave pages on all nodes by default (optimizing for bandwidth)
- Analyze the program's memory mapping via the
/proc/self/maps
file. - Use Hardware Performance Counters to monitor the average resource stall rate
- Move some pages from remote (non-worker) NUMA nodes into local (worker) NUMA nodes -- this reduces the bandwidth but also places pages closer to where they are requested, which may result in a performance increase if latency is the issue when accessing memory.
- If we see a drop in the average resource stall rate, we go back to step 4. A lower stall rate means the CPU is less time idle waiting for a resource. We assume that the loss in memory bandwidth is compensated with a lower access latency.
cmake
-- version 3.5 or newer- A modern C++ compiler
- We have used
gcc
8 during our testing gcc
from version 6 compiles the program, but binaries haven't been testedclang
from version 6 compiles the program, but binaries haven't been tested
- We have used
libnuma-dev
-- for thenuma.h
andlibnuma.h
headers
cmake .
to generate a Makefilemake
to build the library and tests
You can opt to use the library with or without modifying your program.
Preload the library to run alongside your program via LD_PRELOAD
:
LD_PRELOAD=/path/to/libunstickymem.so ./myProgram
- Include the library header in your program:
#include <unstickymem/unstickymem.h>
- Call at least one function (otherwise
gcc
won't bother to actually include it with your executable)
- See the available functions in
unstickymem/unstickymem.h
- Compile your program
You can make the library generally available to any user in the system.
make install
installs the library and required header files in your system.
Run make uninstall
to undo the effects of make install
.
There are a few options that can change the behavior of the library. These are specified via environment variables.
If set, it will not tune the program. Instead, it will just report the observed stall rates while shifting memory from remote to worker nodes.
If set, completely disables the library self-starting the tuning procedure on startup.
This will set a fixed ratio of pages to be placed in the worker nodes. This disables the tuning procedure.
- We are using the
CMake
build system for this library. src
contains all source filesinclude
contains all header files. Each library uses its own subfolder in order to reduce collisions when installed in a system (following the Google C++ Style Guide).- A few example programs are included in the
test
subfolder.
- The higher-level logic is found in
unstickymem.cpp
. - The logic to view/parse/modify the process memory map is in
MemoryMap.cpp
andMemorySegment.cpp
. - The logic to deal with hardware performance counters is in
PerformanceCounters.cpp
- Utility functions to simplify page placement and migration are in
PagePlacement.cpp
If you found a bug or would like a feature added, please file a new Issue!
Pull-requests are welcome.
Check for issues with the
help-wanted
tag -- these are usually ideal as first
issues or where development has been hampered.