GitHub - AlexanderPuckhaber/onnxruntime: ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Forked from https://github.com/microsoft/onnxruntime

This is the comparison showing just my changes

ONNX Runtime Profiler modification to record Linux perf counters per layer

ONNX Runtime is a framework for running ML programs. It can run any ML program in the popular ONNX format.

perf is a utility on Linux to measure software and hardware event counters. It is especially useful for measuring CPU events.

Important files:

onnxruntime/core/common/perf_profiler.h and onnxruntime/core/common/perf_profiler.cc work with the perf_event_open API.
onnxruntime/core/session/inference_session.cc
- Loads perf config json from filename in config_options
- Initializes perf config object and saves it in Profiler
onnxruntime/core/common/profiler.h and onnxruntime/core/common/profiler.cc
- Store perf configuration object here.
- Modified EndTimeAndRecordEvent to take in a list of str, str
onnxruntime/core/framework/sequential_executor.cc
- This is where onnxruntime records profiler info per-layer
- perf profiler is therefore called here pairs to append to the json.

To build from source:

You need perf installed on the Linux kernel
You also need to install libpfm4.
- You may need to modify CMakeLists.txt in this repository to point to your install location for libpfm4, which is also called pfm or perfmon. Otherwise, you will get linker errors when you compile.
  - I made changes here (necessary), and here (unsure if necessary)

Build command:

./build.sh --config RelWithDebInfo --build_wheel --parallel

can add --skip_tests if you fail those (I did)

To install the built python3 package:

With the path to the .whl file from your build:

pip3 install ../onnxruntime/build/Linux/RelWithDebInfo/dist/onnxruntime-1.12.0-cp310-cp310-linux_x86_64.whl

Adding --force will force reinstall, which is good for testing if you have the official or previous version of onnxruntime installed. If you are in a python virtual environment, make sure to re-load it to get the new package.

Usage

sess_options = onnxruntime.SessionOptions()
# enable builtin profiler
sess_options.enable_profiling = True
# specify path to perf configuration json
sess_options.add_session_config_entry("session.profiler.perf_config_file_name", os.path.abspath("perf_config.json"))
sess = onnxruntime.InferenceSession(model_filename, sess_options=sess_options)

# then run sess.run() on your model...

Then run sess.run() with your model. A json should appear in your directory. You can open it directly, or open it in chrome://tracing.

For a tutorial of running a simple ONNX model with profiling, see: https://onnxruntime.ai/docs/api/python/auto_examples/plot_profiling.html

However, that model is too simple to get to the Sequential Executor (which is what my profiler hooks onto). So, use a more complex model such as sigmoid.onnx from here: https://onnxruntime.ai/docs/api/python/auto_examples/plot_load_and_predict.html#sphx-glr-auto-examples-plot-load-and-predict-py

For a full example, see onnx_profiling_example.py

An example of perf_config.json could be:

{
    "perf::PERF_COUNT_HW_CPU_CYCLES": "cycles",
    "perf::PERF_COUNT_HW_INSTRUCTIONS": "instructions",
    "perf::PERF_COUNT_HW_CACHE_DTLB:READ:ACCESS": "L1-dcache-loads"
}

Each key is the name of a perf event which libpfm4 can look up and translate to a perf_event_attr for use by perf_event_open.

To find valid perf events for your cpu, use check_events and showevtinfo in the examples folder of your libpfm4 install.

The value is anything you want to name your event. Here, I am using the corresponding event names that my perf user program uses.

Common runtime errors

"Bad file descriptor": this is a problem when calling the perf_event_open API. Could be one of two things:
- 'perf' does not have permissions: set /proc/sys/kernel/perf_event_paranoid to 3 or lower
- The perf configuration json has an invalid event string. Make sure the perf events in the json are valid and available on your computer. You can use tools like the event checker in the libpfm4 library (which this uses) to verify.
All counters are 0, and they shouldn't be
- This happens when you try to pass in too many perf hardware event counters. The CPU has a special Performance Monitoring Unit (PMU) which only has enough registers to record a few hardware counters at once. On my CPU this limit is 4 (3 for cache counters).
  - Solution: remove some hardware events
  - The perf user program (e.g. perf stat) performs multiplexing to support more event counters. Basically, it quickly cycles through which counters it records and reports a percentage of time that the counter was able to be measured. For more information, this is a good read.
There are no perf counters at all
- Check the exact spelling of the configuration key/value
- Make sure your model is complex enough to get to the Sequential Executor, because that's where I put the per-layer profiling. In onnxruntime python examples, sigmoid.onnx was complex enough, but mul_1.onnx wasn't.

Why would someone want to use this?

ONNX Runtime already has a built-in profiler which records how much time it takes for each layer to execute. If you just want time info, use that.
- It also has a memory profiler for how much memory each layer uses, which can be optionally enabled with a compiler flag.
ONNX Runtime can be compiled to support NVTX, an Nvidia program to monitor GPU performance counters, including hardware performance counters on the GPU. It seems to work by adding events to ONNX Runtime that NVTX can listen for.
Linux perf record function can record with high granularity (samples 1000+ Hz), which is enough to capture the performance counter info for functions that run for more than a few milliseconds (the ones we care about)
- perf can also be configured to start/stop recording at particular code breakpoints. This is extremely useful for profiling individual functions or segments of code in long-running programs*.
- However, in both of these approaches it can be difficult to differentiate between the different layers in the ML model. Convolutions and matrix multiplications from different layers can be fused together into the same function for efficiency reasons, so it can be tricky to figure out which layer belongs to which function
  - I did try once lining up the perf record timestamps with timestamps from the ONNX Runtime builtin profiler. However, they used different system clocks and I found I had to modify the ONNX Runtime profiler anyway, so I might as well add a function to record the perf counters per-layer.
*In hindsight, this is the approach I should have used for this program: adding specific events to ONNX Runtime that the regular perf user program could simply listen for. Which is what I think ONNX Runtime does with NVTX integration...

Caveats

perf_event_open is called here on a the current process pid. I call it in onnxruntime's Sequential Executor. If it spawns new processes, perf should track those (and even kernel processes if perf_event_paranoid is permissive enough). However, I don't think it can keeps track of the counters from existing processes e.g. services. I haven't checked to see if onnxruntime uses those.

Name		Name	Last commit message	Last commit date
Latest commit History 6,817 Commits
.config		.config
.gdn		.gdn
.github		.github
.pipelines		.pipelines
.vscode		.vscode
cgmanifests		cgmanifests
cmake		cmake
csharp		csharp
dockerfiles		dockerfiles
docs		docs
include/onnxruntime/core		include/onnxruntime/core
java		java
js		js
objectivec		objectivec
onnxruntime		onnxruntime
orttraining		orttraining
package/rpm		package/rpm
samples		samples
tools		tools
winml		winml
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NuGet.config		NuGet.config
ORT_icon_for_light_bg.png		ORT_icon_for_light_bg.png
README.md		README.md
SECURITY.md		SECURITY.md
ThirdPartyNotices.txt		ThirdPartyNotices.txt
VERSION_NUMBER		VERSION_NUMBER
build.amd64.1411.bat		build.amd64.1411.bat
build.bat		build.bat
build.sh		build.sh
lgtm.yml		lgtm.yml
ort.wprp		ort.wprp
packages.config		packages.config
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-doc.txt		requirements-doc.txt
requirements-training.txt		requirements-training.txt
requirements.txt.in		requirements.txt.in
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ONNX Runtime Profiler modification to record Linux perf counters per layer

Important files:

To build from source:

Build command:

To install the built python3 package:

Usage

Common runtime errors

Why would someone want to use this?

Caveats

About

Releases

Packages

Languages

License

AlexanderPuckhaber/onnxruntime

Folders and files

Latest commit

History

Repository files navigation

ONNX Runtime Profiler modification to record Linux perf counters per layer

Important files:

To build from source:

Build command:

To install the built python3 package:

Usage

Common runtime errors

Why would someone want to use this?

Caveats

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages