Skip to content

Latest commit

 

History

History
93 lines (69 loc) · 2.63 KB

PROFILING.md

File metadata and controls

93 lines (69 loc) · 2.63 KB

Profiling rav1e

Table of Content

Cargo integrations

There are multiple integrations with cargo that simplify your life a lot .

Flamegraph

flamegraph works in any platform that has dtrace or perf support.

$ cargo install flamegraph
$ cargo flamegraph -o flame.svg -b rav1e -- ~/sample.y4m -o /dev/null
$ $browser flame.svg

NOTE Make sure the browser lets you use the built-in interactivity in the svg.

Instruments

cargo-instruments is macOS-only and integrates neatly with the XCode UI.

$ cargo install cargo-instruments
$ cargo instruments --release --open --bin rav1e -- ~/sample.y4m -o /dev/null

Generic profiling

Perf

Most common linux-specific profiler, to use the callgraphs you need dwarf debug symbols.

$ cargo build --release
$ perf record --call-graph dwarf target/release/rav1e ~/sample.y4m -o /dev/null
$ perf report

Instrumented profiling

uftrace

uftrace is an ELF-specific tracer. It leverages the mcount instrumentation.

$ cargo rustc --release --bin rav1e -- -Z instrument-mcount
$ uftrace record --no-libcall -D 5 target/release/rav1e ~/sample.y4m -o /dev/null
$ uftrace report

tracing

We use profiling to measure specific codepath timings. Building --features=tracing enables it using the tracing backend.

Codegen Inspection

Assembly

cargo-show-asm can be used to inspect the assembly generated by the Rust compiler. This can be helpful for micro-optimizations including coercing the compiler to auto-vectorize a function.

Simple setup:

  • Install cargo-show-asm: cargo install cargo-show-asm
  • Generate the list of functions: cargo asm > fn.list
  • Search for the function you want to analyze: grep cdef_dist fn.list
    • If the function doesn't appear, it may be inlined by the compiler. You can temporarily add #[inline(never)] to the function to bypass this issue.
  • Generate ASM for your function: cargo asm rav1e::rdo::cdef_dist_wxh > out.asm you can enter full function name, pattern or sequential number