In this demo, we look at how to:
- collect a training corpus for training a
-Oz
inliner policy - perform that training
- use the trained policy to build a 'release' clang which embeds the policy
For the first two steps, we use a 'development' mode clang which depends on TFLite and allows swapping policies from the command line - which is necessary to enable the reinforcement training loop iterate and improve over policies.
The 'release' mode compiler does not have this option. The policy is fixed and
embedded in the compiler, and a user opts in to using it via a command line flag
(-mllvm -enable-ml-inliner=release
). Also the compiler has no runtime
tensorflow dependencies.
For corpus extraction, any project that is built with clang and uses a build
system that produces a compilation database json file (like
clang -DCMAKE_EXPORT_COMPILE_COMMANDS
) would work. The larger the number of
modules, the better (for training). For this demo, we're using the Fuchsia
project.
We assume all repositories are installed under $HOME
, e.g. you should have
(after the next step) a ~/fuchsia
, ~/ml-compiler-opt
, etc.
Follow the instructions available at https://llvm.org/docs/GettingStarted.html. In most cases, it should be as simple as:
cd ~ && git clone https://github.com/llvm/llvm-project.git
Typical prerequisites:
sudo apt-get install cmake ninja-build lld
export LLVM_SRCDIR=~/llvm-project
export LLVM_INSTALLDIR=~/llvm-install
See instructions at https://fuchsia.dev/fuchsia-src/get-started/get_fuchsia_source. Make sure PATH
is set up appropriately.
export IDK_DIR=~/fuchsia-idk
export SYSROOT_DIR=~/fuchsia-sysroot
export FUCHSIA_SRCDIR=~/fuchsia
We also need:
This should amount to:
cipd install fuchsia/sdk/core/linux-amd64 latest -root ${IDK_DIR}
cipd install fuchsia/sysroot/linux-arm64 latest -root ${SYSROOT_DIR}/linux-arm64
cipd install fuchsia/sysroot/linux-amd64 latest -root ${SYSROOT_DIR}/linux-x64
Note: If your shell can't find the cipd
command, it's likely your $PATH
variable doesn't contain the path to .jiri_root/bin
. To add this to the path,
run the following command:
export PATH=$PATH:$(realpath $FUCHSIA_SRCDIR/.jiri_root/bin)
We need to make sure the git revision of llvm is one that works with the version of the Fuchsia tree.
To get the git hash at which we know this version of Fuchsia will build, do:
cd ${FUCHSIA_SRCDIR}
jiri package 'fuchsia/third_party/clang/${platform}'
The output will contain a git_revision
line, for example fa4c3f70ff0768a270b0620dc6d158ed1205ec4e
. Copy that hash and then (using the
example):
cd ${LLVM_SRCDIR}
git checkout fa4c3f70ff0768a270b0620dc6d158ed1205ec4e
This is this repository.
We need to install all of the Python dependencies for this repository, setup some environment variables for the AOT compiler, and also build the TFLite dependency to compile LLVM in MLGO development mode.
See also the build bot script
cd ~/ml-compiler-opt
sudo apt-get install python3-pip
pip3 install pipenv
pipenv sync --system
TF_PIP=$(python3 -m pip show tensorflow | grep Location | cut -d ' ' -f 2)
export TENSORFLOW_AOT_PATH="${TF_PIP}/tensorflow"
export TFLITE_PATH=~/tflite
mkdir ${TFLITE_PATH}
cd ${TFLITE_PATH}
~/ml-compiler-opt/buildbot/build_tflite.sh
Build LLVM with the 'development' ML mode, and additional Fuchsia-specific settings:
cd ${LLVM_SRCDIR}
mkdir build
cd build
cmake -G Ninja \
-DLLVM_ENABLE_LTO=OFF \
-DLINUX_x86_64-unknown-linux-gnu_SYSROOT=${SYSROOT_DIR}/linux-x64 \
-DLINUX_aarch64-unknown-linux-gnu_SYSROOT=${SYSROOT_DIR}/linux-arm64 \
-DFUCHSIA_SDK=${IDK_DIR} \
-DCMAKE_INSTALL_PREFIX= \
-DCMAKE_INSTALL_RPATH_USE_LINK_PATH=On \
-C ${LLVM_SRCDIR}/clang/cmake/caches/Fuchsia-stage2.cmake \
-C ${TFLITE_PATH}/tflite.cmake \
${LLVM_SRCDIR}/llvm
ninja toolchain-distribution
DESTDIR=${LLVM_INSTALLDIR} ninja install-toolchain-distribution-stripped
cd ${FUCHSIA_SRCDIR}
python3 scripts/clang/generate_runtimes.py --clang-prefix=$LLVM_INSTALLDIR --sdk-dir=$IDK_DIR --build-id-dir=$LLVM_INSTALLDIR/lib/.build-id > $LLVM_INSTALLDIR/lib/runtime.json
NOTE 1: The only flag specific to MLGO is -C ${TFLITE_PATH}/tflite.cmake
.
NOTE 2: Fuchsia's clang/cmake/caches/Fuchsia-stage2.cmake
enables the new
pass manager by default. This allows us to not need to require it explicitly at
compile time, but it is a requirement for the MLGO project (all our work assumes
the new pass manager)
NOTE 3: The python executable should be explicitly set if there are multiple
(particularly, newer) Python executables on the system with
-DPython3_EXECUTABLE=/path/to/compatible/python
cd ${FUCHSIA_SRCDIR}
fx set core.x64 \
--args='clang_prefix="${LLVM_INSTALLDIR}/bin"' \
--args=clang_embed_bitcode=true \
--args='optimize="size"' \
--args='clang_ml_inliner=false'
fx build
We set clang_ml_inliner
to false here because by default it is set to true,
but that wouldn't yet work since we don't have a model to embed.
Fuchsia build conveniently generates a size report. Let's copy it for reference.
Note
The clang_prefix
is the absolute path of $LLVM_INSTALLDIR/bin(replace it by
yours). The --args=clang_embed_bitcode=true
option above adds the compilation
flag -Xclang=-fembed-bitcode=all
. This can be seen in the compilation database.
The effect of this is that the object files have the llvm bytecode produced by
clang, before the optimization passes, and the clang command line, captured in
the .llvmbc and .llvmcmd sections, respectivelly. This is the mechanism by which
we extract our corpus.
Naturally, the effect of this is that the object files, and the linked binaries, are larger. Fuchsia strips the final object; but, more importantly, the size report captures multiple dimensions, beside the final file size - including, for example, the size of the text section.
cp out/default/elf_sizes.json /tmp/orig_sizes.json
cd ${FUCHSIA_SRCDIR}
fx compdb
This produces a compile_commands.json
compilation database, akin cmake's.
Install the corpus extraction utilities:
pip3 install mlgo-utils
and then run the extract_ir
script to extract the corpus:
export CORPUS=$HOME/corpus
cd ~/ml-compiler-opt
extract_ir \
--cmd_filter="^-O2|-Os|-Oz$" \
--input=$FUCHSIA_SRCDIR/out/default/compile_commands.json \
--input_type=json \
--llvm_objcopy_path=$LLVM_INSTALLDIR/bin/llvm-objcopy \
--output_dir=$CORPUS
If you get an error saying the extract_ir
script cannot be found, make sure
the local binary directory that Python installs scripts to is in your $PATH
.
In most cases this is ~/.local/bin
.
export DEFAULT_TRACE=$HOME/default_trace
export DEFAULT_VOCAB=compiler_opt/rl/inlining/vocab
Collect traces from the default heuristic, to kick off the training process.
NOTE the double and single quotes for the --gin-bindings
- this is because
the last value must appear, syntactically, as a python string.
rm -rf $DEFAULT_TRACE &&
PYTHONPATH=$PYTHONPATH:. python3 \
compiler_opt/tools/generate_default_trace.py \
--data_path=$CORPUS \
--output_path=$DEFAULT_TRACE \
--gin_files=compiler_opt/rl/inlining/gin_configs/common.gin \
--gin_bindings=config_registry.get_configuration.implementation=@configs.InliningConfig \
--gin_bindings=clang_path="'$LLVM_INSTALLDIR/bin/clang'" \
--gin_bindings=llvm_size_path="'$LLVM_INSTALLDIR/bin/llvm-size'" \
--sampling_rate=0.2
Generate vocab for the generated trace. This is an optional step and should be triggered if the set of features or the distribution of features in the trace changes.
rm -rf $DEFAULT_VOCAB &&
PYTHONPATH=$PYTHONPATH:. python3 \
compiler_opt/tools/generate_vocab.py \
--gin_files=compiler_opt/rl/inlining/gin_configs/common.gin \
--input=$DEFAULT_TRACE \
--output_dir=$DEFAULT_VOCAB
Note
The generate_vocab.py
tool optionally accepts two more additional
flags --sampling_fraction
and --parallelism
.
sampling_fraction
downsamples input features and parallelism
controls the
degree of parallelism.
These flags can be tuned to reduce memory footprint and improve execution speed
of the vocab generator.
export WARMSTART_OUTPUT_DIR=$HOME/warmstart
export OUTPUT_DIR=$HOME/model
Train a behavioral cloning model based on the above trace, that mimics default inlining behavior. This is the 'warmstart' model.
rm -rf $WARMSTART_OUTPUT_DIR && \
PYTHONPATH=$PYTHONPATH:. python3 \
compiler_opt/rl/train_bc.py \
--root_dir=$WARMSTART_OUTPUT_DIR \
--data_path=$DEFAULT_TRACE \
--gin_files=compiler_opt/rl/inlining/gin_configs/behavioral_cloning_nn_agent.gin
Starting from the warmstart model, train the optimized model - this will take about half a day
rm -rf $OUTPUT_DIR && \
PYTHONPATH=$PYTHONPATH:. python3 \
compiler_opt/rl/train_locally.py \
--root_dir=$OUTPUT_DIR \
--data_path=$CORPUS \
--gin_bindings=clang_path="'$LLVM_INSTALLDIR/bin/clang'" \
--gin_bindings=llvm_size_path="'$LLVM_INSTALLDIR/bin/llvm-size'" \
--gin_files=compiler_opt/rl/inlining/gin_configs/ppo_nn_agent.gin \
--gin_bindings=train_eval.warmstart_policy_dir=\"$WARMSTART_OUTPUT_DIR/saved_policy\"
You may also start a tensorboard to monitor the training process with
tensorboard --logdir=$OUTPUT_DIR
Mainly check the reward_distribution section for the model performance. It includes the average reward and the percentile of the reward distributions during training. Positive reward means an improvement against the heuristic, and negative reward means a regression.
Optionally, if you are interested in seeing how the trained policy ($OUTPUT_DIR/saved_policy
)
performs on a given corpus (take the training corpus $CORPUS
as an example),
the following command line generates a csv-format report with 4 columns:
module_name, identifier (default
in inlining case), size under heuristic,
size under the trained policy at $OUTPUT_PERFORMANCE_PATH
.
export OUTPUT_PERFORMANCE_PATH=$HOME/performance_report && \
PYTHONPATH=$PYTHONPATH:. python3 \
compiler_opt/tools/generate_default_trace.py \
--data_path=$CORPUS \
--policy_path=$OUTPUT_DIR/saved_policy \
--output_performance_path=$OUTPUT_PERFORMANCE_PATH \
--gin_files=compiler_opt/rl/inlining/gin_configs/common.gin \
--gin_bindings=clang_path="'$LLVM_INSTALLDIR/bin/clang'" \
--gin_bindings=llvm_size_path="'$LLVM_INSTALLDIR/bin/llvm-size'" \
--sampling_rate=0.2
We need to build the 'release' mode of the compiler. Currently, that means
overwritting the model in llvm/lib/Analysis/models/inliner
.
cd $LLVM_SRCDIR
rm -rf llvm/lib/Analysis/models/inliner/*
cp -rf $OUTPUT_DIR/saved_policy/* llvm/lib/Analysis/models/inliner/
Setup the release build:
mkdir build-release
cd build-release
cmake -G Ninja \
-DLLVM_ENABLE_LTO=OFF \
-DLINUX_x86_64-unknown-linux-gnu_SYSROOT=${SYSROOT_DIR}/linux-x64 \
-DLINUX_aarch64-unknown-linux-gnu_SYSROOT=${SYSROOT_DIR}/linux-arm64 \
-DFUCHSIA_SDK=${IDK_DIR} \
-DCMAKE_INSTALL_PREFIX= \
-DTENSORFLOW_AOT_PATH=${TENSORFLOW_AOT_PATH} \
-C ${LLVM_SRCDIR}/clang/cmake/caches/Fuchsia-stage2.cmake \
${LLVM_SRCDIR}/llvm
export LLVM_INSTALLDIR_RELEASE=$LLVM_INSTALLDIR-release
ninja toolchain-distribution
DESTDIR=${LLVM_INSTALLDIR_RELEASE} ninja install-toolchain-distribution-stripped
cd ${FUCHSIA_SRCDIR}
python3 scripts/clang/generate_runtimes.py \
--clang-prefix=$LLVM_INSTALLDIR_RELEASE \
--sdk-dir=$IDK_DIR \
--build-id-dir=$LLVM_INSTALLDIR_RELEASE/lib/.build-id > $LLVM_INSTALLDIR_RELEASE/lib/runtime.json
NOTE 1: If you are using LLVM-at-head instead of an exact repro, there is an
additional flag -DLLVM_INLINER_MODEL_PATH=
that you need to set to the path to
your model. If you set the flag to download
, then the latest compatible model
release from github will be downloaded.
NOTE 2: The only flag specific to MLGO is TENSORFLOW_AOT_PATH
, which
replaces -C ${TFLITE_PATH}/tflite.cmake
used earlier.
cd ${FUCHSIA_SRCDIR}
fx set core.x64 \
--args='clang_prefix="${LLVM_INSTALLDIR_RELEASE}/bin"' \
--args='optimize="size"' \
--args=clang_ml_inliner=true
fx build
Fuchsia has a nice utility for comparing the sizes of binaries' text section,
however, it currently does so indiscriminately for all targets - including those
compiled with -O3
(which are unchanged). We can filter them out to get a -Oz
effect:
python3 -m pip install --user tabulate
scripts/compare_elf_sizes.py \
/tmp/orig_sizes.json \
out/default/elf_sizes.json \
--field code