The split utility operates on an LLVM bitcode file, splitting all of the
functions in it into separate (.bc
) files.
Usage:
./split-llvm-extract <program-to-split>.bc -o <output-dir>
You can read about the LLVM splitter project in the CapableVMs documentation repository and in this Google Doc.
The minimum supported LLVM version is 13.
To build the split
utility,
run make split-llvm-extract
. By default, this will build the splitter using
the clang and LLVM libraries installed on your system.
If you built LLVM from source, set CXX
, LLVM_CONFIG
, LLVM_LINK
and
LLVM_EXTRACT
, and then run:
make CXX=$CXX \
LLVM_CONFIG=$LLVM_CONFIG \
LLVM_LINK=$LLVM_LINK \
LLVM_EXTRACT=$LLVM_EXTRACT \
split-llvm-extract
For example, if you built CHERI
LLVM using cheribuild
, set
CHERI
to your cheribuild
installation directory (which is
~/cheri/output/<SDK>
by default) and run:
make CXX=$CHERI/bin/clang++ \
LLVM_CONFIG=$CHERI/bin/llvm-config \
LLVM_LINK=$CHERI/bin/llvm-link \
LLVM_EXTRACT=$CHERI/bin/llvm-extract \
split-llvm-extract
Steps specific to CHERI cross-compilation can be found in .buildbot.sh.
Once you have built the split utility, you will need to build the programs you wish to apply it to. Each input program needs to be compiled to a single LLVM bitcode file.
Obtaining the final "fat" LLVM bitcode file can, theoretically, be done in
multiple ways. The recommended way is to use CapableVMs fork of "Go Whole
Program LLVM", gllvm
. For example, to get the bitcode from a generic Makefile
project:
-
Install
gllvm
:go get github.com/capablevms/gllvm/cmd/...
-
Build with
gllvm
:
GCLANG=$HOME/go/bin/gclang
export GET_BC=$HOME/go/bin/get-bc
CC=$GCLANG LLVM_COMPILER_PATH=<path_to_llvm>/bin make
- Extract the bitcode:
Finally, the split utility can be launched with:
./split-llvm-extract path/to/binary.bc -o outdir
To run the version of the binary just split, it is necessary to join the parts
(in outdir
) together as shared libraries. A "joiner" Makefile can be found
here.
There are 2 other utilities that reside in this repository:
- manual-split.cpp, which shows how splitting can be done
using the LLVM API instead of using
llvm-extract
. - find-and-split-static.cpp, which focuses on finding functions which are not public and should be. It also tries to find ways to split the bitcode files while preserving the linkage of the functions.
These splitters are not meant for general-purpose use, and are kept as a showcase of what is possible.