Note This is a draft - there's more to come!
LLVM, Clang, and other LLVM sub-projects have a lot of build documentation. But it's spread in many places and, like I think many LLVM engineers do, I've built up local notes over the years on build configs I find particularly useful. This article aims to share all that - if there's anything particularly handy here that isn't in the LLVM documentation in so form, then of course it would be worth improving the upstream docs. I provide a bit of a mix of underlying concepts that I find helpful to have noted down as well as more specific "recipes".
As I don't operate under the same constraints of the upsteam documentation, I can avoid the requirement to be exhaustive about individual build options and just highlight things I tend to use (I hesitate to say "recommend" as YMMV and the problems you're trying to solve may be very different to mine). An additional motivation for writing this up and that putting something up on the internet is a great way to get people ttelling you where you've wrong or have missed a handy trick. Such suggestions are very welcome!
Note this is written against LLVM HEAD at the time of writing (or at least, the last time the article was re-reviewed), and my intent is to fix it up as LLVM HEAD evolves. If working with an older release, you may encounter different problems to the one described here.
Although the different cross-build and test run approach have different fidelity, typically it's helpful to have the ability to use multiple of these as when investigating an issue you might want to rule out something qemu-related.
All cross-build examples use RISC-V as an example, but there shouldn't be anything stopping you applying them to a different target.
Although it might work for that, I'm not aiming to provide a "how to build LLVM for the first time" tutorial. See the upstream documentation for that. Additional sources for information on better understanding the LLVM build system would be:
CMakeLists.txt
and *.cmake
files!).zorg/buildbot/builders/*.py
, with the builder configs using these recipes
in buildbot/osuosl/master/config/builders.py:
.Starting off with the build I'd typically use for iterative development, the configuration looks something like this:
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir -p build/default && cd build/default
cmake -G Ninja \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_ENABLE_LLD=True \
-DLLVM_CCACHE_BUILD=True \
-DBUILD_SHARED_LIBS=True \
-DLLVM_USE_SPLIT_DWARF=True \
-DLLVM_TARGETS_TO_BUILD="all" \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_RUNTIMES="compiler-rt" \
-DLLVM_BUILD_TESTS=True \
-DCOMPILER_RT_BUILD_SANITIZERS=False \
-DCOMPILER_RT_INCLUDE_TESTS=True \
-DLLVM_APPEND_VC_REV=False \
../../llvm
cmake --build .
Key things to note about this configuration:
LLVM_ENABLE_EXPENSIVE_CHECKS
as it's not clear it's worth the extra
runtime cost for the type of changes I typically make.LLVM_APPEND_VC_REV
to reduce relinking.LLVM_OPTIMIZED_TABLEGEN
can be used to build and use a release mode tblgen
binary, at the cost of losing out on the debug checks that would otherwise
be done. I lack the discipline to remember to rebuild with a debug tablegen
whenever making changes that might impact tablegen invariants (and of
course, bugs often occur when you don't predict them!) so I don't find this
a good option in general day to day development. In my ideal world, I'd
build with this option and there would be a separate build target to check
that the debug build of tablegen runs successfully to produce all
tablegenerated output that can be invoked when needed.LLVM_BUILD_TESTS
) is handy both
because it's easy to miss needed changes in them otherwise if changing any
APIs the tests use, and as I often use llvm-lit
directly rather than
invoking check targets via ninja, so rely on unittests being already built
and ready to run.-gsplit-dwarf
can't currently be used with RISC-V linker relaxation.This is still aimed just at local usage rather than anything ambitious like distribution. Incremental build time may be longer, but runtime performance of the produced binaries will be better (noticably so on large test suites).
cd ../
mkdir release && cd release
cmake -G Ninja \
-DCMAKE_BUILD_TYPE="Release" \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_CCACHE_BUILD=ON \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_ENABLE_LLD=True \
-DLLVM_TARGETS_TO_BUILD="all" \
-DLLVM_APPEND_VC_REV=False ../../llvm
cmake --build .
If choosing to disable assertions, it may be worth evaluating
-DLLVM_UNREACHABLE_OPTIMIZE=False
which guarantees a trap for
llvm_unreachable()
rather than leaving it to be optimized as undefined
behaviour if encountered.
Let's try to build and test everything (other than experimental backends). I provide a listing of how long the different check targets take, for reference. Note that we explicitly list projects and runtimes rather than relying on "all" for projects, because that list includes some projects that could be built using the runtimes build approach which is preferred if possible.
We leave off bolt due to test failures. Also cross-project-tests (need imp package, not in python 3.12+). Also offload and sanitizers.
mkdir -p build/release.max && cd build/release.max
cmake -G Ninja \
-DCMAKE_BUILD_TYPE="Release" \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;lldb;mlir;polly" \
-DLLVM_ENABLE_RUNTIMES="compiler-rt;libc;libcxx;libcxxabi;libunwind;openmp;pstl" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_CCACHE_BUILD=ON \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_ENABLE_LLD=True \
-DLLVM_TARGETS_TO_BUILD="all" \
-DLLVM_BUILD_TESTS=True \
-DLLVM_APPEND_VC_REV=False \
-DCOMPILER_RT_INCLUDE_TESTS=ON \
-DCOMPILER_RT_BUILD_SANITIZERS=False \
../../llvm
cmake --build .
As this is a single stage build with all compiles done on clang version 18.1.8 as packaged for Arch.
For reference, these are the rough timings for testing various subprojects (as
measured with time ninja check-foo
):
check-clangd
, check-clang-extra
, check-clang-include-cleaner
check-all
?check-*
above.This is known as a bootstrap build, where the first stage is built using an existing system compiled and you then use the just-built compiler to build the second stage. When editing the CMake options, the most important part to understand is which variables are used across both builds, which only for stage 1, and which are only used for the second stage:
_BOOTSTRAP_DEFAULT_PASSTHROUGH
in clang/CMakeLists.txt.CLANG_BOOTSTRAP_PASSTHROUGH
.BOOTSTRAP_
will be passed through to the second
stage. e.g. BOOTSTRAP_FOO
will set FOO
in the stage 2 build.Enabling a bootstrap build is as simple as adding
-DCLANG_ENABLE_BOOTSTRAP=On
to CMake and then doing ninja stage2
.
An example bootstrap build that builds X86-only in both cases is:
mkdir -p build/multistage && cd build/multistage
cmake -G Ninja -DCMAKE_BUILD_TYPE="Release" \
-DCLANG_ENABLE_BOOTSTRAP=On \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_CCACHE_BUILD=ON \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_ENABLE_LLD=True \
-DBOOTSTRAP_LLVM_ENABLE_LLD=True \
-DLLVM_TARGETS_TO_BUILD="X86" \
-DCLANG_BOOTSTRAP_PASSTHROUGH="LLVM_TARGETS_TO_BUILD;LLVM_ENABLE_ASSERTIONS" \
-DLLVM_APPEND_VC_REV=False \
../../llvm
ninja stage2
Note that because the second stage is built by making a second CMake
invocation you can't examine the full set of build commands in one step with
e.g. ninja -n -v stage2
(you'll see the final commands run cmake in
tools/clang/stage2-bins to kick off the second stage). Note that
LLVM_ENABLE_LLD
isn't one of the variables that gets passed through
automatically, so you'll want to add it to CLANG_BOOTSTRAP_PASSTHROUGH
or
set BOOTSTRAP_LLVM_ENABLE_LLD
(as I do above).
For our first cross-compile we'll aim for a single-stage cross-compile using system clang. You'll need an appropriate sysroot for use with this (see the end of this article for notes on one way of doing this).
We use a toolchain
file
(CMake's way of specifying properties and options for a certain compiler) as
it's easier to reuse across different invocations, and also some variables
like CMAKE_{C,CXX}_COMPILER_TARGET
may only be set from toolchain files. As
you can likely guess, CMAKE_SYSROOT
corresponds to --sysroot
,
CMAKE_{C,CXX}_COMPILER_TARGET
to --target
, CMAKE_LINKER_TYPE
to
-fuse-ld=
and the CMAKE_{C,CXX}_FLAGS_INIT
are other miscellaneous flags
to be passed during compilation. The CMAKE_FIND_ROOT_PATH_MODE_
options
control when the host vs the target environment are used to find binaries,
libraries, includes, and packages.
mkdir -p build/cross && cd build/cross
cat - <<EOF > clang-riscv64-linux.cmake
set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSROOT $HOME/rvsysroot)
set(CMAKE_C_COMPILER clang)
set(CMAKE_CXX_COMPILER clang++)
set(CMAKE_C_COMPILER_TARGET riscv64-linux-gnu)
set(CMAKE_CXX_COMPILER_TARGET riscv64-linux-gnu)
set(CMAKE_C_FLAGS_INIT "-march=rv64gc_zba_zbb_zbs -mabi=lp64d")
set(CMAKE_CXX_FLAGS_INIT "-march=rv64gc_zba_zbb_zbs -mabi=lp64d")
set(CMAKE_LINKER_TYPE LLD)
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
EOF
cmake -G Ninja \
-DCMAKE_TOOLCHAIN_FILE=$(pwd)/clang-riscv64-linux.cmake \
-DCMAKE_BUILD_TYPE="Release" \
-DLLVM_HOST_TRIPLE="riscv64-linux-gnu" \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_CCACHE_BUILD=ON \
-DLLVM_TARGETS_TO_BUILD="RISCV" \
-DCLANG_DISABLE_RUN_PYTHON_TESTS=True \
-DLLVM_APPEND_VC_REV=False ../../llvm
cmake --build .
There's an important gotcha here. The basic problem relates to symlinks in the sysroot:
GCC canonicalizes system headers in dependency
files, so when ninja reads them it doesn't need to do so. Clang doesn't, and
unfortunately ninja doesn't implement the canonicalisation logic at all. This
means a dependency file generated during compilation might indicate
/home/asb/rvsysroot/lib/gcc/riscv64-linux-gnu/14/../../../../include/c++/14/cstddef
.
Because /home/asb/rvsysroot/lib
is actually a symlink to usr/lib
, running
realpath
or otherwise canonicalizing the path gives you
/home/asb/rvsysroot/usr/include/c++/14/cstddef
. But ninja incorrectly
assumes the referenced file is /home/asb/rvsysroot/include/c++/14/cstddef
and because it can't find that file, marks the target that depends on it as
dirty (you can see this with ninja -d explain
) and rebuilds it. As
essentially every file depends on a system header like this, roughly the
entire build is considered dirty and re-done each time you invoke ninja
. The
issue has been open for ~7
years not unfortunately. So, how to address this? The simplest workaround that
works is just to do ln -s usr/include include
within the sysroot. That way,
even though ninja resolves the path incorrectly, it's still able to find it.
This is inelegant, but it's the workaround I've found easiest to apply.
Alternative directions to fix this would be:
-MMD
flag enables this behaviour.
Unfortunately it's not very easy to control the flags CMake uses to get
dependency files. CMAKE_DEPFILE_FLAGS_{C,XXX}
exist but are undocumented
variables for internal CMake use and I wasn't able to override them either
on the command line or in a CMake toolchain file.Assuming you have qemu-user for RISC-V installed and set up to run RISC-V
binaries automatically through binfmt_misc
(if you're on Arch, installing
packages qemu-user-static
and qemu-user-static-binfmt
is sufficient) it's
easy to run any cross-built binary. e.g QEMU_LD_PREFIX=$HOME/rvsysroot ./bin/clang --version
.
It's actually possible, with some provisos, to run the ninja check-all
target within this just cross-built tree, relying on qemu-user execution via
binfmt_misc. The issues to be aware of:
export QEMU_LD_PREFIX=/path/to/your/sysroot
.lit
.check-clang-python
target fails when it tries to run the unit tests
for the Clang Python bindings as it's trying to run these tests using the
host python executable. There are two options here:
python
binary from your sysroot, relying on binfmt_misc
and qemu-user.
You can do this with
-DPython3_EXECUTABLE="$HOME/rvsysroot/usr/bin/python3.12"
but it's not a
solution I'd recommend.-DCLANG_DISABLE_RUN_PYTHON_TESTS=True
, provided you have my patch to
allow this (which
again, hopefully lands soon).There may be some tests that fail in this environment where it doesn't make
sense to disable testing the whole component. In such scenarios, lit
's
selection
options are
your friend. TODO LLVM_LIT_ARGS and LIT_ARGS_DEFAULT discussion.
You can essentially combine the kinds of approaches discussed above to do a multi-stage build that first builds a fresh compiler for the host, then uses that to cross-build. Doing it as below gives you plenty of direct control over each point without having to trace the logic for the all-in-one boostrap builds.
mkdir -p build/stage1cross && cd build/stage1cross
cmake -G Ninja \
-DCMAKE_BUILD_TYPE="Release" \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_CCACHE_BUILD=ON \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_ENABLE_LLD=True \
-DLLVM_TARGETS_TO_BUILD="RISCV" \
-DLLVM_APPEND_VC_REV=False \
../../llvm
cmake --build .
In the example below I haven't enabled ccache
for the second stage build.
You should think for your intended use case whether ccache for the second
stage (which will be dirtied any time you rebuild the stage1 compiler) is
helpful or harmful. If you rebuild the first stage each time you make a
chance, ccache will be of no use.
Then to build the second stage (i.e. the cross-compiled binaries for the rv64 target):
mkdir -p build/stage2cross && cd build/stage2cross
cat - <<EOF > clang-riscv64-linux.cmake
set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSROOT /home/asb/rvsysroot)
set(CMAKE_C_COMPILER $(pwd)/../stage1cross/bin/clang)
set(CMAKE_CXX_COMPILER $(pwd)/../stage1cross/bin/clang++)
set(CMAKE_C_COMPILER_TARGET riscv64-linux-gnu)
set(CMAKE_CXX_COMPILER_TARGET riscv64-linux-gnu)
set(CMAKE_C_FLAGS_INIT "-march=rv64gc_zba_zbb_zbs -mabi=lp64d")
set(CMAKE_CXX_FLAGS_INIT "-march=rv64gc_zba_zbb_zbs -mabi=lp64d")
set(CMAKE_LINKER_TYPE LLD)
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
EOF
cmake -G Ninja \
-DCMAKE_TOOLCHAIN_FILE=$(pwd)/clang-riscv64-linux.cmake \
-DLLVM_NATIVE_TOOL_DIR=$(pwd)/../stage1cross/bin \
-DCMAKE_BUILD_TYPE="Release" \
-DLLVM_HOST_TRIPLE="riscv64-linux-gnu" \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_TARGETS_TO_BUILD="RISCV" \
-DCLANG_DISABLE_RUN_PYTHON_TESTS=True \
-DLLVM_APPEND_VC_REV=False \
../../llvm
cmake --build .
Note the use of LLVM_NATIVE_TOOL_DIR
to reuse the stage1-built LLVM tools
like llvm-tblgen
. If you don't set that option, the tools will be built for
you using the system compiler.
DQIB is a great resource for getting a bootable image off the shelf, but scripting the process to assemble a Debian sysroot makes it a lot easier to update and customise as you need to add new build dependencies while retaining the ability to reproduce sysroot as needed.
First, install the depencies (assuming Arch Linux):
# Install needed packages on the host
sudo pacman -S --needed --noconfirm \
debian-archive-keyring \
qemu-base \
qemu-system-riscv \
guestfs-tools \
qemu-user-static \
qemu-user-static-binfmt
if ! pacman -Qi mmdebstrap > /dev/null; then
rm -rf mmdebstrap
git clone https://aur.archlinux.org/mmdebstrap.git
cd mmdebstrap
makepkg -si
cd ..
fi
if ! pacman -Qi arch-test-bin > /dev/null; then
rm -rf arch-test-bin
git clone https://aur.archlinux.org/arch-test-bin.git
cd arch-test-bin
makepkg -si
cd ..
fi
Something like the following will create both a sysroot, and an image you can boot in qemu order to build "natively" as it has all needed LLVM build dependencies. You must use fakeroot or similar if not running as root.
#!/bin/sh
# Arch standard PATH doesn't have /usr/sbin, which is needed for some of the
# chrooted commands below.
export PATH="$PATH:/usr/sbin"
# Initial bootstrap
rm -rf chroot kernel initrd rootfs.tar rootfs.qcow2
mkdir chroot
mmdebstrap --verbose \
--architectures=riscv64 \
--variant=required \
--include=linux-image-riscv64 \
--include=zstd \
unstable \
chroot/ \
"deb [arch=riscv64] http://deb.debian.org/debian unstable main"
# Configuration / package install requiring network access
mv chroot/etc/resolv.conf chroot/etc/resolv.conf.bak
cat - <<EOF > chroot/etc/resolv.conf
nameserver 1.1.1.1
EOF
chroot chroot/ apt-get update
chroot chroot/ apt-get install -y \
openssh-server \
adduser \
vim \
git \
wget \
cmake \
ninja-build \
python3 \
sudo \
build-essential \
net-tools \
iputils-ping \
python3-psutil \
ccache \
# Install clang/lld from experimental
printf "deb https://deb.debian.org/debian experimental main\n" >> chroot/etc/apt/sources.list
chroot chroot/ apt-get update
chroot chroot/ apt-get install -t experimental -y \
clang \
lld
# Configuration not requiring network access
# 1) Network config
ln -s /dev/null chroot/etc/udev/rules.d/80-net-setup-link.rules # disable persistent network names
cat - <<EOF > chroot/etc/systemd/network/10-eth0.network
[Match]
Name=eth0
[Network]
DHCP=yes
EOF
chroot chroot/ systemctl enable systemd-networkd
# Add user, configure sudo
chroot chroot/ adduser --gecos ",,," --disabled-password asb
chroot chroot/ usermod -aG sudo asb
echo asb:asb | chroot chroot/ chpasswd
echo root:root | chroot chroot/ chpasswd
# Set hostname config properly
chroot chroot/ sed -i "/localhost/ s/$/ $HOSTNAME/" /etc/hosts
# Regenerate initramfs and final prep for boot
ln -sf /dev/null chroot/etc/systemd/system/serial-getty@hvc0.service
chroot chroot/ update-initramfs -k all -c
# Create .tar.gz and disk image, and extract kernel+initrd.
tar -c -S -f rootfs.tar -C chroot/ .
ln -L chroot/vmlinuz kernel
ln -L chroot/initrd.img initrd
virt-make-fs --format=qcow2 --size=250GiB --partition=gpt --type=xfs --label=rootfs rootfs.tar rootfs.qcow2
# Make runvm.sh script
cat - <<\EOF > runvm.sh
#!/bin/sh
# RVA20
EXTRA_OPTS="zfa=false,zba=false,zbb=false,zbc=false,zbs=false"
# RVA23
#EXTRA_OPTS="zba=true,zbb=true,zbc=false,zbs=true,zfhmin=true,v=true,vext_spec=v1.0,zkt=true,zvfhmin=true,zvbb=true,zvkt=true,zihintntl=true,zicond=true,zcb=true,zfa=true,zawrs=true,rvv_ta_all_1s=true,rvv_ma_all_1s=true"
qemu-system-riscv64 \
-machine virt \
-cpu rv64,$EXTRA_OPTS \
-smp 32 \
-m 64G \
-device virtio-blk-device,drive=hd \
-drive file=rootfs.qcow2,if=none,id=hd \
-device virtio-net-device,netdev=net \
-netdev user,id=net,hostfwd=tcp:127.0.0.1:10222-:22 \
-bios /usr/share/qemu/opensbi-riscv64-generic-fw_dynamic.bin \
-kernel kernel \
-initrd initrd \
-object rng-random,filename=/dev/urandom,id=rng \
-device virtio-rng-device,rng=rng \
-nographic \
-append "rw noquiet root=LABEL=rootfs console=ttyS0"
EOF
chmod +x runvm.sh
# Fix permissions on created files
chown $(stat -c '%U:%G' .) rootfs.qcow2 runvm.sh
cmake -LH cmake -LAH Inspecting ninja targets