Building, cross-building, testing, distributing LLVM and Clang and friends

History↓

Note This is a draft - there's more to come!

LLVM, Clang, and other LLVM sub-projects have a lot of build documentation. But it's spread in many places and, like I think many LLVM engineers do, I've built up local notes over the years on build configs I find particularly useful. This article aims to share all that - if there's anything particularly handy here that isn't in the LLVM documentation in so form, then of course it would be worth improving the upstream docs. I provide a bit of a mix of underlying concepts that I find helpful to have noted down as well as more specific "recipes".

As I don't operate under the same constraints of the upsteam documentation, I can avoid the requirement to be exhaustive about individual build options and just highlight things I tend to use (I hesitate to say "recommend" as YMMV and the problems you're trying to solve may be very different to mine). An additional motivation for writing this up and that putting something up on the internet is a great way to get people ttelling you where you've wrong or have missed a handy trick. Such suggestions are very welcome!

Note this is written against LLVM HEAD at the time of writing (or at least, the last time the article was re-reviewed), and my intent is to fix it up as LLVM HEAD evolves. If working with an older release, you may encounter different problems to the one described here.

Although the different cross-build and test run approach have different fidelity, typically it's helpful to have the ability to use multiple of these as when investigating an issue you might want to rule out something qemu-related.

All cross-build examples use RISC-V as an example, but there shouldn't be anything stopping you applying them to a different target.

Introductory documentation and where to go for further help

Although it might work for that, I'm not aiming to provide a "how to build LLVM for the first time" tutorial. See the upstream documentation for that. Additional sources for information on better understanding the LLVM build system would be:

Simple native build optimised for incremental development

Starting off with the build I'd typically use for iterative development, the configuration looks something like this:

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir -p build/default && cd build/default
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=True \
  -DLLVM_CCACHE_BUILD=True \
  -DBUILD_SHARED_LIBS=True \
  -DLLVM_USE_SPLIT_DWARF=True \
  -DLLVM_TARGETS_TO_BUILD="all" \
  -DLLVM_ENABLE_PROJECTS="clang;lld" \
  -DLLVM_ENABLE_RUNTIMES="compiler-rt" \
  -DLLVM_BUILD_TESTS=True \
  -DCOMPILER_RT_BUILD_SANITIZERS=False \
  -DCOMPILER_RT_INCLUDE_TESTS=True \
  -DLLVM_APPEND_VC_REV=False \
  ../../llvm
cmake --build .

Key things to note about this configuration:

Running tests

Simple native release + asserts build

This is still aimed just at local usage rather than anything ambitious like distribution. Incremental build time may be longer, but runtime performance of the produced binaries will be better (noticably so on large test suites).

cd ../
mkdir release && cd release
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE="Release" \
  -DLLVM_ENABLE_PROJECTS="clang;lld" \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DLLVM_CCACHE_BUILD=ON \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=True \
  -DLLVM_TARGETS_TO_BUILD="all" \
  -DLLVM_APPEND_VC_REV=False ../../llvm
  cmake --build .

If choosing to disable assertions, it may be worth evaluating -DLLVM_UNREACHABLE_OPTIMIZE=False which guarantees a trap for llvm_unreachable() rather than leaving it to be optimized as undefined behaviour if encountered.

Maxi native release+asserts build

Let's try to build and test everything (other than experimental backends). I provide a listing of how long the different check targets take, for reference. Note that we explicitly list projects and runtimes rather than relying on "all" for projects, because that list includes some projects that could be built using the runtimes build approach which is preferred if possible.

We leave off bolt due to test failures. Also cross-project-tests (need imp package, not in python 3.12+). Also offload and sanitizers.

mkdir -p build/release.max && cd build/release.max
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE="Release" \
  -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;lldb;mlir;polly" \
  -DLLVM_ENABLE_RUNTIMES="compiler-rt;libc;libcxx;libcxxabi;libunwind;openmp;pstl" \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DLLVM_CCACHE_BUILD=ON \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=True \
  -DLLVM_TARGETS_TO_BUILD="all" \
  -DLLVM_BUILD_TESTS=True \
  -DLLVM_APPEND_VC_REV=False \
  -DCOMPILER_RT_INCLUDE_TESTS=ON \
  -DCOMPILER_RT_BUILD_SANITIZERS=False \
  ../../llvm
  cmake --build .

As this is a single stage build with all compiles done on clang version 18.1.8 as packaged for Arch.

For reference, these are the rough timings for testing various subprojects (as measured with time ninja check-foo):

Multi-stage native bootstrap build

This is known as a bootstrap build, where the first stage is built using an existing system compiled and you then use the just-built compiler to build the second stage. When editing the CMake options, the most important part to understand is which variables are used across both builds, which only for stage 1, and which are only used for the second stage:

Enabling a bootstrap build is as simple as adding -DCLANG_ENABLE_BOOTSTRAP=On to CMake and then doing ninja stage2.

An example bootstrap build that builds X86-only in both cases is:

mkdir -p build/multistage && cd build/multistage
cmake -G Ninja -DCMAKE_BUILD_TYPE="Release" \
  -DCLANG_ENABLE_BOOTSTRAP=On \
  -DLLVM_ENABLE_PROJECTS="clang;lld" \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DLLVM_CCACHE_BUILD=ON \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=True \
  -DBOOTSTRAP_LLVM_ENABLE_LLD=True \
  -DLLVM_TARGETS_TO_BUILD="X86" \
  -DCLANG_BOOTSTRAP_PASSTHROUGH="LLVM_TARGETS_TO_BUILD;LLVM_ENABLE_ASSERTIONS" \
  -DLLVM_APPEND_VC_REV=False \
  ../../llvm
ninja stage2

Note that because the second stage is built by making a second CMake invocation you can't examine the full set of build commands in one step with e.g. ninja -n -v stage2 (you'll see the final commands run cmake in tools/clang/stage2-bins to kick off the second stage). Note that LLVM_ENABLE_LLD isn't one of the variables that gets passed through automatically, so you'll want to add it to CLANG_BOOTSTRAP_PASSTHROUGH or set BOOTSTRAP_LLVM_ENABLE_LLD (as I do above).

Single stage cross-build

For our first cross-compile we'll aim for a single-stage cross-compile using system clang. You'll need an appropriate sysroot for use with this (see the end of this article for notes on one way of doing this).

We use a toolchain file (CMake's way of specifying properties and options for a certain compiler) as it's easier to reuse across different invocations, and also some variables like CMAKE_{C,CXX}_COMPILER_TARGET may only be set from toolchain files. As you can likely guess, CMAKE_SYSROOT corresponds to --sysroot, CMAKE_{C,CXX}_COMPILER_TARGET to --target, CMAKE_LINKER_TYPE to -fuse-ld= and the CMAKE_{C,CXX}_FLAGS_INIT are other miscellaneous flags to be passed during compilation. The CMAKE_FIND_ROOT_PATH_MODE_ options control when the host vs the target environment are used to find binaries, libraries, includes, and packages.

mkdir -p build/cross && cd build/cross
cat - <<EOF > clang-riscv64-linux.cmake
set(CMAKE_SYSTEM_NAME Linux)

set(CMAKE_SYSROOT $HOME/rvsysroot)

set(CMAKE_C_COMPILER clang)
set(CMAKE_CXX_COMPILER clang++)

set(CMAKE_C_COMPILER_TARGET riscv64-linux-gnu)
set(CMAKE_CXX_COMPILER_TARGET riscv64-linux-gnu)
set(CMAKE_C_FLAGS_INIT "-march=rv64gc_zba_zbb_zbs -mabi=lp64d")
set(CMAKE_CXX_FLAGS_INIT "-march=rv64gc_zba_zbb_zbs -mabi=lp64d")

set(CMAKE_LINKER_TYPE LLD)

set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
EOF

cmake -G Ninja \
  -DCMAKE_TOOLCHAIN_FILE=$(pwd)/clang-riscv64-linux.cmake \
  -DCMAKE_BUILD_TYPE="Release" \
  -DLLVM_HOST_TRIPLE="riscv64-linux-gnu" \
  -DLLVM_ENABLE_PROJECTS="clang;lld" \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DLLVM_CCACHE_BUILD=ON \
  -DLLVM_TARGETS_TO_BUILD="RISCV" \
  -DCLANG_DISABLE_RUN_PYTHON_TESTS=True \
  -DLLVM_APPEND_VC_REV=False ../../llvm
cmake --build .

There's an important gotcha here. The basic problem relates to symlinks in the sysroot: GCC canonicalizes system headers in dependency files, so when ninja reads them it doesn't need to do so. Clang doesn't, and unfortunately ninja doesn't implement the canonicalisation logic at all. This means a dependency file generated during compilation might indicate /home/asb/rvsysroot/lib/gcc/riscv64-linux-gnu/14/../../../../include/c++/14/cstddef. Because /home/asb/rvsysroot/lib is actually a symlink to usr/lib, running realpath or otherwise canonicalizing the path gives you /home/asb/rvsysroot/usr/include/c++/14/cstddef. But ninja incorrectly assumes the referenced file is /home/asb/rvsysroot/include/c++/14/cstddef and because it can't find that file, marks the target that depends on it as dirty (you can see this with ninja -d explain) and rebuilds it. As essentially every file depends on a system header like this, roughly the entire build is considered dirty and re-done each time you invoke ninja. The issue has been open for ~7 years not unfortunately. So, how to address this? The simplest workaround that works is just to do ln -s usr/include include within the sysroot. That way, even though ninja resolves the path incorrectly, it's still able to find it. This is inelegant, but it's the workaround I've found easiest to apply. Alternative directions to fix this would be:

Assuming you have qemu-user for RISC-V installed and set up to run RISC-V binaries automatically through binfmt_misc (if you're on Arch, installing packages qemu-user-static and qemu-user-static-binfmt is sufficient) it's easy to run any cross-built binary. e.g QEMU_LD_PREFIX=$HOME/rvsysroot ./bin/clang --version.

It's actually possible, with some provisos, to run the ninja check-all target within this just cross-built tree, relying on qemu-user execution via binfmt_misc. The issues to be aware of:

There may be some tests that fail in this environment where it doesn't make sense to disable testing the whole component. In such scenarios, lit's selection options are your friend. TODO LLVM_LIT_ARGS and LIT_ARGS_DEFAULT discussion.

'Manual' multi-stage cross

You can essentially combine the kinds of approaches discussed above to do a multi-stage build that first builds a fresh compiler for the host, then uses that to cross-build. Doing it as below gives you plenty of direct control over each point without having to trace the logic for the all-in-one boostrap builds.

mkdir -p build/stage1cross && cd build/stage1cross
cmake -G Ninja \
  -DCMAKE_BUILD_TYPE="Release" \
  -DLLVM_ENABLE_PROJECTS="clang;lld" \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DLLVM_CCACHE_BUILD=ON \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DLLVM_ENABLE_LLD=True \
  -DLLVM_TARGETS_TO_BUILD="RISCV" \
  -DLLVM_APPEND_VC_REV=False \
  ../../llvm
cmake --build .

In the example below I haven't enabled ccache for the second stage build. You should think for your intended use case whether ccache for the second stage (which will be dirtied any time you rebuild the stage1 compiler) is helpful or harmful. If you rebuild the first stage each time you make a chance, ccache will be of no use.

Then to build the second stage (i.e. the cross-compiled binaries for the rv64 target):

mkdir -p build/stage2cross && cd build/stage2cross
cat - <<EOF > clang-riscv64-linux.cmake
set(CMAKE_SYSTEM_NAME Linux)

set(CMAKE_SYSROOT /home/asb/rvsysroot)

set(CMAKE_C_COMPILER $(pwd)/../stage1cross/bin/clang)
set(CMAKE_CXX_COMPILER $(pwd)/../stage1cross/bin/clang++)

set(CMAKE_C_COMPILER_TARGET riscv64-linux-gnu)
set(CMAKE_CXX_COMPILER_TARGET riscv64-linux-gnu)
set(CMAKE_C_FLAGS_INIT "-march=rv64gc_zba_zbb_zbs -mabi=lp64d")
set(CMAKE_CXX_FLAGS_INIT "-march=rv64gc_zba_zbb_zbs -mabi=lp64d")

set(CMAKE_LINKER_TYPE LLD)

set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
EOF

cmake -G Ninja \
  -DCMAKE_TOOLCHAIN_FILE=$(pwd)/clang-riscv64-linux.cmake \
  -DLLVM_NATIVE_TOOL_DIR=$(pwd)/../stage1cross/bin \
  -DCMAKE_BUILD_TYPE="Release" \
  -DLLVM_HOST_TRIPLE="riscv64-linux-gnu" \
  -DLLVM_ENABLE_PROJECTS="clang;lld" \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DLLVM_TARGETS_TO_BUILD="RISCV" \
  -DCLANG_DISABLE_RUN_PYTHON_TESTS=True \
  -DLLVM_APPEND_VC_REV=False \
  ../../llvm
cmake --build .

Note the use of LLVM_NATIVE_TOOL_DIR to reuse the stage1-built LLVM tools like llvm-tblgen. If you don't set that option, the tools will be built for you using the system compiler.

Appendix: Assembling your own Debian-based sysroot

DQIB is a great resource for getting a bootable image off the shelf, but scripting the process to assemble a Debian sysroot makes it a lot easier to update and customise as you need to add new build dependencies while retaining the ability to reproduce sysroot as needed.

First, install the depencies (assuming Arch Linux):

# Install needed packages on the host
sudo pacman -S --needed --noconfirm \
  debian-archive-keyring \
  qemu-base \
  qemu-system-riscv \
  guestfs-tools \
  qemu-user-static \
  qemu-user-static-binfmt
if ! pacman -Qi mmdebstrap > /dev/null; then
  rm -rf mmdebstrap
  git clone https://aur.archlinux.org/mmdebstrap.git
  cd mmdebstrap
  makepkg -si
  cd ..
fi
if ! pacman -Qi arch-test-bin > /dev/null; then
  rm -rf arch-test-bin
  git clone https://aur.archlinux.org/arch-test-bin.git
  cd arch-test-bin
  makepkg -si
  cd ..
fi

Something like the following will create both a sysroot, and an image you can boot in qemu order to build "natively" as it has all needed LLVM build dependencies. You must use fakeroot or similar if not running as root.

#!/bin/sh

# Arch standard PATH doesn't have /usr/sbin, which is needed for some of the
# chrooted commands below.
export PATH="$PATH:/usr/sbin"

# Initial bootstrap
rm -rf chroot kernel initrd rootfs.tar rootfs.qcow2
mkdir chroot
mmdebstrap --verbose \
  --architectures=riscv64 \
  --variant=required \
  --include=linux-image-riscv64 \
  --include=zstd \
  unstable \
  chroot/ \
  "deb [arch=riscv64] http://deb.debian.org/debian unstable main"

# Configuration / package install requiring network access
mv chroot/etc/resolv.conf chroot/etc/resolv.conf.bak
cat - <<EOF > chroot/etc/resolv.conf
nameserver 1.1.1.1
EOF
chroot chroot/ apt-get update
chroot chroot/ apt-get install -y \
  openssh-server \
  adduser \
  vim \
  git \
  wget \
  cmake \
  ninja-build \
  python3 \
  sudo \
  build-essential \
  net-tools \
  iputils-ping \
  python3-psutil \
  ccache \

# Install clang/lld from experimental
printf "deb https://deb.debian.org/debian experimental main\n" >> chroot/etc/apt/sources.list
chroot chroot/ apt-get update
chroot chroot/ apt-get install -t experimental -y \
  clang \
  lld

# Configuration not requiring network access
# 1) Network config
ln -s /dev/null chroot/etc/udev/rules.d/80-net-setup-link.rules # disable persistent network names
cat - <<EOF > chroot/etc/systemd/network/10-eth0.network
[Match]
Name=eth0

[Network]
DHCP=yes
EOF
chroot chroot/ systemctl enable systemd-networkd

# Add user, configure sudo
chroot chroot/ adduser --gecos ",,," --disabled-password asb
chroot chroot/ usermod -aG sudo asb
echo asb:asb | chroot chroot/ chpasswd
echo root:root | chroot chroot/ chpasswd

# Set hostname config properly
chroot chroot/ sed -i "/localhost/ s/$/ $HOSTNAME/" /etc/hosts

# Regenerate initramfs and final prep for boot
ln -sf /dev/null chroot/etc/systemd/system/serial-getty@hvc0.service
chroot chroot/ update-initramfs -k all -c

# Create .tar.gz and disk image, and extract kernel+initrd.
tar -c -S -f rootfs.tar -C chroot/ .
ln -L chroot/vmlinuz kernel
ln -L chroot/initrd.img initrd
virt-make-fs --format=qcow2 --size=250GiB --partition=gpt --type=xfs --label=rootfs rootfs.tar rootfs.qcow2

# Make runvm.sh script
cat - <<\EOF > runvm.sh
#!/bin/sh
# RVA20
EXTRA_OPTS="zfa=false,zba=false,zbb=false,zbc=false,zbs=false"
# RVA23
#EXTRA_OPTS="zba=true,zbb=true,zbc=false,zbs=true,zfhmin=true,v=true,vext_spec=v1.0,zkt=true,zvfhmin=true,zvbb=true,zvkt=true,zihintntl=true,zicond=true,zcb=true,zfa=true,zawrs=true,rvv_ta_all_1s=true,rvv_ma_all_1s=true"

qemu-system-riscv64 \
  -machine virt \
  -cpu rv64,$EXTRA_OPTS \
  -smp 32 \
  -m 64G \
  -device virtio-blk-device,drive=hd \
  -drive file=rootfs.qcow2,if=none,id=hd \
  -device virtio-net-device,netdev=net \
  -netdev user,id=net,hostfwd=tcp:127.0.0.1:10222-:22 \
  -bios /usr/share/qemu/opensbi-riscv64-generic-fw_dynamic.bin \
  -kernel kernel \
  -initrd initrd \
  -object rng-random,filename=/dev/urandom,id=rng \
  -device virtio-rng-device,rng=rng \
  -nographic \
  -append "rw noquiet root=LABEL=rootfs console=ttyS0"
EOF
chmod +x runvm.sh

# Fix permissions on created files
chown $(stat -c '%U:%G' .) rootfs.qcow2 runvm.sh

Article changelog