LLVM 17 was released in the past few weeks, and I'm continuing the tradition of writing up some selective highlights of what's new as far as RISC-V is concerned in this release. If you want more general, regular updates on what's going on in LLVM you should of course subscribe to my newsletter.
In case you're not familiar with LLVM's release schedule, it's worth noting that there are two major LLVM releases a year (i.e. one roughly every 6 months) and these are timed releases as opposed to being cut when a pre-agreed set of feature targets have been met. We're very fortunate to benefit from an active and growing set of contributors working on RISC-V support in LLVM projects, who are responsible for the work I describe below - thank you! I coordinate biweekly sync-up calls for RISC-V LLVM contributors, so if you're working in this area please consider dropping in.
A family of extensions referred to as the RISC-V code size reduction
extensions
was ratified earlier this year. One aspect of this is providing ways of
referring to subsets of the standard compressed 'C' (16-bit instructions)
extension that don't include floating point loads/stores, as well as other
variants. But the more meaningful additions are the Zcmp
and Zcmt
extensions, in both cases targeted at embedded rather than application cores,
reusing encodings for double-precision FP store.
Zcmp
provides instructions that implement common stack frame manipulation
operations that would typically require a sequence of instructions, as well as
instructions for moving pairs of registers. The RISCVMoveMerger
pass
performs the necessary peephole optimisation to produce cm.mva01s
or
cm.mvsa01
instructions for moving to/from registers a0-a1 and s0-s7 when
possible. It iterates over generated machine instructions, looking for pairs
of c.mv
instructions that can be replaced. cm.push
and cm.pop
instructions are generated by appropriate modifications to the RISC-V function
frame lowering code, while the RISCVPushPopOptimizer
pass
looks for opportunities to convert a cm.pop
into a cm.popretz
(pop
registers, deallocate stack frame, and return zero) or cm.popret
(pop
registers, deallocate stack frame, and return).
Zcmt
provides the cm.jt
and cm.jalt
instructions to reduce code size
needed for implemented a jump table. Although support is present in the
assembler, the patch to modify the linker to select these instructions is
still under review so we can hope to see full support in LLVM 18.
The RISC-V code size reduction working group have estimates of the code size impact of these extensions produced using this analysis script. I'm not aware of whether a comparison has been made to the real-world results of implementing support for the extensions in LLVM, but that would certainly be interesting.
LLVM has two forms of auto-vectorization, the loop vectorizer and the SLP (superword-level parallelism) vectorizer. The loop vectorizer was enabled during the LLVM 16 development cycle, while the SLP vectorizer was enabled for this release. Beyond that, there's been a huge number of incremental improvements for vector codegen such that isn't always easy to pick out particular highlights. But to pick a small set of changes:
vsetivli
instruction that is used to
modify the vtype
control register.LMUL
in the RISC-V vector extension controls
grouping of vector registers, for instance rather than 32 vector registers,
you might want to set LMUL=4 to treat them as 8 registers that are 4 times
as large. The "best" LMUL is going to vary depending on both the target
microarchitecture and factors such as register pressure, but a change was
made so LMUL=2 is the new
default.LMUL
(register grouping) for RISC-V, however in the
case of the immediate forms of vsetvl
occuring in the input, LMUL
can be
statically determined.If you want to find out more about RISC-V vector support in LLVM, be sure to check out my Igalia colleague Luke Lau's talk at the LLVM Dev Meeting this week (I'll update this article when slides+recording are available).
It wouldn't be a RISC-V article without a list of hard to interpret strings that claim to be ISA extension names (Zvfbfwma is a real extension, I promise!). In addition to the code size reduction extension listed above there's been lots of newly added or updated extensions in this release cycle. Do refer to the RISCVUsage documentation for something that aims to be a complete list of what is supported (occasionally there are omissions) as well as clarity on what we mean by an extension being marked as "experimental".
Here's a partial list:
It landed after the 17.x branch so isn't in this release, but in the future
you'll be able to use --print-supported-extensions
with Clang to have it
print a table of supported ISA extensions (the same flag has now been
implemented for Arm and AArch64 too).
As always, it's not possible to go into detail on every change. A selection of other changes that I'm not able to delve into more detail on:
CONFIG_CFI_CLANG
in the Linux tree) but the target-specific parts were
previously unimplemented for RISC-V. This gap was
filled for the
LLVM 17 release.memcmp
, bcmp
, memset
, and memcpy
all gained optimised RISC-V
specific versions. There will of course be further updates for LLVM 18,
including the work from my colleague Mikhail R Gadelha on 32-bit RISC-V
support.Apologies if I've missed your favourite new feature or improvement - the LLVM release notes will include some things I haven't had space for here. Thanks again for everyone who has been contributing to make the RISC-V in LLVM even better.
If you have a RISC-V project you think me and my colleagues and at Igalia may be able to help with, then do get in touch regarding our services.