What's new for RISC-V in LLVM 17

2023Q4.

LLVM 17 was released in the past few weeks, and I'm continuing the tradition of writing up some selective highlights of what's new as far as RISC-V is concerned in this release. If you want more general, regular updates on what's going on in LLVM you should of course subscribe to my newsletter.

In case you're not familiar with LLVM's release schedule, it's worth noting that there are two major LLVM releases a year (i.e. one roughly every 6 months) and these are timed releases as opposed to being cut when a pre-agreed set of feature targets have been met. We're very fortunate to benefit from an active and growing set of contributors working on RISC-V support in LLVM projects, who are responsible for the work I describe below - thank you! I coordinate biweekly sync-up calls for RISC-V LLVM contributors, so if you're working in this area please consider dropping in.

Code size reduction extensions

A family of extensions referred to as the RISC-V code size reduction extensions was ratified earlier this year. One aspect of this is providing ways of referring to subsets of the standard compressed 'C' (16-bit instructions) extension that don't include floating point loads/stores, as well as other variants. But the more meaningful additions are the Zcmp and Zcmt extensions, in both cases targeted at embedded rather than application cores, reusing encodings for double-precision FP store.

Zcmp provides instructions that implement common stack frame manipulation operations that would typically require a sequence of instructions, as well as instructions for moving pairs of registers. The RISCVMoveMerger pass performs the necessary peephole optimisation to produce cm.mva01s or cm.mvsa01 instructions for moving to/from registers a0-a1 and s0-s7 when possible. It iterates over generated machine instructions, looking for pairs of c.mv instructions that can be replaced. cm.push and cm.pop instructions are generated by appropriate modifications to the RISC-V function frame lowering code, while the RISCVPushPopOptimizer pass looks for opportunities to convert a cm.pop into a cm.popretz (pop registers, deallocate stack frame, and return zero) or cm.popret (pop registers, deallocate stack frame, and return).

Zcmt provides the cm.jt and cm.jalt instructions to reduce code size needed for implemented a jump table. Although support is present in the assembler, the patch to modify the linker to select these instructions is still under review so we can hope to see full support in LLVM 18.

The RISC-V code size reduction working group have estimates of the code size impact of these extensions produced using this analysis script. I'm not aware of whether a comparison has been made to the real-world results of implementing support for the extensions in LLVM, but that would certainly be interesting.

Vectorization

LLVM has two forms of auto-vectorization, the loop vectorizer and the SLP (superword-level parallelism) vectorizer. The loop vectorizer was enabled during the LLVM 16 development cycle, while the SLP vectorizer was enabled for this release. Beyond that, there's been a huge number of incremental improvements for vector codegen such that isn't always easy to pick out particular highlights. But to pick a small set of changes:

Version 0.12 of the RISC-V vector C intrinsics specification is now supported by Clang. As noted in the release notes, the hope is there will not be new incompatibilities introduced prior to v1.0.
There were lots of minor codegen improvements, one example would be improvements to the RISCVInsertVSETVLI pass to avoid additional unnecessary insertions vsetivli instruction that is used to modify the vtype control register.
It's not particularly user visible, but there was a lot of refactoring of vector pseudoinstructions used internally during instruction selection (following this thread. The added documentation will likely be helpful if you're hoping to better understand this.
You might be aware that LMUL in the RISC-V vector extension controls grouping of vector registers, for instance rather than 32 vector registers, you might want to set LMUL=4 to treat them as 8 registers that are 4 times as large. The "best" LMUL is going to vary depending on both the target microarchitecture and factors such as register pressure, but a change was made so LMUL=2 is the new default.
llvm-mca (the LLVM Machine Code Analyzer) is a performance analysis tool that uses information such as LLVM scheduling models to statically estimate the performance of machine code on a specific CPU. There were at least two changes relevant to llvm-mca and RISC-V vector support: scheduling information for RVV on SiFive7 cores (which of course is used outside of llvm-mca as well), and support for vsetivli/vsetvli 'instruments'. llvm-mca has the concept of an 'instrument region', a section of assembly with an LLVM-MCA comment that can (for instance) indicate the value of a control register that would affect scheduling. This can be used to set LMUL (register grouping) for RISC-V, however in the case of the immediate forms of vsetvl occuring in the input, LMUL can be statically determined.

If you want to find out more about RISC-V vector support in LLVM, be sure to check out my Igalia colleague Luke Lau's talk at the LLVM Dev Meeting this week (I'll update this article when slides+recording are available).

Other ISA extensions

It wouldn't be a RISC-V article without a list of hard to interpret strings that claim to be ISA extension names (Zvfbfwma is a real extension, I promise!). In addition to the code size reduction extension listed above there's been lots of newly added or updated extensions in this release cycle. Do refer to the RISCVUsage documentation for something that aims to be a complete list of what is supported (occasionally there are omissions) as well as clarity on what we mean by an extension being marked as "experimental".

Here's a partial list:

Code generation support for the Zfinx, Zdinx, Zhinx, and Zhinxmin extensions. These extensions provide support for single, double, and half precision floating point instructions respectively, but define them to operate on the general purpose register file rather than requiring an additional floating point register file. This reduces implementation cost on simple core designs.
Support for a whole range of vendor-defined extensions. e.g. XTHeadBa (address gneeration), XTheadBb (basic bit manipulation), Xsfvcp (SiFive VCIX), XCVbitmanip (CORE-V bit manipulation custom instructions) and many more (see the release notes.
Experimental vector crypto extension support was updated to version 0.5.1 of the specification.
Experimental support was added for version 0.2 of the Zfa extension (providing additional floating-point instructions).
Assembler/disassembler support for an experimental family of extensions to support operations on the bfloat16 floating-point format. Zfbfmin, Zvfbfmin, and Zvfbfwma.
Assembler/disassembler support for the experimental Zacas extension (atomic compare-and-swap).

It landed after the 17.x branch so isn't in this release, but in the future you'll be able to use --print-supported-extensions with Clang to have it print a table of supported ISA extensions (the same flag has now been implemented for Arm and AArch64 too).

Other additions and improvements

As always, it's not possible to go into detail on every change. A selection of other changes that I'm not able to delve into more detail on:

Initial RISC-V support was added to LLVM's BOLT post-link optimizer and various fixes / feature additions made to JITLink, thanks to the work of my Igalia colleague Job Noorman. There's actually a lot to say about this work, but I don't need to because Job has written up and excellent blog post on it that I highly encourage you go and read.
LLD gained support for some of the relaxations involving the global pointer.
I expect there'll be more to say about this in future releases, but there's been incremental progress on RISC-V GlobalISel in the LLVM 17 development cycle (which has continued after). You might be interested in the slides from my GlobalISel by example talk at EuroLLVM this year. Ivan Baev at SiFive is also set to speak about some of this work at the RISC-V Summit in November.
Clang supports a form of control-flow integrity called KCFI. This is used by low-level software like the Linux kernel (see CONFIG_CFI_CLANG in the Linux tree) but the target-specific parts were previously unimplemented for RISC-V. This gap was filled for the LLVM 17 release.
LLVM has its own work-in-progress libc implementation, and the RISC-V implementations of memcmp, bcmp, memset, and memcpy all gained optimised RISC-V specific versions. There will of course be further updates for LLVM 18, including the work from my colleague Mikhail R Gadelha on 32-bit RISC-V support.

Apologies if I've missed your favourite new feature or improvement - the LLVM release notes will include some things I haven't had space for here. Thanks again for everyone who has been contributing to make the RISC-V in LLVM even better.

If you have a RISC-V project you think me and my colleagues and at Igalia may be able to help with, then do get in touch regarding our services.

Article changelog

2023-10-10: Initial publication date.