<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title>Muxup</title>
<subtitle>Adventures in collaborative open source development</subtitle>
<link href="https://muxup.com/feed.xml" rel="self" type="application/atom+xml"/>
<link href="https://muxup.com"/>
<updated>2024-02-20T12:00:00Z</updated>
<id>https://muxup.com/feed.xml</id>
<entry>
<title>Clarifying instruction semantics with P-Code</title>
<published>2024-02-20T12:00:00Z</published>
<updated>2024-02-20T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2024q1/clarifying-instruction-semantics-with-p-code"/>
<id>https://muxup.com/2024q1/clarifying-instruction-semantics-with-p-code</id>
<content type="html">
&lt;p&gt;I&#x27;ve recently had a need to step through quite a bit of disassembly for
different architectures, and although some architectures have well-written ISA
manuals it can be a bit jarring switching between very different assembly
syntaxes (like &quot;source, destination&quot; for AT&amp;amp;T vs &quot;destination, source&quot; for
just about everything else) or tedious looking up different ISA manuals to
clarify the precise semantics. I&#x27;ve been using a very simple script to help
convert an encoded instruction to a target-independent description of its
semantics, and thought I may as well share it as well as some thoughts on
its limitations.&lt;/p&gt;
&lt;h2 id=&quot;a-hrefhttpsgithubcommuxupmedleyinstruction_to_pcodea&quot;&gt;&lt;a href=&quot;#a-hrefhttpsgithubcommuxupmedleyinstruction_to_pcodea&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;&lt;a href=&quot;https://github.com/muxup/medley&quot;&gt;instruction_to_pcode&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/muxup/medley&quot;&gt;The script&lt;/a&gt; is simplicity itself, thanks to
the &lt;a href=&quot;https://github.com/angr/pypcode&quot;&gt;pypcode&lt;/a&gt; bindings to
&lt;a href=&quot;https://en.wikipedia.org/wiki/Ghidra&quot;&gt;Ghidra&lt;/a&gt;&#x27;s SLEIGH library which provides
an interface to convert an input to the P-Code representation. Articles like
&lt;a href=&quot;https://riverloopsecurity.com/blog/2019/05/pcode/&quot;&gt;this one&lt;/a&gt; provide an
introduction and there&#x27;s the &lt;a href=&quot;https://htmlpreview.github.io/?https://github.com/NationalSecurityAgency/ghidra/blob/master/GhidraDocs/languages/html/pcoderef.html&quot;&gt;reference manual in the Ghidra
repo&lt;/a&gt;
but it&#x27;s probably easiest to just look at a few examples. P-Code is used as
the basis of Ghidra&#x27;s decompiler and provides a consistent human-readable
description of the semantics of instructions for supported targets.&lt;/p&gt;
&lt;p&gt;Here&#x27;s an example aarch64 instruction:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ ./instruction_to_pcode aarch64 b874c925
-- 0x0: ldr w5, [x9, w20, SXTW #0x0]
0) unique[0x5f80:8] = sext(w20)
1) unique[0x7200:8] = unique[0x5f80:8]
2) unique[0x7200:8] = unique[0x7200:8] &amp;lt;&amp;lt; 0x0
3) unique[0x7580:8] = x9 + unique[0x7200:8]
4) unique[0x28b80:4] = *[ram]unique[0x7580:8]
5) x5 = zext(unique[0x28b80:4])
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the above you can see that the disassembly for the instruction is dumped,
and then 5 P-Code instructions are printed showing the semantics. These P-Code
instructions directly use the register names for architectural registers (as a
reminder, &lt;a href=&quot;https://developer.arm.com/documentation/dui0801/l/Overview-of-AArch64-state/Registers-in-AArch64-state?lang=en&quot;&gt;AArch64 has 64-bit GPRs X0-X30 with the bottom halves acessible
through
W-W30&lt;/a&gt;). Intermediate state is stored in &lt;code&gt;unique[addr:width]&lt;/code&gt; locations. So the above instruction sign-extends &lt;code&gt;w20&lt;/code&gt;, adds to &lt;code&gt;x9&lt;/code&gt;, and reads a 32-bit value from the resulting address, then zero-extends to 64-bits when storing into &lt;code&gt;x5&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The output is somewhat more verbose for architectures with flag registers,
e.g. &lt;code&gt;cmpb $0x2f,-0x1(%r11)&lt;/code&gt; produces:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./instruction_to_pcode x86-64 --no-reverse-input &quot;41 80 7b ff 2f&quot;
-- 0x0: CMP byte ptr [R11 + -0x1],0x2f
0) unique[0x3100:8] = R11 + 0xffffffffffffffff
1) unique[0xbd80:1] = *[ram]unique[0x3100:8]
2) CF = unique[0xbd80:1] &amp;lt; 0x2f
3) unique[0xbd80:1] = *[ram]unique[0x3100:8]
4) OF = sborrow(unique[0xbd80:1], 0x2f)
5) unique[0xbd80:1] = *[ram]unique[0x3100:8]
6) unique[0x28e00:1] = unique[0xbd80:1] - 0x2f
7) SF = unique[0x28e00:1] s&amp;lt; 0x0
8) ZF = unique[0x28e00:1] == 0x0
9) unique[0x13180:1] = unique[0x28e00:1] &amp;amp; 0xff
10) unique[0x13200:1] = popcount(unique[0x13180:1])
11) unique[0x13280:1] = unique[0x13200:1] &amp;amp; 0x1
12) PF = unique[0x13280:1] == 0x0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But simple instructions that don&#x27;t set flags do produce concise P-Code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ ./instruction_to_pcode riscv64 &quot;9d2d&quot;
-- 0x0: c.addw a0,a1
0) unique[0x15880:4] = a0 + a1
1) a0 = sext(unique[0x15880:4])
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&quot;other-approaches&quot;&gt;&lt;a href=&quot;#other-approaches&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Other approaches&lt;/h2&gt;
&lt;p&gt;P-Code was an intermediate language I&#x27;d encountered before and of course
benefits from having an easy to use Python wrapper and fairly good support for
a range of ISAs in Ghidra. But there are lots of other options -
&lt;a href=&quot;https://angr.io/&quot;&gt;angr&lt;/a&gt; (which
uses Vex, taken from Valgrind) &lt;a href=&quot;https://docs.angr.io/en/latest/faq.html#why-did-you-choose-vex-instead-of-another-ir-such-as-llvm-reil-bap-etc&quot;&gt;compares some
options&lt;/a&gt;
and there&#x27;s &lt;a href=&quot;https://softsec.kaist.ac.kr/~soomink/paper/ase17main-mainp491-p.pdf&quot;&gt;more in this
paper&lt;/a&gt;.
Radare2 has &lt;a href=&quot;https://book.rada.re/disassembling/esil.html&quot;&gt;ESIL&lt;/a&gt;, but while
I&#x27;m sure you&#x27;d get used to it, it doesn&#x27;t pass the readability test for me.
The &lt;a href=&quot;https://rev.ng/&quot;&gt;rev.ng&lt;/a&gt; project uses QEMU&#x27;s
&lt;a href=&quot;https://www.qemu.org/docs/master/devel/tcg-ops.html&quot;&gt;TCG&lt;/a&gt;. This is an
attractive approach because you benefit from more testing and ISA extension
support for some targets vs P-Code (Ghidra support &lt;a href=&quot;https://github.com/NationalSecurityAgency/ghidra/pull/5778&quot;&gt;is
lacking&lt;/a&gt; for RVV,
bitmanip, and crypto extensions).&lt;/p&gt;
&lt;p&gt;Another route would be to pull out the semantic definitions from a formal spec
(like &lt;a href=&quot;https://www.cl.cam.ac.uk/~pes20/sail/&quot;&gt;Sail&lt;/a&gt;) or even an easy to read
simulator (e.g. &lt;a href=&quot;https://github.com/riscv-software-src/riscv-isa-sim&quot;&gt;Spike&lt;/a&gt;
for RISC-V). But in both cases, definitions are written to minimise repetition
to some degree, while when expanding the semantics we prefer explicitness, so
would want to expand to a form that differs a bit from the Sail/Spike code as
written.&lt;/p&gt;
</content>
</entry>
<entry>
<title>Reflections on ten years of LLVM Weekly</title>
<published>2024-01-01T12:00:00Z</published>
<updated>2024-01-01T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2024q1/reflections-on-ten-years-of-llvm-weekly"/>
<id>https://muxup.com/2024q1/reflections-on-ten-years-of-llvm-weekly</id>
<content type="html">
&lt;p&gt;Today, with &lt;a href=&quot;https://llvmweekly.org/issue/522&quot;&gt;Issue #522&lt;/a&gt; I&#x27;m marking ten
years of authoring &lt;a href=&quot;https://llvmweekly.org/&quot;&gt;LLVM Weekly&lt;/a&gt;, a newsletter
summarising developments on projects under the LLVM umbrella (LLVM, Clang,
MLIR, Flang, libcxx, compiler-rt, lld, LLDB, ...). Somehow I&#x27;ve managed to
keep up an unbroken streak, publishing every single Monday since the first
issue back on &lt;a href=&quot;https://llvmweekly.org/issue/1&quot;&gt;Jan 6th 2014&lt;/a&gt; (the first Monday
of 2014 - you can also see the format hasn&#x27;t changed much!). With a milestone
like that, now is the perfect moment to jot down some reflections on the
newsletter and thoughts for the future.&lt;/p&gt;
&lt;h2 id=&quot;motivation-and-purpose&quot;&gt;&lt;a href=&quot;#motivation-and-purpose&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Motivation and purpose&lt;/h2&gt;
&lt;p&gt;Way back when I started &lt;a href=&quot;https://llvmweekly.org/&quot;&gt;LLVM Weekly&lt;/a&gt;, I&#x27;d been
working with LLVM for a few years as part of developing and supporting a
downstream compiler for a novel
research architecture. This was a very educational yet somewhat lonely
experience, and I sought to more closely follow upstream LLVM development
to keep better abreast of changes that might impact or help my work, to
learn more about parts of the compiler I wasn&#x27;t actively using, and also to
feel more of a connection to the wider LLVM community given my compiler work
was a solo effort. The calculus for kicking off an LLVM development newsletter
was dead simple: I found value in tracking development anyway, the
incremental effort to write up and share with others wasn&#x27;t &lt;em&gt;too&lt;/em&gt; great, and I
felt quite sure others would benefit as well.&lt;/p&gt;
&lt;p&gt;Looking back at my notes (I have a huge Markdown file with daily notes going back
to 2011 - a file of this rough size and format is also a good
&lt;a href=&quot;https://github.com/mawww/kakoune/issues/4685#issuecomment-1208129806&quot;&gt;stress&lt;/a&gt;
&lt;a href=&quot;https://github.com/helix-editor/helix/issues/3072#issuecomment-1208133990&quot;&gt;test&lt;/a&gt;
for text editors!) it seems I thought seriously about the idea of starting
something up at the beginning of December 2013. I brainstormed the format,
looked at other newsletters I might want to emulate, and went ahead and just
did it starting in the new year. It really was as simple as that. I figured
better to give it a try and stop it if it gets no traction rather than waste
lots of time putting out feelers on level of interest and format. As a
sidenote, I was delighted to see many of the newsletters I studied at the time
are still going: &lt;a href=&quot;https://this-week-in-rust.org/&quot;&gt;This Week in Rust&lt;/a&gt;
&lt;a href=&quot;https://perlweekly.com/&quot;&gt;Perl Weekly&lt;/a&gt; (I&#x27;ll admit this surprised me!),
&lt;a href=&quot;https://discourse.ubuntu.com/c/uwn/124&quot;&gt;Ubuntu Weekly News&lt;/a&gt;, &lt;a href=&quot;https://alan.petitepomme.net/cwn/index.html&quot;&gt;OCaml Weekly
News&lt;/a&gt;, and &lt;a href=&quot;https://wiki.haskell.org/Haskell_Weekly_News&quot;&gt;Haskell Weekly
News&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;readership-and-content&quot;&gt;&lt;a href=&quot;#readership-and-content&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Readership and content&lt;/h2&gt;
&lt;p&gt;The basic format of LLVM Weekly is incredibly simple - highlight relevant news
articles and blog posts, pick out some forum/mailing discussions (sometimes
trying to summarise complex debates - but this is very challenging and time
intensive), and highlight some noteworthy commits from across the project.
More recently I&#x27;ve taken to advertising the scheduled &lt;a href=&quot;https://llvm.org/docs/GettingInvolved.html#online-sync-ups&quot;&gt;online
sync-ups&lt;/a&gt; and
&lt;a href=&quot;https://llvm.org/docs/GettingInvolved.html#office-hours&quot;&gt;office hours&lt;/a&gt; for
the week. Notably absent are any kind of ads or paid content. I respect that
others have made successful businesses in this kind of space, but although I&#x27;ve
always written LLVM Weekly on my own personal time I&#x27;ve never felt comfortable
trying to monetise other people&#x27;s attention or my relationship with the
community in this way.&lt;/p&gt;
&lt;p&gt;The target audience is really anyone with an interest in keeping track of LLVM
development, though I don&#x27;t tend to expand every acronym or give a
from-basics explanation for every term, so some familiarity with the project
is assumed if you want to understand every line. The newsletter is posted to
LLVM&#x27;s Discourse, to &lt;a href=&quot;https://llvmweekly.org/&quot;&gt;llvmweekly.org&lt;/a&gt;, and delivered
direct to people&#x27;s inboxes. I additionally post &lt;a href=&quot;https://twitter.com/llvmweekly&quot;&gt;on
Twitter&lt;/a&gt; and &lt;a href=&quot;https://fosstodon.org/@llvmweekly&quot;&gt;on
Mastodon&lt;/a&gt; linking to each issue. I don&#x27;t
attempt to track open rates or have functioning analytics, so only have a
rough idea of readership. There are ~3.5k active subscribers directly to the
mailing list, ~7.5k Twitter followers, ~180 followers on Mastodon (introduced
much more recently), and an unknown number of people reading via
llvmweekly.org or RSS. I&#x27;m pretty confident that I&#x27;m not just shouting in the
void at least.&lt;/p&gt;
&lt;p&gt;There are some gaps or blind spots of course. I make no attempt to try to link
to patches that are under-review, even though many might have interesting
review discussions because it would simply be too much work to sort through
them and if the discussion is particularly contentious or requires input
from a wider cross-section of the LLVM community you&#x27;d expect an RFC
to be posted anyway. Although I do try to highlight MLIR threads or commits,
as it&#x27;s not an area of LLVM I&#x27;m working right now I probably miss some things.
Thankfully Javed Absar has taken up writing an &lt;a href=&quot;https://discourse.llvm.org/c/mlir/mlir-news-mlir-newsletter/37&quot;&gt;MLIR
newsletter&lt;/a&gt;
that helps plug those gaps. I&#x27;m also not currently trawling through repos
under the &lt;a href=&quot;https://github.com/llvm/&quot;&gt;LLVM GitHub organisation&lt;/a&gt; other than the
main llvm-project monorepo, though perhaps I should...&lt;/p&gt;
&lt;p&gt;I&#x27;ve shied away from reposting job posts as the overhead is just too high.  I
found dealing with requests to re-advertise (and considering if this is useful
to the community) or determining if ads are sufficiently LLVM related just
wasnt a good use of time when there&#x27;s a good alternative. People can &lt;a href=&quot;https://discourse.llvm.org/c/community/job-postings/&quot;&gt;check
the job post
category on LLVM
discourse&lt;/a&gt; or search for
LLVM on their favourite jobs site.&lt;/p&gt;
&lt;h2 id=&quot;how-it-works&quot;&gt;&lt;a href=&quot;#how-it-works&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;How it works&lt;/h2&gt;
&lt;p&gt;There are really two questions to be answered here: how I go about writing it
each week, and what tools and services are used. In terms of writing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I have a checklist I follow just to ensure nothing gets missed and help dive
back in quickly if splitting work across multiple days.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tig --since=$LAST_WEEK_DATE $DIR&lt;/code&gt; to step through commits in the past week
for each sub-project within the monorepo.
&lt;a href=&quot;https://jonas.github.io/tig/&quot;&gt;Tig&lt;/a&gt; is a fantastic text interface for git,
and I of course have an ugly script that I bind to a key that generates the
&lt;code&gt;[shorthash](github_link)&lt;/code&gt; style links I insert for each chosen commit.&lt;/li&gt;
&lt;li&gt;I make a judgement call as to whether I think a commit might be of interest
to others. This is bound to be somewhat flawed, but hopefully better than
ramdom selection! I always really appreciate feedback if you think I missed
something important, or tips on things you think I should include next week.
&lt;ul&gt;
&lt;li&gt;There&#x27;s a cheat that practically guarantees a mention in LLVM Weekly
without even needing to drop me a note though - write documentation! It&#x27;s
very rare I see a commit that adds more docs and fail to highlight it.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Similarly, I scan through &lt;a href=&quot;https://discourse.llvm.org/&quot;&gt;LLVM Discourse&lt;/a&gt;
posts over the past week and pick out discussions I think readers may be
interested in. Most RFCs will be picked up as part of this. In some cases if
there&#x27;s a lengthy discussion I might attempt to summarise or point to key
messages, but honestly this is rarer than I&#x27;d like as it can be incredibly
time consuming. I try very hard to remain a neutral voice and no to insert
personal views on technical discussions.&lt;/li&gt;
&lt;li&gt;Many ask how long it takes to write, and the answer is of course that it
varies. It&#x27;s easy to spend a lot of time trying to figure out the importance
of commits or discussions in parts of the compiler I don&#x27;t work with much,
or to better summarise content. The amount of activity can also vary a lot
week to week (especially on Discourse). It&#x27;s mostly in the 2.5-3.5h range
(very rarely any more than 4 hours) to write, copyedit, and send.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&#x27;s not much to be said on the tooling said, except that I could probably
benefit from refreshing my helper scripts. Mail sending is handled by
&lt;a href=&quot;https://www.mailgun.com/&quot;&gt;Mailgun&lt;/a&gt;, who have changed ownership three times
since I started. I handle double opt-in via a simple Python script on the
server-side and mail sending costs me $3-5 a month. Otherwise, I generate the
static HTML with some scripts that could do with a bit more love. The only
other running costs are the domain name fees and a VPS that hosts some other
things as well, so quite insignificant compared to the time commitment.&lt;/p&gt;
&lt;h2 id=&quot;how-you-can-help&quot;&gt;&lt;a href=&quot;#how-you-can-help&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;How you can help&lt;/h2&gt;
&lt;p&gt;I cannot emphasise enough that I&#x27;m not an expert on all parts of LLVM, and I&#x27;m
also only human and can easily miss things. If you did something you think
people may be interested in and I failed to cover it, I almost certainly
didn&#x27;t explicitly review it and deem it not worthy. Please do continue to help
me out by dropping links and suggestions. Writing commit messages that make it
clear if a change has wider impact also helps increase the chance I&#x27;ll pick it
up.&lt;/p&gt;
&lt;p&gt;I noted above that it is particularly time consuming to summarise back and
forth in lengthy RFC threads. Sometimes people step up and do this and I
always try to link to it when this happens. The person who initiated a thread
or proposal is best placed write such a summary, and it&#x27;s also a useful tool
to check that you interpreted people&#x27;s suggestions/feedback correctly, but it
can still be helpful if others provide a similar service.&lt;/p&gt;
&lt;p&gt;Many people have fed back they find LLVM Weekly useful to stay on top of LLVM
developments. This is gratifying, but also a pretty huge responsibility. If
you have thoughts on things I could be doing differently to serve the
community even better without a big difference in time commitment, I&#x27;m always
keen to hear ideas and suggestions.&lt;/p&gt;
&lt;h2 id=&quot;miscellaneous-thoughts&quot;&gt;&lt;a href=&quot;#miscellaneous-thoughts&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Miscellaneous thoughts&lt;/h2&gt;
&lt;p&gt;To state the obvious, ten years is kind of a long time. A lot has happened
with me in that time - I&#x27;ve got married, we had a son, I co-founded and helped
grow a company, and then moved on from that, kicked off the upstream RISC-V
LLVM backend, and much more. One of the things I love working with compilers
is that there&#x27;s always new things to learn, and writing LLVM Weekly helps me
learn at least a little more each week in areas outside of where I&#x27;m currently
working.  There&#x27;s been a lot of changes in LLVM as well. Off the top off my
head: there&#x27;s been the move from SVN to Git, moving the Git repo to GitHub,
moving from Phabricator to GitHub PRs, Bugzilla to GitHub issues, mailing
lists to Discourse, relicensing to Apache 2.0 with LLVM exception, the wider
adoption of office hours and area-specific sync-up calls, and more. I think
even the LLVM Foundation was set up a little bit after LLVM Weekly started.
It&#x27;s comforting to see the &lt;a href=&quot;https://web.archive.org/web/20140102034931/http://llvm.org/&quot;&gt;llvm.org website design remains unchanged
though!&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It&#x27;s also been a time period where I&#x27;ve become increasingly involved in LLVM.
Upstream work - most notably initiating the RISC-V LLVM backend, organising an
LLVM conference, many talks, serving on various program committees for LLVM
conferences, etc etc. When I started I felt a lot like someone outside the
community looking in and documenting what I saw. That was probably accurate
too, given the majority of my work was downstream. While I don&#x27;t feel like an
LLVM &quot;insider&quot; (if such a thing exists?!), I certainly feel a lot more part of
the community than I did way back then.&lt;/p&gt;
&lt;p&gt;An obvious question is whether there are other ways of pulling together the
newsletter that are worth pursuing. My experience with large language models
so far has been that they haven&#x27;t been very helpful in reducing the effort for
the more time consuming aspects of producing LLVM Weekly, but perhaps that
will change in the future. If I could be automated away then that&#x27;s great -
perhaps I&#x27;m misjudging how much of my editorial input is signal rather than
just noise, but I don&#x27;t think we&#x27;re there yet for AI. More collaborative
approaches to producing content would be another avenue to explore. For the
current format, the risk is that the communication overhead and stress of
seeing if various contributions actually materialise before the intended
publication date is quite high. If I did want to spread the load or hand it
over, then a rotating editorship would probably be most interesting to me.
Even if multiple people contribute, each week a single would act as a backstop
to make sure something goes out.&lt;/p&gt;
&lt;p&gt;The unbroken streak of LLVM Weekly editions each Monday has become a bit
totemic. It&#x27;s certainly not always convenient having this fixed commitment,
but it can also be nice to have this rhythm to the week. Even if it&#x27;s a bad
week, at least it&#x27;s something in the bag that people seem to appreciate.
Falling into bad habits and frequently missing weeks would be good for nobody,
but I suspect that a schedule that allowed the odd break now and then would be
just fine. Either way, I feel a sense of relief having hit the 10 year
unbroken streak. I don&#x27;t intend to start skipping weeks, but should life get
in the way and the streak gets broken I&#x27;ll feel rather more relaxed about it
having hit that arbitrary milestone.&lt;/p&gt;
&lt;h2 id=&quot;looking-forwards-and-thanks&quot;&gt;&lt;a href=&quot;#looking-forwards-and-thanks&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Looking forwards and thanks&lt;/h2&gt;
&lt;p&gt;So what&#x27;s next? LLVM Weekly continues, much as before. I don&#x27;t know of I&#x27;ll
still be writing it in another 10 years time, but I&#x27;m not planning to stop
soon. If it ceases to be a good use of my time, ceases to have values for
others, or I find there&#x27;s a better way of generating similar impact then it
would only be logical to move on. But for now, onwards and upwards.&lt;/p&gt;
&lt;p&gt;Many thanks are due. Thank you to the people who make LLVM
what it is - both technically and in terms of its community that I&#x27;ve learned
so much from. Thank you to &lt;a href=&quot;https://www.igalia.com/&quot;&gt;Igalia&lt;/a&gt; where I work for
creating an environment where I&#x27;m privileged enough to be paid to contribute
upstream to LLVM (&lt;a href=&quot;https://www.igalia.com/contact/&quot;&gt;get in touch&lt;/a&gt; if you have
LLVM needs!). Thanks to my family for ongoing support and
of course putting up with the times my LLVM Weekly commitment is inconvenient.
Thank you to everyone who has been reading LLVM Weekly and especially those
sending in feedback or tips or suggestions for future issues.&lt;/p&gt;
&lt;p&gt;On a final note, if you&#x27;ve got this far you should make sure you are
&lt;a href=&quot;https://llvmweekly.org/&quot;&gt;subscribed&lt;/a&gt; to LLVM Weekly and &lt;a href=&quot;https://fosstodon.org/@llvmweekly&quot;&gt;follow on
Mastodon&lt;/a&gt; or &lt;a href=&quot;https://twitter.com/llvmweekly&quot;&gt;on
Twitter&lt;/a&gt;.&lt;/p&gt;
</content>
</entry>
<entry>
<title>Let the (terminal) bells ring out</title>
<published>2023-12-24T12:00:00Z</published>
<updated>2023-12-24T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2023q4/let-the-terminal-bells-ring-out"/>
<id>https://muxup.com/2023q4/let-the-terminal-bells-ring-out</id>
<content type="html">
&lt;p&gt;I just wanted to take a few minutes to argue that the venerable terminal bell
is a helpful and perhaps overlooked tool for anyone who does a lot of their
work out of a terminal window. First, an important clarification. Bells
ringing, chiming, or (as is appropriate for the season) jingling all sounds
very noisy - but although you can configure your terminal emulator to emit a
sound for the terminal bell, I&#x27;m actually advocating for configuring a
non-intrusive but persistent visual notification.&lt;/p&gt;
&lt;h2 id=&quot;bel&quot;&gt;&lt;a href=&quot;#bel&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;BEL&lt;/h2&gt;
&lt;p&gt;Our goal is to generate a visual indicator on demand (e.g. when a long-running
task has finished) and to do so with minimal fuss. This should work over ssh
and without worrying about forwarding connections to some notification
daemon. The ASCII &lt;code&gt;BEL&lt;/code&gt; control character (alternatively written as &lt;code&gt;BELL&lt;/code&gt; by
those willing to spend characters extravagantly) meets these requirements.
You&#x27;ll just need co-operation from your terminal emulator and window manager
to convert the bell to an appropriate notification.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;BEL&lt;/code&gt; is &lt;code&gt;7&lt;/code&gt; in ASCII, but can be printed using &lt;code&gt;\a&lt;/code&gt; in &lt;code&gt;printf&lt;/code&gt; (including
the &lt;code&gt;/usr/bin/printf&lt;/code&gt; you likely use from your shell, &lt;a href=&quot;https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html&quot;&gt;defined in
POSIX&lt;/a&gt;).
There&#x27;s even a &lt;a href=&quot;https://rosettacode.org/wiki/Terminal_control/Ringing_the_terminal_bell&quot;&gt;Rosetta Code
page&lt;/a&gt;
on ringing the terminal bell from various languages. Personally, I like to
define a shell alias such as:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;alias&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;bell=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;printf &amp;#39;\aBELL!\n&amp;#39;&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Printing some text alongside the bell is helpful for confirming the bell was
triggered as expected even after it was dismissed. Then, if kicking off a long
operation like an LLVM compile and test use something like:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;cmake --build . &lt;span style=&quot;color: #000000&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; ./bin/llvm-lit -s test; bell
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;;&lt;/code&gt; ensures the bell is produced regardless of the exit code of the
previous commands. All being well, this sets the urgent hint on the X11 window
used by your terminal, and your window manager produces a subtle but
persistent visual indicator that is dismissed after you next give focus to the
source of the bell. Here&#x27;s how it looks for me in
&lt;a href=&quot;https://dwm.suckless.org/&quot;&gt;DWM&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/static/bell_example.png&quot; alt=&quot;Screenshot of DWM showing a notification from abell&quot; title=&quot;DWM screenshot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The above example shows 9 workspaces (some of them named), where the &lt;code&gt;llvm&lt;/code&gt;
workspace has been highlighted because a bell was produced there. You&#x27;ll also
spot that I have a &lt;code&gt;timers&lt;/code&gt; workspace, which I tend to use for miscellaneous
timers. e.g. a reminder before a meeting is due to start, or when I&#x27;m planning
to switch a task. I have a small tool for this I might share in a future post.&lt;/p&gt;
&lt;p&gt;A limitation versus triggering &lt;a href=&quot;https://specifications.freedesktop.org/notification-spec/latest/&quot;&gt;freedesktop.org Desktop
Notifications&lt;/a&gt;
is that there&#x27;s no payload / associated message. For me this isn&#x27;t a big deal,
such messages are distracting, and it&#x27;s easy enough to see the full context
when switching workspaces. It&#x27;s possible it&#x27;s a problem for your preferred
workflow of course.&lt;/p&gt;
&lt;p&gt;You &lt;em&gt;could&lt;/em&gt; put &lt;code&gt;\a&lt;/code&gt; in your terminal prompt (&lt;code&gt;$PS1&lt;/code&gt;), meaning a bell is
triggered after every command finishes. For me this would lead to too many
notifications for commands I didn&#x27;t want to carefully monitor the output for,
but your mileage may vary. After publishing this article, my
&lt;a href=&quot;https://igalia.com&quot;&gt;Igalia&lt;/a&gt; colleague Adrian Perez pointed me to a slight
variant on this that he uses: in Zsh &lt;code&gt;$TTYIDLE&lt;/code&gt; makes it easy to configure
behaviour based on the duration of a command and &lt;a href=&quot;https://github.com/aperezdc/dotfiles/blob/ce6a240bcbcac7b796895da581f0a6c5f23f31d5/dot.zsh--rc.zsh#L392&quot;&gt;he configures zsh so a bell
is produced for commands that take longer than 30
seconds to
complete&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;terminal-emulator-support&quot;&gt;&lt;a href=&quot;#terminal-emulator-support&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Terminal emulator support&lt;/h2&gt;
&lt;p&gt;Unfortunately, setting the urgent hint upon a bell is not supported by
gnome-terminal, with a &lt;a href=&quot;https://gitlab.gnome.org/GNOME/gnome-terminal/-/issues/6698&quot;&gt;15 year-old issue left
unresolved&lt;/a&gt;. It
is however supported by the otherwise very similar xfce4-terminal (just enable
the visual bell in preferences), and I switched solely due to this issue.&lt;/p&gt;
&lt;p&gt;From what I can tell, this is the status of visual bell support via setting
the X11 urgent hint:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;xfce4-terminal: Supported. In Preferences -&amp;gt; Advanced ensure &quot;Visual bell&quot;
is ticked.&lt;/li&gt;
&lt;li&gt;xterm: Set &lt;code&gt;XTerm.vt100.bellIsUrgent: true&lt;/code&gt; in your &lt;code&gt;.Xresources&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;rxvt-unicode (urxvt): Set &lt;code&gt;URxvt.urgentOnBell: true&lt;/code&gt; in your &lt;code&gt;.Xresources&lt;/code&gt;
file.&lt;/li&gt;
&lt;li&gt;alacritty: Supported. Works out of the box with no additional configuration
needed.&lt;/li&gt;
&lt;li&gt;gnome-terminal: &lt;a href=&quot;https://gitlab.gnome.org/GNOME/gnome-terminal/-/issues/6698&quot;&gt;Not
supported&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;konsole: As far as I can tell it isn&#x27;t supported. Creating a new profile and
setting the &quot;Terminal bell mode&quot; to &quot;Visual Bell&quot; doesn&#x27;t seem to result in
the urgent hint being set.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor &quot;aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2023-12-24: Add note about configuring a bell for commands taking
longer than a certain threshold duration in Zsh.&lt;/li&gt;
&lt;li&gt;2023-12-24: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>Ownership you can count on</title>
<published>2023-12-20T12:00:00Z</published>
<updated>2023-12-20T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2023q4/ownership-you-can-count-on"/>
<id>https://muxup.com/2023q4/ownership-you-can-count-on</id>
<content type="html">
&lt;h2 id=&quot;introduction&quot;&gt;&lt;a href=&quot;#introduction&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Introduction&lt;/h2&gt;
&lt;p&gt;I came across the paper &lt;a href=&quot;https://web.archive.org/web/20220111001720/https://researcher.watson.ibm.com/researcher/files/us-bacon/Dingle07Ownership.pdf&quot;&gt;Ownership You Can Count
On&lt;/a&gt;
(by Adam Dingle and David F. Bacon, seemingly written in 2007) some years ago
and it stuck with me as being an interesting variant on traditional reference
counting. Since then I&#x27;ve come across references to it multiple times in the
programming language design and implementation community and thought it might
be worth jotting down some notes on the paper itself and the various attempts
to adopt its ideas.&lt;/p&gt;
&lt;h2 id=&quot;ownership-you-can-count-on-and-the-gel-language&quot;&gt;&lt;a href=&quot;#ownership-you-can-count-on-and-the-gel-language&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Ownership you can count on and the Gel language&lt;/h2&gt;
&lt;p&gt;The basic idea is very straight-forward. Introduce the idea of an owning
pointer type (not dissimilar to &lt;code&gt;std::unique_ptr&lt;/code&gt; in C++) where each object
may have only a single owning pointer and the object is freed when that
pointer goes out of scope. In addition to that allow an arbitrary number of
non-owning pointers to the object, but require that all non-owning pointers
have been destroyed by the time the object is freed. This requirement
prevents use-after-free and is implemented using run-time checks.
A reference count is incremented or decremented whenever a non-owning pointer
is created or destroyed (referred to as alias counting in the paper). That
count is checked when the object is freed and if it still has non-owning
references (i.e. the count is non-zero), then the program exits with an error.&lt;/p&gt;
&lt;p&gt;Contrast this with a conventional reference counting system, where objects are
freed when their refcount reaches zero. In the alias counting scheme, the
refcount stored on the object is simply decremented when a non-owning
reference goes out of scope, and this refcount needs to be compared to 0 only
upon the owning pointer going out of scope rather than upon every decrement
(as an object is never freed as a result of the non-owning pointer count
reaching zero). Additionally, refcount manipulation is never needed when
passing around the owning pointer. The paper also describes an analysis
that allows many refcount operations to be elided in regions where there
aren&#x27;t modifications to owned pointers of the owned object&#x27;s
subclass/class/superclass (which guarantees the pointed-to object can&#x27;t be
destructed in this region). The paper also claims support for destroying data
structures with pointer cycles that can&#x27;t be automatically destroyed with
traditional reference counting. The authors suggest for cases where you might
otherwise reach for multiple ownership, (e.g. an arbitrary graph) to allocate
an array of owning pointers to hold your nodes, then use non-owning pointers
between them.&lt;/p&gt;
&lt;p&gt;The paper describes a C# derived language called Gel which only requires two
additional syntactic constructs to support the alias counting model: owned
pointers indicated by a &lt;code&gt;^&lt;/code&gt; (e.g. &lt;code&gt;Foo ^f = new Foo();&lt;/code&gt; and a &lt;code&gt;take&lt;/code&gt; operator
to take ownership of a value from an owning field. Non-owning pointers are
written just as &lt;code&gt;Foo f&lt;/code&gt;. They also achieve a rather neat &lt;em&gt;erasure&lt;/em&gt;
property, whereby if you take a Gel program and remove all &lt;code&gt;^&lt;/code&gt; and &lt;code&gt;take&lt;/code&gt;
you&#x27;ll have a C# program that is valid as long as the original Gel program was
valid.&lt;/p&gt;
&lt;p&gt;That all sounds pretty great, right? Does this mean we can have it all: full
memory safety, deterministic destruction, low runtime and annotation overhead? As you&#x27;d
expect, there are some challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The usability in practice is going to depend a lot on how easy it is for a
programmer to maintain the invariant that no unowned pointers outlive the
owned pointer, especially as failure to do so results in a runtime crash.
&lt;ul&gt;
&lt;li&gt;Several languages have looked at integrating the idea, so there&#x27;s some
information later in this article on their experiences.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Although data structures with pointer cycles that couldn&#x27;t be automatically
destroyed by reference counting an be handled, there are also significant
limitations. Gel can&#x27;t destroy graphs containing non-owning pointers to
non-ancestor nodes unless the programmer writes logic to null out those
pointers. This could be frustrating and error-prone to handle.
&lt;ul&gt;
&lt;li&gt;The authors present a multi-phase object destruction mechanism aiming to
address these limitations, though the cost (potentially recursively
descending the graph of the ownership tree 3 times) depends on how much
can be optimised away.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Although it&#x27;s not a fundamental limitation of the approach, Gel doesn&#x27;t yet
provide any kind of polymorphism between owned and unowned references. This
would be necessary for any modern language with support for generics.&lt;/li&gt;
&lt;li&gt;The reference count elimination optimisation described in the paper assumes
single-threaded execution.
&lt;ul&gt;
&lt;li&gt;Though as noted there, thread escape analysis or structuring groups of
objects into independent regions (also see &lt;a href=&quot;https://dl.acm.org/doi/10.1145/3622846&quot;&gt;recent work on
Verona&lt;/a&gt;) could provide a solution.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So in summary, an interesting idea that is meaningfully different to
traditional reference counting, but the largest program written using this
scheme is the Gel compiler itself and many of the obvious questions require
larger scale usage to judge the practicality of the scheme.&lt;/p&gt;
&lt;h2 id=&quot;influence-and-adoption-of-the-papers-ideas&quot;&gt;&lt;a href=&quot;#influence-and-adoption-of-the-papers-ideas&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Influence and adoption of the paper&#x27;s ideas&lt;/h2&gt;
&lt;p&gt;Ownership You Can Count On was written around 2007 and as far as I can tell
never published in the proceedings of a conference or workshop, or in a
journal. Flicking through the &lt;a href=&quot;https://code.google.com/archive/p/gel2/source/default/source&quot;&gt;Gel language
repository&lt;/a&gt; and
applying some world-class logical deduction based on the directory name
holding a draft version of the paper leads to me suspect it was submitted to
PLDI though. Surprisingly it has no academic citations, despite being shared
publicly on David F. Bacon&#x27;s site (and Bacon has a range of widely cited
papers related to reference counting / garbage collection). Yet, the work
has been used as the basis for memory management in one language
(&lt;a href=&quot;https://inko-lang.org/&quot;&gt;Inko&lt;/a&gt;) and was seriously evaluated (partially
implemented?) for both &lt;a href=&quot;https://nim-lang.org/&quot;&gt;Nim&lt;/a&gt; and
&lt;a href=&quot;https://vale.dev/&quot;&gt;Vale&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Inko started out with a garbage collector, but its creator Yorick Peterse
announced in 2021 &lt;a href=&quot;https://yorickpeterse.com/articles/friendship-ended-with-the-garbage-collector/&quot;&gt;a plan to adopt the scheme from Ownership You Can Count
on&lt;/a&gt;,
who then &lt;a href=&quot;https://yorickpeterse.com/articles/im-leaving-gitlab-to-work-on-inko-full-time/&quot;&gt;left his job to work on Inko
full-time&lt;/a&gt;
and &lt;a href=&quot;https://inko-lang.org/news/inko-0-10-0-released/#header-what-happened-since-the-last-release&quot;&gt;successfully transitioned to the new memory management scheme in the
Inko 0.10.0
release&lt;/a&gt;
about a year later. Inko as a language is more ambitious than Gel - featuring
parametric polymorphism, lightweight processes for concurrency, and more. Yet
it&#x27;s still early days and it&#x27;s not yet, for instance, good for drawing
conclusions about performance on modern systems as &lt;a href=&quot;https://github.com/jinyus/related_post_gen/pull/440#issuecomment-1816583612&quot;&gt;optimisations aren&#x27;t
currently
applied&lt;/a&gt;.
Dusty Phillips wrote a blog post earlier this year &lt;a href=&quot;https://dusty.phillips.codes/2023/06/26/understanding-inko-memory-management-through-data-structures/&quot;&gt;explaining Inko&#x27;s memory
management through some example data
structures&lt;/a&gt;,
which also includes some thoughts on the usability of the system and some
drawbacks. Some of the issues may be more of a result of the language being
young, e.g. the author notes it took a lot of trial and error to figure out
some of the described techniques (perhaps this will be easier once common
patterns are better documented and potentially supported by library functions
or syntax?), or that debuggability is poor when the program exits with a
dangling reference error.&lt;/p&gt;
&lt;p&gt;Nim was at one point going to move to Gel&#x27;s scheme (see the &lt;a href=&quot;https://nim-lang.org/araq/ownedrefs.html&quot;&gt;blog
post&lt;/a&gt; and
(&lt;a href=&quot;https://github.com/nim-lang/RFCs/issues/144&quot;&gt;RFC&lt;/a&gt;). I haven&#x27;t been able to
find a detailed explanation for the reasons why it was rejected, though
Andreas Rumpf (Nim language creator) commented on a &lt;a href=&quot;https://forum.dlang.org/post/ejvshljmezqovmfprkww@forum.dlang.org&quot;&gt;Dlang forum discussion
thread&lt;/a&gt;
about the paper that &quot;Nim tried to use this in production before moving to ORC
and I&#x27;m not looking back, &#x27;ownership you can count on&#x27; was actually quite a
pain to work with...&quot;. Nim has since &lt;a href=&quot;https://nim-lang.org/blog/2020/10/15/introduction-to-arc-orc-in-nim.html&quot;&gt;adopted a more conventional non-atomic
reference counted
scheme&lt;/a&gt;
(ARC), with an optional cycle collector (ORC).&lt;/p&gt;
&lt;p&gt;Vale was &lt;a href=&quot;https://verdagon.dev/blog/raii-next-steps&quot;&gt;previously adopting the Gel / Ownership You Can Count On
scheme&lt;/a&gt; (calling the unowned
references &quot;constraint references&quot;), but has since changed path slightly and
now uses &quot;&lt;a href=&quot;https://verdagon.dev/blog/generational-references&quot;&gt;generational
references&lt;/a&gt;&quot; Rather than a
reference count, each allocation includes a generation counter which is
incremented each time it is reused. Fat pointers that include the expected
value of the generation counter are used, and checked before dereferencing. If
they don&#x27;t match, that indicates the memory location was since reallocated and
the program will fault. This also puts restrictions on the allocator&#x27;s
ability to reuse freed allocations without compromising safety. Vale&#x27;s memory
management scheme retains similarities to Gel: the language is still based
around single ownership and this is exploited to elide checks on owning
references/pointers.&lt;/p&gt;
&lt;h1 id=&quot;conclusion-and-other-related-work&quot;&gt;&lt;a href=&quot;#conclusion-and-other-related-work&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Conclusion and other related work&lt;/h1&gt;
&lt;p&gt;Some tangentially related things that didn&#x27;t make it into the main body of
text above:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A language that references Gel in its design but goes in a slightly
different direction is Wouter van Oortmerssen&#x27;s
&lt;a href=&quot;https://strlen.com/lobster/&quot;&gt;Lobster&lt;/a&gt;.  Its &lt;a href=&quot;https://aardappel.github.io/lobster/memory_management.html&quot;&gt;memory management
scheme&lt;/a&gt; attempts
to infer ownership (and optimise in the case where single ownership can be
inferred) rather than requiring single ownership like the languages listed
above.&lt;/li&gt;
&lt;li&gt;One of the discussions on Ownership You Can Count On referenced this
&lt;a href=&quot;https://insights.sei.cmu.edu/documents/351/2013_019_001_55008.pdf&quot;&gt;Pointer Ownership
Model&lt;/a&gt;
paper as having similarities. It chooses to categorise pointers as either
&quot;responsible&quot; or &quot;irresponsible&quot; - cute!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And, that&#x27;s about it. If you&#x27;re hoping for a definitive answer on whether
alias counting is a winning idea or not I&#x27;m sorry to disappoint, but I&#x27;ve at
least succeeded in collecting together the various places I&#x27;ve seen it
explored and am looking forward to seeing how Inko&#x27;s adoption of it evolves.
I&#x27;d be very interested to hear any experience reports of adopting alias
counting or using a language like Inko that tries to use it.&lt;/p&gt;
</content>
</entry>
<entry>
<title>Storing data in pointers</title>
<published>2023-11-26T12:00:00Z</published>
<updated>2023-11-26T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2023q4/storing-data-in-pointers"/>
<id>https://muxup.com/2023q4/storing-data-in-pointers</id>
<content type="html">
&lt;h2 id=&quot;introduction&quot;&gt;&lt;a href=&quot;#introduction&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Introduction&lt;/h2&gt;
&lt;p&gt;On mainstream 64-bit systems, the maximum bit-width
of a virtual address is somewhat lower than 64 bits (commonly 48 bits). This
gives an opportunity to repurpose those unused bits for data storage, if
you&#x27;re willing to mask them out before using your pointer (or have a hardware
feature that does that for you - more on this later). I wondered what happens
to userspace programs relying on such tricks as processors gain support for
wider virtual addresses, hence this little blog post. TL;DR is that there&#x27;s no
real change unless certain hint values to enable use of wider addresses are
passed to &lt;code&gt;mmap&lt;/code&gt;, but read on for more details as well as other notes about
the general topic of storing data in pointers.&lt;/p&gt;
&lt;h2 id=&quot;storage-in-upper-bits-assuming-48-bit-virtual-addresses&quot;&gt;&lt;a href=&quot;#storage-in-upper-bits-assuming-48-bit-virtual-addresses&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Storage in upper bits assuming 48-bit virtual addresses&lt;/h2&gt;
&lt;p&gt;Assuming your platform has 48-bit wide virtual addresses, this is pretty
straightforward. You can stash whatever you want in those 16 bits, but you&#x27;ll
need to ensure you masking them out for every load and store (which is cheap,
but has at least some cost) and would want to be confident that there&#x27;s no
other attempted users for these bits. The masking would be slightly different
in kernel space due to rules on how the upper bits are set:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;x86-64 defines a &lt;a href=&quot;https://cdrdv2.intel.com/v1/dl/getContent/671200&quot;&gt;canonical form of
addresses&lt;/a&gt; (see 3.3.7.1).
This describes how on an implementation with 48-bit virtual addresses, bits
63:48 must be set to the value of bit 47 (i.e. sign extended). The &lt;a href=&quot;https://docs.kernel.org/arch/x86/x86_64/mm.html&quot;&gt;memory
map used by the Linux
kernel&lt;/a&gt; uses bit 47 to
split the address space between kernel and user addresses, so that bit will
always be 0 for user-space addresses meaning bits 63:48 must also be 0 to be
in canonical form.&lt;/li&gt;
&lt;li&gt;RISC-V has essentially the same restriction (see &lt;a href=&quot;https://github.com/riscv/riscv-isa-manual/releases/download/Priv-v1.12/riscv-privileged-20211203.pdf&quot;&gt;4.5.1 in the RISC-V
privileged
specification&lt;/a&gt;)
&quot;instruction fetch addresses and load and store effective addresses, which
are 64 bits, must have bits 63-48 all equal to bit 47, or else a page-fault
exception will occur.&quot; The virtual memory layout used by the Linux kernel
&lt;a href=&quot;https://docs.kernel.org/arch/riscv/vm-layout.html&quot;&gt;uses the same approach as for
x86-64&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;AArch64 has a slight variant on the above which essentially provides a
49-bit address space (meaning user-space virtual memory can cover 256TiB
rather than 128TiB). As &lt;a href=&quot;https://documentation-service.arm.com/static/5efa1d23dbdee951c1ccdec5?token=&quot;&gt;described in the Armv8-A address translation
documentation&lt;/a&gt;
(section 3), for a 48-bit address space bits 63:48 must be all 0s or all 1s.
However, they don&#x27;t need to be a copy of bit 47, and a different address
translation table is used depending on whether bits 63:48 are 1 or 0. This
&lt;a href=&quot;https://docs.kernel.org/arch/arm64/memory.html&quot;&gt;allows splitting kernel/user addresses without giving up bit
47&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;what-if-virtual-addresses-are-wider-than-48-bits&quot;&gt;&lt;a href=&quot;#what-if-virtual-addresses-are-wider-than-48-bits&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;What if virtual addresses are wider than 48 bits?&lt;/h2&gt;
&lt;p&gt;So we&#x27;ve covered the easy case, where you can freely (ab)use the upper 16 bits
for your own purposes. But what if you&#x27;re running on a system that has wider
than 48 bit virtual addresses? How do you know that&#x27;s the case? And is it
possible to limit virtual addresses to the 48-bit range if you&#x27;re sure you
don&#x27;t need the extra bits?&lt;/p&gt;
&lt;p&gt;You can query the virtual address width from the command-line by &lt;code&gt;cat&lt;/code&gt;ting
&lt;code&gt;/proc/cpuinfo&lt;/code&gt;, which might include a line like &lt;code&gt;address sizes	: 39 bits physical, 48 bits virtual&lt;/code&gt;. I&#x27;d hope there&#x27;s a way to get the same information
without parsing &lt;code&gt;/proc/cpuinfo&lt;/code&gt;, but I haven&#x27;t been able to find it.&lt;/p&gt;
&lt;p&gt;As for how to keep using those upper bits on a system with wider virtual
addresses, helpfully the behaviour of &lt;code&gt;mmap&lt;/code&gt; is defined with this
compatibility in mind. It&#x27;s explicitly documented &lt;a href=&quot;https://docs.kernel.org/arch/x86/x86_64/5level-paging.html#user-space-and-large-virtual-address-space&quot;&gt;for
x86-64&lt;/a&gt;,
&lt;a href=&quot;https://docs.kernel.org/arch/arm64/memory.html#bit-userspace-vas&quot;&gt;for
AArch64&lt;/a&gt; and
&lt;a href=&quot;https://docs.kernel.org/arch/riscv/vm-layout.html#userspace-vas&quot;&gt;for RISC-V&lt;/a&gt;
that addresses beyond 48-bits won&#x27;t be returned unless a hint parameter beyond
a certain width is used (the details are slightly different for each target).
This means if you&#x27;re confident that nothing within your process is going to be
passing such hints to &lt;code&gt;mmap&lt;/code&gt; (including e.g. your &lt;code&gt;malloc&lt;/code&gt; implementation), or
at least that you&#x27;ll never need to try to reuse upper bits of addresses
produced in this way, then you&#x27;re free to presume the system uses no more than
48 bits of virtual address.&lt;/p&gt;
&lt;h2 id=&quot;top-byte-ignore-and-similar-features&quot;&gt;&lt;a href=&quot;#top-byte-ignore-and-similar-features&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Top byte ignore and similar features&lt;/h2&gt;
&lt;p&gt;Up to this point I&#x27;ve completely skipped over the various architectural
features that allow some of the upper bits to be ignored upon dereference,
essentially providing hardware support for this type of storage of additional
data within pointeres by making additional masking unnecessary.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;x86-64 keeps things interesting by having slightly different variants of
this for Intel and AMD.
&lt;ul&gt;
&lt;li&gt;Intel introduced Linear Address Masking (LAM), documented in chapter 6 of
&lt;a href=&quot;https://cdrdv2.intel.com/v1/dl/getContent/671368&quot;&gt;their document on instruction set extensions and future
features&lt;/a&gt;. If enabled
this modifies the canonicality check so that, for instance, on a system
with 48-bit virtual addresses bit 47 must be equal to bit 63. This would
allow bits 62:48 (15 bits) can be freely used with no masking needed.
&quot;LAM57&quot; allows 62:57 to be used (6 bits). It seems as if Linux is
currently opting to &lt;a href=&quot;https://lwn.net/Articles/902094/&quot;&gt;only support LAM57 and not
LAM48&lt;/a&gt;. Support for LAM can be
configured separately for user and supervisor mode, but I&#x27;ll refer you to
the Intel docs for details.&lt;/li&gt;
&lt;li&gt;AMD instead &lt;a href=&quot;https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf&quot;&gt;describes Upper Address
Ignore&lt;/a&gt;
(see section 5.10) which allows bits 63:57 (7 bits) to be used, and unlike
LAM doesn&#x27;t require bit 63 to match the upper bit of the virtual address.
As documented in LWN, this &lt;a href=&quot;https://lwn.net/Articles/888914/&quot;&gt;caused some concern from the Linux kernel
community&lt;/a&gt;. Unless I&#x27;m missing it, there
doesn&#x27;t seem to be any level of support merged in the Linux kernel at the
time of writing.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;RISC-V has the proposed &lt;a href=&quot;https://github.com/riscv/riscv-j-extension/blob/1c7cf98295e678e015750ff0b7fdc54ed213b95e/zjpm-spec.pdf&quot;&gt;pointer masking
extension&lt;/a&gt;
which defines new supervisor-level extensions Ssnpm, Smnpm, and Smmpm to
control it. These allow &lt;code&gt;PMLEN&lt;/code&gt; to potentially be set to 7 (masking the
upper 7 bits) or 16 (masking the upper 16 bits). In usual RISC-V style, it&#x27;s
not mandated which of these are supported, but the &lt;a href=&quot;https://github.com/riscv/riscv-profiles/blob/ff79c48f975f93c25f6359d47d0f578b3ecb8555/rva23-profile.adoc&quot;&gt;draft RVA23 profile
mandates that PMLEN=7 must be supported at a
minimum&lt;/a&gt;.
Eagle-eyed readers will note that the proposed approach has the same issue
that caused concern with AMD&#x27;s Upper Address Ignore, namely that the most
significant bit is no longer required to be the same as the top bit of the
virtual address. This is
&lt;a href=&quot;https://github.com/riscv/riscv-j-extension/blob/1c7cf98295e678e015750ff0b7fdc54ed213b95e/zjpm/background.adoc#pointer-masking-and-privilege-modes&quot;&gt;noted&lt;/a&gt;
in the spec, with the suggestion that this is solvable at the ABI level and
some operating systems may choose to mandate that the MSB not be used for
tagging.&lt;/li&gt;
&lt;li&gt;AArch64 has the &lt;a href=&quot;https://developer.arm.com/documentation/den0024/a/ch12s05s01&quot;&gt;Top Byte
Ignore&lt;/a&gt; (TBI)
feature, which as the name suggests just means that the top 8 bits of a
virtual address are ignored when used for memory accesses and can be used to
store data. Any other bits between the virtual address width and top byte
must be set to all 0s or all 1s, as before. TBI is also used by Arm&#x27;s
&lt;a href=&quot;https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Arm_Memory_Tagging_Extension_Whitepaper.pdf&quot;&gt;Memory Tagging
Extension&lt;/a&gt;
(MTE), which uses 4 of those bits as the &quot;key&quot; to be compared against the
&quot;lock&quot; tag bits associated with a memory location being accessed. Armv8.3
defines another potential consumer of otherwise unused address bits,
&lt;a href=&quot;https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets/documents/pointer-auth-v7.pdf&quot;&gt;pointer
authentication&lt;/a&gt;
which uses 11 to 31 bits depending on the virtual address width if TBI isn&#x27;t
being used, or 3 to 23 bits if it is.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A relevant historical note that multiple people pointed out: the original
Motorola 68000 had a 24-bit address bus and so the top byte was simply
ignored which caused &lt;a href=&quot;https://macgui.com/news/article.php?t=527&quot;&gt;well documented porting issues when trying to expand the
address space&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;storing-data-in-least-significant-bits&quot;&gt;&lt;a href=&quot;#storing-data-in-least-significant-bits&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Storing data in least significant bits&lt;/h2&gt;
&lt;p&gt;Another commonly used trick I&#x27;d be remiss not to mention is repurposing a
small number of the least significant bits in a pointer. If you know a certain
set of pointers will only ever be used to point to memory with a given minimal
alignment, you can exploit the fact that the lower bits corresponding to that
alignment will always be zero and store your own data there. As before,
you&#x27;ll need to account for the bits you repurpose when accessing the pointer -
in this case either by masking, or by adjusting the offset used to access the
address (if those least significant bits are known).&lt;/p&gt;
&lt;p&gt;As &lt;a href=&quot;https://fosstodon.org/@pervognsen@mastodon.social/111478311705167492&quot;&gt;suggested by Per
Vognsen&lt;/a&gt;,
after this article was first published, you can exploit x86&#x27;s &lt;a href=&quot;https://en.wikipedia.org/wiki/ModR/M#SIB_byte&quot;&gt;scaled index
addressing mode&lt;/a&gt; to use up
to 3 bits that are unused due to alignment, but storing your data in the upper bits.
The scaled index addressing mode meaning there&#x27;s no need for separate pointer
manipulation upon access. e.g. for an 8-byte aligned address, store it
right-shifted by 3 and use the top 3 bits for metadata, then scaling by 8
using SIB when accessing (which effectively ignores the top 3 bits).  This has
some trade-offs, but is such a neat trick I felt I have to include it!&lt;/p&gt;
&lt;h2 id=&quot;some-real-world-examples&quot;&gt;&lt;a href=&quot;#some-real-world-examples&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Some real-world examples&lt;/h2&gt;
&lt;p&gt;To state what I hope is obvious, this is far from an exhaustive list. The
point of this quick blog post was really to discuss cases where additional
data is stored alongside a pointer, but of course unused bits can also be
exploited to allow a more efficient tagged union representation (and this is
arguably more common), so I&#x27;ve included some examples of that below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href=&quot;https://www.cipht.net/2017/10/29/fixie-tries.html&quot;&gt;fixie trie&lt;/a&gt;, is a
variant of the trie that uses 16 bits in each pointer to store a bitmap used
as part of the lookup logic. It also exploits the minimum alignment of
pointers to repurpose the least significant bit to indicate if a value is a
branch or a leaf.&lt;/li&gt;
&lt;li&gt;On the topic of storing data in the least significant bits, we have a handy
&lt;a href=&quot;https://github.com/llvm/llvm-project/blob/dc8b055c71d2ff2f43c0f4cac66e15a210b91e3b/llvm/include/llvm/ADT/PointerIntPair.h#L64&quot;&gt;PointerIntPair&lt;/a&gt;
class in LLVM to allow the easy implementation of this optimisation. There&#x27;s
also an &lt;a href=&quot;https://github.com/rust-lang/rfcs/pull/3204&quot;&gt;&#x27;alignment niches&#x27;
proposal&lt;/a&gt; for Rust which would
allow this kind of optimisation to be done automatically for &lt;code&gt;enum&lt;/code&gt;s (tagged
unions). Another example of repurposing the LSB found in the wild would be
the Linux kernel &lt;a href=&quot;https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/list_bl.h&quot;&gt;using it for a spin
lock&lt;/a&gt;
(thanks Vegard Nossum for the
&lt;a href=&quot;https://fosstodon.org/@vegard@mastodon.social/111478755690419785&quot;&gt;tip&lt;/a&gt;, who
notes this is used in the kernel&#x27;s directory entry cache hashtable). There
are surely many many more examples.&lt;/li&gt;
&lt;li&gt;Go repurposes both upper and lower bits in its
&lt;a href=&quot;https://github.com/golang/go/blob/master/src/runtime/tagptr_64bit.go&quot;&gt;taggedPointer&lt;/a&gt;,
used internally in its runtime implementation.&lt;/li&gt;
&lt;li&gt;If you have complete control over your heap then there&#x27;s more you can do to
make use of embedded metadata, including using additional bits by avoiding
allocation outside of a certain range and using redundant mappings to avoid
or reduce the need for masking. OpenJDK&#x27;s ZGC &lt;a href=&quot;https://dinfuehr.github.io/blog/a-first-look-into-zgc/&quot;&gt;is a good example of
this&lt;/a&gt;, utilising a
42-bit address space for objects and upon allocation mapping pages to
different aliases to allow pointers using their metadata bits to be
dereferenced without masking.&lt;/li&gt;
&lt;li&gt;A fairly common trick in language runtimes is to exploit the fact that
values can be stored inside the payload of double floating point NaN (not a
number) values and overlap it with pointers (knowing that the full 64 bits
aren&#x27;t needed) and even small integers. There&#x27;s a nice description of this
&lt;a href=&quot;https://github.com/WebKit/WebKit/blob/0a64dd54421137c48a57e6e0aab15a99139a8776/Source/JavaScriptCore/runtime/JSCJSValue.h#L403&quot;&gt;in
JavaScriptCore&lt;/a&gt;,
but it was famously used in
&lt;a href=&quot;http://lua-users.org/lists/lua-l/2009-11/msg00089.html&quot;&gt;LuaJIT&lt;/a&gt;. Andy Wingo
also has a &lt;a href=&quot;https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations&quot;&gt;helpful
write-up&lt;/a&gt;.
Along similar lines, OCaml steals just the least significant bit in order to
&lt;a href=&quot;https://blog.janestreet.com/what-is-gained-and-lost-with-63-bit-integers/&quot;&gt;efficiently support unboxed
integers&lt;/a&gt;
(meaning integers are 63-bit on 64-bit platforms and 31-bit on 32-bit
platforms).&lt;/li&gt;
&lt;li&gt;Apple&#x27;s Objective-C implementation makes heavy use of unused pointer bits,
with some examples
&lt;a href=&quot;https://www.mikeash.com/pyblog/friday-qa-2012-07-27-lets-build-tagged-pointers.html&quot;&gt;documented&lt;/a&gt;
&lt;a href=&quot;https://www.mikeash.com/pyblog/friday-qa-2013-09-27-arm64-and-you.html&quot;&gt;in&lt;/a&gt;
&lt;a href=&quot;https://www.mikeash.com/pyblog/friday-qa-2015-07-31-tagged-pointer-strings.html&quot;&gt;detail&lt;/a&gt;
on Mike Ash&#x27;s excellent blog (with a more recent scheme &lt;a href=&quot;https://alwaysprocessing.blog/2023/03/19/objc-tagged-ptr&quot;&gt;described on Brian
T. Kelley&#x27;s blog&lt;/a&gt;.
Inlining the reference count (falling back to a hash lookup upon overflow)
is a fun one. Another example of using the LSB
to store small strings in-line is &lt;a href=&quot;https://squoze.org/&quot;&gt;squoze&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;V8 opts to limit the heap used for V8 objects to 4GiB using &lt;a href=&quot;https://v8.dev/blog/pointer-compression&quot;&gt;pointer
compression&lt;/a&gt;, where an offset is
used alongside the 32-bit value (which itself might be a pointer or a 31-bit
integer, depending on the least significant bit) to refer to the memory
location.&lt;/li&gt;
&lt;li&gt;As this list is becoming more of a collection of things slightly outside the
scope of this article I might as well round it off with &lt;a href=&quot;https://en.wikipedia.org/wiki/XOR_linked_list&quot;&gt;the XOR linked
list&lt;/a&gt;, which reduces the
storage requirements for doubly linked lists by exploiting the reversibility
of the XOR operation.&lt;/li&gt;
&lt;li&gt;I&#x27;ve focused on storing data in conventional pointers on current commodity
architectures but there is of course a huge wealth of work involving tagged
memory (also an area where &lt;a href=&quot;https://github.com/lowRISC/lowrisc-site/blob/master/static/downloads/lowRISC-memo-2014-001.pdf&quot;&gt;I&#x27;ve
dabbled&lt;/a&gt; -
something for a future blog post perhaps) and/or alternative pointer
representations. I&#x27;ve touched on this with MTE (mentioned due to its
interaction with TBI), but another prominent example is of course
&lt;a href=&quot;https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-941.pdf&quot;&gt;CHERI&lt;/a&gt; which moves
to using 128-bit capabilities in order to fit in additional inline metadata.
David Chisnall provided some &lt;a href=&quot;https://lobste.rs/s/5417dx/storing_data_pointers#c_j12qr0&quot;&gt;observations based on porting code to CHERI
that relies on the kind of tricks described in this
post&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;fin&quot;&gt;&lt;a href=&quot;#fin&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Fin&lt;/h2&gt;
&lt;p&gt;What did I miss? What did I get wrong? Let me know &lt;a href=&quot;https://fosstodon.org/@asb&quot;&gt;on
Mastodon&lt;/a&gt; or email (asb@asbradbury.org).&lt;/p&gt;
&lt;p&gt;You might be interested in the discussion of this article &lt;a href=&quot;https://lobste.rs/s/5417dx/storing_data_pointers&quot;&gt;on
lobste.rs&lt;/a&gt;, &lt;a href=&quot;https://news.ycombinator.com/item?id=38424090&quot;&gt;on
HN&lt;/a&gt;, &lt;a href=&quot;https://old.reddit.com/r/cpp/duplicates/184n4bd/storing_data_in_pointers/&quot;&gt;on various
subreddits&lt;/a&gt;,
or &lt;a href=&quot;https://fosstodon.org/@asb/111478289261238134&quot;&gt;on Mastodon&lt;/a&gt;.&lt;/p&gt;
&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor &quot;aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2023-12-02:
&lt;ul&gt;
&lt;li&gt;Reference Brian T. Kelley&#x27;s blog providing a more up-to-date description
of &quot;pointer tagging&quot; in Objective-C. &lt;a href=&quot;https://fosstodon.org/@uliwitness@chaos.social/111510381669628525&quot;&gt;Spotted on
Mastodon&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;2023-11-27:
&lt;ul&gt;
&lt;li&gt;Mention Squoze (thanks to &lt;a href=&quot;https://fosstodon.org/@vanderZwan@vis.social/111482795620805222&quot;&gt;Job van der
Zwan&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Reworded the intro so as not to claim &quot;it&#x27;s quite well known&quot; that the
maximum virtual address width is typically less than 64 bits. This might
be interpreted as shaming readers for not being aware of that, which
wasn&#x27;t my intent.  Thanks to HN reader jonasmerlin for &lt;a href=&quot;https://news.ycombinator.com/item?id=38430812&quot;&gt;pointing this
out&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Mention CHERI is the list of &quot;real world examples&quot; which is becoming
dominated by instances of things somewhat different to what I was
describing! Thanks to Paul Butcher &lt;a href=&quot;https://www.linkedin.com/feed/update/urn:li:activity:7134613236737302528&quot;&gt;for the
suggestion&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Link to relevant posts on Mike Ash&#x27;s blog
(&lt;a href=&quot;https://lobste.rs/s/5417dx/storing_data_pointers#c_la63sf&quot;&gt;suggested&lt;/a&gt; by
Jens Alfke).&lt;/li&gt;
&lt;li&gt;Link to the various places this article is being discussed.&lt;/li&gt;
&lt;li&gt;Add link to M68k article
&lt;a href=&quot;https://fosstodon.org/@christer@mastodon.gamedev.place/111480158347208956&quot;&gt;suggested&lt;/a&gt;
by Christer Ericson (with multiple others suggesting something similar -
thanks!).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;2023-11-26:
&lt;ul&gt;
&lt;li&gt;Minor typo fixes and rewordings.&lt;/li&gt;
&lt;li&gt;Note the Linux kernel repurposing the LSB as a spin lock (thanks to Vegard
Nossum for the
&lt;a href=&quot;https://fosstodon.org/@vegard@mastodon.social/111478755690419785&quot;&gt;suggestion&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Add SIB addressing idea &lt;a href=&quot;https://fosstodon.org/@pervognsen@mastodon.social/111478311705167492&quot;&gt;shared by Per
Vognsen&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Integrate note
&lt;a href=&quot;https://fosstodon.org/@wingo@mastodon.social/111478367520587737&quot;&gt;suggested&lt;/a&gt;
by Andy Wingo that explicit masking often isn&#x27;t needed when the least
significant pointer bits are repurposed.&lt;/li&gt;
&lt;li&gt;Add a reference to Arm Pointer Authentication (thanks to the
&lt;a href=&quot;https://twitter.com/tmikov/status/1728849257439150425&quot;&gt;suggestion&lt;/a&gt; from
Tzvetan Mikov).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;2023-11-26: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>What&#x27;s new for RISC-V in LLVM 17</title>
<published>2023-10-10T12:00:00Z</published>
<updated>2023-10-10T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2023q4/whats-new-for-risc-v-in-llvm-17"/>
<id>https://muxup.com/2023q4/whats-new-for-risc-v-in-llvm-17</id>
<content type="html">
&lt;p&gt;LLVM 17 was &lt;a href=&quot;https://discourse.llvm.org/t/llvm-17-0-1-released/73549&quot;&gt;released in the past few
weeks&lt;/a&gt;, and I&#x27;m
&lt;a href=&quot;/2023q1/whats-new-for-risc-v-in-llvm-16&quot;&gt;continuing&lt;/a&gt; the
&lt;a href=&quot;/2022q3/whats-new-for-risc-v-in-llvm-15&quot;&gt;tradition&lt;/a&gt;
of writing up some selective highlights of what&#x27;s new as far as RISC-V is
concerned in this release. If you want more general, regular updates on what&#x27;s
going on in LLVM you should of course &lt;a href=&quot;https://llvmweekly.org/&quot;&gt;subscribe to my
newsletter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In case you&#x27;re not familiar with LLVM&#x27;s release schedule, it&#x27;s worth noting
that there are two major LLVM releases a year (i.e. one roughly every 6
months) and these are timed releases as opposed to being cut when a pre-agreed
set of feature targets have been met. We&#x27;re very fortunate to benefit from an
active and growing set of contributors working on RISC-V support in LLVM
projects, who are responsible for the work I describe below - thank you!
I coordinate biweekly sync-up calls for RISC-V LLVM contributors, so if you&#x27;re
working in this area please &lt;a href=&quot;https://discourse.llvm.org/c/code-generation/riscv/57&quot;&gt;consider dropping
in&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;code-size-reduction-extensions&quot;&gt;&lt;a href=&quot;#code-size-reduction-extensions&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Code size reduction extensions&lt;/h2&gt;
&lt;p&gt;A family of extensions referred to as the &lt;a href=&quot;https://github.com/riscv/riscv-code-size-reduction/releases/download/v1.0.4-2/Zc_1.0.4-2.pdf&quot;&gt;RISC-V code size reduction
extensions&lt;/a&gt;
was ratified earlier this year. One aspect of this is providing ways of
referring to subsets of the standard compressed &#x27;C&#x27; (16-bit instructions)
extension that don&#x27;t include floating point loads/stores, as well as other
variants. But the more meaningful additions are the &lt;code&gt;Zcmp&lt;/code&gt; and &lt;code&gt;Zcmt&lt;/code&gt;
extensions, in both cases targeted at embedded rather than application cores,
reusing encodings for double-precision FP store.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Zcmp&lt;/code&gt; provides instructions that implement common stack frame manipulation
operations that would typically require a sequence of instructions, as well as
instructions for moving pairs of registers. The &lt;a href=&quot;https://github.com/llvm/llvm-project/blob/release/17.x/llvm/lib/Target/RISCV/RISCVMoveMerger.cpp&quot;&gt;RISCVMoveMerger
pass&lt;/a&gt;
performs the necessary peephole optimisation to produce &lt;code&gt;cm.mva01s&lt;/code&gt; or
&lt;code&gt;cm.mvsa01&lt;/code&gt; instructions for moving to/from registers a0-a1 and s0-s7 when
possible. It iterates over generated machine instructions, looking for pairs
of &lt;code&gt;c.mv&lt;/code&gt; instructions that can be replaced. &lt;code&gt;cm.push&lt;/code&gt; and &lt;code&gt;cm.pop&lt;/code&gt;
instructions are generated by appropriate modifications to the RISC-V function
frame lowering code, while the &lt;a href=&quot;https://github.com/llvm/llvm-project/blob/release/17.x/llvm/lib/Target/RISCV/RISCVPushPopOptimizer.cpp&quot;&gt;RISCVPushPopOptimizer
pass&lt;/a&gt;
looks for opportunities to convert a &lt;code&gt;cm.pop&lt;/code&gt; into a &lt;code&gt;cm.popretz&lt;/code&gt; (pop
registers, deallocate stack frame, and return zero) or &lt;code&gt;cm.popret&lt;/code&gt; (pop
registers, deallocate stack frame, and return).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Zcmt&lt;/code&gt; provides the &lt;code&gt;cm.jt&lt;/code&gt; and &lt;code&gt;cm.jalt&lt;/code&gt; instructions to reduce code size
needed for implemented a jump table. Although support is present in the
assembler, the patch to modify the linker to select these instructions is
still under review so we can hope to see full support in LLVM 18.&lt;/p&gt;
&lt;p&gt;The RISC-V code size reduction working group have &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1bFMyGkuuulBXuIaMsjBINoCWoLwObr1l9h5TAWN8s7k/edit#gid=1837831327&quot;&gt;estimates of the code size
impact of these
extensions&lt;/a&gt;
produced using &lt;a href=&quot;https://github.com/riscv/riscv-code-size-reduction/tree/main/benchmarks&quot;&gt;this analysis
script&lt;/a&gt;.
I&#x27;m not aware of whether a comparison has been made to the real-world results
of implementing support for the extensions in LLVM, but that would certainly
be interesting.&lt;/p&gt;
&lt;h2 id=&quot;vectorization&quot;&gt;&lt;a href=&quot;#vectorization&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Vectorization&lt;/h2&gt;
&lt;p&gt;LLVM has &lt;a href=&quot;https://llvm.org/docs/Vectorizers.html&quot;&gt;two forms of
auto-vectorization&lt;/a&gt;, the loop
vectorizer and the SLP (superword-level parallelism) vectorizer. The loop
vectorizer was enabled during the LLVM 16 development cycle, while the SLP
vectorizer &lt;a href=&quot;https://github.com/llvm/llvm-project/commit/7f26c27e03f1&quot;&gt;was
enabled&lt;/a&gt; for this
release. Beyond that, there&#x27;s been a huge number of incremental improvements
for vector codegen such that isn&#x27;t always easy to pick out particular
highlights. But to pick a small set of changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/riscv-non-isa/rvv-intrinsic-doc/releases/tag/v0.12.0&quot;&gt;Version 0.12 of the RISC-V vector C intrinsics
specification&lt;/a&gt;
is now supported by Clang. As noted in the release notes, the hope is there
will not be new incompatibilities introduced prior to v1.0.&lt;/li&gt;
&lt;li&gt;There were lots of minor codegen improvements, one example would be
&lt;a href=&quot;https://github.com/llvm/llvm-project/commit/badf11de4ac6&quot;&gt;improvements to the RISCVInsertVSETVLI
pass&lt;/a&gt; to avoid
additional unnecessary insertions &lt;code&gt;vsetivli&lt;/code&gt; instruction that is used to
modify the &lt;code&gt;vtype&lt;/code&gt; control register.&lt;/li&gt;
&lt;li&gt;It&#x27;s not particularly user visible, but there was a lot of refactoring of
vector pseudoinstructions used internally during instruction selection
(following &lt;a href=&quot;https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295&quot;&gt;this
thread&lt;/a&gt;.
The &lt;a href=&quot;https://github.com/llvm/llvm-project/commit/691618a7a959&quot;&gt;added
documentation&lt;/a&gt;
will likely be helpful if you&#x27;re hoping to better understand this.&lt;/li&gt;
&lt;li&gt;You might be aware that &lt;code&gt;LMUL&lt;/code&gt; in the RISC-V vector extension controls
grouping of vector registers, for instance rather than 32 vector registers,
you might want to set LMUL=4 to treat them as 8 registers that are 4 times
as large. The &quot;best&quot; LMUL is going to vary depending on both the target
microarchitecture and factors such as register pressure, but a change was
made so &lt;a href=&quot;https://github.com/llvm/llvm-project/commit/8d16c6809a08&quot;&gt;LMUL=2 is the new
default&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvm.org/docs/CommandGuide/llvm-mca.html&quot;&gt;llvm-mca&lt;/a&gt; (the LLVM
Machine Code Analyzer) is a performance analysis tool that uses information
such as LLVM scheduling models to statically estimate the performance of
machine code on a specific CPU. There were at least two changes relevant to
llvm-mca and RISC-V vector support: &lt;a href=&quot;https://github.com/llvm/llvm-project/commit/1a855819a87f&quot;&gt;scheduling information for RVV on
SiFive7 cores&lt;/a&gt;
(which of course is used outside of llvm-mca as well), and support for
&lt;a href=&quot;https://github.com/llvm/llvm-project/commit/ecf372f993fa&quot;&gt;vsetivli/vsetvli&lt;/a&gt;
&#x27;instruments&#x27;. llvm-mca has the concept of an &#x27;&lt;a href=&quot;https://llvm.org/docs/CommandGuide/llvm-mca.html#instrument-regions&quot;&gt;instrument
region&lt;/a&gt;&#x27;,
a section of assembly with an LLVM-MCA comment that can (for instance)
indicate the value of a control register that would affect scheduling. This
can be used to set &lt;code&gt;LMUL&lt;/code&gt; (register grouping) for RISC-V, however in the
case of the immediate forms of &lt;code&gt;vsetvl&lt;/code&gt; occuring in the input, &lt;code&gt;LMUL&lt;/code&gt; can be
statically determined.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want to find out more about RISC-V vector support in LLVM, be sure to
check out &lt;a href=&quot;https://llvm.swoogo.com/2023devmtg/session/1767411/vector-codegen-in-the-risc-v-backend&quot;&gt;my Igalia colleague Luke Lau&#x27;s
talk&lt;/a&gt;
at the LLVM Dev Meeting this week (I&#x27;ll update this article when
slides+recording are available).&lt;/p&gt;
&lt;h2 id=&quot;other-isa-extensions&quot;&gt;&lt;a href=&quot;#other-isa-extensions&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Other ISA extensions&lt;/h2&gt;
&lt;p&gt;It wouldn&#x27;t be a RISC-V article without a list of hard to interpret strings
that claim to be ISA extension names (Zvfbfwma is a real extension, I
promise!). In addition to the code size reduction extension listed above
there&#x27;s been lots of newly added or updated extensions in this release cycle.
Do refer to the &lt;a href=&quot;https://releases.llvm.org/17.0.1/docs/RISCVUsage.html&quot;&gt;RISCVUsage
documentation&lt;/a&gt; for
something that aims to be a complete list of what is supported (occasionally
there are omissions) as well as clarity on what we mean by an extension being
marked as &quot;experimental&quot;.&lt;/p&gt;
&lt;p&gt;Here&#x27;s a partial list:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code generation support for the &lt;a href=&quot;https://github.com/riscv/riscv-isa-manual/blob/main/src/zfinx.adoc&quot;&gt;Zfinx, Zdinx, Zhinx, and Zhinxmin
extensions&lt;/a&gt;.
These extensions provide support for single, double, and half precision
floating point instructions respectively, but define them to operate on the
general purpose register file rather than requiring an additional floating
point register file. This reduces implementation cost on simple core
designs.&lt;/li&gt;
&lt;li&gt;Support for a whole range of vendor-defined extensions. e.g. XTHeadBa
(address gneeration), XTheadBb (basic bit manipulation), Xsfvcp (SiFive
VCIX), XCVbitmanip (CORE-V bit manipulation custom instructions) and many
more (see the &lt;a href=&quot;https://releases.llvm.org/17.0.1/docs/ReleaseNotes.html#changes-to-the-risc-v-backend&quot;&gt;release
notes&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Experimental &lt;a href=&quot;https://github.com/riscv/riscv-crypto&quot;&gt;vector crypto extension&lt;/a&gt; support was
updated to version 0.5.1 of the specification.&lt;/li&gt;
&lt;li&gt;Experimental support was added for version 0.2 of the &lt;a href=&quot;https://github.com/riscv/riscv-isa-manual/blob/main/src/zfa.adoc&quot;&gt;Zfa
extension&lt;/a&gt;
(providing additional floating-point instructions).&lt;/li&gt;
&lt;li&gt;Assembler/disassembler support for an experimental family of extensions to support operations
on the &lt;a href=&quot;https://en.wikipedia.org/wiki/Bfloat16_floating-point_format&quot;&gt;bfloat16 floating-point
format&lt;/a&gt;.
&lt;a href=&quot;https://github.com/riscv/riscv-bfloat16&quot;&gt;Zfbfmin, Zvfbfmin, and Zvfbfwma&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Assembler/disassembler support for the experimental
&lt;a href=&quot;https://github.com/riscv/riscv-zacas&quot;&gt;Zacas&lt;/a&gt; extension (atomic
compare-and-swap).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It landed after the 17.x branch so isn&#x27;t in this release, but in the future
you&#x27;ll be able to use &lt;code&gt;--print-supported-extensions&lt;/code&gt; with Clang to have it
print a table of supported ISA extensions (the same flag has now been
implemented for Arm and AArch64 too).&lt;/p&gt;
&lt;h2 id=&quot;other-additions-and-improvements&quot;&gt;&lt;a href=&quot;#other-additions-and-improvements&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Other additions and improvements&lt;/h2&gt;
&lt;p&gt;As always, it&#x27;s not possible to go into detail on every change. A selection of
other changes that I&#x27;m not able to delve into more detail on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Initial RISC-V support was added to &lt;a href=&quot;https://github.com/llvm/llvm-project/blob/main/bolt/README.md&quot;&gt;LLVM&#x27;s BOLT post-link
optimizer&lt;/a&gt;
and various fixes / feature additions made to
&lt;a href=&quot;https://llvm.org/docs/JITLink.html&quot;&gt;JITLink&lt;/a&gt;, thanks to the work of my
Igalia colleague Job Noorman. There&#x27;s actually a lot to say about this work,
but I don&#x27;t need to because Job has written up and &lt;a href=&quot;https://blogs.igalia.com/compilers/2023/06/30/porting-bolt-to-risc-v/&quot;&gt;excellent blog post on
it&lt;/a&gt;
that I highly encourage you go and read.&lt;/li&gt;
&lt;li&gt;LLD &lt;a href=&quot;https://github.com/llvm/llvm-project/commit/85444794cdde&quot;&gt;gained
support&lt;/a&gt; for some
of the relaxations involving the global pointer.&lt;/li&gt;
&lt;li&gt;I expect there&#x27;ll be more to say about this in future releases, but there&#x27;s
been incremental progress on RISC-V
&lt;a href=&quot;https://llvm.org/docs/GlobalISel/index.html&quot;&gt;GlobalISel&lt;/a&gt; in the LLVM 17
development cycle (which has continued after). You might be interested in
the &lt;a href=&quot;https://llvm.org/devmtg/2023-05/slides/Tutorial-May11/01-Bradbury-GlobalISelTutorial.pdf&quot;&gt;slides from my GlobalISel by example talk at EuroLLVM this
year&lt;/a&gt;.
Ivan Baev at SiFive is also set to &lt;a href=&quot;https://riscvsummit2023.sched.com/event/1QUod&quot;&gt;speak about some of this
work&lt;/a&gt; at the RISC-V Summit in
November.&lt;/li&gt;
&lt;li&gt;Clang supports a form of control-flow integrity called
&lt;a href=&quot;https://clang.llvm.org/docs/ControlFlowIntegrity.html#fsanitize-kcfi&quot;&gt;KCFI&lt;/a&gt;.
This is used by low-level software like the Linux kernel (see
&lt;code&gt;CONFIG_CFI_CLANG&lt;/code&gt; in the Linux tree) but the target-specific parts were
previously unimplemented for RISC-V. This gap &lt;a href=&quot;https://github.com/llvm/llvm-project/commit/83835e22c7cd&quot;&gt;was
filled&lt;/a&gt; for the
LLVM 17 release.&lt;/li&gt;
&lt;li&gt;LLVM has its own &lt;a href=&quot;https://libc.llvm.org/&quot;&gt;work-in-progress libc
implementation&lt;/a&gt;, and the RISC-V implementations of
&lt;code&gt;memcmp&lt;/code&gt;, &lt;code&gt;bcmp&lt;/code&gt;, &lt;code&gt;memset&lt;/code&gt;, and &lt;code&gt;memcpy&lt;/code&gt; all gained optimised RISC-V
specific versions. There will of course be further updates for LLVM 18,
including the work from my colleague Mikhail R Gadelha on 32-bit RISC-V
support.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Apologies if I&#x27;ve missed
your favourite new feature or improvement - the &lt;a href=&quot;https://releases.llvm.org/17.0.1/docs/ReleaseNotes.html#changes-to-the-risc-v-backend&quot;&gt;LLVM release
notes&lt;/a&gt;
will include some things I haven&#x27;t had space for here. Thanks again for
everyone who has been contributing to make the RISC-V in LLVM even better.&lt;/p&gt;
&lt;p&gt;If you have a RISC-V project you think me and my colleagues and at Igalia may
be able to help with, then do &lt;a href=&quot;https://www.igalia.com/contact/&quot;&gt;get in touch&lt;/a&gt;
regarding our services.&lt;/p&gt;
</content>
</entry>
<entry>
<title>2023Q2 week log</title>
<published>2023-04-10T12:00:00Z</published>
<updated>2023-06-05T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2023q2/week-log"/>
<id>https://muxup.com/2023q2/week-log</id>
<content type="html">
&lt;p&gt;I tend to keep quite a lot of notes on the development related (sometimes at
work, sometimes not) I do on a week-by-week basis, and thought it might be fun
to write up the parts that were public. This may or may not be of wider
interest, but it aims to be a useful aide-mémoire for my purposes at least.
Weeks with few entries might be due to focusing on downstream work (or perhaps
just a less productive week - I am only human!).&lt;/p&gt;
&lt;h2 id=&quot;week-of-29th-may-2023&quot;&gt;&lt;a href=&quot;#week-of-29th-may-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 29th May 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Posted &lt;a href=&quot;https://reviews.llvm.org/D151663&quot;&gt;D151663&lt;/a&gt;, implementing support for
bf16 truncate/extend of hard FP targets.&lt;/li&gt;
&lt;li&gt;Responded to user query about &lt;a href=&quot;https://discourse.llvm.org/t/csrs-defined-in-sstc-extension/70824/2&quot;&gt;gating of
CSRs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Filed &lt;a href=&quot;https://gitlab.xfce.org/apps/xfce4-terminal/-/issues/244&quot;&gt;issue&lt;/a&gt;
about shift and right-click and xfce4-terminal (after migrating to it due to
frustration with gnome-terminal &lt;a href=&quot;https://gitlab.gnome.org/GNOME/gnome-terminal/-/issues/6698&quot;&gt;not supporting setting the urgent hint upon
receiving a terminal
bell&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://fosstodon.org/@asb/110475298100440985&quot;&gt;Spread the word&lt;/a&gt; about my
keynote about LLVM next week at the RISC-V Summit Europe.&lt;/li&gt;
&lt;li&gt;A few reviews on RISC-V psABI or ASM manual PRs (e.g. &lt;a href=&quot;https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378&quot;&gt;atomics
ABI&lt;/a&gt;,
&lt;a href=&quot;https://github.com/riscv-non-isa/riscv-asm-manual/pull/86&quot;&gt;floating point in the asm
manual&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Less activity this week due to being on holiday.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/491&quot;&gt;LLVM Weekly #491&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-22nd-may-2023&quot;&gt;&lt;a href=&quot;#week-of-22nd-may-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 22nd May 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Posted a &lt;a href=&quot;https://reviews.llvm.org/D151434&quot;&gt;patch&lt;/a&gt; to generalise  the
shouldExtendTypeInLibcall hook so it applies to half and bfloat16.&lt;/li&gt;
&lt;li&gt;Updated &lt;a href=&quot;https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/367&quot;&gt;bfloat16 psABI
PR&lt;/a&gt;, which
has now been merged.&lt;/li&gt;
&lt;li&gt;Posted &lt;a href=&quot;https://reviews.llvm.org/D151563&quot;&gt;D151363&lt;/a&gt; a patch to implement soft
FP legalisation for bf16 FP_EXTEND and BF16_TO_FP after abandoning &lt;a href=&quot;https://reviews.llvm.org/D151436&quot;&gt;patch to
add an extenbfsf2 libcall&lt;/a&gt; (which would
match libgcc, but add no real value).&lt;/li&gt;
&lt;li&gt;Identified a bug in the ABI used for half FP libcalls and &lt;a href=&quot;https://reviews.llvm.org/D151284&quot;&gt;posted a
patch&lt;/a&gt; to fix it.&lt;/li&gt;
&lt;li&gt;Some misc small cleanups like &lt;a href=&quot;https://reviews.llvm.org/D151096&quot;&gt;making zfbfmin imply the F
extension&lt;/a&gt;, cleaning up bfloat16 tests
(&lt;a href=&quot;https://reviews.llvm.org/rGf3202b9da663&quot;&gt;1&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/rGa6e2b1ee49f5&quot;&gt;2&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Prepared agenda for an ran biweekly &lt;a href=&quot;https://discourse.llvm.org/t/risc-v-llvm-sync-up-call-may-25th-2023/70873&quot;&gt;RISC-V LLVM sync-up
call&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/490&quot;&gt;LLVM Weekly #490&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-15th-may-2023&quot;&gt;&lt;a href=&quot;#week-of-15th-may-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 15th May 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Corrected Clang codegen support for half FP types when the zhinx extension
is available (&lt;a href=&quot;https://reviews.llvm.org/D150777&quot;&gt;D150777&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Rebased and committed patches to implement MC layer support for the bfloat16
extensions (unblocked now a new PDF was posted in the riscv-bfloat16 repo).
&lt;a href=&quot;https://reviews.llvm.org/D147610&quot;&gt;Zfbfmin&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/D147611&quot;&gt;zvfbfmin&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/D147612&quot;&gt;zvfbfwma&lt;/a&gt;. Also made a &lt;a href=&quot;https://github.com/riscv/riscv-bfloat16/pull/48&quot;&gt;trivial typo fix
to the spec&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Looked at cleaning up the usage of &lt;code&gt;report_fatal_error&lt;/code&gt; in the RISC-V
backend and also &lt;a href=&quot;https://reviews.llvm.org/D150669&quot;&gt;fixed a bug encountered while looking at
this&lt;/a&gt;.
&lt;a href=&quot;https://reviews.llvm.org/D150674&quot;&gt;D150674&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Participated in the ongoing discussion about &lt;a href=&quot;https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378#issuecomment-1549250676&quot;&gt;adding atomics lowering to the
RISC-V
psABI&lt;/a&gt;,
including a slightly altered lowering of some primitives in order to allow
for forwards compatibility with &quot;table A.7&quot;.&lt;/li&gt;
&lt;li&gt;Usual mix of upstream LLVM reviews, and a number of RISC-V psABI or ASM
manual reviews.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/489&quot;&gt;LLVM Weekly #489&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Missed some weeks - busy with EuroLLVM etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-17th-april-2023&quot;&gt;&lt;a href=&quot;#week-of-17th-april-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 17th April 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Still &lt;a href=&quot;https://github.com/riscv/riscv-bfloat16/issues/33&quot;&gt;pinging&lt;/a&gt; for an
updated riscv-bfloat1y spec version that incorporates the &lt;code&gt;fcvt.bf16.s&lt;/code&gt;
encoding fix.&lt;/li&gt;
&lt;li&gt;Bumped the version of the experimental Zfa RISC-V extension supported by
LLVM to 0.2 (&lt;a href=&quot;https://reviews.llvm.org/D148634&quot;&gt;D146834&lt;/a&gt;). This was very
straightforward as after inspecting the spec history, it was clear there
were no changes that would impact the compiler.&lt;/li&gt;
&lt;li&gt;Filed a couple of pull requests against the &lt;a href=&quot;https://github.com/riscv/riscv-zacas&quot;&gt;riscv-zacas
repo&lt;/a&gt; (RISC-V Atomic Compare and Swap
extension).
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/riscv/riscv-zacas/pull/8&quot;&gt;#8&lt;/a&gt; made the
dependency on the A extension explicit.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/riscv/riscv-zacas/pull/7&quot;&gt;#7&lt;/a&gt; attempted to explicitly
reference the extension for misaligned atomics, though it seems won&#x27;t be
merged. I do feel uncomfortable with RISC-V extensions that can have their
semantics changed by other standard extensions without this possibility
being called out very explicitly. As I note in the PR, failure to
appreciate this might mean that conformance tests written for &lt;code&gt;zacas&lt;/code&gt;
might fail on a system with &lt;code&gt;zacas_zam&lt;/code&gt;. I see a slight parallel to a
recent &lt;a href=&quot;https://lists.riscv.org/g/tech-profiles/message/94&quot;&gt;discussion about RISC-V
profiles&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Fixed the canonical ordering used for ISA naming strings in RISCVISAInfo
(this will mainly affect the string stored in build attributes). This was
fixed in &lt;a href=&quot;https://reviews.llvm.org/D148615&quot;&gt;D148615&lt;/a&gt; which built on the
&lt;a href=&quot;https://reviews.llvm.org/rGa35e67fc5be6&quot;&gt;pre-committed test case&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A whole bunch of upstream LLVM reviews. As noted in
&lt;a href=&quot;https://reviews.llvm.org/D148315#4279486&quot;&gt;D148315&lt;/a&gt; I&#x27;m thinking we should
probably relaxing the ordering rules for ISA strings in &lt;code&gt;-march&lt;/code&gt; in order to
avoid issues due to spec changes and incompatibilities between GCC and
Clang.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/485&quot;&gt;LLVM Weekly #485&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-10th-april-2023&quot;&gt;&lt;a href=&quot;#week-of-10th-april-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 10th April 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Some days off due to the Easter holidays, so less to report this week.&lt;/li&gt;
&lt;li&gt;Updated RISC-V bfloat16 patches
(&lt;a href=&quot;https://reviews.llvm.org/D147610&quot;&gt;Zfbfmin&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/D147611&quot;&gt;Zvfbfmin&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/D147612&quot;&gt;Zvfbfwma&lt;/a&gt;), incorporating new
&lt;code&gt;fcvt.bf16.s&lt;/code&gt; encoding. Also filed an
&lt;a href=&quot;https://github.com/riscv/riscv-bfloat16/issues/40&quot;&gt;issue&lt;/a&gt; about the way in
which the dependencies of the vector bfloat16 extensions is specified.&lt;/li&gt;
&lt;li&gt;Blogged about &lt;a href=&quot;/2023q2/updating-wrens-benchmarks&quot;&gt;updating the Wren language
benchmarks&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Variety of upstream LLVM reviews.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/484&quot;&gt;LLVM Weekly #484&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-3rd-april-2023&quot;&gt;&lt;a href=&quot;#week-of-3rd-april-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 3rd April 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Some days off due to the Easter holidays, so less to report this week.&lt;/li&gt;
&lt;li&gt;Posted MC layer (assembler/disassembler) patches for the bfloat16
extensions:
&lt;a href=&quot;https://reviews.llvm.org/D147610&quot;&gt;Zfbfmin&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/D147611&quot;&gt;Zvfbfmin&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/D147612&quot;&gt;Zvfbfwma&lt;/a&gt;.
&lt;ul&gt;
&lt;li&gt;Also posted a PR to the riscv-bfloat16 spec to &lt;a href=&quot;https://github.com/riscv/riscv-bfloat16/pull/34&quot;&gt;clarify the vector
extension dependencies&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Pinged on my &lt;a href=&quot;https://github.com/riscv/riscv-bfloat16/issues/33&quot;&gt;bug report about the fcvt.bf16.s encoding
clash&lt;/a&gt;. Once this is
resolved, the LLVM MC layer patches can land.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Updated authorship information for
&lt;a href=&quot;https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/34&quot;&gt;riscv-toolchain-conventions&lt;/a&gt;
which now has a range of contributors beyond myself.&lt;/li&gt;
&lt;li&gt;Usual mix of upstream LLVM reviews. This included some discussion on
&lt;a href=&quot;https://reviews.llvm.org/D146463&quot;&gt;changing the shadow call stack register to
x3&lt;/a&gt; which spilled into the &lt;a href=&quot;https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/371&quot;&gt;psABI
PR&lt;/a&gt; where I
suggested some alternate wording.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/483&quot;&gt;LLVM Weekly #483&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor &quot;aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2023-06-05: Added notes for the week of 22nd May 2023 and week fo 29th May
2023.&lt;/li&gt;
&lt;li&gt;2023-05-22: Added notes for the week of 15th May 2023.&lt;/li&gt;
&lt;li&gt;2023-04-24: Added notes for the week of 17th April 2023.&lt;/li&gt;
&lt;li&gt;2023-04-17: Added notes for the week of 10th April 2023.&lt;/li&gt;
&lt;li&gt;2023-04-10: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>Updating Wren&#x27;s benchmarks</title>
<published>2023-04-10T12:00:00Z</published>
<updated>2023-04-10T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2023q2/updating-wrens-benchmarks"/>
<id>https://muxup.com/2023q2/updating-wrens-benchmarks</id>
<content type="html">
&lt;p&gt;&lt;a href=&quot;https://wren.io/&quot;&gt;Wren&lt;/a&gt; is a &quot;small, fast, class-based, concurrent scripting
language&quot;, originally designed by Bob Nystrom (who you might recognise as the
author of &lt;a href=&quot;https://gameprogrammingpatterns.com/&quot;&gt;Game Programming Patterns&lt;/a&gt;
and &lt;a href=&quot;https://craftinginterpreters.com/&quot;&gt;Crafting Interpreters&lt;/a&gt;. It&#x27;s a really
fun language to study - the implementation is compact and easily readable, and
although class-based languages aren&#x27;t considered very hip these days there&#x27;s a
real elegance to its design. I saw Wren&#x27;s &lt;a href=&quot;https://wren.io/performance.html&quot;&gt;performance
page&lt;/a&gt; hadn&#x27;t been updated for a very long
time, and especially given the recent upstream interpreter performance work on
Python, was interested in seeing how performance on these microbencharks has
changed. Hence this quick post to share some new numbers.&lt;/p&gt;
&lt;h2 id=&quot;new-results&quot;&gt;&lt;a href=&quot;#new-results&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;New results&lt;/h2&gt;
&lt;p&gt;To cut to the chase, here are the results I get running the same set of
&lt;a href=&quot;https://github.com/wren-lang/wren/tree/main/test/benchmark&quot;&gt;benchmarks&lt;/a&gt;
across a collection of Python, Ruby, and Lua versions (those available in
current Arch Linux).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Method Call&lt;/strong&gt;:&lt;/p&gt;
&lt;table class=&quot;chart&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;wren0.4&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 25%;&quot;&gt;0.079s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;luajit2.1 -joff&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 29%;&quot;&gt;0.090s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;ruby2.7&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 33%;&quot;&gt;0.102s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;ruby3.0&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 33%;&quot;&gt;0.104s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.4&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 39%;&quot;&gt;0.123s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.3&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 50%;&quot;&gt;0.156s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.11&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 54%;&quot;&gt;0.170s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.2&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 59%;&quot;&gt;0.184s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;mruby&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 62%;&quot;&gt;0.193s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.10&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 100%;&quot;&gt;0.313s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Delta Blue&lt;/strong&gt;:&lt;/p&gt;
&lt;table class=&quot;chart&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;wren0.4&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 43%;&quot;&gt;0.086s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.11&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 53%;&quot;&gt;0.106s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.10&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 100%;&quot;&gt;0.202s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Binary Trees&lt;/strong&gt;:&lt;/p&gt;
&lt;table class=&quot;chart&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;luajit2.1 -joff&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 37%;&quot;&gt;0.073s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;ruby2.7&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 58%;&quot;&gt;0.113s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;ruby3.0&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 59%;&quot;&gt;0.115s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.11&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 70%;&quot;&gt;0.137s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.4&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 71%;&quot;&gt;0.138s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;wren0.4&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 73%;&quot;&gt;0.144s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;mruby&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 84%;&quot;&gt;0.163s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.10&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 95%;&quot;&gt;0.186s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.3&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 99%;&quot;&gt;0.195s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.2&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 100%;&quot;&gt;0.196s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Recursive Fibonacci&lt;/strong&gt;:&lt;/p&gt;
&lt;table class=&quot;chart&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;luajit2.1 -joff&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 22%;&quot;&gt;0.055s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.4&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 36%;&quot;&gt;0.090s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;ruby2.7&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 43%;&quot;&gt;0.109s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;ruby3.0&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 47%;&quot;&gt;0.117s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.3&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 50%;&quot;&gt;0.126s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.2&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 55%;&quot;&gt;0.138s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;wren0.4&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 59%;&quot;&gt;0.148s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.11&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 62%;&quot;&gt;0.157s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;mruby&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 73%;&quot;&gt;0.185s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.10&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 100%;&quot;&gt;0.252s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;I&#x27;ve used essentially the same presentation and methodology as in the original
benchmark, partly to save time pondering the optimal approach, partly so I can
redirect any critiques to the original author (sorry Bob!). Benchmarks do not
measure interpreter startup time, and each benchmark is run ten times with the
median used (thermal throttling could potentially mean this isn&#x27;t the best
methodology, but changing the number of test repetitions to e.g. 1000 seems to
have little effect).&lt;/p&gt;
&lt;p&gt;The tests were run on a machine with an AMD Ryzen 9 5950X processor. wren 0.4
as of commit
&lt;a href=&quot;https://github.com/wren-lang/wren/commit/c2a75f1eaf9b1ba1245d7533a723360863fb012d&quot;&gt;c2a75f1&lt;/a&gt;
was used as well as the following Arch Linux packages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;lua52-5.2.4-5&lt;/li&gt;
&lt;li&gt;lua53-5.3.6-1&lt;/li&gt;
&lt;li&gt;lua-5.4.4-3,&lt;/li&gt;
&lt;li&gt;luajit-2.1.0.beta3.r471.g505e2c03-1&lt;/li&gt;
&lt;li&gt;mruby-3.1.0-1&lt;/li&gt;
&lt;li&gt;python-3.10.10-1&lt;/li&gt;
&lt;li&gt;python-3.11.3-1 (taken from Arch Linux&#x27;s staging repo)&lt;/li&gt;
&lt;li&gt;ruby2.7-2.7.7-1&lt;/li&gt;
&lt;li&gt;ruby-3.0.5-1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Python 3.10 and 3.11 packages were compiled with the same GCC version
(12.2.1 according to &lt;code&gt;python -VV&lt;/code&gt;), though this won&#x27;t necessarily be true for
all other packages (e.g. the lua52 and lua53 packages are several years old so
will have been built an older GCC).&lt;/p&gt;
&lt;p&gt;I&#x27;ve submitted a &lt;a href=&quot;https://github.com/wren-lang/wren/pull/1164&quot;&gt;pull request to update the Wren performance
page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;old-results&quot;&gt;&lt;a href=&quot;#old-results&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Old results&lt;/h2&gt;
&lt;p&gt;The following results are copied from the &lt;a href=&quot;https://wren.io/performance.html&quot;&gt;Wren performance
page&lt;/a&gt; (&lt;a href=&quot;https://web.archive.org/web/20230326002211/https://wren.io/performance.html&quot;&gt;archive.org
link&lt;/a&gt;
ease of comparison. They were run on a MacBook Pro 2.3GHz Intel Core i7 with
Lua 5.2.3, LuaJIT 2.0.2, Python 2.7.5, Python 3.3.4, ruby 2.0.0p247.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Method Call&lt;/strong&gt;:&lt;/p&gt;
&lt;table class=&quot;chart&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;wren2015&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 14%;&quot;&gt;0.12s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;luajit2.0 -joff&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 18%;&quot;&gt;0.16s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;ruby2.0&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 23%;&quot;&gt;0.20s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.2&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 41%;&quot;&gt;0.35s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.3&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 91%;&quot;&gt;0.78s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python2.7&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 100%;&quot;&gt;0.85s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;DeltaBlue&lt;/strong&gt;:&lt;/p&gt;
&lt;table class=&quot;chart&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;wren2015&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 22%;&quot;&gt;0.13s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.3&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 83%;&quot;&gt;0.48s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python2.7&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 100%;&quot;&gt;0.57s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Binary Trees&lt;/strong&gt;:&lt;/p&gt;
&lt;table class=&quot;chart&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;luajit2.0 -joff&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 20%;&quot;&gt;0.11s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;wren2015&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 41%;&quot;&gt;0.22s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;ruby2.0&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 46%;&quot;&gt;0.24s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python2.7&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 71%;&quot;&gt;0.37s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.3&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 73%;&quot;&gt;0.38s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.2&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 100%;&quot;&gt;0.52s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Recursive Fibonacci&lt;/strong&gt;:&lt;/p&gt;
&lt;table class=&quot;chart&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;luajit2.0 -joff&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 17%;&quot;&gt;0.10s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;wren2015&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 35%;&quot;&gt;0.20s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;ruby2.0&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 39%;&quot;&gt;0.22s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;lua5.2&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 49%;&quot;&gt;0.28s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python2.7&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 90%;&quot;&gt;0.51s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th&gt;python3.3&lt;/th&gt;&lt;td&gt;&lt;div class=&quot;chart-bar&quot; style=&quot;width: 100%;&quot;&gt;0.57s&amp;nbsp;&lt;/div&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;h2 id=&quot;observations&quot;&gt;&lt;a href=&quot;#observations&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Observations&lt;/h2&gt;
&lt;p&gt;A few takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LuaJIT&#x27;s bytecode interpreter remains incredibly fast (though see &lt;a href=&quot;https://sillycross.github.io/2022/11/22/2022-11-22/&quot;&gt;this blog
post&lt;/a&gt; for a methodology
to produce an even faster interpreter).&lt;/li&gt;
&lt;li&gt;The performance improvements in Python 3.11 were &lt;a href=&quot;https://docs.python.org/3/whatsnew/3.11.html#whatsnew311-faster-cpython&quot;&gt;well
documented&lt;/a&gt;
and are very visible on this set of microbenchparks.&lt;/li&gt;
&lt;li&gt;I was more surprised by the performance jump with Lua 5.4, especially as the
&lt;a href=&quot;https://www.lua.org/manual/5.4/readme.html#changes&quot;&gt;release notes&lt;/a&gt; give few
hints of performance improvements that would be reflected in these
microbenchmarks. The &lt;a href=&quot;https://lwn.net/Articles/826134/&quot;&gt;LWN article about the Lua 5.4
release&lt;/a&gt; however did note improved
performance on a range of benchmarks.&lt;/li&gt;
&lt;li&gt;Wren remains speedy (for these workloads at least), but engineering work on
other interpreters has narrowed that gap for some of these benchmarks.&lt;/li&gt;
&lt;li&gt;I haven&#x27;t taken the time to compare the January 2015 version of Wren used
for the benchmarks vs present-day Wren 0.4. It would be interesting to
explore that though.&lt;/li&gt;
&lt;li&gt;A tiny number of microbenchmarks have been used in this performance test. It
wouldn&#x27;t be wise to draw general conclusions - this is just a bit of fun.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;appendix-benchmark-script&quot;&gt;&lt;a href=&quot;#appendix-benchmark-script&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Appendix: Benchmark script&lt;/h2&gt;
&lt;p&gt;Health warning: this is incredibly quick and dirty (especially the repeated
switching between the python packages to allow testing both 3.10 and 3.11):&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #177500&quot;&gt;#!/usr/bin/env python3&lt;/span&gt;

&lt;span style=&quot;color: #177500&quot;&gt;# Copyright Muxup contributors.&lt;/span&gt;
&lt;span style=&quot;color: #177500&quot;&gt;# Distributed under the terms of the MIT license, see LICENSE for details.&lt;/span&gt;
&lt;span style=&quot;color: #177500&quot;&gt;# SPDX-License-Identifier: MIT&lt;/span&gt;

&lt;span style=&quot;color: #A90D91&quot;&gt;import&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;statistics&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;import&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;subprocess&lt;/span&gt;

&lt;span style=&quot;color: #000000&quot;&gt;out&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;open&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;out.md&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;w&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;encoding=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;utf-8&amp;quot;&lt;/span&gt;)


&lt;span style=&quot;color: #A90D91&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;run_single_bench&lt;/span&gt;(&lt;span style=&quot;color: #000000&quot;&gt;bench_name&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt;):
    &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;./test/benchmark/&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;
    &lt;span style=&quot;color: #A90D91&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;lua5.2&amp;quot;&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.lua&amp;quot;&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;lua5.2&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;]
    &lt;span style=&quot;color: #A90D91&quot;&gt;elif&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;lua5.3&amp;quot;&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.lua&amp;quot;&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;lua5.3&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;]
    &lt;span style=&quot;color: #A90D91&quot;&gt;elif&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;lua5.4&amp;quot;&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.lua&amp;quot;&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;lua5.4&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;]
    &lt;span style=&quot;color: #A90D91&quot;&gt;elif&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;luajit2.1 -joff&amp;quot;&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.lua&amp;quot;&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;luajit&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;-joff&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;]
    &lt;span style=&quot;color: #A90D91&quot;&gt;elif&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;mruby&amp;quot;&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.rb&amp;quot;&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;mruby&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;]
    &lt;span style=&quot;color: #A90D91&quot;&gt;elif&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;python3.10&amp;quot;&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.py&amp;quot;&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;subprocess.run&lt;/span&gt;(
            [
                &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;sudo&amp;quot;&lt;/span&gt;,
                &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;pacman&amp;quot;&lt;/span&gt;,
                &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;-U&amp;quot;&lt;/span&gt;,
                &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;--noconfirm&amp;quot;&lt;/span&gt;,
                &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;/var/cache/pacman/pkg/python-3.10.10-1-x86_64.pkg.tar.zst&amp;quot;&lt;/span&gt;,
            ],
            &lt;span style=&quot;color: #000000&quot;&gt;check=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;True&lt;/span&gt;,
        )
        &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;python&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;]
    &lt;span style=&quot;color: #A90D91&quot;&gt;elif&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;python3.11&amp;quot;&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.py&amp;quot;&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;subprocess.run&lt;/span&gt;(
            [
                &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;sudo&amp;quot;&lt;/span&gt;,
                &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;pacman&amp;quot;&lt;/span&gt;,
                &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;-U&amp;quot;&lt;/span&gt;,
                &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;--noconfirm&amp;quot;&lt;/span&gt;,
                &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;/var/cache/pacman/pkg/python-3.11.3-1-x86_64.pkg.tar.zst&amp;quot;&lt;/span&gt;,
            ],
            &lt;span style=&quot;color: #000000&quot;&gt;check=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;True&lt;/span&gt;,
        )
        &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;python&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;]
    &lt;span style=&quot;color: #A90D91&quot;&gt;elif&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;ruby2.7&amp;quot;&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.rb&amp;quot;&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;ruby-2.7&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;]
    &lt;span style=&quot;color: #A90D91&quot;&gt;elif&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;ruby3.0&amp;quot;&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.rb&amp;quot;&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;ruby&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;]
    &lt;span style=&quot;color: #A90D91&quot;&gt;elif&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner_name&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;wren0.4&amp;quot;&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.wren&amp;quot;&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;./bin/wren_test&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;bench_file&lt;/span&gt;]
    &lt;span style=&quot;color: #A90D91&quot;&gt;else&lt;/span&gt;:
        &lt;span style=&quot;color: #A90D91&quot;&gt;raise&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;SystemExit&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;Unrecognised runner&amp;quot;&lt;/span&gt;)

    &lt;span style=&quot;color: #000000&quot;&gt;times&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; []
    &lt;span style=&quot;color: #A90D91&quot;&gt;for&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;_&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;in&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;range&lt;/span&gt;(&lt;span style=&quot;color: #1C01CE&quot;&gt;10&lt;/span&gt;):
        &lt;span style=&quot;color: #000000&quot;&gt;bench_out&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;subprocess.run&lt;/span&gt;(
            &lt;span style=&quot;color: #000000&quot;&gt;cmdline&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;capture_output=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;True&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;check=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;True&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;encoding=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;utf-8&amp;quot;&lt;/span&gt;
        )&lt;span style=&quot;color: #000000&quot;&gt;.stdout&lt;/span&gt;
        &lt;span style=&quot;color: #000000&quot;&gt;times.append&lt;/span&gt;(&lt;span style=&quot;color: #A90D91&quot;&gt;float&lt;/span&gt;(&lt;span style=&quot;color: #000000&quot;&gt;bench_out.split&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;: &amp;quot;&lt;/span&gt;)[&lt;span style=&quot;color: #000000&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;]&lt;span style=&quot;color: #000000&quot;&gt;.strip&lt;/span&gt;()))
    &lt;span style=&quot;color: #A90D91&quot;&gt;return&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;statistics.median&lt;/span&gt;(&lt;span style=&quot;color: #000000&quot;&gt;times&lt;/span&gt;)


&lt;span style=&quot;color: #A90D91&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;do_bench&lt;/span&gt;(&lt;span style=&quot;color: #000000&quot;&gt;name&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;file_base&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;runners&lt;/span&gt;):
    &lt;span style=&quot;color: #000000&quot;&gt;results&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; {}
    &lt;span style=&quot;color: #A90D91&quot;&gt;for&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;in&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runners&lt;/span&gt;:
        &lt;span style=&quot;color: #000000&quot;&gt;results&lt;/span&gt;[&lt;span style=&quot;color: #000000&quot;&gt;runner&lt;/span&gt;] &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;run_single_bench&lt;/span&gt;(&lt;span style=&quot;color: #000000&quot;&gt;name&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;file_base&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;runner&lt;/span&gt;)
    &lt;span style=&quot;color: #000000&quot;&gt;results&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;dict&lt;/span&gt;(&lt;span style=&quot;color: #A90D91&quot;&gt;sorted&lt;/span&gt;(&lt;span style=&quot;color: #000000&quot;&gt;results.items&lt;/span&gt;(), &lt;span style=&quot;color: #000000&quot;&gt;key=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;lambda&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;kv&lt;/span&gt;: &lt;span style=&quot;color: #000000&quot;&gt;kv&lt;/span&gt;[&lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;]))
    &lt;span style=&quot;color: #000000&quot;&gt;longest_result&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;max&lt;/span&gt;(&lt;span style=&quot;color: #000000&quot;&gt;results.values&lt;/span&gt;())
    &lt;span style=&quot;color: #000000&quot;&gt;out.write&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;f&amp;quot;**{&lt;/span&gt;&lt;span style=&quot;color: #000000&quot;&gt;name&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}**:\n&amp;quot;&lt;/span&gt;)
    &lt;span style=&quot;color: #000000&quot;&gt;out.write&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;&amp;lt;table class=&amp;quot;chart&amp;quot;&amp;gt;\n&amp;#39;&lt;/span&gt;)
    &lt;span style=&quot;color: #A90D91&quot;&gt;for&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;runner&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;result&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;in&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;results.items&lt;/span&gt;():
        &lt;span style=&quot;color: #000000&quot;&gt;percent&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;round&lt;/span&gt;((&lt;span style=&quot;color: #000000&quot;&gt;result&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;/&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;longest_result&lt;/span&gt;) &lt;span style=&quot;color: #000000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;100&lt;/span&gt;)
        &lt;span style=&quot;color: #000000&quot;&gt;out.write&lt;/span&gt;(
            &lt;span style=&quot;color: #C41A16&quot;&gt;f&amp;quot;&amp;quot;&amp;quot;\&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  &amp;lt;tr&amp;gt;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;    &amp;lt;th&amp;gt;{&lt;/span&gt;&lt;span style=&quot;color: #000000&quot;&gt;runner&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}&amp;lt;/th&amp;gt;&amp;lt;td&amp;gt;&amp;lt;div class=&amp;quot;chart-bar&amp;quot; style=&amp;quot;width: {&lt;/span&gt;&lt;span style=&quot;color: #000000&quot;&gt;percent&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}%;&amp;quot;&amp;gt;{&lt;/span&gt;&lt;span style=&quot;color: #000000&quot;&gt;result&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;:.3f}s&amp;amp;nbsp;&amp;lt;/div&amp;gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  &amp;lt;/tr&amp;gt;\n&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
        )
    &lt;span style=&quot;color: #000000&quot;&gt;out.write&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&amp;lt;/table&amp;gt;\n\n&amp;quot;&lt;/span&gt;)


&lt;span style=&quot;color: #000000&quot;&gt;all_runners&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&lt;/span&gt; [
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;lua5.2&amp;quot;&lt;/span&gt;,
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;lua5.3&amp;quot;&lt;/span&gt;,
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;lua5.4&amp;quot;&lt;/span&gt;,
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;luajit2.1 -joff&amp;quot;&lt;/span&gt;,
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;mruby&amp;quot;&lt;/span&gt;,
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;python3.10&amp;quot;&lt;/span&gt;,
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;python3.11&amp;quot;&lt;/span&gt;,
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;ruby2.7&amp;quot;&lt;/span&gt;,
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;ruby3.0&amp;quot;&lt;/span&gt;,
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;wren0.4&amp;quot;&lt;/span&gt;,
]
&lt;span style=&quot;color: #000000&quot;&gt;do_bench&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;Method Call&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;method_call&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;all_runners&lt;/span&gt;)
&lt;span style=&quot;color: #000000&quot;&gt;do_bench&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;Delta Blue&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;delta_blue&amp;quot;&lt;/span&gt;, [&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;python3.10&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;python3.11&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;wren0.4&amp;quot;&lt;/span&gt;])
&lt;span style=&quot;color: #000000&quot;&gt;do_bench&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;Binary Trees&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;binary_trees&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;all_runners&lt;/span&gt;)
&lt;span style=&quot;color: #000000&quot;&gt;do_bench&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;Recursive Fibonacci&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;fib&amp;quot;&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;all_runners&lt;/span&gt;)
&lt;span style=&quot;color: #A90D91&quot;&gt;print&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;Output written to out.md&amp;quot;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

</content>
</entry>
<entry>
<title>2023Q1 week log</title>
<published>2023-02-27T12:00:00Z</published>
<updated>2023-04-03T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2023q1/week-log"/>
<id>https://muxup.com/2023q1/week-log</id>
<content type="html">
&lt;p&gt;I tend to keep quite a lot of notes on the development related (sometimes at
work, sometimes not) I do on a week-by-week basis, and thought it might be fun
to write up the parts that were public. This may or may not be of wider
interest, but it aims to be a useful aide-mémoire for my purposes at least.
Weeks with few entries might be due to focusing on downstream work (or perhaps
just a less productive week - I am only human!).&lt;/p&gt;
&lt;h2 id=&quot;week-of-27th-march-2023&quot;&gt;&lt;a href=&quot;#week-of-27th-march-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 27th March 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Submitted a &lt;a href=&quot;https://github.com/llvm/llvm-project-release-prs/pull/406&quot;&gt;backport request to
16.0.1&lt;/a&gt; for my
recent fixes to &lt;code&gt;llvm-objdump&lt;/code&gt; (and related tools) when encountering
unrecognised RISC-V base or ISA extension versions, or unrecognised ISA
extension names.&lt;/li&gt;
&lt;li&gt;Landed a tweak to the RISC-V ISA manual to &lt;a href=&quot;https://github.com/riscv/riscv-isa-manual/pull/1001&quot;&gt;make it clear that HINT
encodings aren&#x27;t
&quot;reserved&quot;&lt;/a&gt; in terms of
being part of the defined &quot;reserved instruction-set category&quot;. Thanks to
Andrew Waterman for suggesting a simpler fix than my first attempt.&lt;/li&gt;
&lt;li&gt;I&#x27;m now on Mastodon, at &lt;a href=&quot;https://fosstodon.org/@asb&quot;&gt;@asb@fosstodon.org&lt;/a&gt; and
&lt;a href=&quot;https://fosstodon.org/@llvmweekly&quot;&gt;@llvmweekly@fosstodon.org&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Proposed &lt;a href=&quot;https://reviews.llvm.org/D147183#4233360&quot;&gt;alternate wording for the RISC-V LLVM doc updates reflecting
recent ISA versioning
discussions&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Set agenda for and ran the usual biweekly &lt;a href=&quot;https://discourse.llvm.org/t/risc-v-llvm-sync-up-call-30th-march-2023-note-daylight-savings-time-impact/69635&quot;&gt;RISC-V LLVM contributor sync up
call&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Added the Renesas R9A06G150 to my &lt;a href=&quot;/2023q1/commercially-available-risc-v-silicon&quot;&gt;commercially available RISC-V silicon
list&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Posted and landed patches to implement MC layer and codegen support for the
experimental &lt;code&gt;Zicond&lt;/code&gt; (integer conditional operations) extension
(&lt;a href=&quot;https://reviews.llvm.org/D146946&quot;&gt;D146946&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/D147147&quot;&gt;D147147&lt;/a&gt;). This is essentially the same
as the &lt;code&gt;XVentanaCondOps&lt;/code&gt; extension.&lt;/li&gt;
&lt;li&gt;Advertised the ongoing discussion about changing the shadow call stack
register on RISC-V &lt;a href=&quot;https://discourse.llvm.org/t/rfc-psa-changing-the-shadow-call-stack-register-on-risc-v/69537&quot;&gt;through an
RFC&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;It was pointed out to me that the expansion of &lt;code&gt;seq_cst&lt;/code&gt; atomic ops to
RISC-V lr/sc loops was slightly stronger than required by the mapping table
in the ISA manual. Specifically, &lt;code&gt;sc.{w|d}.rl&lt;/code&gt; is sufficient rather than
&lt;code&gt;sc.{w|d}.aqrl&lt;/code&gt;. Fixed with &lt;a href=&quot;https://reviews.llvm.org/D146933&quot;&gt;D146933&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Filed issue about the &lt;a href=&quot;https://github.com/riscv/riscv-bfloat16/issues/33&quot;&gt;fcvt.bf16.s instruction encoding colliding with
fround.h&lt;/a&gt; (instructions
from &lt;code&gt;zfbfmin&lt;/code&gt; and &lt;code&gt;zfa&lt;/code&gt; respectively).&lt;/li&gt;
&lt;li&gt;Usual mix of upstream LLVM reviews. We were now able to
&lt;a href=&quot;https://reviews.llvm.org/D147179&quot;&gt;bump&lt;/a&gt; the versions of the standard ISA
extensions LLVM claims to support. As noted in my &lt;a href=&quot;https://discourse.llvm.org/t/rfc-resolving-issues-related-to-extension-versioning-in-risc-v/68472&quot;&gt;previous
RFC&lt;/a&gt;,
LLVM was reporting the wrong version information for the A/F/D extensions.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/482&quot;&gt;LLVM Weekly #482&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-20th-march-2023&quot;&gt;&lt;a href=&quot;#week-of-20th-march-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 20th March 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Landed patches to fix RISC-V ISA extension versioning related issues in
llvm-objdump and related tools (&lt;a href=&quot;https://reviews.llvm.org/D146070&quot;&gt;D146070&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/D146113&quot;&gt;D146113&lt;/a&gt;, and
&lt;a href=&quot;https://reviews.llvm.org/D146114&quot;&gt;D146114&lt;/a&gt;). Also patches to fix an ABI bug
with &lt;code&gt;_Float16&lt;/code&gt; lowering on RISC-V
(&lt;a href=&quot;https://reviews.llvm.org/D142326&quot;&gt;D142326&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/D145074&quot;&gt;D145074&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Opened a few issues on the &lt;a href=&quot;https://github.com/sigoden/aichat&quot;&gt;aichat&lt;/a&gt;
(command line ChatGPT client) repo: one for &lt;a href=&quot;https://github.com/sigoden/aichat/issues/97&quot;&gt;maximum line
width&lt;/a&gt;, another for &lt;a href=&quot;https://github.com/sigoden/aichat/issues/99&quot;&gt;word
wrapping&lt;/a&gt;, and a &lt;a href=&quot;https://github.com/sigoden/aichat/issues/88#issuecomment-1484132478&quot;&gt;suggestion on
converting one-shot questions to
conversations&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Added the HPM6750 to my &lt;a href=&quot;/2023q1/commercially-available-risc-v-silicon&quot;&gt;commercially available RISC-V silicon
list&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;My tutorial and lightning talk proposals were accepted for EuroLLVM!&lt;/li&gt;
&lt;li&gt;Some bits and pieces related to the RISC-V bfloat16 spec and also Zfh.
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/riscv/riscv-bfloat16/pull/31&quot;&gt;Drafted&lt;/a&gt; a Zfbfinxmin
extension definition (primarily for symmetry with the existing
&lt;code&gt;z*inx[min]&lt;/code&gt; extensions.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://reviews.llvm.org/D146435&quot;&gt;Fixing&lt;/a&gt; a missed predicate for
&lt;code&gt;PseudoQuietFCMP&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/riscv/riscv-bfloat16/pull/29&quot;&gt;Minor clarification&lt;/a&gt; to
the riscv-bfloat16 spec.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Usual mix of upstream LLVM reviews.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/481&quot;&gt;LLVM Weekly #481&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-13th-march-2023&quot;&gt;&lt;a href=&quot;#week-of-13th-march-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 13th March 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Most importantly, added
&lt;a href=&quot;https://github.com/muxup/muxup-site/commit/7159a2400e6535a288c78dfd4d71c1b544ddf51e#diff-196dde1107e14fd35d571db219211acb6853813d95a5c7faee5ac09e058f9203&quot;&gt;some&lt;/a&gt;
&lt;a href=&quot;https://github.com/muxup/muxup-site/commit/a1cb4d4256815bcfa8a6a4c5174a03ae077ee8c6#diff-4e9f5b15205b49dff89e5050a5a899e63213f1f015daeca45b76270bb2c009dd&quot;&gt;more&lt;/a&gt;
footer images for this site from the Quick Draw dataset. Thanks to my son
(Archie, 5) for the assistance.&lt;/li&gt;
&lt;li&gt;Reviewed submissions for &lt;a href=&quot;https://llvm.org/devmtg/2023-05/&quot;&gt;EuroLLVM&lt;/a&gt; (I&#x27;m
on the program committee).&lt;/li&gt;
&lt;li&gt;Added note to the &lt;a href=&quot;/2023q1/commercially-available-risc-v-silicon&quot;&gt;commercially available RISC-V silicon
post&lt;/a&gt; about a
hardware bug in the Renesas RZ/Five.&lt;/li&gt;
&lt;li&gt;Finished writing and published &lt;a href=&quot;/2023q1/whats-new-for-risc-v-in-llvm-16&quot;&gt;what&#x27;s new for RISC-V in LLVM 16
article&lt;/a&gt; and took part in
some of the discussions in the
&lt;a href=&quot;https://news.ycombinator.com/item?id=35215826&quot;&gt;HN&lt;/a&gt; and &lt;a href=&quot;https://old.reddit.com/r/RISCV/comments/11veftz/whats_new_for_riscv_in_llvm_16/&quot;&gt;Reddit
threads&lt;/a&gt;
(it&#x27;s &lt;a href=&quot;https://lobste.rs/s/qcu7fc/what_s_new_for_risc_v_llvm_16&quot;&gt;on lobste.rs
too&lt;/a&gt;, but that
didn&#x27;t generate any comments).&lt;/li&gt;
&lt;li&gt;Investigated an issue where inline asm with the &lt;code&gt;m&lt;/code&gt; constraint was
generating worse code on LLVM vs GCC, finding that LLVM conservatively
lowers this to a single register, while GCC treats &lt;code&gt;m&lt;/code&gt; as reg+imm, relying
on users indicating &lt;code&gt;A&lt;/code&gt; when using a memory operand with an instruction that
can&#x27;t take an immediate offset. Worked with a colleague who posted
&lt;a href=&quot;https://reviews.llvm.org/D146245&quot;&gt;D146245&lt;/a&gt; to fix this.&lt;/li&gt;
&lt;li&gt;Set
&lt;a href=&quot;https://discourse.llvm.org/t/risc-v-llvm-sync-up-call-16th-march-2023-note-daylight-savings-impact/69244&quot;&gt;agenda&lt;/a&gt;
for and ran the biweekly RISC-V LLVM contributor sync call as usual.&lt;/li&gt;
&lt;li&gt;Bisected reported LLVM bug
&lt;a href=&quot;https://github.com/llvm/llvm-project/issues/61412&quot;&gt;#61412&lt;/a&gt;, which
as it happens was fixed that evening by
&lt;a href=&quot;https://reviews.llvm.org/D145474&quot;&gt;D145474&lt;/a&gt; being committed. We hope to
backport this to 16.0.1.&lt;/li&gt;
&lt;li&gt;Did some digging on a regression (compiler crash) for &lt;code&gt;-Oz&lt;/code&gt;, bisecting it to
the commit that enabled machine copy propagation by default. I found the
issue was due to machine copy propagation running after the machine
outliner, and incorrectly determining that some register writes in outlined
functions were not live-out. I posted and
landed &lt;a href=&quot;https://reviews.llvm.org/D146037&quot;&gt;D146037&lt;/a&gt; to fix this by running
machine copy propagation earlier in the pipeline, though a more principled
fix would be desirable.&lt;/li&gt;
&lt;li&gt;Filed a PR against the riscv-isa-manual to &lt;a href=&quot;https://github.com/riscv/riscv-isa-manual/pull/990&quot;&gt;disambiguate the use of the term
&quot;reserved&quot; for HINT
instructions&lt;/a&gt;. I&#x27;ve also
been looking at the proposed bfloat16 extension recently and filed an
&lt;a href=&quot;https://github.com/riscv/riscv-bfloat16/issues/27&quot;&gt;issue&lt;/a&gt; to clarify if
Zfbfinxmin will be defined (as all the other floating point extensions so
far have an &lt;code&gt;*inx&lt;/code&gt; twin.&lt;/li&gt;
&lt;li&gt;Almost finished work to resolve issues related to overzealous error checking
on RISC-V ISA naming strings (with llvm-objdump and related tools being the
final piece).
&lt;ul&gt;
&lt;li&gt;Landed &lt;a href=&quot;https://reviews.llvm.org/D145879&quot;&gt;D145879&lt;/a&gt; and
&lt;a href=&quot;https://reviews.llvm.org/D145882&quot;&gt;D145882&lt;/a&gt; to expand &lt;code&gt;RISCVISAInfo&lt;/code&gt; test
coverage and fix an issue that surfaced through that.&lt;/li&gt;
&lt;li&gt;Posted a pair of patches that makes llvm-objdump and related tools
tolerant of unrecognised versions of ISA extensions.
&lt;a href=&quot;https://reviews.llvm.org/D146070&quot;&gt;D146070&lt;/a&gt; resolves this for the base ISA
in a minimally invasive way, while
&lt;a href=&quot;https://reviews.llvm.org/D146114&quot;&gt;D146114&lt;/a&gt; solves this for other
extensions, moving the parsing logic to using the
&lt;code&gt;parseNormalizedArchString&lt;/code&gt; function I introduced to fix a similar issue
in LLD. This built on some directly committed work to &lt;a href=&quot;https://reviews.llvm.org/rG0ae8f5ac08ae&quot;&gt;expand
testing&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The usual assortment of upstream LLVM reviews.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/480&quot;&gt;LLVM Weekly #480&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-6th-march-2023&quot;&gt;&lt;a href=&quot;#week-of-6th-march-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 6th March 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Had a really useful meeting with a breakout group from the &lt;a href=&quot;https://docs.google.com/document/d/1G3ocHm2zE6AYTS2N3_3w2UxFnSEyKkcF57siLWe-NVs/edit&quot;&gt;RISC-V LLVM
sync-up
calls&lt;/a&gt;
about the long-standing issues related to ISA extension versioning, error
handling for this, and other related issues.
&lt;ul&gt;
&lt;li&gt;Related to this, posted &lt;a href=&quot;https://reviews.llvm.org/D145879&quot;&gt;D145879&lt;/a&gt; and
&lt;a href=&quot;https://reviews.llvm.org/D145882&quot;&gt;D145882&lt;/a&gt; to flesh out testing of
&lt;code&gt;RISCVISAInfo::parseArchString&lt;/code&gt; ahead of further improvements, and to
start to fix identified issues.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Incorporated more clarifications submitted to me about the &lt;a href=&quot;/2023q1/commercially-available-risc-v-silicon&quot;&gt;commercially
available RISC-V
SoCs&lt;/a&gt; post,
particularly around SiFive cores.&lt;/li&gt;
&lt;li&gt;Shared &lt;a href=&quot;https://discourse.llvm.org/t/diversity-inclusion-strategic-planning-march-6-7/68794/8&quot;&gt;some
thoughts&lt;/a&gt;
on attracting more people to the LLVM Foundation strategic planning
sessions.&lt;/li&gt;
&lt;li&gt;Posted and committed an LLVM patch to &lt;a href=&quot;https://reviews.llvm.org/D145570&quot;&gt;migrate the RISC-V backend to using
shared MCELFStreamer code for attribute
emission&lt;/a&gt;. The initial implementation was
largely derived from Arm&#x27;s version of the same feature but a later
refactoring managed to move this logic to common code, which we can now
reuse.&lt;/li&gt;
&lt;li&gt;Some small tasks related to the RISC-V LLVM build-bot: trying to find a path
forwards for a &lt;a href=&quot;https://reviews.llvm.org/D143158&quot;&gt;simple patch to enable RISC-V support in
libcxx&lt;/a&gt;, and &lt;a href=&quot;https://reviews.llvm.org/D144465&quot;&gt;clarifying how often the
staging buildmaster configuration is
updated&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Posted a docs patch to &lt;a href=&quot;https://reviews.llvm.org/D145564&quot;&gt;clarify
Clang&#x27;s &lt;code&gt;-fexceptions&lt;/code&gt;&lt;/a&gt; in follow-up to
discussion in &lt;a href=&quot;https://github.com/llvm/llvm-project/issues/61216&quot;&gt;issue
61216&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Did some final preparations for the LLVM 16.0.0 release - committing some
&lt;a href=&quot;https://reviews.llvm.org/rGae37edf1486d&quot;&gt;cleaned up release notes&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A variety of upstream LLVM reviews, and left some thoughts on &lt;a href=&quot;https://github.com/llvm/llvm-project/issues/61179&quot;&gt;using
sub-modules for RVV intrinsics
tests&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/479&quot;&gt;LLVM Weekly #479&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-27th-february-2023&quot;&gt;&lt;a href=&quot;#week-of-27th-february-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 27th February 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Completed (to the point I was happy to publish at least) my attempt to
enumerate the &lt;a href=&quot;/2023q1/commercially-available-risc-v-silicon&quot;&gt;commercially available RISC-V
SoCs&lt;/a&gt;. I&#x27;m very
grateful to have received a whole range of suggested additions and
clarifications over the weekend, which have all been incorporated.&lt;/li&gt;
&lt;li&gt;Ran the usual biweekly &lt;a href=&quot;https://discourse.llvm.org/t/risc-v-llvm-sync-up-call-2nd-march-2023/68876&quot;&gt;RISC-V LLVM sync-up
call&lt;/a&gt;.
Topics included outstanding issues for LLVM 16.x (no major issues now my
&lt;a href=&quot;https://github.com/llvm/llvm-project-release-prs/pull/324#issuecomment-1445012422&quot;&gt;backport
request&lt;/a&gt;
to fix and LLD regression was merged), an overview off &lt;code&gt;_Float16&lt;/code&gt; ABI
lowering fixes, GP relaxation in LLD, my recent RISC-V buildbot, and some
vectorisation related issues.&lt;/li&gt;
&lt;li&gt;Investigated and largely resolved a issues related to ABI lowering of
&lt;code&gt;_Float16&lt;/code&gt; for RISC-V. Primarily, we weren&#x27;t handling the cases where a
GPR+FPR or a pair of FPRs are used to pass small structs including
&lt;code&gt;_Float16&lt;/code&gt;.
&lt;ul&gt;
&lt;li&gt;Part of this work involved rebasing my
&lt;a href=&quot;https://reviews.llvm.org/D134050&quot;&gt;previous&lt;/a&gt;
&lt;a href=&quot;https://reviews.llvm.org/D140400&quot;&gt;patches&lt;/a&gt; to refactor our RISC-V ABI
lowering tests in Clang. Now that a version of my improvements to
&lt;code&gt;update_cc_test_check.py --function-signature&lt;/code&gt; (required for the refactor)
landed as part of &lt;a href=&quot;https://reviews.llvm.org/D144963&quot;&gt;D144963&lt;/a&gt;, this can
hopefully be completed.&lt;/li&gt;
&lt;li&gt;Committed a number of simple test improvements related to half floats. e.g.
&lt;a href=&quot;https://reviews.llvm.org/rG570995eba2f9&quot;&gt;570995e&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/rG81979c3038de&quot;&gt;81979c3&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/rG34b412dc0efe&quot;&gt;34b412d&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Posted &lt;a href=&quot;https://reviews.llvm.org/D145070&quot;&gt;D145070&lt;/a&gt; to add proper coverage
for &lt;code&gt;_Float16&lt;/code&gt; ABI lowering, and
&lt;a href=&quot;https://reviews.llvm.org/D145074&quot;&gt;D145074&lt;/a&gt; to fix it. Also
&lt;a href=&quot;https://reviews.llvm.org/D145071&quot;&gt;D145071&lt;/a&gt; to set the &lt;code&gt;HasLegalHalfType&lt;/code&gt;
property, but the semantics of that are less clear.&lt;/li&gt;
&lt;li&gt;Posted a &lt;a href=&quot;https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/367&quot;&gt;strawman psABI
patch&lt;/a&gt; for
&lt;code&gt;__bf16&lt;/code&gt;, needed for the RISC-V bfloat16 extension.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Attended the &lt;a href=&quot;https://community.riscv.org/events/details/risc-v-international-cambridge-risc-v-group-presents-cheri-risc-v-full-stack-security-using-open-source-hardware-and-software/&quot;&gt;Cambridge RISC-V
Meetup&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;After seeing the Helix editor &lt;a href=&quot;https://lobste.rs/s/nvoikx/helix_notes&quot;&gt;discussed on
lobste.rs&lt;/a&gt;, retried my previously
shared &lt;a href=&quot;https://github.com/helix-editor/helix/issues/3072#issuecomment-1208133990&quot;&gt;large Markdown file test
case&lt;/a&gt;.
Unfortunately it&#x27;s still unusably slow to edit, seemingly due to a
tree-sitter related issue.&lt;/li&gt;
&lt;li&gt;Cleaned up the static site generator used for this site a bit. e.g. now my
fixes (&lt;a href=&quot;https://github.com/miyuchina/mistletoe/pull/157&quot;&gt;#157&lt;/a&gt;,
&lt;a href=&quot;https://github.com/miyuchina/mistletoe/pull/158&quot;&gt;#158&lt;/a&gt;,
&lt;a href=&quot;https://github.com/miyuchina/mistletoe/pull/159&quot;&gt;#159&lt;/a&gt;) for the
&lt;code&gt;traverse()&lt;/code&gt; helper in &lt;a href=&quot;https://github.com/miyuchina/mistletoe&quot;&gt;mistletoe&lt;/a&gt;
where merged upstream, I
&lt;a href=&quot;https://github.com/muxup/muxup-site/commit/52989cf7462d7900bbef5bc2ca9f976af8022ade&quot;&gt;removed&lt;/a&gt;
my downstream version.&lt;/li&gt;
&lt;li&gt;The usual mix of upstream LLVM reviews.&lt;/li&gt;
&lt;li&gt;Had a day off for my birthday.&lt;/li&gt;
&lt;li&gt;Publicly shared this week log for the first time.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/478&quot;&gt;LLVM Weekly #478&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-20th-february-2023&quot;&gt;&lt;a href=&quot;#week-of-20th-february-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 20th February 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Iterated on &lt;a href=&quot;https://reviews.llvm.org/D144353&quot;&gt;D144353&lt;/a&gt; (aiming to fix LLD
regression related to merging RISC-V attributes) based on review feedback
and committed it.
&lt;ul&gt;
&lt;li&gt;Created &lt;a href=&quot;https://github.com/llvm/llvm-project/issues/60889&quot;&gt;an issue to track this as a
regression&lt;/a&gt;, aiming for
a backport into 16.0.0, and &lt;a href=&quot;https://github.com/llvm/llvm-project-release-prs/pull/324#issuecomment-1445012422&quot;&gt;requested that
backport&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Also some related discussion in ClangBuiltLinux issues
&lt;a href=&quot;https://github.com/ClangBuiltLinux/linux/issues/1777&quot;&gt;#1777&lt;/a&gt; and
&lt;a href=&quot;https://github.com/ClangBuiltLinux/linux/issues/1808&quot;&gt;#1808&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Committed &lt;a href=&quot;https://reviews.llvm.org/D143172&quot;&gt;my llvm-zorg patch to add the qemu-user based RISC-V
builder&lt;/a&gt;, after finalising provisioning
the machine to run it. The builder is live on the LLVM staging buildmaster
&lt;a href=&quot;https://lab.llvm.org/staging/#/builders/241&quot;&gt;as
clang-rv64gc-qemu-user-single-stage&lt;/a&gt;.
&lt;ul&gt;
&lt;li&gt;Worked to resolve remaining test failures and stability issues. One
recurrent issue was an assert in &lt;code&gt;___pthread_mutex_lock&lt;/code&gt; when executing
&lt;code&gt;ccache&lt;/code&gt;. Setting &lt;code&gt;inode_cache=false&lt;/code&gt; in the local &lt;code&gt;ccache&lt;/code&gt; config seems
to avoid this.&lt;/li&gt;
&lt;li&gt;Posted a couple of patches - &lt;a href=&quot;https://reviews.llvm.org/D144464&quot;&gt;D144464&lt;/a&gt;
and &lt;a href=&quot;https://reviews.llvm.org/D144465&quot;&gt;D144465&lt;/a&gt; to tweak the LLVM docs on
setting up a builder, based on my experience doing so.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Chased for reviews and clarification about pre-commit test requirements for
my libcxx RISC-V test fix patch,
&lt;a href=&quot;https://reviews.llvm.org/D143158&quot;&gt;D134158&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Committed a couple of further test updates for Wasm in LLVM ahead of some
upcoming patches. &lt;a href=&quot;https://reviews.llvm.org/rG771261ff0128&quot;&gt;771261f&lt;/a&gt;
&lt;a href=&quot;https://reviews.llvm.org/rG1ae859753c06&quot;&gt;1ae8597&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Left some quick notes on the &lt;a href=&quot;https://discourse.llvm.org/t/rfc-rfc-shepherds/68666/8&quot;&gt;LLVM RFC
shepherds&lt;/a&gt; proposal.&lt;/li&gt;
&lt;li&gt;A variety of upstream LLVM reviews, and received a &lt;a href=&quot;https://reviews.llvm.org/D143115#4151994&quot;&gt;useful clarification on
the RISC-V psABI and the ratification
lifecycle&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/477&quot;&gt;LLVM Weekly #477&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-13th-february-2023&quot;&gt;&lt;a href=&quot;#week-of-13th-february-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 13th February 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;After a fair bit of investigation and thinking about reported compatibility
issues between GNU and LLVM tools (particularly binutils ld and lld) due to
RISC-V extension versioning, &lt;a href=&quot;https://discourse.llvm.org/t/rfc-resolving-issues-related-to-extension-versioning-in-risc-v/68472&quot;&gt;posted an RFC outlining the major issues and a
proposed fix for what I consider to be a regression in
lld&lt;/a&gt;.
&lt;ul&gt;
&lt;li&gt;Landed a few LLVM patches cleaning up tests related to this.
&lt;a href=&quot;https://reviews.llvm.org/rG8b5004864aab&quot;&gt;8b50048&lt;/a&gt;
&lt;a href=&quot;https://reviews.llvm.org/rG574d0c2ec107&quot;&gt;574d0c2&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/rGd05e1e99b1d6&quot;&gt;d05e1e9&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Posted &lt;a href=&quot;https://reviews.llvm.org/D144353&quot;&gt;D144353&lt;/a&gt;, a proposed fix for the
LLD regression due to overzealous checking of extensions/versions when
merging RISC-V attributes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Organised agenda for and ran the bi-weekly &lt;a href=&quot;https://discourse.llvm.org/t/risc-v-llvm-sync-up-call-16th-february-2023/68500&quot;&gt;RISC-V LLVM contributor
call&lt;/a&gt;.
Key discussion items were the extension versioning related compatibility
issue mentioned below and support for emulated TLS (where I&#x27;d &lt;a href=&quot;https://reviews.llvm.org/D143708#4118468&quot;&gt;left some
comments&lt;/a&gt; the previous week).&lt;/li&gt;
&lt;li&gt;Updated my patch (&lt;a href=&quot;https://reviews.llvm.org/D143172&quot;&gt;D143172&lt;/a&gt;) to register
and configure a RISC-V qemu-user based builder with LLVM&#x27;s staging
buildmaster, based on review feedback.&lt;/li&gt;
&lt;li&gt;A variety of upstream LLVM reviews. Also landed
&lt;a href=&quot;https://reviews.llvm.org/D143507&quot;&gt;D143407&lt;/a&gt;, marking Zawrs as
non-experimental.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/476&quot;&gt;LLVM Weekly #476&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;week-of-6th-february-2023&quot;&gt;&lt;a href=&quot;#week-of-6th-february-2023&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Week of 6th February 2023&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Left feedback on the proposed RISC-V psABI
&lt;a href=&quot;https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/365&quot;&gt;patch clarifying treatment of empty structs or unions in the FP calling
convention&lt;/a&gt;.
This is a follow-up to the &lt;a href=&quot;https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/358&quot;&gt;issue I
filed&lt;/a&gt; on
this issue (where I have &lt;a href=&quot;https://reviews.llvm.org/D142327&quot;&gt;D142327&lt;/a&gt; queued
up for LLVM to fix our incorrect handling).&lt;/li&gt;
&lt;li&gt;Responded to a question on LLVM&#x27;s Discourse &lt;a href=&quot;https://discourse.llvm.org/t/support-for-zicsr-and-zifencei-extensions/68369/2&quot;&gt;about zicsr and zifencei
support in
LLVM&lt;/a&gt;.
As noted, the issue is that we haven&#x27;t moved RV32I/RV64I 2.1 yet which split
out Zicsr and Zifencei. Unfortunately this is a backwards-incompatible
change so requires some care.&lt;/li&gt;
&lt;li&gt;Worked with a colleague trying to reproduce an assertion failure in his
committed patch &lt;a href=&quot;https://reviews.llvm.org/rGeb66833d19573df97034a81279eda31b8d19815b&quot;&gt;adding support for WebAssembly externref in
Clang&lt;/a&gt;
that appeared only on an MSan buildbot. The &lt;a href=&quot;https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild&quot;&gt;sanitizers project
guidance&lt;/a&gt;
is useful for this, but I ended up &lt;a href=&quot;https://gist.github.com/asb/645a071903f0c3cf9ef6c59a3d3e0810&quot;&gt;rolling a slightly hacky
script&lt;/a&gt; as I
stepped through each part of the multi-stage build and test sequence.&lt;/li&gt;
&lt;li&gt;Left my thoughts on a &lt;a href=&quot;https://github.com/riscv-non-isa/riscv-c-api-doc/issues/32&quot;&gt;proposed RISC-V preprocessor define to specify
support and for and performance of misaligned
loads/stores&lt;/a&gt;. I
like the idea of the define, but prefer sticking to the &lt;code&gt;Zicclsm&lt;/code&gt;
terminology introduced in the RISC-V profiles.&lt;/li&gt;
&lt;li&gt;Posted patch &lt;a href=&quot;https://reviews.llvm.org/D143507&quot;&gt;D143507&lt;/a&gt; to mark RISC-V
Zawrs as non-experimental, after confirming there were no relevant changes
between the implemented 1.0-rc3 spec and the ratified version.&lt;/li&gt;
&lt;li&gt;A series of WebAssembly GC type related patches remains a work in progress
downstream, but I landed a couple of related minor test cleanups.
&lt;a href=&quot;https://reviews.llvm.org/rG604c9a07f3a9&quot;&gt;604c9a0&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/rG3a80dc27ed45&quot;&gt;3a80dc2&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Many upstream RISC-V LLVM reviews.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://llvmweekly.org/issue/475&quot;&gt;LLVM Weekly #475&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor &quot;aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2023-04-03: Added notes for the week of 27th March 2023.&lt;/li&gt;
&lt;li&gt;2023-03-27: Added notes for the week of 20th March 2023.&lt;/li&gt;
&lt;li&gt;2023-03-20: Added notes for the week of 13th March 2023.&lt;/li&gt;
&lt;li&gt;2023-03-13: Added notes for the week of 6th March 2023.&lt;/li&gt;
&lt;li&gt;2023-03-06: Added notes for the week of 27th February 2023.&lt;/li&gt;
&lt;li&gt;2023-02-27: Added in a forgotten note about trivial buildbot doc
improvements.&lt;/li&gt;
&lt;li&gt;2023-02-27: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>What&#x27;s new for RISC-V in LLVM 16</title>
<published>2023-03-18T12:00:00Z</published>
<updated>2023-03-18T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2023q1/whats-new-for-risc-v-in-llvm-16"/>
<id>https://muxup.com/2023q1/whats-new-for-risc-v-in-llvm-16</id>
<content type="html">
&lt;p&gt;LLVM 16.0.0 was &lt;a href=&quot;https://discourse.llvm.org/t/llvm-16-0-0-release/69326&quot;&gt;just
released today&lt;/a&gt;, and
as &lt;a href=&quot;/2022q3/whats-new-for-risc-v-in-llvm-15&quot;&gt;I did for LLVM 15&lt;/a&gt;, I
wanted to highlight some of the RISC-V specific changes and improvements. This
is very much a tour of a chosen subset of additions rather than an attempt to
be exhaustive. If you&#x27;re interested in RISC-V, you may also want to check out
my recent attempt to enumerate the &lt;a href=&quot;/2023q1/commercially-available-risc-v-silicon&quot;&gt;commercially available RISC-V
SoCs&lt;/a&gt; and if you want
to find out what&#x27;s going on in LLVM as a whole on a week-by-week basis, then
I&#x27;ve got &lt;a href=&quot;https://llvmweekly.org/&quot;&gt;the perfect newsletter for you&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In case you&#x27;re not familiar with LLVM&#x27;s release schedule, it&#x27;s worth noting
that there are two major LLVM releases a year (i.e. one roughly every 6
months) and these are timed releases as opposed to being cut when a pre-agreed
set of feature targets have been met. We&#x27;re very fortunate to benefit from an
active and growing set of contributors working on RISC-V support in LLVM
projects, who are responsible for the work I describe below - thank you!
I coordinate biweekly sync-up calls for RISC-V LLVM contributors, so if you&#x27;re
working in this area please &lt;a href=&quot;https://discourse.llvm.org/c/code-generation/riscv/57&quot;&gt;consider dropping
in&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;documentation&quot;&gt;&lt;a href=&quot;#documentation&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Documentation&lt;/h2&gt;
&lt;p&gt;LLVM 16 is the first release featuring a user guide for the RISC-V target
(&lt;a href=&quot;https://releases.llvm.org/16.0.0/docs/RISCVUsage.html&quot;&gt;16.0.0 version&lt;/a&gt;,
&lt;a href=&quot;https://llvm.org/docs/RISCVUsage.html&quot;&gt;current HEAD&lt;/a&gt;. This fills a
long-standing gap in our documentation, whereby it was difficult to tell at a
glance the expected level of support for the various RISC-V instruction set
extensions (standard, vendor-specific, and experimental extensions of either
type) in a given LLVM release. We&#x27;ve tried to keep it concise but informative,
and add a brief note to describe any known limitations that end users should
know about. Thanks again to Philip Reames for kicking this off, and the
reviewers and contributors for ensuring it&#x27;s kept up to date.&lt;/p&gt;
&lt;h2 id=&quot;vectorization&quot;&gt;&lt;a href=&quot;#vectorization&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Vectorization&lt;/h2&gt;
&lt;p&gt;LLVM 16 was a big release for vectorisation. As well as a long-running strand
of work making incremental improvements (e.g. better cost modelling) and
fixes, scalable vectorization was &lt;a href=&quot;https://reviews.llvm.org/rG15c645f7ee67&quot;&gt;enabled by
default&lt;/a&gt;. This allows LLVM&#x27;s &lt;a href=&quot;https://llvm.org/docs/Vectorizers.html#loop-vectorizer&quot;&gt;loop
vectorizer&lt;/a&gt; to use
scalable vectors when profitable. Follow-on work
&lt;a href=&quot;https://reviews.llvm.org/rGb45a262679ab&quot;&gt;enabled&lt;/a&gt; support for loop
vectorization using fixed length vectors and &lt;a href=&quot;https://reviews.llvm.org/rG269bc684e7a0&quot;&gt;disabled vectorization of
epilogue loops&lt;/a&gt;. See the talk
&lt;a href=&quot;https://www.youtube.com/watch?v=daWLCyhwrZ8&quot;&gt;optimizing code for scalable vector
architectures&lt;/a&gt;
(&lt;a href=&quot;https://llvm.org/devmtg/2021-11/slides/2021-OptimizingCodeForScalableVectorArchitectures.pdf&quot;&gt;slides&lt;/a&gt;)
by Sander de Smalen for more information about scalable vectorization in LLVM
and &lt;a href=&quot;https://eupilot.eu/wp-content/uploads/2022/11/RISC-V-VectorExtension-1-1.pdf&quot;&gt;introduction to the RISC-V vector
extension&lt;/a&gt;
by Roger Ferrer Ibáñez for an overview of the vector extension and some of its
codegen challenges.&lt;/p&gt;
&lt;p&gt;The RISC-V vector intrinsics supported by Clang have changed (to match e.g.
&lt;a href=&quot;https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/186&quot;&gt;this&lt;/a&gt; and
&lt;a href=&quot;https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/185&quot;&gt;this&lt;/a&gt;) during
the 16.x development process in a backwards incompatible way, as the &lt;a href=&quot;https://github.com/riscv-non-isa/rvv-intrinsic-doc&quot;&gt;RISC-V
Vector Extension Intrinsics
specification&lt;/a&gt; evolves
towards a v1.0. In retrospect, it would have been better to keep the
intrinsics behind an experimental flag when the vector codegen and MC layer
(assembler/disassembler) support became stable, and this is something we&#x27;ll be
more careful of for future extensions. The good news is that thanks to
Yueh-Ting Chen, headers &lt;a href=&quot;https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/master/auto-generated/rvv-v0p10-compatible-headers&quot;&gt;are
available&lt;/a&gt;
that provide the old-style intrinsics mapped to the new version.&lt;/p&gt;
&lt;h2 id=&quot;support-for-new-instruction-set-extensions&quot;&gt;&lt;a href=&quot;#support-for-new-instruction-set-extensions&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Support for new instruction set extensions&lt;/h2&gt;
&lt;p&gt;I refer to &#x27;experimental&#x27; support many times below. See the &lt;a href=&quot;https://releases.llvm.org/16.0.0/docs/RISCVUsage.html#experimental-extensions&quot;&gt;documentation on
experimental extensions within RISC-V
LLVM&lt;/a&gt;
for guidance on what that means. One point to highlight is that the extensions
remain experimental until they are ratified, which is why some extensions on
the list below are &#x27;experimental&#x27; despite the fact the LLVM support needed is
trivial. On to the list of newly added instruction set extensions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Experimental support for the
&lt;a href=&quot;https://github.com/riscv/riscv-code-size-reduction/releases/tag/V0.70.1-TOOLCHAIN-DEV&quot;&gt;Zca, Zcf, and
Zcd&lt;/a&gt;
instruction set extensions. These are all 16-bit instructions and are being
defined as part of the output of the RISC-V code size reduction working
group.
&lt;ul&gt;
&lt;li&gt;Zca is just a subset of the standard &#x27;C&#x27; compressed instruction set
extension but without floating point loads/stores.&lt;/li&gt;
&lt;li&gt;Zcf is also a subset of the standard &#x27;C&#x27; compressed instruction set
extension, including just the single precision floating point loads and
stores (&lt;code&gt;c.flw&lt;/code&gt;, &lt;code&gt;c.flwsp&lt;/code&gt;, &lt;code&gt;c.fsw&lt;/code&gt;, &lt;code&gt;c.fswsp&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Zcd, as you might have guessed, just includes the double precision
floating point loads and stores from the standard &#x27;C&#x27; compressed
instruction set extension (&lt;code&gt;c.fld&lt;/code&gt;, &lt;code&gt;c.fldsp&lt;/code&gt;, &lt;code&gt;c.fsd&lt;/code&gt;, &lt;code&gt;c.fsdsp&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Experimental assembler/disassembler support for the
&lt;a href=&quot;https://github.com/riscv/riscv-isa-manual/releases/tag/draft-20220831-bf5a151&quot;&gt;Zihintntl&lt;/a&gt;
instruction set extension. This provides a small set of instructions that
can be used to hint that the memory accesses of the following instruction
exhibits poor temporal locality.&lt;/li&gt;
&lt;li&gt;Experimental assembler/disassembler support for the
&lt;a href=&quot;https://github.com/riscv/riscv-zawrs/releases/download/V1.0-rc3/Zawrs.pdf&quot;&gt;Zawrs&lt;/a&gt;
instruction set extension, providing a pair of instructions meant for use in
a polling loop allowing a core to enter a low-power state and wait on a
store to a memory location.&lt;/li&gt;
&lt;li&gt;Experimental support for the
&lt;a href=&quot;https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-spec.pdf&quot;&gt;Ztso&lt;/a&gt;
extension, which for now just means setting the appropriate ELF header flag.
If a core implements Ztso, it implements the Total Store Ordering memory
consistency model. Future releases will provide alternate lowerings of
atomics operations that take advantage of this.&lt;/li&gt;
&lt;li&gt;Code generation support for the &lt;a href=&quot;https://drive.google.com/file/d/1z3tQQLm5ALsAD77PM0l0CHnapxWCeVzP/view&quot;&gt;Zfhmin
extension&lt;/a&gt;
(load/store, conversion, and GPR/FPR move support for 16-bit floating point
values).&lt;/li&gt;
&lt;li&gt;Codegen and assembler/disassembler support for the
&lt;a href=&quot;https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.0/ventana-custom-extensions-v1.0.0.pdf&quot;&gt;XVentanaCondOps&lt;/a&gt;
vendor extension, which provides conditional arithmetic and move/select
operations.&lt;/li&gt;
&lt;li&gt;Codegen and assembler/disassembler support for the
&lt;a href=&quot;https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadvdot.adoc&quot;&gt;XTHeadVdot&lt;/a&gt;
vendor extension, which implements vector integer four 8-bit multiple and
32-bit add.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;lldb&quot;&gt;&lt;a href=&quot;#lldb&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;LLDB&lt;/h2&gt;
&lt;p&gt;LLDB has started to become usable for RISC-V in this period due to
work by contributor &#x27;Emmer&#x27;. As they &lt;a href=&quot;https://discourse.llvm.org/t/is-lldb-for-riscv-ready-to-use/68326/2&quot;&gt;summarise
here&lt;/a&gt;,
LLDB should be usable for debugging RV64 programs locally but support is
lacking for remote debug (e.g. via the gdb server protocol). During the LLVM
16 development window, LLDB gained &lt;a href=&quot;https://reviews.llvm.org/rG4fc7e9cba24b&quot;&gt;support for software single stepping on
RISC-V&lt;/a&gt;, support in
&lt;code&gt;EmulateInstructionRISCV&lt;/code&gt; for
&lt;a href=&quot;https://reviews.llvm.org/rGff7b876aa75d&quot;&gt;RV{32,64}I&lt;/a&gt;, as well as extensions
&lt;a href=&quot;https://reviews.llvm.org/rG49f9af1864d9&quot;&gt;A and M&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/rG05ae747a5353&quot;&gt;C&lt;/a&gt;,
&lt;a href=&quot;https://reviews.llvm.org/rG6d4ab6d92179&quot;&gt;RV32F&lt;/a&gt; and
&lt;a href=&quot;https://reviews.llvm.org/rG2d7f43f9eaf3&quot;&gt;RV64F&lt;/a&gt;, and
&lt;a href=&quot;https://reviews.llvm.org/rG6493fc4bccd2&quot;&gt;D&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;short-forward-branch-optimisation&quot;&gt;&lt;a href=&quot;#short-forward-branch-optimisation&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Short forward branch optimisation&lt;/h2&gt;
&lt;p&gt;Another improvement that&#x27;s fun to look more closely at is support for &quot;short
forward branch optimisation&quot; for the &lt;a href=&quot;https://www.sifive.com/press/sifive-core-ip-7-series-creates-new-class-of-embedded&quot;&gt;SiFive 7
series&lt;/a&gt;
cores. What does this mean? Well, let&#x27;s start by looking at the problem it&#x27;s
trying to solve. The base RISC-V ISA doesn&#x27;t include conditional moves or
predicated instructions, which can be a downside if your code features
unpredictable short forward branches (with the ensuing cost in terms of
branch mispredictions and bloating branch predictor state). The &lt;a href=&quot;https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf&quot;&gt;ISA
spec&lt;/a&gt;
includes commentary on this decision (page 23), noting some disadvantages of
adding such instructions to the specification and noting microarchitectural
techniques exist to convert short forward branches into predicated code
internally. In the case of the SiFive 7 series, this is achieved using
&lt;a href=&quot;https://en.wikichip.org/wiki/macro-operation_fusion&quot;&gt;macro-op fusion&lt;/a&gt; where a
branch over a single ALU instruction is fused and executed as a single
conditional instruction.&lt;/p&gt;
&lt;p&gt;In the LLVM 16 cycle, compiler optimisations targeting this microarchitectural
feature were enabled for &lt;a href=&quot;https://reviews.llvm.org/rG2b32e4f98b4f&quot;&gt;conditional move style
sequences&lt;/a&gt; (i.e. branch over a
register move) as well as for &lt;a href=&quot;https://reviews.llvm.org/rGda7415acdafb&quot;&gt;other ALU
operations&lt;/a&gt;. The job of the
compiler here is of course to emit a sequence compatible with the
micro-architectural optimisation when possible and profitable. I&#x27;m not aware
of other RISC-V designs implementing a similar optimisation - although there
are developments in terms of instructions to support such operations directly
in the ISA which would avoid the need for such microarchitectural tricks. See
&lt;a href=&quot;https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.0/ventana-custom-extensions-v1.0.0.pdf&quot;&gt;XVentanaCondOps&lt;/a&gt;,
&lt;a href=&quot;https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadcondmov.adoc&quot;&gt;XTheadCondMov&lt;/a&gt;,
the previously proposed but now abandoned &lt;a href=&quot;https://github.com/riscv/riscv-bitmanip/releases/download/v0.93/bitmanip-0.93.pdf&quot;&gt;Zbt
extension&lt;/a&gt;
(part of the earlier bitmanip spec) and more recently the proposed
&lt;a href=&quot;https://github.com/riscv/riscv-zicond&quot;&gt;Zicond&lt;/a&gt; (integer conditional
operations) standard extension.&lt;/p&gt;
&lt;h2 id=&quot;atomics&quot;&gt;&lt;a href=&quot;#atomics&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Atomics&lt;/h2&gt;
&lt;p&gt;It&#x27;s perhaps not surprising that code generation for atomics can be tricky to
understand, and the &lt;a href=&quot;https://llvm.org/docs/Atomics.html#atomics-and-codegen&quot;&gt;LLVM documentation on atomics codegen and
libcalls&lt;/a&gt; is actually
one of the best references on the topic I&#x27;ve found. A particularly important
note in that document is that if a backend supports any inline lock-free
atomic operations at a given size, all operations of that size must be
supported in a lock-free manner. If targeting a RISC-V CPU without the atomics
extension, all atomics operations would usually be lowered to &lt;code&gt;__atomic_*&lt;/code&gt;
libcalls. But if we know a bit more about the target, it&#x27;s possible to do
better - for instance, a single-core microcontroller could implement an atomic
operation in a lock-free manner by disabling interrupts (and conventionally,
lock-free implementations of atomics are provided through &lt;code&gt;__sync_*&lt;/code&gt;
libcalls).  This kind of setup is exactly what the &lt;a href=&quot;https://reviews.llvm.org/rGf5ed0cb217a9988f97b55f2ccb053bca7b41cc0c&quot;&gt;&lt;code&gt;+forced-atomics&lt;/code&gt;
feature&lt;/a&gt;
enables, where atomic load/store can be lowered to a load/store with
appropriate fences (as is supported in the base ISA) while other atomic
operations generate a &lt;code&gt;__sync_*&lt;/code&gt; libcall.&lt;/p&gt;
&lt;p&gt;There&#x27;s also been a very minor improvement for targets with native atomics
support (the &#x27;A&#x27; instruction set extension) that I may as well mention while
on the topic. As you might know, atomic operations such as compare and swap
that are lowered to an instruction sequence involving &lt;code&gt;lr.{w,d}&lt;/code&gt; (load reserved) and
&lt;code&gt;sc.{w,d}&lt;/code&gt; (store conditional). There are very specific rules about these
instruction sequences that must be met to align with the &lt;a href=&quot;https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf&quot;&gt;architectural
forward progress
guarantee&lt;/a&gt; (section 8.3, page 51),
which is why we expand to a fixed instruction sequence at a very late stage in
compilation (see &lt;a href=&quot;https://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html&quot;&gt;original
RFC&lt;/a&gt;). This
means the sequence of instructions implementing the atomic operation are
opaque to LLVM&#x27;s optimisation passes and are treated as a single unit. The
obvious disadvantage of avoiding LLVM&#x27;s optimisations is that sometimes there
are optimisations that would be helpful and wouldn&#x27;t break that
forward-progress guarantee. One that came up in real-world code was the lack
of branch folding, which would have simplified a branch in the expanded
&lt;code&gt;cmpxchg&lt;/code&gt; sequence that just targets another branch with the same condition
(by just folding in the eventual target). With some &lt;a href=&quot;https://reviews.llvm.org/rGce381281940f&quot;&gt;relatively simple
logic&lt;/a&gt;, this suboptimal codegen is
resolved.&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #177500&quot;&gt;; Before                 =&amp;gt; After&lt;/span&gt;
&lt;span style=&quot;color: #000000&quot;&gt;.loop:&lt;/span&gt;                   &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span style=&quot;color: #836C28&quot;&gt;.loop&lt;/span&gt;
  &lt;span style=&quot;color: #000000&quot;&gt;lr.w.aqrl&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a3&lt;/span&gt;, (&lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;)     &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;lr.w.aqrl&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a3&lt;/span&gt;, (&lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;)
  &lt;span style=&quot;color: #000000&quot;&gt;bne&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a3&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;a1&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;.afterloop&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;bne&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a3&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;a1&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;.loop&lt;/span&gt;
  &lt;span style=&quot;color: #000000&quot;&gt;sc.w.aqrl&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a4&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;a2&lt;/span&gt;, (&lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;) &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;sc.w.aqrl&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a4&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;a2&lt;/span&gt;, (&lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;)
  &lt;span style=&quot;color: #000000&quot;&gt;bnez&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a4&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;.loop&lt;/span&gt;         &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;bnez&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a4&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;.loop&lt;/span&gt;
&lt;span style=&quot;color: #000000&quot;&gt;.aferloop:&lt;/span&gt;               &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt;
  &lt;span style=&quot;color: #000000&quot;&gt;bne&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a3&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;a1&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;.loop&lt;/span&gt;      &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt;
  &lt;span style=&quot;color: #000000&quot;&gt;ret&lt;/span&gt;                    &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;ret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;assorted-optimisations&quot;&gt;&lt;a href=&quot;#assorted-optimisations&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Assorted optimisations&lt;/h2&gt;
&lt;p&gt;As you can imagine, there&#x27;s been a lot of incremental minor improvements over
the past ~6 months. I unfortunately only have space (and patience) to highight
a few of them.&lt;/p&gt;
&lt;p&gt;A new pre-regalloc pseudo instruction expansion pass was
&lt;a href=&quot;https://reviews.llvm.org/rG260a64106854986a981e49ed87ee740460a23eb5&quot;&gt;added&lt;/a&gt;
in order to allow &lt;a href=&quot;https://reviews.llvm.org/rG0bc177b6f54b&quot;&gt;optimising&lt;/a&gt; the
global address access instruction sequences such as those found in the &lt;a href=&quot;https://github.com/riscv-non-isa/riscv-toolchain-conventions/blob/master/README.mkd#specifying-the-target-code-model-with--mcmodel&quot;&gt;medany
code
model&lt;/a&gt;
(and was later &lt;a href=&quot;https://reviews.llvm.org/rGda5b1bf5bb0f&quot;&gt;broadened further&lt;/a&gt;).
This results in improvements such as the following (note: this transformation
was already supported for the medlow code model):&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #177500&quot;&gt;; Before                            =&amp;gt; After&lt;/span&gt;
&lt;span style=&quot;color: #000000&quot;&gt;.Lpcrel_hi1:&lt;/span&gt;                        &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span style=&quot;color: #836C28&quot;&gt;.Lpcrel_hi1&lt;/span&gt;
&lt;span style=&quot;color: #000000&quot;&gt;auipc&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;%pcrel_hi1&lt;/span&gt;(&lt;span style=&quot;color: #000000&quot;&gt;ga&lt;/span&gt;)            &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;auipc&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;%pcrel_hi1&lt;/span&gt;(&lt;span style=&quot;color: #000000&quot;&gt;ga+&lt;/span&gt;&lt;span style=&quot;color: #1C01CE&quot;&gt;4&lt;/span&gt;)
&lt;span style=&quot;color: #000000&quot;&gt;addi&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;%pcrel_lo&lt;/span&gt;(.&lt;span style=&quot;color: #000000&quot;&gt;Lpcrel_hi1&lt;/span&gt;) &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt;
&lt;span style=&quot;color: #000000&quot;&gt;lw&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;, &lt;span style=&quot;color: #1C01CE&quot;&gt;4&lt;/span&gt;(&lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;)                        &lt;span style=&quot;color: #000000&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;lw&lt;/span&gt; &lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;, &lt;span style=&quot;color: #000000&quot;&gt;%pcrel_lo&lt;/span&gt;(.&lt;span style=&quot;color: #000000&quot;&gt;Lpcrel_hi1&lt;/span&gt;)(&lt;span style=&quot;color: #000000&quot;&gt;a0&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A missing target hook (&lt;code&gt;isUsedByReturnOnly&lt;/code&gt;) had been preventing tail calling
libcalls in some cases. This was
&lt;a href=&quot;https://reviews.llvm.org/rG47b1f8362aa4&quot;&gt;fixed&lt;/a&gt;, and later support was added
for &lt;a href=&quot;https://reviews.llvm.org/rGe94dc58dff1d&quot;&gt;generating an inlined sequence of
instructions&lt;/a&gt; for some of the
floating point libcalls.&lt;/p&gt;
&lt;p&gt;The RISC-V compressed instruction set extension defines a number of 16-bit
encodings that map to a 32-bit longer form (with restrictions on addressable
registers in the compressed form of course). The conversion 32-bit
instructions 16-bit forms when possible happens at a very late stage, after
instruction selection. But of course over time, we&#x27;ve introduced more tuning
to influence codegen decisions in cases where a choice can be made to produce
an instruction that can be compressed, rather than one that can&#x27;t. A recent
addition to this was the &lt;a href=&quot;https://reviews.llvm.org/rGd64d3c5a8f81&quot;&gt;RISCVStripWSuffix
pass&lt;/a&gt;, which for RV64 targets will
convert &lt;code&gt;addw&lt;/code&gt; and &lt;code&gt;slliw&lt;/code&gt; to &lt;code&gt;add&lt;/code&gt; or &lt;code&gt;slli&lt;/code&gt; respectively when it can be
determined that all the users of its result only use the lower 32 bits. This
is a minor code size saving, as &lt;code&gt;slliw&lt;/code&gt; has no matching compressed instruction
and &lt;code&gt;c.addw&lt;/code&gt; can address a more restricted set of registers than &lt;code&gt;c.add&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&quot;other&quot;&gt;&lt;a href=&quot;#other&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Other&lt;/h2&gt;
&lt;p&gt;At the risk of repeating myself, this has been a selective tour of some
additions I thought it would be fun to write about. Apologies if I&#x27;ve missed
your favourite new feature or improvement - the &lt;a href=&quot;https://releases.llvm.org/16.0.0/docs/ReleaseNotes.html#changes-to-the-risc-v-backend&quot;&gt;LLVM release
notes&lt;/a&gt;
will include some things I haven&#x27;t had space for here. Thanks again for
everyone who has been contributing to make the RISC-V in LLVM even better.&lt;/p&gt;
&lt;p&gt;If you have a RISC-V project you think me and my colleagues and at Igalia may
be able to help with, then do &lt;a href=&quot;https://www.igalia.com/contact/&quot;&gt;get in touch&lt;/a&gt;
regarding our services.&lt;/p&gt;
&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor &quot;aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2023-03-19: Clarified that Zawrs and Zihintntl support just involves
the MC layer (assembler/disassembler).&lt;/li&gt;
&lt;li&gt;2023-03-18: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
</feed>
