<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title>Muxup</title>
<subtitle>Adventures in collaborative open source development</subtitle>
<link href="https://muxup.com/feed.xml" rel="self" type="application/atom+xml"/>
<link href="https://muxup.com"/>
<updated>2026-05-11T12:00:00Z</updated>
<id>https://muxup.com/feed.xml</id>
<entry>
<title>Building 32-bit RISC-V sysroots and images with Yocto</title>
<published>2026-05-11T12:00:00Z</published>
<updated>2026-05-11T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/building-32-bit-risc-v-sysroots-and-images-with-yocto"/>
<id>https://muxup.com/building-32-bit-risc-v-sysroots-and-images-with-yocto</id>
<content type="html">
&lt;p&gt;Thanks to the Debian 64-bit RISC-V port it&#x27;s really easy to build a sysroot
appropriate for cross-compiling Clang/LLVM and its separate &lt;a href=&quot;https://llvm.org/docs/TestSuiteGuide.html&quot;&gt;test
suite&lt;/a&gt;. Either &lt;a href=&quot;/2024q4/rootless-cross-architecture-debootstrap&quot;&gt;use my
rootless-deboostrap-wrapper
script&lt;/a&gt; or the
&lt;a href=&quot;https://llvm.org/docs/HowToCrossCompileLLVM.html#setting-up-a-sysroot&quot;&gt;command I documented in LLVM&#x27;s cross-compilation
instructions&lt;/a&gt;,
being sure to &lt;a href=&quot;https://llvm.org/docs/HowToCrossCompileLLVM.html#working-around-a-ninja-dependency-issue&quot;&gt;see the note on working around a Ninja dependency
issue&lt;/a&gt;.
For a bootable QEMU image, Debian-based recipes are &lt;a href=&quot;/2026q2/bootable-qemu-image-menagerie-with-rootless-debootstrap&quot;&gt;similarly
straightforward&lt;/a&gt;.
But we don&#x27;t have the luxury of a precompiled distribution for 32-bit RISC-V
and so we&#x27;ll lean on &lt;a href=&quot;https://www.yoctoproject.org/&quot;&gt;Yocto&lt;/a&gt; to produce the
needed sysroot by building from source. I cover three cases: 1) building a
sysroot for cross-compiling projects like LLVM, 2) doing the same but in a way
that requires fewer build steps, 3) building an image approximating my
debootstrap image recipes.&lt;/p&gt;
&lt;p&gt;In this article I use release 5.3 (&#x27;Whinlatter&#x27;), which &lt;a href=&quot;https://docs.yoctoproject.org/dev/migration-guides/release-notes-5.3.html&quot;&gt;introduced the
&lt;code&gt;bitbake-setup&lt;/code&gt; helper
tool&lt;/a&gt;.
For documentation, I found the &lt;a href=&quot;https://docs.yoctoproject.org/dev/brief-yoctoprojectqs/index.html&quot;&gt;Yocto quick build
guide&lt;/a&gt;, and
&lt;a href=&quot;https://docs.yoctoproject.org/dev/brief-yoctoprojectqs/index.html&quot;&gt;bitbake-setup
docs&lt;/a&gt;, and
&lt;a href=&quot;https://docs.yoctoproject.org/dev-manual/customizing-images.html#customizing-images&quot;&gt;image customisation
guide&lt;/a&gt;
helpful.&lt;/p&gt;
&lt;p&gt;I&#x27;m not a Yocto developer, so if you reading this and think there are other
approaches to consider or alternative ways of solving the problem that are
better, please do drop me a note!&lt;/p&gt;
&lt;h2 id=&quot;common-setup&quot;&gt;&lt;a href=&quot;#common-setup&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Common setup&lt;/h2&gt;
&lt;p&gt;I&#x27;m running on Arch Linux which isn&#x27;t one of the tested Yocto host
distributions, but seemed to work just fine.&lt;/p&gt;
&lt;p&gt;I found I needed to enable the &lt;code&gt;en_US&lt;/code&gt; locale:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;sudo sed /etc/locale.gen -i -e &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;s/^\#en_US.UTF-8 UTF-8.*/en_US.UTF-8 UTF-8/&amp;quot;&lt;/span&gt;
sudo locale-gen
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And install the following additional packages:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;sudo pacman -S inetutils chrpath cpio diffstat rpcsvc-proto flex bison zstd
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now we will check out bitbake into a work directory and set a directory to be
used to hold downloaded files:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;mkdir yocto-work &lt;span style=&quot;color: #000&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; yocto-work
git clone https://git.openembedded.org/bitbake
./bitbake/bin/bitbake-setup settings &lt;span style=&quot;color: #A90D91&quot;&gt;set&lt;/span&gt; default dl-dir &lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;/.cache/yocto/dl
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;producing-a-sysroot-based-on-core-image-minimal&quot;&gt;&lt;a href=&quot;#producing-a-sysroot-based-on-core-image-minimal&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Producing a sysroot based on core-image-minimal&lt;/h2&gt;
&lt;p&gt;As is often the case, the workload I&#x27;m interested in here is LLVM. If you&#x27;re
looking to build a sysroot to cross-compile something else, you may need a
slightly different package list.&lt;/p&gt;
&lt;p&gt;In this first stanza, we use &lt;code&gt;bitbake-setup&lt;/code&gt; to initialise our development
environment. Because there isn&#x27;t a predefined machine target for riscv32 in
&lt;code&gt;bitbake/default-registry/configurations/poky-whinlatter.conf.json&lt;/code&gt;, we
avoid selecting &lt;code&gt;machine&lt;/code&gt; and will address it later. Importantly, we set a
&lt;code&gt;SSTATE_DIR&lt;/code&gt; which will be used for the shared state cache, avoiding
rebuilding packages when not necessary (I&#x27;m not totaly sure when this isn&#x27;t
exposed in &lt;code&gt;bitbake-setup settings&lt;/code&gt; like &lt;code&gt;dl-dir&lt;/code&gt; is).&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;./bitbake/bin/bitbake-setup init --non-interactive &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --skip-selection machine &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  ./bitbake/default-registry/configurations/poky-whinlatter.conf.json &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  poky &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  distro/poky

&lt;span style=&quot;color: #A90D91&quot;&gt;printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;SSTATE_DIR = &amp;quot;%s&amp;quot;\n&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.cache/yocto/sstate&amp;quot;&lt;/span&gt; &amp;gt;&amp;gt; bitbake-builds/site.conf
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With that done, we can source the generated definitions to enter the build
environment (note we&#x27;re using the default setup directory, you can override it
to something other than &lt;code&gt;poky-whinlatter&lt;/code&gt; by using &lt;code&gt;--setup-dir-name&lt;/code&gt;) and
use &lt;code&gt;enable-fragment&lt;/code&gt; to set the qemuriscv32 machine:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;. bitbake-builds/poky-whinlatter/build/init-build-env
bitbake-config-build enable-fragment machine/qemuriscv32
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now configure the build, indicating the additional libraries that need to be
present and run &lt;code&gt;bitbake&lt;/code&gt; to actually produce it:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;cat &amp;gt;&amp;gt; conf/local.conf &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;&amp;#39;EOF&amp;#39;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;IMAGE_INSTALL:append = &amp;quot; \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  glibc-dev \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  libgcc \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  libgcc-dev \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  libatomic \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  libatomic-dev \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  libstdc++ \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  libstdc++-dev \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;

bitbake core-image-minimal
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This results in 4482 build tasks and takes quite some time to complete if you
haven&#x27;t run it before (i.e. aren&#x27;t hitting in the sstate cache). The next
section of this article explores how to produce the needed output while
building much less, but let&#x27;s finish the job and extract a rootfs from what
was built. I would like to now &lt;a href=&quot;https://docs.yoctoproject.org/sdk-manual/appendix-obtain.html#extracting-the-root-filesystem&quot;&gt;follow advice in the
documentation&lt;/a&gt;
and do &lt;code&gt;runqemu-extract-sdk tmp/deploy/images/qemuriscv32/core-image-minimal-qemuriscv32.rootfs.tar.zst ~/rv32sysroot&lt;/code&gt;, except that fails because the &lt;code&gt;runqemu-extract-sdk&lt;/code&gt; script
doesn&#x27;t recognise .tar.zst (I&#x27;ve &lt;a href=&quot;https://lists.openembedded.org/g/openembedded-core/message/229316&quot;&gt;submitted a
patch&lt;/a&gt;). So
instead we manually extract the .tar from the .tar.zst and then run the
runqemu-extract-sdk script:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;zstd -d -k -f tmp/deploy/images/qemuriscv32/core-image-minimal-qemuriscv32.rootfs.tar.zst
runqemu-extract-sdk tmp/deploy/images/qemuriscv32/core-image-minimal-qemuriscv32.rootfs.tar ~/rv32sysroot
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At this point, you have a sysroot that&#x27;s &lt;em&gt;almost&lt;/em&gt; directly usable for
cross-compiling Clang/LLVM (with &lt;code&gt;--target=riscv32-poky-linux&lt;/code&gt;) but there are
three finalisation steps we will perform:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add an additional symlink to the tree so that upstream Clang&#x27;s search
procedure for the GCC install finds the correct directory. The combination
of
&lt;a href=&quot;https://git.openembedded.org/openembedded-core/tree/meta/recipes-devtools/clang/clang/0010-clang-Define-releative-gcc-installation-dir.patch?h=whinlatter&quot;&gt;these&lt;/a&gt;
&lt;a href=&quot;https://git.openembedded.org/openembedded-core/tree/meta/recipes-devtools/clang/clang/0018-llvm-clang-Insert-anchor-for-adding-OE-distro-vendor.patch?h=whinlatter&quot;&gt;two&lt;/a&gt;
downstream patches which Yocto applies to its own Clang builds would make
this unnecessary. I&#x27;m not sure if upstreaming has ever been pursued.&lt;/li&gt;
&lt;li&gt;Convert all absolute symlinks to relative ones. Yocto provides a Python
script for this, which is in our &lt;code&gt;$PATH&lt;/code&gt; after sourcing
&lt;code&gt;build/init-build-env&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;(Optional) Apply workaround &lt;a href=&quot;https://llvm.org/docs/HowToCrossCompileLLVM.html#working-around-a-ninja-dependency-issue&quot;&gt;for a ninja
issue&lt;/a&gt;
that would otherwise mean incremental builds don&#x27;t work.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rv32sysroot/usr/lib/gcc&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; ln -s ../riscv32-poky-linux &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rv32sysroot/usr/lib/gcc/riscv32-poky-linux&amp;quot;&lt;/span&gt;
sysroot-relativelinks.py &lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;/rv32sysroot
ln -s usr/include &lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;/rv32sysroot/include
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;producing-a-sysroot-with-fewer-build-steps&quot;&gt;&lt;a href=&quot;#producing-a-sysroot-with-fewer-build-steps&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Producing a sysroot with fewer build steps&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;core-image-minimal&lt;/code&gt; recipe above is straightforward, but does a lot more
work than strictly necessary. We can reduce this by instead adding a
dependency-only recipe that explicitly lists the needed build-time
dependencies and contains logic to produce the sysroot.&lt;/p&gt;
&lt;p&gt;First, create a layer:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;. bitbake-builds/poky-whinlatter/build/init-build-env
bitbake-layers create-layer --add-layer ../layers/meta-rv32-llvm-sysroot
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Then add the recipe:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;recipe_dir=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;../layers/meta-rv32-llvm-sysroot/recipes-devtools/rv32-llvm-deps-sysroot&amp;quot;&lt;/span&gt;
mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$recipe_dir&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;

cat &amp;gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$recipe_dir&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rv32-llvm-deps-sysroot.bb&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;&amp;#39;EOF&amp;#39;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;SUMMARY = &amp;quot;Dependency-only recipe to export an RV32 sysroot&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;LICENSE = &amp;quot;MIT-0&amp;quot;&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;INHIBIT_DEFAULT_DEPS = &amp;quot;1&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EXCLUDE_FROM_WORLD = &amp;quot;1&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;PACKAGE_ARCH = &amp;quot;${MACHINE_ARCH}&amp;quot;&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;DEPENDS = &amp;quot;virtual/libc libgcc virtual/${MLPREFIX}compilerlibs zlib&amp;quot;&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;inherit deploy nopackages&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;do_configure[noexec] = &amp;quot;1&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;do_compile[noexec] = &amp;quot;1&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;do_install[noexec] = &amp;quot;1&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;do_populate_sysroot[noexec] = &amp;quot;1&amp;quot;&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;do_deploy() {&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  export_dir=&amp;quot;${DEPLOYDIR}/${PN}-${MACHINE}&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  rm -rf &amp;quot;$export_dir&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  mkdir -p &amp;quot;$export_dir&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  cp -a &amp;quot;${RECIPE_SYSROOT}/.&amp;quot; &amp;quot;$export_dir/&amp;quot;&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;  sysroot-relativelinks.py &amp;quot;$export_dir&amp;quot;&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;  mkdir -p &amp;quot;$export_dir/usr/lib/gcc&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  ln -s ../riscv32-poky-linux &amp;quot;$export_dir/usr/lib/gcc/riscv32-poky-linux&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  ln -s usr/include &amp;quot;$export_dir/include&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;}&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;addtask deploy after do_prepare_recipe_sysroot before do_build&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;do_deploy&lt;/code&gt; function implements the sysroot preparation logic that largely
mirrors the previous section. Otherwise, &lt;code&gt;DEPENDS&lt;/code&gt; specifies the needed
dependencies (of these, &lt;code&gt;virtual/${MLPREFIX}compilerlibs&lt;/code&gt; is a bit magic -
this resolves to the compiler runtime provider which pulls in things like
&lt;code&gt;libstdc++&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Build the sysroot with:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;bitbake rv32-llvm-deps-sysroot
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This performs ~850 build tasks and will produce the sysroot at
&lt;code&gt;tmp/deploy/images/qemuriscv32/rv32-llvm-deps-sysroot-qemuriscv32/&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The sysroot is slightly larger than the one in the section above because it
contains large unstripped static archives like &lt;code&gt;usr/lib/libstdc++.a&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&quot;producing-a-featureful-image-bootable-in-qemu&quot;&gt;&lt;a href=&quot;#producing-a-featureful-image-bootable-in-qemu&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Producing a featureful image bootable in QEMU&lt;/h2&gt;
&lt;p&gt;Watch this space!&lt;/p&gt;

&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2026-05-11: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>Bootable QEMU image menagerie with rootless debootstrap</title>
<published>2026-05-04T12:00:00Z</published>
<updated>2026-05-04T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2026q2/bootable-qemu-image-menagerie-with-rootless-debootstrap"/>
<id>https://muxup.com/2026q2/bootable-qemu-image-menagerie-with-rootless-debootstrap</id>
<content type="html">
&lt;p&gt;Quite some time ago I shared a script and methodology for &lt;a href=&quot;/2024q4/rootless-cross-architecture-debootstrap&quot;&gt;performing a
cross-architecture debootstrap in a rootless
way&lt;/a&gt;. I had a short
note on producing an image bootable in QEMU, but it was fairly minimal. This
page provides a cookbook / quick reference on producing such images across
various Debian target architectures supported by QEMU. The goal is that the
starting point here &quot;gets the basics right&quot; for local experimentation, but of
course you are encouraged to evolve the recipe for your needs.&lt;/p&gt;
&lt;p&gt;The basic process is to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Build a root filesystem with &lt;code&gt;rootless-debootstrap-wrapper&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Configure just enough networking, DNS, serial login, and SSH.&lt;/li&gt;
&lt;li&gt;Create a 30 GiB ext4 filesystem image directly with &lt;code&gt;mkfs.ext4&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Boot it with &lt;code&gt;qemu-system-*&lt;/code&gt;, passing the Debian kernel and initrd directly.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We use Debian trixie for amd64, arm64, armhf, ppc64el, riscv64, and s390x.
We use sid for ppc64 big endian and loong64. I ran all of this on a current
Arch Linux install.&lt;/p&gt;
&lt;h2 id=&quot;common-setup&quot;&gt;&lt;a href=&quot;#common-setup&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Common setup&lt;/h2&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;sudo pacman -S debootstrap fakeroot qemu-user-static qemu-user-static-binfmt &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  qemu-emulators-full e2fsprogs socat debian-archive-keyring debian-ports-archive-keyring
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Put
&lt;a href=&quot;https://github.com/muxup/medley/blob/main/rootless-debootstrap-wrapper&quot;&gt;&lt;code&gt;rootless-debootstrap-wrapper&lt;/code&gt;&lt;/a&gt;
somewhere in your &lt;code&gt;PATH&lt;/code&gt;, then create a working directory:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;mkdir -p qemu-debian-images
&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; qemu-debian-images
mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/debcache&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Paste the following into your terminal, which will be called to do the common
guest-side configuration. The main thing that&#x27;s slightly non-standard in this
setup are the systemd drop-in overrides which allow authorised SSH keys to be
specified by teh systemd credential mechanism. If that&#x27;s not something you&#x27;re
interested in doing, you can skip the parts touch &lt;code&gt;/etc/systemd/system/ssh*&lt;/code&gt;
altogether.&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;configure_qemu_rootfs&lt;span style=&quot;color: #000&quot;&gt;()&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;{&lt;/span&gt;
  &lt;span style=&quot;color: #000&quot;&gt;rootfs=$1&lt;/span&gt;
  &lt;span style=&quot;color: #000&quot;&gt;console=$2&lt;/span&gt;
  &lt;span style=&quot;color: #000&quot;&gt;suite=$3&lt;/span&gt;
  &lt;span style=&quot;color: #000&quot;&gt;hostname=$4&lt;/span&gt;

  &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$rootfs&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/_enter&amp;quot;&lt;/span&gt; sh &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;EOF&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;mkdir -p /etc/systemd/network /etc/ssh/sshd_config.d&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;cat &amp;gt; /etc/systemd/network/10-qemu.network &amp;lt;&amp;lt;&amp;#39;INNER&amp;#39;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;[Match]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Type=ether&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;[Network]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;DHCP=yes&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;INNER&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;cat &amp;gt; /etc/ssh/sshd_config.d/20-qemu-login.conf &amp;lt;&amp;lt;&amp;#39;INNER&amp;#39;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;PermitRootLogin yes&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;PasswordAuthentication yes&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;INNER&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;rm -f /etc/ssh/ssh_host_*_key /etc/ssh/ssh_host_*_key.pub&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;cat &amp;gt; /etc/systemd/system/ssh.service.d/10-ephemeral-authorized-keys.conf &amp;lt;&amp;lt;&amp;#39;INNER&amp;#39;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;[Service]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;ImportCredential=ssh.ephemeral-authorized_keys-all&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;ExecStart=&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;ExecStart=/usr/sbin/sshd -D \$SSHD_OPTS -o &amp;quot;AuthorizedKeysFile .ssh/authorized_keys&amp;quot; -o &amp;quot;AuthorizedKeysCommand /usr/bin/cat \${CREDENTIALS_DIRECTORY}/ssh.ephemeral-authorized_keys-all&amp;quot; -o &amp;quot;AuthorizedKeysCommandUser root&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;INNER&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;cat &amp;gt; /etc/systemd/system/sshd-vsock@.service.d/10-ephemeral-authorized-keys.conf &amp;lt;&amp;lt;&amp;#39;INNER&amp;#39;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;[Service]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;ImportCredential=ssh.ephemeral-authorized_keys-all&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;ExecStart=&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;ExecStart=-/usr/sbin/sshd -i \$SSHD_OPTS -o &amp;quot;AuthorizedKeysFile .ssh/authorized_keys&amp;quot; -o &amp;quot;AuthorizedKeysCommand /usr/bin/cat \${CREDENTIALS_DIRECTORY}/ssh.ephemeral-authorized_keys-all&amp;quot; -o &amp;quot;AuthorizedKeysCommandUser root&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;INNER&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;/usr/bin/systemd-firstboot --locale=C.UTF-8 --hostname=${hostname} --force&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;ln -sf ../locale.conf /etc/default/locale&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;printf &amp;#39;127.0.1.1 %s\n&amp;#39; &amp;quot;$hostname&amp;quot; &amp;gt;&amp;gt; /etc/hosts&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;printf &amp;#39;uninitialized\n&amp;#39; &amp;gt; /etc/machine-id&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;mkdir -p /var/lib/dbus&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;rm -f /var/lib/dbus/machine-id&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;ln -sf /etc/machine-id /var/lib/dbus/machine-id&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;systemctl enable systemd-networkd systemd-resolved systemd-timesyncd ssh&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;systemctl enable serial-getty@${console}.service&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;ln -sf ../run/systemd/resolve/resolv.conf /etc/resolv.conf&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;printf &amp;#39;root:root\n&amp;#39; | chpasswd&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;adduser --gecos &amp;quot;,,,&amp;quot; --disabled-password user&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;usermod -aG sudo user&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;printf &amp;#39;user:user\n&amp;#39; | chpasswd&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;

  &lt;span style=&quot;color: #A90D91&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;[&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$suite&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; trixie &lt;span style=&quot;color: #000&quot;&gt;]&lt;/span&gt;; &lt;span style=&quot;color: #A90D91&quot;&gt;then&lt;/span&gt;
    cat &amp;gt;&amp;gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$rootfs&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/etc/apt/sources.list&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;&amp;#39;EOF&amp;#39;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;deb https://security.debian.org/debian-security trixie-security main&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;deb https://deb.debian.org/debian trixie-updates main&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;
  &lt;span style=&quot;color: #A90D91&quot;&gt;fi&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This should &lt;em&gt;not&lt;/em&gt; be exposed on any public network without further
configuration. You can ssh in to either the root user or &lt;code&gt;user&lt;/code&gt; via ssh, using
password &lt;code&gt;root&lt;/code&gt; or &lt;code&gt;user&lt;/code&gt; respectively. The commands below expose ssh via a
unix domain socket. One potential gotcha: this unix domain socket must not
have any &lt;code&gt;-&lt;/code&gt; in its name as that collides with the splitting done for the
&lt;code&gt;hostfwd&lt;/code&gt; argument. The examples given below avoid this issue.  The boot
commands pass &lt;code&gt;net.ifnames=0&lt;/code&gt;, so the single QEMU network device is
consistently named &lt;code&gt;eth0&lt;/code&gt; and matched by the networkd config above (I found
this more reliable than &lt;code&gt;ln -sf /dev/null /etc/udev/rules.d/80-net-setup-link.rules&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;For simplicity we make use of &lt;code&gt;mkfs.ext4&lt;/code&gt;&#x27;s ability to populate the image from
a directory. Pleasingly, &lt;code&gt;mkfs.xfs&lt;/code&gt; gained a similar ability in the &lt;a href=&quot;https://lwn.net/Articles/1042751/&quot;&gt;xfsprogs
6.17.0 release&lt;/a&gt; in Oct 2025. If you have
a new enough version, and you prefer an XFS rootfs over ext4 you can tweak the
recipes below to do the following for the final image population step:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;fakeroot -i &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.fakeroot.env&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  mkfs.xfs -f -q -L rootfs &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
    -d file,name&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rootfs.img&amp;quot;&lt;/span&gt;,size&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;30g &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
    -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;,atime&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #1C01CE&quot;&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;amd64--x86-64&quot;&gt;&lt;a href=&quot;#amd64--x86-64&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;amd64 / x86-64&lt;/h2&gt;
&lt;p&gt;Build:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;WORK=$PWD&lt;/span&gt;/amd64-trixie-qemu
&lt;span style=&quot;color: #000&quot;&gt;ROOTFS=$WORK&lt;/span&gt;/rootfs
mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;

rootless-debootstrap-wrapper &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --arch&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;amd64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --suite&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;trixie &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --mirror&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;https://deb.debian.org/debian &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --cache-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/debcache&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --target-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --include&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;linux-image-amd64,zstd,dbus,systemd-resolved,systemd-timesyncd,openssh-server,sudo

configure_qemu_rootfs &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; ttyS0 trixie qemu-amd64-trixie
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/vmlinuz-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/kernel&amp;quot;&lt;/span&gt;
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/initrd.img-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/initrd&amp;quot;&lt;/span&gt;
fakeroot -i &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.fakeroot.env&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  mkfs.ext4 -q -L rootfs -d &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rootfs.img&amp;quot;&lt;/span&gt; 30G
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Boot:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; amd64-trixie-qemu
qemu-system-x86_64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -accel kvm &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -machine q35 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -cpu host &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -smp &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -m 8G &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -drive &lt;span style=&quot;color: #000&quot;&gt;file=&lt;/span&gt;rootfs.img,if&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;none,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd,format&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;raw &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-blk-pci,drive&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -netdev user,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net,hostfwd&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;unix:/tmp/qemu_amd64.sock-:22 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-net-pci,netdev&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -object rng-random,filename&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;/dev/urandom,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-rng-pci,rng&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -kernel kernel &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -initrd initrd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -nographic &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -append &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;rw root=LABEL=rootfs console=ttyS0 net.ifnames=0&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above assumes you are running on a x86-64 host, hence enables KVM. If not,
then drop &lt;code&gt;-accel kvm&lt;/code&gt; and use &lt;code&gt;-cpu max&lt;/code&gt; instead of &lt;code&gt;-cpu host&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&quot;arm64--aarch64&quot;&gt;&lt;a href=&quot;#arm64--aarch64&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;arm64 / AArch64&lt;/h2&gt;
&lt;p&gt;Build:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;WORK=$PWD&lt;/span&gt;/arm64-trixie-qemu
&lt;span style=&quot;color: #000&quot;&gt;ROOTFS=$WORK&lt;/span&gt;/rootfs
mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;

rootless-debootstrap-wrapper &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --arch&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;arm64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --suite&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;trixie &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --mirror&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;https://deb.debian.org/debian &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --cache-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/debcache&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --target-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --include&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;linux-image-arm64,zstd,dbus,systemd-resolved,systemd-timesyncd,openssh-server,sudo

configure_qemu_rootfs &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; ttyAMA0 trixie qemu-arm64-trixie
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/vmlinuz-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/kernel&amp;quot;&lt;/span&gt;
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/initrd.img-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/initrd&amp;quot;&lt;/span&gt;
fakeroot -i &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.fakeroot.env&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  mkfs.ext4 -q -L rootfs -d &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rootfs.img&amp;quot;&lt;/span&gt; 30G
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Boot:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; arm64-trixie-qemu
qemu-system-aarch64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -machine virt &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -cpu cortex-a57 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -smp &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -m 8G &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -drive &lt;span style=&quot;color: #000&quot;&gt;file=&lt;/span&gt;rootfs.img,if&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;none,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd,format&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;raw &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-blk-device,drive&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -netdev user,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net,hostfwd&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;unix:/tmp/qemu_arm64.sock-:22 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-net-device,netdev&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -object rng-random,filename&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;/dev/urandom,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-rng-device,rng&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -kernel kernel &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -initrd initrd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -nographic &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -append &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;rw root=LABEL=rootfs console=ttyAMA0 net.ifnames=0&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;armhf--32-bit-arm&quot;&gt;&lt;a href=&quot;#armhf--32-bit-arm&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;armhf / 32-bit ARM&lt;/h2&gt;
&lt;p&gt;For this one I had to add the relevant virtio modules to the initrd.&lt;/p&gt;
&lt;p&gt;Build:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;WORK=$PWD&lt;/span&gt;/armhf-trixie-qemu
&lt;span style=&quot;color: #000&quot;&gt;ROOTFS=$WORK&lt;/span&gt;/rootfs
mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;

rootless-debootstrap-wrapper &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --arch&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;armhf &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --suite&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;trixie &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --mirror&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;https://deb.debian.org/debian &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --cache-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/debcache&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --target-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --include&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;linux-image-armmp,zstd,dbus,systemd-resolved,systemd-timesyncd,openssh-server,sudo

configure_qemu_rootfs &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; ttyAMA0 trixie qemu-armhf-trixie
&lt;span style=&quot;color: #A90D91&quot;&gt;printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;%s\n&amp;#39;&lt;/span&gt; virtio_mmio virtio_blk virtio_net &amp;gt;&amp;gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/etc/initramfs-tools/modules&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/_enter&amp;quot;&lt;/span&gt; update-initramfs -u -k all
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/vmlinuz-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/kernel&amp;quot;&lt;/span&gt;
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/initrd.img-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/initrd&amp;quot;&lt;/span&gt;
fakeroot -i &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.fakeroot.env&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  mkfs.ext4 -q -L rootfs -d &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rootfs.img&amp;quot;&lt;/span&gt; 30G
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Boot:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; armhf-trixie-qemu
qemu-system-arm &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -machine virt &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -cpu cortex-a15 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -smp &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -m 4G &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -drive &lt;span style=&quot;color: #000&quot;&gt;file=&lt;/span&gt;rootfs.img,if&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;none,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd,format&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;raw &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-blk-device,drive&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -netdev user,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net,hostfwd&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;unix:/tmp/qemu_armhf.sock-:22 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-net-device,netdev&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -object rng-random,filename&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;/dev/urandom,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-rng-device,rng&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -kernel kernel &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -initrd initrd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -nographic &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -append &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;rw root=LABEL=rootfs console=ttyAMA0 net.ifnames=0&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;riscv64&quot;&gt;&lt;a href=&quot;#riscv64&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;riscv64&lt;/h2&gt;
&lt;p&gt;Build:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;WORK=$PWD&lt;/span&gt;/riscv64-trixie-qemu
&lt;span style=&quot;color: #000&quot;&gt;ROOTFS=$WORK&lt;/span&gt;/rootfs
mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;

rootless-debootstrap-wrapper &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --arch&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;riscv64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --suite&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;trixie &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --mirror&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;https://deb.debian.org/debian &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --cache-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/debcache&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --target-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --include&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;linux-image-riscv64,zstd,dbus,systemd-resolved,systemd-timesyncd,openssh-server,sudo

configure_qemu_rootfs &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; ttyS0 trixie qemu-riscv64-trixie
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/vmlinux-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/kernel&amp;quot;&lt;/span&gt;
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/initrd.img-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/initrd&amp;quot;&lt;/span&gt;
fakeroot -i &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.fakeroot.env&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  mkfs.ext4 -q -L rootfs -d &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rootfs.img&amp;quot;&lt;/span&gt; 30G
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Boot:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; riscv64-trixie-qemu
qemu-system-riscv64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -machine virt &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -cpu rv64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -smp &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -m 8G &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -drive &lt;span style=&quot;color: #000&quot;&gt;file=&lt;/span&gt;rootfs.img,if&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;none,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd,format&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;raw &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-blk-device,drive&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -netdev user,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net,hostfwd&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;unix:/tmp/qemu_riscv64.sock-:22 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-net-device,netdev&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -object rng-random,filename&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;/dev/urandom,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-rng-device,rng&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -bios /usr/share/qemu/opensbi-riscv64-generic-fw_dynamic.bin &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -kernel kernel &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -initrd initrd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -nographic &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -append &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;rw root=LABEL=rootfs console=ttyS0 net.ifnames=0&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above assumes you have opensbi installed in /usr/share/qemu (it is put
here by the qemu-system-riscv-firmware package on Arch).&lt;/p&gt;
&lt;h2 id=&quot;ppc64el&quot;&gt;&lt;a href=&quot;#ppc64el&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;ppc64el&lt;/h2&gt;
&lt;p&gt;Build:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;WORK=$PWD&lt;/span&gt;/ppc64el-trixie-qemu
&lt;span style=&quot;color: #000&quot;&gt;ROOTFS=$WORK&lt;/span&gt;/rootfs
mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;

rootless-debootstrap-wrapper &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --arch&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;ppc64el &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --suite&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;trixie &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --mirror&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;https://deb.debian.org/debian &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --cache-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/debcache&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --target-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --include&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;linux-image-powerpc64le,zstd,dbus,systemd-resolved,systemd-timesyncd,openssh-server,sudo

configure_qemu_rootfs &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; hvc0 trixie qemu-ppc64el-trixie
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/vmlinux-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/kernel&amp;quot;&lt;/span&gt;
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/initrd.img-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/initrd&amp;quot;&lt;/span&gt;
fakeroot -i &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.fakeroot.env&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  mkfs.ext4 -q -L rootfs -d &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rootfs.img&amp;quot;&lt;/span&gt; 30G
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Boot:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; ppc64el-trixie-qemu
qemu-system-ppc64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -machine pseries &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -cpu power9 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -smp &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -m 8G &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -drive &lt;span style=&quot;color: #000&quot;&gt;file=&lt;/span&gt;rootfs.img,if&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;none,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd,format&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;raw &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-blk-pci,drive&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -netdev user,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net,hostfwd&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;unix:/tmp/qemu_ppc64el.sock-:22 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-net-pci,netdev&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -object rng-random,filename&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;/dev/urandom,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-rng-pci,rng&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -kernel kernel &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -initrd initrd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -nographic &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -append &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;rw root=LABEL=rootfs console=hvc0 net.ifnames=0&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;s390x-systemz&quot;&gt;&lt;a href=&quot;#s390x-systemz&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;s390x (SystemZ)&lt;/h2&gt;
&lt;p&gt;Build:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;WORK=$PWD&lt;/span&gt;/s390x-trixie-qemu
&lt;span style=&quot;color: #000&quot;&gt;ROOTFS=$WORK&lt;/span&gt;/rootfs
mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;

rootless-debootstrap-wrapper &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --arch&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;s390x &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --suite&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;trixie &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --mirror&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;https://deb.debian.org/debian &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --cache-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/debcache&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --target-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --include&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;linux-image-s390x,zstd,dbus,systemd-resolved,systemd-timesyncd,openssh-server,sudo

configure_qemu_rootfs &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; ttysclp0 trixie qemu-s390x-trixie
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/vmlinuz-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/kernel&amp;quot;&lt;/span&gt;
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/initrd.img-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/initrd&amp;quot;&lt;/span&gt;
fakeroot -i &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.fakeroot.env&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  mkfs.ext4 -q -L rootfs -d &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rootfs.img&amp;quot;&lt;/span&gt; 30G
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Boot:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; s390x-trixie-qemu
qemu-system-s390x &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -machine s390-ccw-virtio &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -smp &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -m 8G &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -drive &lt;span style=&quot;color: #000&quot;&gt;file=&lt;/span&gt;rootfs.img,if&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;none,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd,format&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;raw &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-blk-ccw,drive&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -netdev user,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net,hostfwd&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;unix:/tmp/qemu_s390x.sock-:22 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-net-ccw,netdev&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -object rng-random,filename&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;/dev/urandom,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-rng-ccw,rng&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -kernel kernel &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -initrd initrd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -nographic &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -append &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;rw root=LABEL=rootfs console=ttysclp0 net.ifnames=0&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;ppc64-big-endian&quot;&gt;&lt;a href=&quot;#ppc64-big-endian&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;ppc64 big-endian&lt;/h2&gt;
&lt;p&gt;This is a Debian ports target, so we use &lt;code&gt;sid&lt;/code&gt; and the ports mirror.&lt;/p&gt;
&lt;p&gt;Build:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;WORK=$PWD&lt;/span&gt;/ppc64-sid-qemu
&lt;span style=&quot;color: #000&quot;&gt;ROOTFS=$WORK&lt;/span&gt;/rootfs
mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;

rootless-debootstrap-wrapper &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --arch&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;ppc64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --suite&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;sid &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --mirror&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;https://deb.debian.org/debian-ports &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --cache-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/debcache&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --target-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --keyring&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;/usr/share/keyrings/debian-ports-archive-keyring.gpg &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --include&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;linux-image-powerpc64,zstd,dbus,systemd-resolved,systemd-timesyncd,openssh-server,sudo

configure_qemu_rootfs &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; hvc0 sid qemu-ppc64-sid
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/vmlinux-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/kernel&amp;quot;&lt;/span&gt;
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/initrd.img-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/initrd&amp;quot;&lt;/span&gt;
fakeroot -i &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.fakeroot.env&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  mkfs.ext4 -q -L rootfs -d &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rootfs.img&amp;quot;&lt;/span&gt; 30G
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Boot:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; ppc64-sid-qemu
qemu-system-ppc64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -machine pseries &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -cpu power9 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -smp &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -m 8G &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -drive &lt;span style=&quot;color: #000&quot;&gt;file=&lt;/span&gt;rootfs.img,if&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;none,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd,format&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;raw &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-blk-pci,drive&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -netdev user,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net,hostfwd&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;unix:/tmp/qemu_ppc64.sock-:22 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-net-pci,netdev&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -object rng-random,filename&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;/dev/urandom,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-rng-pci,rng&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -kernel kernel &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -initrd initrd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -nographic &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -append &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;rw root=LABEL=rootfs console=hvc0 net.ifnames=0&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;loong64--loongarch&quot;&gt;&lt;a href=&quot;#loong64--loongarch&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;loong64 / LoongArch&lt;/h2&gt;
&lt;p&gt;For this one, we need EDK2 which you can obtain from Debian&#x27;s
qemu-efi-loongarch64 package (&lt;code&gt;QEMU_EFI.fd&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Build:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;WORK=$PWD&lt;/span&gt;/loong64-sid-qemu
&lt;span style=&quot;color: #000&quot;&gt;ROOTFS=$WORK&lt;/span&gt;/rootfs
mkdir -p &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;

rootless-debootstrap-wrapper &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --arch&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;loong64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --suite&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;sid &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --mirror&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;https://deb.debian.org/debian &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --cache-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/debcache&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --target-dir&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --include&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;linux-image-loong64,zstd,dbus,systemd-resolved,systemd-timesyncd,openssh-server,sudo

configure_qemu_rootfs &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; ttyS0 sid qemu-loong64-sid
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/vmlinuz-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/kernel&amp;quot;&lt;/span&gt;
cp &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;/boot/initrd.img-* &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/initrd&amp;quot;&lt;/span&gt;
fakeroot -i &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.fakeroot.env&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  mkfs.ext4 -q -L rootfs -d &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ROOTFS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WORK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/rootfs.img&amp;quot;&lt;/span&gt; 30G
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Boot:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; loong64-sid-qemu
cp ../QEMU_EFI.fd .
qemu-system-loongarch64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -machine virt,firmware&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;QEMU_EFI.fd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -smp &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -m 8G &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -drive &lt;span style=&quot;color: #000&quot;&gt;file=&lt;/span&gt;rootfs.img,if&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;none,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd,format&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;raw &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-blk-pci,drive&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;hd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -netdev user,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net,hostfwd&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;unix:/tmp/qemu_loong64.sock-:22 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-net-pci,netdev&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -object rng-random,filename&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;/dev/urandom,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -device virtio-rng-pci,rng&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;rng &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -kernel kernel &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -initrd initrd &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -nographic &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -append &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;rw root=LABEL=rootfs console=ttyS0 net.ifnames=0&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;logging-in&quot;&gt;&lt;a href=&quot;#logging-in&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Logging in&lt;/h2&gt;
&lt;p&gt;As noted above, you can log in with &lt;code&gt;root&lt;/code&gt;/&lt;code&gt;root&lt;/code&gt; or &lt;code&gt;user&lt;/code&gt;/&lt;code&gt;user&lt;/code&gt;. The launch
commands above run QEMU with &lt;code&gt;-nographic&lt;/code&gt; causing your terminal to be
connected to the guest serial console. &lt;code&gt;Ctrl-c&lt;/code&gt; alone won&#x27;t kill the virtual
machine, so it&#x27;s helpful to know:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Ctrl-a x&lt;/code&gt; exits QEMU immediately.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Ctrl-a c&lt;/code&gt; switches between the guest serial console and the QEMU monitor.
From the monitor, &lt;code&gt;quit&lt;/code&gt; exits QEMU and &lt;code&gt;system_powerdown&lt;/code&gt; asks the guest to
shut down cleanly.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Ctrl-a h&lt;/code&gt; prints QEMU&#x27;s help for the other &lt;code&gt;Ctrl-a&lt;/code&gt; shortcuts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once the guest is booted, you can connect via ssh to the Unix domain socket
that forwards to guest port 22. Assuming you&#x27;re on a recent system with
&lt;code&gt;systemd-ssh-proxy&lt;/code&gt; (and the ssh config file it adds) present, this can be
done with e.g.:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;ssh root@unix/tmp/qemu_amd64.sock
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Without &lt;code&gt;systemd-ssh-proxy&lt;/code&gt;, you can specify &lt;code&gt;ProxyCommand&lt;/code&gt; instead:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #177500&quot;&gt;# For socat:&lt;/span&gt;
ssh -o &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;ProxyCommand=socat - UNIX-CONNECT:/tmp/qemu_amd64.sock&amp;quot;&lt;/span&gt; root@vm
&lt;span style=&quot;color: #177500&quot;&gt;# Or for OpenBSD netcat:&lt;/span&gt;
ssh -o &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;ProxyCommand=nc -U /tmp/qemu_amd64.sock&amp;quot;&lt;/span&gt; root@vm
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you&#x27;d rather use a TCP port, replace the &lt;code&gt;-netdev&lt;/code&gt; part of the qemu launch
command with something like the following and connect to &lt;code&gt;localhost:2222&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;-netdev user,id&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;net,hostfwd&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;tcp:127.0.0.1:2222-:22
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The systemd-provided config for use of &lt;code&gt;systemd-ssh-proxy&lt;/code&gt; disables host
identity checks, which is what you typically want with this setup. If using
one of the &lt;code&gt;ProxyCommand&lt;/code&gt; options above you may want to add &lt;code&gt;-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null&lt;/code&gt; to your `ssh
invocation.&lt;/p&gt;
&lt;h2 id=&quot;alternative-ssh-over-vsock&quot;&gt;&lt;a href=&quot;#alternative-ssh-over-vsock&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Alternative: SSH over vsock&lt;/h2&gt;
&lt;p&gt;It&#x27;s possible to avoid QEMU user-mode networking and use ssh via &lt;code&gt;AF_VSOCK&lt;/code&gt;.
This can even work without any additional image changes as
&lt;code&gt;systemd-ssh-generator&lt;/code&gt; in the guest will generate an appropriate
socket-activated sshd service if vsock is present. On the host, you&#x27;ll need to
pick a numeric address for the vsock (&#x27;guest CID&#x27;) that isn&#x27;t already in use on
the system, and change the qemu command line to add the appropriate vsock
device with that CID assigned. The vsock device used depends on the machine
being emulated - e.g. whether to attach on PCI or the virtio device bus.&lt;/p&gt;
&lt;p&gt;For amd64, ppc64el, ppc64, and loong64, add:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;-device vhost-vsock-pci,guest-cid&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #1C01CE&quot;&gt;42&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For arm64, armhf, and riscv64, add:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;-device vhost-vsock-device,guest-cid&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #1C01CE&quot;&gt;42&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For s390x, use:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;-device vhost-vsock-ccw,guest-cid&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #1C01CE&quot;&gt;42&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Assuming your host has &lt;code&gt;systemd-ssh-proxy&lt;/code&gt; and its OpenSSH config installed,
you can connect with:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;ssh root@vsock/42
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;using-an-injected-ssh-key&quot;&gt;&lt;a href=&quot;#using-an-injected-ssh-key&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Using an injected SSH key&lt;/h2&gt;
&lt;p&gt;Images set up using the recipes above allow a public key to be specified at
boot time using the systemd system credential mechanism. Just append the
following to the qemu launch command and you can ssh in using that key:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;-smbios &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;type=11,value=io.systemd.credential.binary:ssh.ephemeral-authorized_keys-all=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(&lt;/span&gt;base64 -w0 ~/.ssh/id_ed25519.pub&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2026-05-11:
&lt;ul&gt;
&lt;li&gt;Add notes on ssh over AF_VSOCK.&lt;/li&gt;
&lt;li&gt;Add note about ssh host key checking.&lt;/li&gt;
&lt;li&gt;Add support for injecting ssh keys using systemd&#x27;s credential mechanism.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;2026-05-10:
&lt;ul&gt;
&lt;li&gt;Add note about serial console shortcuts.&lt;/li&gt;
&lt;li&gt;Use systemd-ssh-proxy.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;2026-05-09:
&lt;ul&gt;
&lt;li&gt;Tweak shell commands slightly (no &lt;code&gt;cp -f&lt;/code&gt;) and use &lt;code&gt;Type=ether&lt;/code&gt; for
systemd-networkd match.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;2026-05-07:
&lt;ul&gt;
&lt;li&gt;Add note about how to use an XFS rootfs.&lt;/li&gt;
&lt;li&gt;Get rid of vestigial errexit usage in common setup script.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;2026-05-05: Use &lt;code&gt;net.ifnames=0&lt;/code&gt; command line argument rather than
&lt;code&gt;ln -sf /dev/null /etc/udev/rules.d/80-net-setup-link.rules&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;2026-05-04: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>Minipost: Routing a Linux user&#x27;s traffic through a WireGuard interface</title>
<published>2026-03-31T12:00:00Z</published>
<updated>2026-03-31T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2026q1/minipost-routing-a-linux-users-traffic-through-a-wireguard-interface"/>
<id>https://muxup.com/2026q1/minipost-routing-a-linux-users-traffic-through-a-wireguard-interface</id>
<content type="html">
&lt;p&gt;Simple goal: take advantage of my home router&#x27;s WireGuard support and have one
of my external servers connect using this, and pass all traffic from a certain
user through that interface.&lt;/p&gt;
&lt;h2 id=&quot;create-wireguard-credentials&quot;&gt;&lt;a href=&quot;#create-wireguard-credentials&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Create WireGuard credentials&lt;/h2&gt;
&lt;p&gt;This part of the note won&#x27;t be that useful to you unless you&#x27;re using a
Fritzbox router. But if you&#x27;re me or someone suspiciously like me you may want
to know to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Navigate to &lt;code&gt;https://192.168.178.1/#/access/wireguard&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Click &quot;Add WireGuard connection&quot; and ensure &quot;Connect a single device&quot; is
selected on the modal that appears. Then click &quot;Next&quot;.&lt;/li&gt;
&lt;li&gt;Enter a unique name for the connection (I typically use
&lt;code&gt;$remote_host_name-wg&lt;/code&gt;) and click Finish. Follow request to confirm by
pressing a button on the router.&lt;/li&gt;
&lt;li&gt;Click &quot;Download settings&quot; and a &lt;code&gt;wg_config.conf&lt;/code&gt; will be downloaded.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;add-user&quot;&gt;&lt;a href=&quot;#add-user&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Add user&lt;/h2&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;VPN_USER=&lt;/span&gt;asbvpn
sudo useradd -m -g users -G wheel -s /bin/bash &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$VPN_USER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
sudo passwd &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$VPN_USER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;configure-systemd-networkd&quot;&gt;&lt;a href=&quot;#configure-systemd-networkd&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Configure systemd-networkd&lt;/h2&gt;
&lt;p&gt;First, extracting the relevant values from the &lt;code&gt;wg_config.conf&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;WG_CONF=&lt;/span&gt;wg_config.conf
&lt;span style=&quot;color: #000&quot;&gt;PRIVATE_KEY=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(&lt;/span&gt;sed -n &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;s/^PrivateKey = //p&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WG_CONF&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;PUBLIC_KEY=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(&lt;/span&gt;sed -n &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;s/^PublicKey = //p&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WG_CONF&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;PRESHARED_KEY=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(&lt;/span&gt;sed -n &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;s/^PresharedKey = //p&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WG_CONF&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;ENDPOINT=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(&lt;/span&gt;sed -n &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;s/^Endpoint = //p&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WG_CONF&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;

&lt;span style=&quot;color: #000&quot;&gt;ADDRS=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(&lt;/span&gt;sed -n &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;s/^Address = //p&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$WG_CONF&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;IPV4_ADDR=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;%s\n&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ADDRS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; | cut -d, -f1&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;IPV6_ADDR=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;%s\n&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$ADDRS&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; | cut -d, -f2&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;What we want to do is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Define the wireguard interface &lt;code&gt;wg0&lt;/code&gt; and specify the necessary keys, IP
addresses etc for it to be brought up successfully.&lt;/li&gt;
&lt;li&gt;Specify a routing policy so that all traffic from the given user account
goes via that interface.
&lt;ul&gt;
&lt;li&gt;As you can see below, we specify a RouteTable called &quot;vpn&quot;, associate that
with the interface, and specify rules for that table.&lt;/li&gt;
&lt;li&gt;Ideally this would &quot;fail closed&quot; and no traffic from the user would be
routed if &lt;code&gt;wg0&lt;/code&gt; is down. That appears to use additional rules managed
outside of systemd-networkd. I haven&#x27;t tried to implement this.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The above can be achieved with:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;sudo mkdir -p /etc/systemd/networkd.conf.d
sudo tee /etc/systemd/networkd.conf.d/90-vpn-table.conf &amp;gt;/dev/null &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;&amp;#39;EOF&amp;#39;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;[Network]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;RouteTable=vpn:100&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;

sudo tee /etc/systemd/network/50-wg0.netdev &amp;gt;/dev/null &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;EOF&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;[NetDev]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Name=wg0&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Kind=wireguard&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;[WireGuard]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;PrivateKey=$PRIVATE_KEY&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;RouteTable=vpn&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;[WireGuardPeer]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;PublicKey=$PUBLIC_KEY&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;PresharedKey=$PRESHARED_KEY&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;AllowedIPs=0.0.0.0/0&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;AllowedIPs=::/0&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Endpoint=$ENDPOINT&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;

sudo tee /etc/systemd/network/50-wg0.network &amp;gt;/dev/null &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;EOF&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;[Match]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Name=wg0&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;[Network]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Address=$IPV4_ADDR&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Address=$IPV6_ADDR&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;[RoutingPolicyRule]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;User=$VPN_USER&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Table=vpn&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Priority=10000&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Family=both&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;

sudo chgrp systemd-network /etc/systemd/network/50-wg0.netdev
sudo chmod &lt;span style=&quot;color: #1C01CE&quot;&gt;0640&lt;/span&gt; /etc/systemd/network/50-wg0.netdev
sudo systemctl restart systemd-networkd
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Doing it this way, we&#x27;ve stored the secret keys in the 50-wg0.netdev file
itself but restricted access to the file. It&#x27;s possible to have the keys
stored in a separate file, but for my setup it didn&#x27;t seem worthwhile.&lt;/p&gt;
&lt;p&gt;Then check the status with e.g.:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;sudo networkctl status wg0
sudo ip rule show
sudo ip route show table &lt;span style=&quot;color: #1C01CE&quot;&gt;100&lt;/span&gt;
sudo wg show wg0
sudo -u &lt;span style=&quot;color: #000&quot;&gt;$VPN_USER&lt;/span&gt; curl https://ifconfig.me/all
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;IPv6 does not work in this setup (&lt;code&gt;curl -6 google.com&lt;/code&gt; will fail),&lt;/p&gt;
&lt;h2 id=&quot;copying-authorized_keys-to-new-user&quot;&gt;&lt;a href=&quot;#copying-authorized_keys-to-new-user&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Copying authorized_keys to new user&lt;/h2&gt;
&lt;p&gt;This is more a note to myself than anything else:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;sudo install -d -m &lt;span style=&quot;color: #1C01CE&quot;&gt;700&lt;/span&gt; -o &lt;span style=&quot;color: #000&quot;&gt;$VPN_USER&lt;/span&gt; -g users /home/&lt;span style=&quot;color: #000&quot;&gt;$VPN_USER&lt;/span&gt;/.ssh
sudo install -m &lt;span style=&quot;color: #1C01CE&quot;&gt;600&lt;/span&gt; -o &lt;span style=&quot;color: #000&quot;&gt;$VPN_USER&lt;/span&gt; -g users &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.ssh/authorized_keys&amp;quot;&lt;/span&gt; /home/&lt;span style=&quot;color: #000&quot;&gt;$VPN_USER&lt;/span&gt;/.ssh/authorized_keys
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2026-03-31: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>Minipost: Additional figures for per-query energy consumption of LLMs</title>
<published>2026-02-17T12:00:00Z</published>
<updated>2026-02-17T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2026q1/minipost-additional-figures-for-per-query-energy-consumption-of-LLMs"/>
<id>https://muxup.com/2026q1/minipost-additional-figures-for-per-query-energy-consumption-of-LLMs</id>
<content type="html">
&lt;p&gt;Last month I wrote up a fairly long piece on &lt;a href=&quot;/2026q1/per-query-energy-consumption-of-llms&quot;&gt;per-query energy consumption of
LLMs using the data from
InferenceMAX&lt;/a&gt; (note:
InferenceMAX has since been renamed to InferenceX). Much of the write-up was
dedicated to exploring what you can actually conclude from these figures and
how that interacts with some of the implementation decisions in the benchmark,
but I feel the results still give a useful yardstick. Beyond concerns about
overly-specialised serving engine configurations and whether the workload is
representative of real-world model serving in a paid API host, the other
obvious limitation is that InferenceMAX is only testing GPT-OSS 120b and
DeepSeek R1 0528 when there is a world of other models out there. I dutifully
added &quot;run my own tests using other models&quot; to the todo list and here we are.
By &quot;here we are&quot; I of course mean I made no progress towards that goal but
&lt;a href=&quot;https://muellerzr.github.io/&quot;&gt;Zach Mueller&lt;/a&gt; at &lt;a href=&quot;https://lambda.ai/&quot;&gt;Lambda&lt;/a&gt;
started publishing &lt;a href=&quot;https://lambda.ai/inference-models&quot;&gt;model cards with the needed
data&lt;/a&gt; - thanks Zach!&lt;/p&gt;
&lt;p&gt;The setup for Lambda is simple - each model card lists the observed token
generation throughput and total throughput (along with other stats) for an
input sequence length / output sequence length (ISL/OSL) of 8192/1024, as
benchmarked using &lt;code&gt;vllm bench serve&lt;/code&gt;. The command used to serve the LLM (using
sglang or vllm depending on the model) is also given. As a starting point this
is no worse than the InferenceMAX data, and potentially somewhat better due to
figures being taken from a configuration that&#x27;s not &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/359#issue-3750796719&quot;&gt;overly specialised to a
particular query
length&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The figures each Lambda model card gives us that are relevant for calculating
the energy per query are: the hardware used, token generation throughput and
total token throughput (input+output tokens). Other statistics such as the
time to first token, inter-token latency, and parallel requests tested help
confirm whether this is a configuration someone would realistically use. Using
an equivalent methodology to before, we get the Watt hours per query by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Determining the total Watts for the GPU cluster. We take the figures used by
SemiAnalysis (2.17kW for a single B200) and multiply by the number of GPUs.&lt;/li&gt;
&lt;li&gt;Calculate the joules per token by dividing this total Watts figure by the
total token throughput. This gives a weighted average of the joules per
token for the measured workload, reflecting the ratio of isl:osl.&lt;/li&gt;
&lt;li&gt;Multiply this weighted average of joules per token by the tokens per query
(isl+osl) to get the joules per query. Then divide by 3600 to get Wh.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Collecting the data from the individual model cards we can generate the
following (as before, using minutes of PlayStation 5 gameplay as a point of
comparison):&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;data&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; {
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;Qwen/Qwen3.5-397B-A17B&amp;quot;&lt;/span&gt;: {
        &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;num_b200&amp;quot;&lt;/span&gt;: &lt;span style=&quot;color: #1C01CE&quot;&gt;8&lt;/span&gt;,
        &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;total_throughput&amp;quot;&lt;/span&gt;: &lt;span style=&quot;color: #1C01CE&quot;&gt;11092&lt;/span&gt;,
    },
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;MiniMaxAI/MiniMax-M2.5&amp;quot;&lt;/span&gt;: {
        &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;num_b200&amp;quot;&lt;/span&gt;: &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt;,
        &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;total_throughput&amp;quot;&lt;/span&gt;: &lt;span style=&quot;color: #1C01CE&quot;&gt;8062&lt;/span&gt;,
    },
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;zai-org/GLM-5-FP8&amp;quot;&lt;/span&gt;: {
        &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;num_b200&amp;quot;&lt;/span&gt;: &lt;span style=&quot;color: #1C01CE&quot;&gt;8&lt;/span&gt;,
        &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;total_throughput&amp;quot;&lt;/span&gt;: &lt;span style=&quot;color: #1C01CE&quot;&gt;6300&lt;/span&gt;,
    },
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;zai-org/GLM-4.7-Flash&amp;quot;&lt;/span&gt;: {
        &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;num_b200&amp;quot;&lt;/span&gt;: &lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;,
        &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;total_throughput&amp;quot;&lt;/span&gt;: &lt;span style=&quot;color: #1C01CE&quot;&gt;8125&lt;/span&gt;,
    },
    &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;arcee-ai/Trinity-Large-Preview&amp;quot;&lt;/span&gt;: {
        &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;num_b200&amp;quot;&lt;/span&gt;: &lt;span style=&quot;color: #1C01CE&quot;&gt;8&lt;/span&gt;,
        &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;total_throughput&amp;quot;&lt;/span&gt;: &lt;span style=&quot;color: #1C01CE&quot;&gt;15611&lt;/span&gt;,
    },
}

&lt;span style=&quot;color: #177500&quot;&gt;# 8192 + 1024&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;TOKENS_PER_QUERY&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;9216&lt;/span&gt;

&lt;span style=&quot;color: #177500&quot;&gt;# Taken from &amp;lt;https://inferencex.semianalysis.com/&amp;gt;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;B200_KW&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;2.17&lt;/span&gt;

&lt;span style=&quot;color: #177500&quot;&gt;# Reference power draw for PS5 playing a game. Taken from&lt;/span&gt;
&lt;span style=&quot;color: #177500&quot;&gt;# &amp;lt;https://www.playstation.com/en-gb/legal/ecodesign/&amp;gt; (&amp;quot;Active Power&lt;/span&gt;
&lt;span style=&quot;color: #177500&quot;&gt;# Consumption&amp;quot;). Ranges from ~217W to ~197W depending on model.&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;PS5_KW&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;0.2&lt;/span&gt;


&lt;span style=&quot;color: #A90D91&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;wh_per_query&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;num_b200&lt;/span&gt;, &lt;span style=&quot;color: #000&quot;&gt;total_throughput&lt;/span&gt;, &lt;span style=&quot;color: #000&quot;&gt;tokens_per_query&lt;/span&gt;):
    &lt;span style=&quot;color: #000&quot;&gt;total_cluster_kw&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;num_b200&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;B200_KW&lt;/span&gt;
    &lt;span style=&quot;color: #000&quot;&gt;total_cluster_watts&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;total_cluster_kw&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;1000&lt;/span&gt;
    &lt;span style=&quot;color: #177500&quot;&gt;# joules_per_token is a weighted average for the measured mix of input&lt;/span&gt;
    &lt;span style=&quot;color: #177500&quot;&gt;# and output tokens.&lt;/span&gt;
    &lt;span style=&quot;color: #000&quot;&gt;joules_per_token&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;total_cluster_watts&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;/&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;total_throughput&lt;/span&gt;
    &lt;span style=&quot;color: #000&quot;&gt;joules_per_query&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;joules_per_token&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;tokens_per_query&lt;/span&gt;
    &lt;span style=&quot;color: #177500&quot;&gt;# Convert joules to watt-hours&lt;/span&gt;
    &lt;span style=&quot;color: #A90D91&quot;&gt;return&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;joules_per_query&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;/&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;3600.0&lt;/span&gt;

&lt;span style=&quot;color: #A90D91&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;ps5_minutes&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;wh&lt;/span&gt;):
    &lt;span style=&quot;color: #000&quot;&gt;ps5_watts&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;PS5_KW&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;1000&lt;/span&gt;
    &lt;span style=&quot;color: #A90D91&quot;&gt;return&lt;/span&gt; (&lt;span style=&quot;color: #000&quot;&gt;wh&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;/&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;ps5_watts&lt;/span&gt;) &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;60.0&lt;/span&gt;

&lt;span style=&quot;color: #000&quot;&gt;MODEL_WIDTH&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;31&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;WH_WIDTH&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;8&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;PS5_WIDTH&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;8&lt;/span&gt;

&lt;span style=&quot;color: #000&quot;&gt;header&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;f&amp;quot;{&amp;#39;Model&amp;#39;:&amp;lt;{&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;MODEL_WIDTH&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}} | {&amp;#39;Wh/q&amp;#39;:&amp;lt;{&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;WH_WIDTH&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}} | {&amp;#39;PS5 min&amp;#39;:&amp;lt;{&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;PS5_WIDTH&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}}&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;separator&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;f&amp;quot;{&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;MODEL_WIDTH&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;} | {&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;WH_WIDTH&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;} | {&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;PS5_WIDTH&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}&amp;quot;&lt;/span&gt;

&lt;span style=&quot;color: #A90D91&quot;&gt;print&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;header&lt;/span&gt;)
&lt;span style=&quot;color: #A90D91&quot;&gt;print&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;separator&lt;/span&gt;)

&lt;span style=&quot;color: #A90D91&quot;&gt;for&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;model&lt;/span&gt;, &lt;span style=&quot;color: #000&quot;&gt;vals&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;in&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;data.items&lt;/span&gt;():
    &lt;span style=&quot;color: #000&quot;&gt;wh&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;wh_per_query&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;vals&lt;/span&gt;[&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;num_b200&amp;quot;&lt;/span&gt;], &lt;span style=&quot;color: #000&quot;&gt;vals&lt;/span&gt;[&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;total_throughput&amp;quot;&lt;/span&gt;], &lt;span style=&quot;color: #000&quot;&gt;TOKENS_PER_QUERY&lt;/span&gt;)
    &lt;span style=&quot;color: #000&quot;&gt;ps5_min&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;ps5_minutes&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;wh&lt;/span&gt;)

    &lt;span style=&quot;color: #000&quot;&gt;wh_str&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;f&amp;quot;{&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;wh&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;:.2f}&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;wh&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;10&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;else&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;f&amp;quot;{&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;wh&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;:.1f}&amp;quot;&lt;/span&gt;
    &lt;span style=&quot;color: #A90D91&quot;&gt;print&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;f&amp;quot;{&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;model.strip&lt;/span&gt;()&lt;span style=&quot;color: #C41A16&quot;&gt;:&amp;lt;{&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;MODEL_WIDTH&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}} | {&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;wh_str&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;:&amp;lt;{&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;WH_WIDTH&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}} | {&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;ps5_min&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;:.2f}&amp;quot;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This gives the following figures (reordered to show Wh per query in ascending
order, and added a column for interactivity (1/TPOT)):&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Model&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Intvty (tok/s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Wh/q&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;PS5 min.&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;zai-org/GLM-4.7-Flash (bf16)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;34.0&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.68&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;MiniMaxAI/MiniMax-M2.5 (fp8)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;30.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.38&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;arcee-ai/Trinity-Large-Preview (bf16)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;58.8&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;2.85&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;Qwen/Qwen3.5-397B-A17B (bf16)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;41.7&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;4.01&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;zai-org/GLM-5-FP8 (fp8)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;23.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;7.05&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;2.12&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As a point of comparison, the most efficient 8 GPU deployment of fp8 DeepSeek
R1 0528 from my figures in the &lt;a href=&quot;/2026q1/per-query-energy-consumption-of-llms&quot;&gt;previous
article&lt;/a&gt; was 3.32 Wh
per query.&lt;/p&gt;
&lt;p&gt;And that&#x27;s all I really have for today. Some interesting datapoints with
hopefully more to come as Lambda puts up more model cards in this format.
There&#x27;s a range of interesting potential further experiments to do, but for
now, I just wanted to share this initial look.&lt;/p&gt;

&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2026-02-17: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>shandbox</title>
<published>2026-02-11T12:00:00Z</published>
<updated>2026-02-11T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/shandbox"/>
<id>https://muxup.com/shandbox</id>
<content type="html">
&lt;p&gt;&lt;a href=&quot;https://github.com/muxup/medley/blob/main/shandbox&quot;&gt;&lt;code&gt;shandbox&lt;/code&gt;&lt;/a&gt; is a simple
Linux sandboxing script that serves my needs well. Perhaps it works for you
too? No dependencies between a shell and util-linux (&lt;code&gt;unshare&lt;/code&gt; and &lt;code&gt;nsenter&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;In short, it aims to provide fairly good isolation for personal files (i.e.
your &lt;code&gt;$HOME&lt;/code&gt;) while being very convenient for day to day use. It&#x27;s designed to
be run as an unprivileged user - as long as you can make new namespaces you
should be good to go. By default &lt;code&gt;/home/youruser/sandbox&lt;/code&gt; shows up as
&lt;code&gt;/home/sandbox&lt;/code&gt; within the sandbox, and other than standard paths like &lt;code&gt;/usr&lt;/code&gt;,
&lt;code&gt;/etc&lt;/code&gt;, &lt;code&gt;/tmp&lt;/code&gt;, and so on it&#x27;s left for you to either copy things into the
sandbox or expose them via a mount. There&#x27;s a single shared sandbox (i.e.
processes within the sandbox can see and interact with each other, and the
exposed sandbox filesystem is shared as well), which trades off some ease of
use for the security you might get with a larger number of more targeted
sandboxes. On the other hand, you only gain security from a sandbox if you
actually use it and this is a setup that offers very low friction for me. The
network is not namespaced (although this is something you could change with a
simple edit).&lt;/p&gt;
&lt;p&gt;Usability is both subjective and highly dependent on your actual use case, so
the tradeoffs may or may not align with what is interesting for you!
&lt;a href=&quot;https://github.com/containers/bubblewrap&quot;&gt;Bubblewrap&lt;/a&gt; is an example of a
mature alternative unprivileged sandboxing
tool that offers a lot of configurability as well as options with greater
degrees of sandboxing. Beyond that, look to
&lt;a href=&quot;https://firecracker-microvm.github.io/&quot;&gt;Firecracker&lt;/a&gt; based solutions or
&lt;a href=&quot;https://gvisor.dev/&quot;&gt;gvisor&lt;/a&gt;. &lt;code&gt;shandbox&lt;/code&gt; obviously aims to provide a
reasonable sandbox as much as Linux namespaces alone are able to offer, but if
you&#x27;re looking for a security property stronger than &quot;makes it harder for
something to edit or access unwanted files&quot; it&#x27;s down to you to both carefully
review its implementation and consider alternatives.&lt;/p&gt;
&lt;h2 id=&quot;usage-example&quot;&gt;&lt;a href=&quot;#usage-example&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Usage example&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;$ shandbox run uvx pycowsay
Installed 1 package in 5ms

  ------------
&amp;lt; Hello, world &amp;gt;
  ------------
   \   ^__^
    \  (oo)\_______
       (__)\       )\/\
           ||----w |
           ||     ||
$ shandbox status
running (pid 1589364)

log:
  2026-02-11 13:02:51 stopped
  2026-02-11 13:05:06 started (pid 1589289)
$ shandbox add-mount ~/repos/llvm-project /home/sandbox/llvm-project
mounted /home/asb/repos/llvm-project -&amp;gt; /home/sandbox/llvm-project
$ shandbox run touch /home/sandbox/llvm-project/write-attempt
touch: cannot touch &#x27;/home/sandbox/llvm-project/write-attempt&#x27;: Read-only file system
$ shandbox remove-mount /home/sandbox/llvm-project
unmounted /home/sandbox/llvm-project
$ shandbox add-mount --read-write ~/repos/llvm-project /home/sandbox/llvm-project
mounted /home/asb/repos/llvm-project -&amp;gt; /home/sandbox/llvm-project
$ shandbox run touch /home/sandbox/llvm-project/write-attempt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;shandbox enter&lt;/code&gt; will open a shell within the sandbox for easy interactive
usage. As a convenience, if the current working directory is in
&lt;code&gt;$HOME/sandbox&lt;/code&gt; (e.g. &lt;code&gt;$HOME/sandbox/foo&lt;/code&gt;) then the working directory within
the sandbox for &lt;code&gt;shandbox run&lt;/code&gt; or &lt;code&gt;shandbox enter&lt;/code&gt; will be set to the
appropriate path within the sandbox (&lt;code&gt;/home/sandbox/foo&lt;/code&gt; in this case). i.e.,
the case where this mapping is trivial. Environment variables are not passed
through.&lt;/p&gt;
&lt;h2 id=&quot;functionality-overview&quot;&gt;&lt;a href=&quot;#functionality-overview&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Functionality overview&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;shandbox start&lt;/code&gt;: Start the sandbox, creating the necessary namespaces and
mount layout. Fails if the sandbox is already running.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;shandbox stop&lt;/code&gt;: Stop the sandbox by killing the process holding the
namespaces. Fails if the sandbox is not running.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;shandbox restart&lt;/code&gt;: Stop the sandbox and start it again.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;shandbox status&lt;/code&gt;: Print whether the sandbox is running and if it is, the
pid. Also print the last 20 lines of the log.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;shandbox enter&lt;/code&gt;: Open bash within the sandbox, starting the sandbox first
if it&#x27;s not already running.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;shandbox run &amp;lt;command&amp;gt; [args...]&lt;/code&gt;: Run a command inside the sandbox. The
current working directory is translated to an in-sandbox path if it falls
within the sandbox home directory. Starts the sandbox first if it isn&#x27;t
already running.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;shandbox add-mount [--read-write] &amp;lt;host-path&amp;gt; &amp;lt;sandbox-path&amp;gt;&lt;/code&gt;: Bind-mount a
host path into the running sandbox. Mounts are read-only by default; pass
&lt;code&gt;--read-write&lt;/code&gt; to allow writes. The sandbox must already be running.
Both directories and individual files are supported.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;shandbox remove-mount &amp;lt;sandbox-path&amp;gt;&lt;/code&gt;: Remove a previously added bind mount
from the running sandbox.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;implementation-approach&quot;&gt;&lt;a href=&quot;#implementation-approach&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Implementation approach&lt;/h2&gt;
&lt;p&gt;The core sandboxing functionality is provided by the Linux namespaces
functionality exposed by
&lt;a href=&quot;https://manpages.debian.org/unstable/util-linux/unshare.1.en.html&quot;&gt;&lt;code&gt;unshare&lt;/code&gt;&lt;/a&gt;
and
&lt;a href=&quot;https://manpages.debian.org/unstable/util-linux/nsenter.1.en.html&quot;&gt;&lt;code&gt;nsenter&lt;/code&gt;&lt;/a&gt;.
The &lt;a href=&quot;https://github.com/muxup/medley/blob/main/shandbox&quot;&gt;script&#x27;s
implementation&lt;/a&gt; should be
quite readable but I&#x27;ll try to summarise some key points here.&lt;/p&gt;
&lt;p&gt;The goal is that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Within the sandbox, you appear as an unprivileged user, with uid and gid
equal to your usual Linux user.&lt;/li&gt;
&lt;li&gt;It should be possible to expose additional files or directories to the
sandbox once it&#x27;s running.&lt;/li&gt;
&lt;li&gt;Applications running within the sandbox have no way (modulo bugs or
vulnerabilities in the kernel or accessible applications) of reaching files
on the host filesystem that aren&#x27;t explicitly exposed.
&lt;ul&gt;
&lt;li&gt;To underline: This is a goal, it is &lt;em&gt;not&lt;/em&gt; a guarantee.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;It&#x27;s possible to launch multiple processes within the sandbox which can all
see each other, and have the same shared sandboxed filesystem.&lt;/li&gt;
&lt;li&gt;This is all doable as an unprivileged user.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To implement that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Two sets of namespaces are used to provide this isolation: the outer
&#x27;shandbox_root&#x27; has the user mapped to root within the namespace and retains
access to standard / (allowing us to mount additional paths into after the
sandbox has started). The inner &#x27;shandbox_user&#x27; represents a new user
namepsace mapping our uid/gid to an unprivileged user, but other namespaces
are shared with &#x27;shandbox_root&#x27;. Sandboxed processes are launched within the
namespaces of &#x27;shandbox_user&#x27;.&lt;/li&gt;
&lt;li&gt;The process IDs of the initial process within &#x27;sandbox_root&#x27; and
&#x27;sandbox_user&#x27; are saved and recalled so the script can use &lt;code&gt;nsenter&lt;/code&gt; to
enter the namespace.&lt;/li&gt;
&lt;li&gt;To help make it easier to tell when you&#x27;re in the sandbox, a dummy
&lt;code&gt;/etc/passwd&lt;/code&gt; is bind-mounted naming the current user as &lt;code&gt;sandbox&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;When &lt;code&gt;shandbox start&lt;/code&gt; is executed, the necessary directories are bind
mounted in a directory that will be used as root (&lt;code&gt;/&lt;/code&gt;) for the user sandbox
in &lt;code&gt;.local/share/shandbox/root&lt;/code&gt;. This happens within the sandbox_root
namespace, which then uses &lt;code&gt;unshare&lt;/code&gt; again to create a new user namespace
with an unprivileged user, executing within a chroot.&lt;/li&gt;
&lt;li&gt;&#x27;sandbox_root&#x27; retains access to the host filesystem, which is necessary to
allow mounting additional paths after the fact. Without this requirement, we
could likely rewrite &lt;code&gt;shandbox start&lt;/code&gt; to use &lt;code&gt;pivot_root&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;making-it-your-own&quot;&gt;&lt;a href=&quot;#making-it-your-own&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Making it your own&lt;/h2&gt;
&lt;p&gt;The script should be straight-forward enough to customise to your needs if
they&#x27;re not too dissimilar to what is offered out of the box. Some variables
at the top provide things you may be more likely to want to change, such as
the home directory location, and a list of files or directories in &lt;code&gt;$HOME&lt;/code&gt; to
always bind-mount into the sandbox home:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;SANDBOX_HOME_DIR=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/sandbox&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;HOME_FILES_TO_MAP=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.bashrc .vimrc&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;HOME_DIRS_TO_MAP=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.vim bin&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;SB_HOME=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;/home/sandbox&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;SB_PATH=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$SB_HOME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/bin:/usr/local/bin:/usr/bin&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2026-02-11: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>Per-query energy consumption of LLMs</title>
<published>2026-01-07T12:00:00Z</published>
<updated>2026-01-07T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2026q1/per-query-energy-consumption-of-llms"/>
<id>https://muxup.com/2026q1/per-query-energy-consumption-of-llms</id>
<content type="html">
&lt;p&gt;How much energy is consumed when querying an LLM? We&#x27;re largely in the dark
when it comes to proprietary models, but for open weight models that anyone
can host on readily available, albeit eye-wateringly expensive, hardware this
is something that can be measured and reported, right? In fact, given other
people are &lt;a href=&quot;https://inferencemax.semianalysis.com/&quot;&gt;doing the hard work&lt;/a&gt; of
setting up and running benchmarks across all kinds of different hardware and
software configurations for common open weight models, can we just re-use that
to get a reasonable figure in terms of Watt-hours (Wh) per query?&lt;/p&gt;
&lt;p&gt;For the kind of model you can run locally on a consumer GPU then of course
there&#x27;s some value in seeing how low the per-query energy usage might be on a
large scale commercial setup. But my main interest is in larger and more
capable models, the kind that you wouldn&#x27;t realistically run locally and end
up using in a pay-per-token manner either directly with your host of choice or
through an intermediary like &lt;a href=&quot;https://openrouter.ai/&quot;&gt;OpenRouter&lt;/a&gt;. In these
cases where models are efficiently served with a minimum of 4-8 GPUs or even
&lt;a href=&quot;https://www.perplexity.ai/hub/blog/lower-latency-and-higher-throughput-with-multi-node-deepseek-deployment&quot;&gt;multi-node
clusters&lt;/a&gt;
it&#x27;s not easy to get a feel for the resources you&#x27;re using. I&#x27;m pretty happy
that simple back of the envelope maths shows that whether providers are
properly amortising the cost of their GPUs or not, it&#x27;s implausible that
they&#x27;re selling per-token API access for open models at below the cost of
electricity. That gives a kind of upper bound on energy usage, and looking at
the pennies I spend on such services it&#x27;s clearly a drop in the ocean compared
to my overall energy footprint. But it&#x27;s not a very tight bound, which means
it&#x27;s hard to assess the impact of increasing my usage.&lt;/p&gt;
&lt;p&gt;We can look at things like &lt;a href=&quot;https://arxiv.org/pdf/2508.15734&quot;&gt;Google&#x27;s published figures on energy usage for
Gemini&lt;/a&gt; but this doesn&#x27;t help much. They
don&#x27;t disclose the length of the median prompt and its response, or details of
the model used to serve that median query meaning it&#x27;s not helpful for
either estimating how it might apply to other models or how it might apply to
your own usage (which may be far away from this mysterious median query).
Mistral &lt;a href=&quot;https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai&quot;&gt;released
data&lt;/a&gt;
on the per query environmental impact (assuming for a 400 token query), but
the size of the Mistral Large 2 model is not disclosed and they don&#x27;t calculate
a Wh per query figure. CO2 and water per query are very helpful to evaluate a
particular deployment, but the actual energy used is a better starting point
that can be applied to other providers assuming different levels of carbon
intensity. If one of the API providers were to share statistics based on a
real world deployment of one of the open models with a much higher degree of
transparency (i.e. sharing stats on the number of queries served during the
period, statistics on their length, and measured system power draw) that would
be a useful source of data. But today we&#x27;re looking at what we can conclude
from the &lt;a href=&quot;https://inferencemax.semianalysis.com/&quot;&gt;InferenceMAX benchmark
suite&lt;/a&gt; published results.&lt;/p&gt;
&lt;p&gt;I&#x27;d started looking at options for getting good figures thinking I might
have to invest in the hassle and expense of renting a multi-GPU cloud
instance to run my own benchmarks, then felt InferenceMAX may make that
unnecessary. After writing this up along with all my provisos I&#x27;m perhaps
tempted again to try to generate figures myself. Anyway, read on for a more
detailed look at that benchmark suite. You can scroll past all the provisos
and &lt;a href=&quot;#results&quot;&gt;jump ahead to the figures&lt;/a&gt; giving the Wh/query
figures implied by the benchmark results across different GPUs, different
average input/output sequence lengths, and for gpt-oss 120B and
DeepSeek-R1-0528. But I hope you&#x27;ll feel a bit guilty about it.&lt;/p&gt;
&lt;p&gt;If you see any errors, please let me know.&lt;/p&gt;
&lt;h2 id=&quot;high-level-notes-on-inferencemax&quot;&gt;&lt;a href=&quot;#high-level-notes-on-inferencemax&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;High-level notes on InferenceMAX&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://inferencemax.semianalysis.com/&quot;&gt;InferenceMAX benchmark suite&lt;/a&gt; has the
&lt;a href=&quot;https://newsletter.semianalysis.com/p/inferencemax-open-source-inference&quot;&gt;stated
goal&lt;/a&gt;
to &quot;provide benchmarks that both emulate real world applications as much as
possible and reflect the continuous pace of software innovation.&quot; They
differentiate themselves from other benchmarking efforts noting &quot;Existing
performance benchmarks quickly become obsolete because they are static, and
participants often game the benchmarks with unrealistic, highly specific
configurations.&quot;&lt;/p&gt;
&lt;p&gt;The question I&#x27;m trying to answer is &quot;what is the most &#x27;useful AI&#x27; I can
expect for a modern GPU cluster in a realistic deployment and how much energy
does it consume&quot;. Any benchmark is going to show peak throughput higher than
you&#x27;d expect to achieve in real workload and there&#x27;s naturally a desire to
keep it pinned on a specific model for as long as it isn&#x27;t &lt;em&gt;totally&lt;/em&gt;
irrelevant in order to enable comparisons as hardware and software evolves
with a common point of reference. But although I might make slightly
different choices about what gets benchmarked and how, the InferenceMAX setup
at first look seems broadly aligned with what I want to achieve.&lt;/p&gt;
&lt;p&gt;They benchmark
&lt;a href=&quot;https://huggingface.co/deepseek-ai/DeepSeek-R1-0528&quot;&gt;DeepSeek-R1-0528&lt;/a&gt; (both
at the native fp8 quantisation and at fp4) which is a 671B parameter model
with 37B active weights released ~7 months ago and seems a fair representative
of a large MoE open weight model.
&lt;a href=&quot;https://huggingface.co/openai/gpt-oss-120b&quot;&gt;gpt-oss-120b&lt;/a&gt; is also
benchmarked, providing a point of comparison for a much smaller and efficient
to run model. Different input sequence length and output sequence length (ISL
and OSL - the number of input and output tokens) are tested: 1k/1k, 1k/8k,
8k/1k, which provides coverage of different query types. Plus tests against a
wide range of GPUs (including the 72-GPU GB200 NVL72 cluster) and sweeps
different settings.&lt;/p&gt;
&lt;p&gt;At the time of writing you might reasonably consider to be &#x27;InferenceMAX&#x27; is
split into around three pieces:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The frontend website you can &lt;a href=&quot;https://inferencemax.semianalysis.com/&quot;&gt;see at
inferencemax.semianalysis.com&lt;/a&gt; (not
currently open source but &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/315&quot;&gt;planned to
be&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;The &lt;a href=&quot;https://github.com/kimbochen/bench_serving&quot;&gt;script for executing queries against the LLM serving infrastructure and
collecting stats&lt;/a&gt; (currently in
a seperate repo but &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/338&quot;&gt;planned to be incorporated into the main InferenceMAX
repository&lt;/a&gt;),&lt;/li&gt;
&lt;li&gt;The wrapper/runner scripts and GitHub actions workflows that live in the
&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX&quot;&gt;main InferenceMAX
repository&lt;/a&gt;.
&lt;ul&gt;
&lt;li&gt;This is actively contributed to by at least Nvidia and AMD engineers.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;GitHub Actions is used to orchestrate the runs, ultimately producing a zip
file containing JSON with the statistics of each configuration (e.g.
&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/actions/runs/20216709902/job/58149531774&quot;&gt;here&lt;/a&gt;).
The &lt;code&gt;benchmark_serving.py&lt;/code&gt; script is invoked via the &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/benchmark_lib.sh#L107&quot;&gt;&lt;code&gt;run_benchmark_serving&lt;/code&gt; wrapper
in
&lt;code&gt;benchmark_lib.sh&lt;/code&gt;&lt;/a&gt;
which hardcodes some options and passes through some others from the workflow
YAML. The results logged by &lt;code&gt;benchmark_serving.py&lt;/code&gt; are &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/utils/process_result.py&quot;&gt;processed in
InferenceMAX&#x27;s &lt;code&gt;process_result.py&lt;/code&gt;
helper&lt;/a&gt;
which will produce JSON in the desired output format. Together, these scripts
provide statistics like throughput (input and output token), end to end
latency, interactivity (output tokens per second) etc.&lt;/p&gt;
&lt;h2 id=&quot;further-studying-the-benchmark-setup&quot;&gt;&lt;a href=&quot;#further-studying-the-benchmark-setup&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Further studying the benchmark setup&lt;/h2&gt;
&lt;p&gt;So, let&#x27;s look at the benchmarking logic in more detail to look for any
surprises or things that might affect the accuracy of the Wh-per-query figure
I want to generate. I&#x27;ll note that InferenceMAX is an ongoing project that is
actively being developed. These observations are based on a recent repo
checkout, but of course things may have changed since then if you&#x27;re reading
this post some time after it was first published.&lt;/p&gt;
&lt;p&gt;Looking through I made the following observations. Some represent potential
issues (see the next subheading for a list of the upstream issues I filed),
while others are just notes based on aspects of the benchmark I wanted to
better understand.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One of the required arguments to the benchmark serving script is
&lt;code&gt;--random-range-ratio&lt;/code&gt;. This is set by default &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/.github/workflows/benchmark-tmpl.yml#L56&quot;&gt;to 0.8 in
&lt;code&gt;benchmark-tmpl.yml&lt;/code&gt;&lt;/a&gt;
and &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/.github/workflows/benchmark-multinode-tmpl.yml#L49&quot;&gt;in
&lt;code&gt;benchmark-multinode-tmpl.yml&lt;/code&gt;&lt;/a&gt;
and is not overridden elsewhere.
&lt;ul&gt;
&lt;li&gt;This argument is ultimately used in
&lt;a href=&quot;https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L366&quot;&gt;&lt;code&gt;sample_random_requests&lt;/code&gt;&lt;/a&gt;.
It uses &lt;code&gt;np.random.randint&lt;/code&gt; to sample input/output lengths between the
&lt;code&gt;range_ratio * {input,output}_len&lt;/code&gt; and &lt;code&gt;{input,output}_len&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Taken together, this logic means for for a workload advertised as having
8k input or output tokens (8192), the benchmark will actually run with an
average ~7373 (&lt;code&gt;0.9*num_tokens&lt;/code&gt;, due to the length being a random number
between &lt;code&gt;0.8*num_tokens&lt;/code&gt; and &lt;code&gt;num_tokens&lt;/code&gt;) tokens.&lt;/li&gt;
&lt;li&gt;Because the throughput figures are &lt;a href=&quot;https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L498&quot;&gt;calculated using the actual input and
output token
lengths&lt;/a&gt;,
the figure &lt;em&gt;does&lt;/em&gt; represent what was observed, it&#x27;s just the workload
doesn&#x27;t quite match the description. The reported end to end latency for
instance will be misleadingly lower than you would get for a workload that
actually did have the expected input / output sequence lengths.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The various request functions in
&lt;a href=&quot;https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/backend_request_func.py&quot;&gt;backend_request.func.py&lt;/a&gt;
will set &lt;code&gt;output.success = False&lt;/code&gt; if they don&#x27;t get a HTTP 200 status code
back for a request. There is no logic to retry a refused request and
&lt;a href=&quot;https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L485&quot;&gt;metrics will be calculated skipping any failed
requests&lt;/a&gt;.
This means an overloaded server will perform better on this benchmark for
metrics like E2E latency and TTFT if it refuses requests rather than accept
them and serve them slowly. As the number of failed requests isn&#x27;t included
in the results json it&#x27;s not easy to tell if this is a factor for any
benchmarks.&lt;/li&gt;
&lt;li&gt;Many of the various scripts in the benchmarks/ subdirectory &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/gptoss_fp4_b200_docker.sh#L22&quot;&gt;set a
max-model-len
parameter&lt;/a&gt;
or the similar &lt;code&gt;--max_seq_len&lt;/code&gt; parameter for trt-llm (e.g. &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/gptoss_fp4_b200_trt_docker.sh#L65&quot;&gt;the b200
config&lt;/a&gt;
which if I&#x27;m not mistaken will ultimately be set from the max_model_len
&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/utils/matrix_logic/generate_sweep_configs.py&quot;&gt;defined in
generate_sweep_configs.py&lt;/a&gt;.
This parameter is &lt;a href=&quot;https://docs.vllm.ai/en/latest/cli/serve/#-max-model-len&quot;&gt;documented in
vllm&lt;/a&gt; and &lt;a href=&quot;https://nvidia.github.io/TensorRT-LLM/1.0.0rc2/commands/trtllm-serve.html#cmdoption-trtllm-serve-serve-max_seq_len&quot;&gt;in
TensortRT-LLM&lt;/a&gt;
and controls the maximum supported length of a request, including both the
prompt and any generated output. Setting it 20 or 200 tokens above the sum
of the benchmarked ISL+OSL to minimise memory use does not seem like a
realistic real-world deployment, which seems the wrong choice given the
InferenceMAX complaint that in other suites &quot;participants often
game the benchmarks with unrealistic, highly specific configurations&quot;.
Benchmarks naturally show a &#x27;best case&#x27;, but if you&#x27;re generating figures
like $ per M tokens it&#x27;s a figure that makes little sense if it reflects a
configuration you wouldn&#x27;t feasibly use/sell.&lt;/li&gt;
&lt;li&gt;Throughput is &lt;a href=&quot;https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L546&quot;&gt;calculated in
&lt;code&gt;benchmark_serving.py&lt;/code&gt;&lt;/a&gt;
based on the total number of tokens divided by the duration of the
benchmark. This is then normalised on a per-GPU basis &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/utils/process_result.py#L90&quot;&gt;in
process_result.py&lt;/a&gt;.
No problems here, I just wanted to clarify the source of the figure.&lt;/li&gt;
&lt;li&gt;In terms of the source of the input tokens themselves, we can see that
&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/benchmark_lib.sh#L222&quot;&gt;&lt;code&gt;--dataset-name random&lt;/code&gt; is always passed to
&lt;code&gt;benchmark_serving.py&lt;/code&gt;&lt;/a&gt;.
This leads to
&lt;a href=&quot;https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L366&quot;&gt;&lt;code&gt;sample_random_requests&lt;/code&gt;&lt;/a&gt;
being called, which will pick random token ids and create a list of tokens
of the desired length (mapping these randomly picked IDs to tokens).
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;--ignore-eos&lt;/code&gt; flag is passed to the &lt;code&gt;benchmark_serving.py&lt;/code&gt; script
which will in turn set this option in the JSON when making the LLM request.
&lt;a href=&quot;https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/backend_request_func.py&quot;&gt;&lt;code&gt;backend_request_func.py&lt;/code&gt;&lt;/a&gt;
sets this and also sets &lt;code&gt;max_tokens&lt;/code&gt; to the desired &lt;code&gt;output_len&lt;/code&gt; which
&lt;em&gt;should&lt;/em&gt; ensure that the response has that exact desired number of output
tokens. &lt;code&gt;ignore_eos&lt;/code&gt; means that the LLM server will keep generating tokens
even after seeing the end of sequence token.&lt;/li&gt;
&lt;li&gt;It&#x27;s interesting that some of the benchmark configurations enable
multi-token prediction, and presumably find it beneficial even given the
totally random token inputs. Is it possible that such configurations
benefit from undesirable looped outputs (due to a combination of random
inputs and continuing to sample tokens past the EOS marker) that
potentially are very predictable and give an extra boost?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The --num-prompts parameter controls the total number of requests that are
issued. The benchmark script is written so it will wait for all of these to
complete (either successfully or unsuccessfully). This is
&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/gptoss_fp4_h100_slurm.sh#L51&quot;&gt;typically&lt;/a&gt;
set to the concurrency times 10, but some benchmark setups set it higher
(presumably as the default figure finishes too quickly for good results).&lt;/li&gt;
&lt;li&gt;In terms of how requests are submitted with a certain level of concurrency:
&lt;ul&gt;
&lt;li&gt;See above for a discussion of the total number of requests&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--request-rate inf&lt;/code&gt; is always passed, so there&#x27;s no limit on submitting
requests up to the concurency limit.&lt;/li&gt;
&lt;li&gt;It &lt;a href=&quot;https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L962&quot;&gt;precomputes a list of requests to
submit&lt;/a&gt;
and then &lt;a href=&quot;https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L664&quot;&gt;uses a semaphore to limit
concurrency&lt;/a&gt;
but otherwise continuously submits requests up to the concurrency limit,
and then waits until they call complete.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;There are no tests that the configuration is serving the model with the
expected quality currently, but there&#x27;s an &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/123&quot;&gt;issue tracking at least adding a
simple quality
benchmark&lt;/a&gt;.
Although none of the explored settings &lt;em&gt;should&lt;/em&gt; impact the quality of output,
it&#x27;s always possible they trigger a bug and in this case it&#x27;s not
interesting to benchmark.&lt;/li&gt;
&lt;li&gt;It would be helpful for reproducibility if more complete system information
for the benchmark runners was released. This is &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/393&quot;&gt;being worked
on&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;You should of course consider whether the tested input and output sequence
lengths correspond to a workload you are interested in (thank you to Aaron
Zhao for &lt;a href=&quot;https://www.linkedin.com/feed/update/urn:li:activity:7414767337058242562?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7414767337058242562%2C7415321431900905472%29&amp;amp;dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287415321431900905472%2Curn%3Ali%3Aactivity%3A7414767337058242562%29&quot;&gt;reminding me to mention
this&lt;/a&gt;.
This benchmarking approach also doesn&#x27;t consider caching. Both factors could
be highly relevant if trying to estimate energy cost for a long context chat
or &#x27;agentic&#x27; flow. But I&#x27;m happy enough with the tested workloads as a
starting point, and my main focus here is trying to get a degree of comfort
with the reported numbers for the ISL/OSL combinations they&#x27;ve chosen to
test.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;filed-issues&quot;&gt;&lt;a href=&quot;#filed-issues&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Filed issues&lt;/h2&gt;
&lt;p&gt;I ended up filing the following issues upstream:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;FIXED &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/293&quot;&gt;Token throughput per MW is described as reflecting the generated tokens but
is actually processed+generated
tokens&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;The companion &lt;a href=&quot;https://newsletter.semianalysis.com/p/inferencemax-open-source-inference&quot;&gt;article introducing
InferenceMAX&lt;/a&gt;
has previously defined throughput as the rate at which the GPU
&lt;strong&gt;generates&lt;/strong&gt; tokens yet the figure displayed in the UI was the total
number of output &lt;em&gt;and&lt;/em&gt; input tokens per second. The definition in the
article has now been fixed, and changes to the UI make it more obvious
based on context that throughput refers to input+output tokens (as y-axis
metric options now exist to show &quot;input token throughput per GPU&quot; and
&quot;output token throughput per GPU&quot;).&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=yYha_OtxA14&quot;&gt;This talking head video from
Nvidia&lt;/a&gt; seems to make the
same error, talking about the number of tokens &#x27;generated&#x27; per second per
GPU when looking at the relevant results these sem to be the total throughput
(i.e. output &lt;strong&gt;plus&lt;/strong&gt; the much faster to process input tokens).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/299&quot;&gt;Presented input/output token throughput per GPU for disaggregated setups
not usefully comparable to standard multi-gpu
setups&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;In disaggregated setups you have some number of GPUs dedicated to prefill
(processing input tokens) and some number dedicated to decode (generating
output tokens). In this case, the reported input/output throughput figures
refer to the input or output throughput per prefill GPU or per decode GPU.
It doesn&#x27;t make sense (IMHO) to plot this figure against the input/output
throughput figures for a non-disaggregated setup. To make it comparable,
the input/output throughput per GPU should be calculated by averaging
across the whole cluster rather than just the GPUs dedicated to prefill or
decode respectively.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/300&quot;&gt;Standard deviatation of interactivity (std_intvty) in result json is
incorrectly
calculated&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;Not a big issue as the figure isn&#x27;t used anywhere. Interactivity
(tokens/second) metrics are calculated from the recorded time per output
token. &lt;code&gt;1000/$tpot_metric&lt;/code&gt; is correct for the mean, median, and p99 figures
but mathematically incorrect for the standard deviation. e.g. a small
standard deviation for time per output token will result in a huge
standard deviation being computed for interactivity.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;FIXED &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/349&quot;&gt;Reference kW figures no longer shown in frontend for each
GPU&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;At some point updates to the frontend logic meant that the per-GPU kW
figures used in calculating the token throughput per utility MW were no
longer displayed. This has now been fixed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/350&quot;&gt;How will full workflow run output be retained beyond 90
days&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;The benchmark frontend helpfully links to the GitHub Actions run that
generated the displayed results and has a datepicker to view previous
results. Clicking through to GitHub means you can download the original
.zip of the JSON format benchmark results which is something I take
advantage of in the analysis later in this article. According to GitHub
docs, &lt;a href=&quot;https://docs.github.com/en/organizations/managing-organization-settings/configuring-the-retention-period-for-github-actions-artifacts-and-logs-in-your-organization&quot;&gt;the maximum retention period for Actions artifacts and logs is 90
days for a public
repo&lt;/a&gt;.
It would be good to have a mechanism so that this information is backed up
rather than lost.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/365&quot;&gt;Contents of CONFIG_DIR path as used in launch_gb200-nv.sh is
undisclosed&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;Most benchmark configuration lives in the main repository, but
unfortunately one of the Nvidia DeepSeek R1 configurations &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/blob/ff7dfc7365034aa84245f41c517c38618860d484/runners/launch_gb200-nv.sh#L26&quot;&gt;relies on
a config dir that&#x27;s not publicly
available&lt;/a&gt;
meaning it can&#x27;t be audited or reproduced. This is a case where tightening
up benchmark rules and review process can hopefully avoid it happening in
the future.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/359&quot;&gt;Reconsider allowing setting max_model_len / max_seq_len to
isl+osl+tiny_margin&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;As explained above, a number of benchmarks set &lt;code&gt;max_model_len&lt;/code&gt; (or for
Nvidia&#x27;s TensorRT, &lt;code&gt;--max_seq_len&lt;/code&gt;) to some figure that is just above
ISL+OSL. Although some degree of tuning is expected, to me this goes
against the idea that &quot;&lt;a href=&quot;https://newsletter.semianalysis.com/p/inferencemax-open-source-inference&quot;&gt;We want server configs to reflect real world
deployments as much as
possible&lt;/a&gt;&quot;
and the stated goal &quot;to provide benchmarks that both emulate real world
applications as much as possible and reflect the continuous pace of
software innovation&quot;. It&#x27;s hard to imagine a realistic deployment that
would configure their serving engine in a way such that it errors if
input+output tokens passes ~2k tokens for instance. Looking at the
&lt;a href=&quot;https://openrouter.ai/deepseek/deepseek-r1-0528&quot;&gt;DeepSeek R1 0528 providers on
OpenRouter&lt;/a&gt;, the vast
majority offer greater than 128k context.&lt;/li&gt;
&lt;li&gt;By my understanding, with PagedAttention the KV cache is dynamically
allocated anyway so this setting would largely impact other data
structures. Plus vllm at least contains a startup check that there is
sufficient VRAM to serve at least one request at the maximum configured
context. I would really like to see what impact this setting has on
benchmarks.&lt;/li&gt;
&lt;li&gt;The repository maintainers renamed my issue to a title that doesn&#x27;t
reflect my report. I&#x27;m hopeful they will review my recent comment and
title it back.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/357&quot;&gt;Some reported metrics will be inflated if a serving engine sheds
load&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;This covers the observation made above that failed requests are simply
skipped. As the number of failed requests isn&#x27;t tracked, it&#x27;s not easy to
see if a particular configuration may appear better (better E2E latency,
lower time to first token) as a reset of shedding load rather than
queueing.&lt;/li&gt;
&lt;li&gt;The repository maintainers renamed this issue to &quot;[feature suggestion for
vllm/vllm benchmark_serving]&quot; and closed it. I&#x27;m hopeful they will read my
&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/357#issuecomment-3680821210&quot;&gt;response&lt;/a&gt;
and reconsider on the grounds that:
&lt;ul&gt;
&lt;li&gt;The benchmark_serving script isn&#x27;t doing anything &quot;wrong&quot; necessarily.
It is simply making an implementation choice with potential impact on
results that the InferenceMAX harness isn&#x27;t tracking.&lt;/li&gt;
&lt;li&gt;The script is planned to be added to the repo soon anyway.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/issues/356&quot;&gt;Benchmarked ISL and OSL averages 0.9*target_length meaning results are
over-optimistic&lt;/a&gt;.
&lt;ul&gt;
&lt;li&gt;This is the problem mentioned above where the introduced variance in
input/output sequence length has an average lower than the headline rate.
As noted, this means specifically the end to end latency figure is
misleading, but also impacts tokens/second and throughput to the extent
that the cost of serving a query doesn&#x27;t scale with O(n).&lt;/li&gt;
&lt;li&gt;This will be fixed by &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/pull/339&quot;&gt;PR
339&lt;/a&gt; which
upstreams the &lt;code&gt;benchmark_serving.py&lt;/code&gt; script and in that modified branch
changes &lt;code&gt;sample_random_requests&lt;/code&gt; to sample a range with multiplier between
&lt;code&gt;1 - RANGE_RATIO&lt;/code&gt; and &lt;code&gt;1 + RANGE_RATIO&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the best case, you&#x27;d hope to look at the benchmark results, accept they&#x27;re
probably represent a higher degree of efficiency than you&#x27;d likely get on a
real workload, that an API provider might achieve 50% of that and double the
effective cost per query to give a very rough upper estimate on per-query cost
But that only really works if the reported benchmark results roughly match the
achievable throughput in a setup configured for commercial serving.  Given the
tuning to specific isl/osl values, I&#x27;m not at all confident thats the case and
I don&#x27;t know how wide the gap is.&lt;/p&gt;
&lt;h2 id=&quot;generating-results&quot;&gt;&lt;a href=&quot;#generating-results&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Generating results&lt;/h2&gt;
&lt;p&gt;Firstly I wrote a &lt;a href=&quot;https://gist.github.com/asb/44fe17f4f5b7abed7836481be45c5a38#file-check-py&quot;&gt;quick
script&lt;/a&gt;
to check some assumptions about the data and look for anything that seems
anomalous. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check that total throughput per GPU matches what you&#x27;d expect based on the
input token and output token throughput per GPU, even in the disaggregated
case. i.e. the total thoughput per GPU averaged over the whole cluster
should equal the sum of the input and output throughput per GPU provided
those figures are averaged over the whole cluster.&lt;/li&gt;
&lt;li&gt;The ratio of input token throughput to output token throughput should be
almost equal to the to the ratio of input to output tokens in the
benchmark&#x27;s workload. If not, there is something surprising that needs
investigating.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Based on the information available in the generated result JSON and the
reported all-in power per GPU (based on SemiAnalysis&#x27; model), we can calculate
the Watt hours per query. First calculate the joules per token (watts per GPU
divided by the total throughput per GPU). This gives a weighted average of the
joules per token for the measured workload (i.e. reflecting the ratio of
isl:osl). Multiplying joules per token by the tokens per query (isl+osl) gives
the joules per query, and we can just divide by 3600 to get Wh.&lt;/p&gt;
&lt;p&gt;There is some imprecision because we&#x27;re constructing the figure for e.g.
8192/1024 ISL based on measurements with an average &lt;code&gt;0.9*8192&lt;/code&gt; input and
&lt;code&gt;0.9*1024&lt;/code&gt; output length. The whole calculation would be much simpler if the
benchmark harness recorded the number of queries executed and in what time,
meaning we can directly calculate the Wh/query from the Wh for the system over
the benchmark duration divided by the number of queries served (and
remembering that in the current setup each query is on average 90% of the
advertised sequence length).&lt;/p&gt;
&lt;p&gt;This logic is wrapped up in a &lt;a href=&quot;https://gist.github.com/asb/44fe17f4f5b7abed7836481be45c5a38#file-process_results-py&quot;&gt;simple
script&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There&#x27;s been a recent change to &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/pull/381&quot;&gt;remove the &#x27;full sweep&#x27;
workflows&lt;/a&gt; in favour of
only triggering a subset of runs when there is a relevant change. But I
grabbed my results from before this happened, from a December 15th 2025 run.
However when finalising this article I spotted Nvidia managed to land some new
NVL72 DeepSeek R1 0528 configurations just before Christmas, so I&#x27;ve merged in
those results as well, using a run from December 19th. All data and scripts are
collected together &lt;a href=&quot;https://gist.github.com/asb/44fe17f4f5b7abed7836481be45c5a38&quot;&gt;in this
Gist&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;results&quot;&gt;&lt;a href=&quot;#results&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Results&lt;/h2&gt;
&lt;p&gt;As well as giving the calculated Wh per query, the script also gives a
comparison point of minutes of PS5 gameplay (&lt;a href=&quot;https://www.playstation.com/en-gb/legal/ecodesign/&quot;&gt;according to
Sony&lt;/a&gt;, &quot;Active Power
Consumption&quot; ranges from ~217W to ~197W depending on model - we&#x27;ll just use
200W). The idea here is to provide some kind of reference point for what a
given Wh figure means in real-world times, rather than focusing solely on the
relative differences between different deployments. Comparisons to &quot;minutes of
internet streaming&quot; seem popular at the moment, presumably as it&#x27;s because an
activity basically everyone does. I&#x27;m steering away from that because I&#x27;d
be comparing one value that&#x27;s hard to estimate accurately and has many
provisos to another figure that&#x27;s hard to estimate accurately and has many
provisos, which just injects more error and uncertainty into this effort to
better measure/understand/contextualise energy used for LLM inference.&lt;/p&gt;
&lt;p&gt;I&#x27;m now going to cherry-pick some results for discussion. Firstly for DeepSeek
R1 0528 with 8k/1k ISL/OSL, we see that the reported configurations that give
a usable level of interactivity at fp8 report between 0.96-3.74 Wh/query
(equivalent to 0.29-1.12 minutes of PS5 gaming). The top row which is
substantially
more efficient is the newer &lt;a href=&quot;https://github.com/SemiAnalysisAI/InferenceX/commit/c040b5cf23ced2c7e23d1da03e1abae89e6426aa&quot;&gt;GB200 NVL72 configuration added at the end of
last
year&lt;/a&gt;.
It&#x27;s not totally easy to trace the configuration changes given they&#x27;re
accompanied by a reworking of the associated scripts, but as far as I can see
the configuration ultimately used is &lt;a href=&quot;https://github.com/ai-dynamo/dynamo/blob/b7107d008/examples/backends/sglang/slurm_jobs/scripts/gb200-fp8/disagg/8k1k-max-tpt.sh&quot;&gt;this file from the dynamo
repository&lt;/a&gt;.
Looking at the JSON the big gain comes from significantly higher prefill
throughput (with output throughput per GPU remaining roughly the same). This
indicates the older results (the second row) were bottlenecked waiting for
waiting for prefill to complete.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Workload&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Intvty (tok/s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;E2EL (s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Details&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Wh/Q&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;PS5 min&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp8 DS R1 0528 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;39.5&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;36.5&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;gb200 dynamo-sglang (72 GPUs disagg, conc: 2048, pfill_dp_attn, dec_dp_attn)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.96&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp8 DS R1 0528 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;31.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;55.2&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;gb200 dynamo-sglang (72 GPUs disagg, conc: 1024, pfill_dp_attn, dec_dp_attn)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;3.13&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp8 DS R1 0528 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;20.9&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;48.8&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;h200 trt (8 GPUs, conc: 64, dp_attn)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;3.32&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp8 DS R1 0528 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;19.5&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;49.6&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;h200 sglang (8 GPUs, conc: 64)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;3.39&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp8 DS R1 0528 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;23.9&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;39.9&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (8 GPUs, conc: 64)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;3.39&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp8 DS R1 0528 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;22.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;44.5&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200 sglang (8 GPUs, conc: 64)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;3.74&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.12&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Now taking a look at the results for an fp4 quantisation of the same workload,
the result is significantly cheaper to serve with similer or better
interactivity and the NVL72 setup Nvidia submitted does have a significant
advantage over the 4/8 GPU clusters. This time we see 0.63-1.67 Wh/query
(equivalent to 0.19-0.50 minutes of PS5 power draw while gaming). Serving at a
lower quantisation impacts the quality of results of course, but the improved
efficiency, including on smaler 4 GPU setups helps demonstrate why models like
&lt;a href=&quot;https://huggingface.co/moonshotai/Kimi-K2-Thinking&quot;&gt;Kimi K2 thinking&lt;/a&gt; are
distributed as &quot;native int4&quot;, with benchmark results reported at this
quantisation and quantisation aware training used to maintain quality of
result.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Workload&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Intvty (tok/s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;E2EL (s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Details&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Wh/Q&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;PS5 min&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 DS R1 0528 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;41.6&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;24.6&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;gb200 dynamo-trt (40 GPUs disagg, conc: 1075, pfill_dp_attn, dec_dp_attn)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.63&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 DS R1 0528 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;22.8&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;43.2&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (4 GPUs, conc: 128, dp_attn)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.93&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 DS R1 0528 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;18.7&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;59.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200 sglang (4 GPUs, conc: 128)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.25&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 DS R1 0528 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;30.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;39.4&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200 sglang (4 GPUs, conc: 64)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.67&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Looking now at the 1k/8k workload (i.e. generating significant output) and the
cost is 15.0-16.3 Wh/query (equivalent to 4.49-4.89 minutes of PS5 power draw
while gaming). As expected this is significantly higher than the 8k/1k
workload as prefill (processing input tokens) is much cheaper per token than
decode (generating output tokens)&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Workload&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Intvty (tok/s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;E2EL (s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Details&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Wh/Q&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;PS5 min&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp8 DS R1 0528 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;42.5&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;176.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200 sglang (8 GPUs, conc: 64)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;15.0&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;4.49&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp8 DS R1 0528 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;31.9&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;232.2&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;h200 sglang (8 GPUs, conc: 64)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;15.9&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;4.76&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp8 DS R1 0528 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;31.2&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;237.9&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;h200 trt (8 GPUs, conc: 64)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;16.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;4.88&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp8 DS R1 0528 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;39.1&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;189.5&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (8 GPUs, conc: 64)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;16.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;4.89&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Again, fp4 has a significant improvement in efficiency:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Workload&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Intvty (tok/s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;E2EL (s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Details&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Wh/Q&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;PS5 min&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 DS R1 0528 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;29.7&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;251.5&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (4 GPUs, conc: 256, dp_attn)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;2.73&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.82&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 DS R1 0528 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;37.7&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;197.5&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (8 GPUs, conc: 256, dp_attn)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;4.31&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 DS R1 0528 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;34.2&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;221.2&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200 sglang (4 GPUs, conc: 128)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;4.75&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 DS R1 0528 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;33.1&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;223.1&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (4 GPUs, conc: 128)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;4.79&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;1.44&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As you&#x27;d expect for a much smaller model at native fp4 quantisation,
GPT-OSS-120B is much cheaper to serve. e.g. for 8k/1k:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Workload&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Intvty (tok/s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;E2EL (s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Details&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Wh/Q&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;PS5 min&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 GPT-OSS 120B 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;45.8&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;20.8&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (1 GPUs, conc: 128)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.11&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 GPT-OSS 120B 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;93.1&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;10.5&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (2 GPUs, conc: 128, dp_attn)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.11&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 GPT-OSS 120B 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;44.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;21.4&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200 vllm (1 GPUs, conc: 128)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.11&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 GPT-OSS 120B 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;145.7&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;6.7&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (2 GPUs, conc: 64, dp_attn)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.14&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.04&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 GPT-OSS 120B 8k/1k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;103.8&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;9.2&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200 vllm (2 GPUs, conc: 64)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.20&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.06&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Or for 1k/8k:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Workload&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Intvty (tok/s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;E2EL (s)&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Details&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Wh/Q&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;PS5 min&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 GPT-OSS 120B 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;80.5&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;91.6&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (1 GPUs, conc: 128)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.49&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 GPT-OSS 120B 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;72.3&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;102.0&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200 vllm (1 GPUs, conc: 128)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.55&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 GPT-OSS 120B 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;144.9&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;51.1&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (2 GPUs, conc: 128, dp_attn)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.55&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;fp4 GPT-OSS 120B 1k/8k&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;129.4&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;57.0&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;b200-trt trt (2 GPUs, conc: 128)&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.61&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;0.18&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;&lt;a href=&quot;#conclusion&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Well, this took rather a lot more work than I thought it would and I&#x27;m
not yet fully satisfied with the result. Partly we have to accept a degree of
fuzziness about marginal energy usage of an individual query - it&#x27;s going to
depend on the overall workload of the system so there&#x27;s going to be some
approximation when you try to cost a single query.&lt;/p&gt;
&lt;p&gt;I&#x27;m glad that InferenceMAX exists and am especially glad that it&#x27;s open and
publicly developed, which is what has allowed me to dive into its
implementation to the extent I have and flag concerns/issues. I feel it&#x27;s not
yet fully living up to its aim of providing results that reflect real world
application, but I hope that will improve with further maturation and better
rules for benchmark participants. Of course, it may still make most sense to
collect benchmark figures myself and even if doing so, being able to refer to
the benchmarked configurations and get an indication of what hardware can
achieve what performance is helpful in doing so. Renting a 72-GPU cluster is
expensive and as far as I can see not typically available for a short time, so
any benchmarking run by myself would be limited to 4-8 GPU configurations. If
the gap in efficiency is huge for such setups vs the NVL72 then these smaller
setups are maybe less interesting.&lt;/p&gt;
&lt;p&gt;If I found the time to run benchmarks myself, what would I be testing? I&#x27;d
move to &lt;a href=&quot;https://huggingface.co/deepseek-ai/DeepSeek-V3.2&quot;&gt;DeepSeek V3.2&lt;/a&gt;. One
of the big features of this release was the movement to a new attention
mechanism which &lt;a href=&quot;https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/assets/paper.pdf#section.3&quot;&gt;scales &lt;em&gt;much&lt;/em&gt; closer to linearly with sequence
length&lt;/a&gt;.
With e.g. &lt;a href=&quot;https://github.com/MoonshotAI/Kimi-Linear&quot;&gt;Kimi Linear&lt;/a&gt; and
&lt;a href=&quot;https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&quot;&gt;Qwen3-Next&lt;/a&gt;,
other labs are moving in a similar direction experimentally at least. I&#x27;d
try to set up 8 GPU configuration with sglang/vllm configured in a way that it
would be capable of serving a commercial workload with varied input/output
sequence lengths and test this is the case (Chutes &lt;a href=&quot;https://chutes.ai/app/chute/398651e1-5f85-5e50-a513-7c5324e8e839?tab=source&quot;&gt;provide their deployed
configs&lt;/a&gt;
which may be another reference point). I&#x27;d want to see how much the effective
Wh per million input/output tokens varies depending on the different isl/osl
workloads. These &lt;em&gt;should&lt;/em&gt; be relatively similar given the linear attention
mechanism, and if so it&#x27;s a lot easier to estimate the rough energy cost of a
series of your own queries of varied length. I would stick with the random
input tokens for the time being.&lt;/p&gt;
&lt;p&gt;So where does that leave us? All of this and we&#x27;ve got figures for two
particular models, with one benchmark harness, a limited set of input/output
sequence lengths, and a range of
potential issues that might impact the conclusion. I think this is a useful
yardstick / datapoint, though I&#x27;d like to get towards something that&#x27;s even
more useful and that I have more faith in.&lt;/p&gt;
&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2026-02-17:
&lt;ul&gt;
&lt;li&gt;Changed GitHub links to point to SemiAnalysisAI/InferenceX rather than
InferenceMAX/InferenceMAX, as they were broken by the upstream rename.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;2026-01-09:
&lt;ul&gt;
&lt;li&gt;Fix broken link.&lt;/li&gt;
&lt;li&gt;Add note that more complete system info would be helpful for
reproducibility.&lt;/li&gt;
&lt;li&gt;Add note about variety of input/output sequence lengths tested.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;2026-01-07: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>QEMU-based instruction execution counting</title>
<published>2025-12-02T12:00:00Z</published>
<updated>2025-12-02T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2025q4/qemu-based-instruction-execution-counting"/>
<id>https://muxup.com/2025q4/qemu-based-instruction-execution-counting</id>
<content type="html">
&lt;p&gt;Although analysing performance by way of instruction counting has obvious
limitations, it can be helpful (especially when combined with appropriate
analysis scripts) to get rapid feedback on the impact of code generation
changes or to explore hypotheses about why code from one compiler might be
performing differently from another - for instance, by looking at instruction
mix in the most executed translation blocks. In this post we&#x27;ll look at how to
capture the necessary data to perform such an analysis using a QEMU plugin.
Future posts will give details of the analysis scripts I&#x27;ve used, and walk
through an example or two of putting them to use.&lt;/p&gt;
&lt;h2 id=&quot;modifying-qemu&quot;&gt;&lt;a href=&quot;#modifying-qemu&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Modifying QEMU&lt;/h2&gt;
&lt;p&gt;Over the past few years, QEMU&#x27;s plugin API has developed a fair bit. QEMU
includes several plugins, and &lt;code&gt;hotblocks&lt;/code&gt; provides &lt;em&gt;almost&lt;/em&gt; what we want but
doesn&#x27;t allow configurability of the number of blocks it will print
information on. I submitted a &lt;a href=&quot;https://lore.kernel.org/qemu-devel/cf5a00136738b981a12270b76572e8d502daf208.1753857212.git.asb@igalia.com/T/&quot;&gt;small patch
series&lt;/a&gt;
(and &lt;a href=&quot;https://lore.kernel.org/qemu-devel/cover.1764716538.git.asb@igalia.com/&quot;&gt;submitted it a second
time&lt;/a&gt;
addressing this and other minor issues found along the way. The series has now
been &lt;a href=&quot;https://lore.kernel.org/qemu-devel/87o6o3ucy6.fsf@draig.linaro.org/&quot;&gt;accepted by the
maintainer&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To build QEMU with this patch:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;git clone https://github.com/qemu/qemu &lt;span style=&quot;color: #000&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; qemu
git checkout v10.1.2
cat - &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;&amp;#39;EOF&amp;#39; &amp;gt; hotblocks.patch&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;index 98404b6885..8ecf033997 100644&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;--- a/contrib/plugins/hotblocks.c&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+++ b/contrib/plugins/hotblocks.c&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;@@ -73,28 +73,29 @@ static void exec_count_free(gpointer key, gpointer value, gpointer user_data)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt; static void plugin_exit(qemu_plugin_id_t id, void *p)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt; {&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;     g_autoptr(GString) report = g_string_new(&amp;quot;collected &amp;quot;);&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;-    GList *counts, *it;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+    GList *counts, *sorted_counts, *it;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;     int i;&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;     g_string_append_printf(report, &amp;quot;%d entries in the hash table\n&amp;quot;,&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;                            g_hash_table_size(hotblocks));&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;     counts = g_hash_table_get_values(hotblocks);&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;-    it = g_list_sort_with_data(counts, cmp_exec_count, NULL);&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+    sorted_counts = g_list_sort_with_data(counts, cmp_exec_count, NULL);&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;-    if (it) {&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+    if (sorted_counts) {&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;         g_string_append_printf(report, &amp;quot;pc, tcount, icount, ecount\n&amp;quot;);&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;-        for (i = 0; i &amp;lt; limit &amp;amp;&amp;amp; it-&amp;gt;next; i++, it = it-&amp;gt;next) {&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+        for (i = 0, it = sorted_counts; (limit == 0 || i &amp;lt; limit) &amp;amp;&amp;amp; it;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+             i++, it = it-&amp;gt;next) {&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;             ExecCount *rec = (ExecCount *) it-&amp;gt;data;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;             g_string_append_printf(&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;-                report, &amp;quot;0x%016&amp;quot;PRIx64&amp;quot;, %d, %ld, %&amp;quot;PRId64&amp;quot;\n&amp;quot;,&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+                report, &amp;quot;0x%016&amp;quot;PRIx64&amp;quot;, %d, %ld, %&amp;quot;PRIu64&amp;quot;\n&amp;quot;,&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;                 rec-&amp;gt;start_addr, rec-&amp;gt;trans_count,&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;                 rec-&amp;gt;insns,&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;                 qemu_plugin_u64_sum(&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;                     qemu_plugin_scoreboard_u64(rec-&amp;gt;exec_count)));&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;         }&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;-        g_list_free(it);&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+        g_list_free(sorted_counts);&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;     }&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;     qemu_plugin_outs(report-&amp;gt;str);&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;@@ -170,6 +171,13 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;                 fprintf(stderr, &amp;quot;boolean argument parsing failed: %s\n&amp;quot;, opt);&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;                 return -1;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;             }&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+        } else if (g_strcmp0(tokens[0], &amp;quot;limit&amp;quot;) == 0) {&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+            char *endptr = NULL;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+            limit = g_ascii_strtoull(tokens[1], &amp;amp;endptr, 10);&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+            if (endptr == tokens[1] || *endptr != &amp;#39;\0&amp;#39;) {&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+                fprintf(stderr, &amp;quot;unsigned integer parsing failed: %s\n&amp;quot;, opt);&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+                return -1;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+            }&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;         } else {&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;             fprintf(stderr, &amp;quot;option parsing failed: %s\n&amp;quot;, opt);&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;             return -1;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;diff --git a/docs/about/emulation.rst b/docs/about/emulation.rst&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;index 4a7d1f4178..e8793b0f9c 100644&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;--- a/docs/about/emulation.rst&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+++ b/docs/about/emulation.rst&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;@@ -463,6 +463,18 @@ Example::&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;   0x000000004002b0, 1, 4, 66087&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;   ...&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;+Behaviour can be tweaked with the following arguments:&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+.. list-table:: Hot Blocks plugin arguments&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+  :widths: 20 80&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+  :header-rows: 1&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+  * - Option&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+    - Description&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+  * - inline=true|false&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+    - Use faster inline addition of a single counter.&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+  * - limit=N&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;+    - The number of blocks to be printed. (Default: N = 20, use 0 for no limit).&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt; Hot Pages&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt; .........&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;
patch -p1 &amp;lt; hotblocks.patch
./configure --prefix&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(pwd)&lt;/span&gt;/inst --target-list&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;riscv32-linux-user riscv64-linux-user&amp;quot;&lt;/span&gt;
make -j&lt;span style=&quot;color: #A90D91&quot;&gt;$(&lt;/span&gt;nproc&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; ..
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;using-this-plugin-to-capture-statistics-from-running-a-binary-under-qemu-user&quot;&gt;&lt;a href=&quot;#using-this-plugin-to-capture-statistics-from-running-a-binary-under-qemu-user&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Using this plugin to capture statistics from running a binary under qemu-user&lt;/h2&gt;
&lt;p&gt;Assuming you have an &lt;a href=&quot;/2024q4/rootless-cross-architecture-debootstrap&quot;&gt;appropriate
sysroot&lt;/a&gt;, you can
run a binary and have the execution information emitted to stderr by doing
something like:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;QEMUDIR=$HOME&lt;/span&gt;/qemu/build
&lt;span style=&quot;color: #000&quot;&gt;SYSROOT=$HOME&lt;/span&gt;/rvsysroot
&lt;span style=&quot;color: #000&quot;&gt;$QEMUDIR&lt;/span&gt;/qemu-riscv64 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -L &lt;span style=&quot;color: #000&quot;&gt;$SYSROOT&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -plugin &lt;span style=&quot;color: #000&quot;&gt;$QEMUDIR&lt;/span&gt;/contrib/plugins/libhotblocks.so,limit&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #1C01CE&quot;&gt;0&lt;/span&gt;,inline&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;on &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -d plugin,nochain &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  my_rv64_binary
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This produces output like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;collected 2229 entries in the hash table
pc, tcount, icount, ecount
0x00007fffee7012ba, 1, 1, 3737
0x00007fffee7012be, 1, 3, 3737
0x00007ffff741e738, 1, 23, 1074
0x00007fffee71bb38, 1, 5, 884
0x00007ffff741bb2e, 1, 11, 662
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This listing indicates the address of the translation block, the number of
times it&#x27;s been translated, the number of instructions it contains, and the
number of times it was executed. Note that a translation block is not the same
as a basic block in the compiler. A translation block can span multiple basic
blocks in the case of fallthrough, and this can also mean an instruction may
show up in multiple translation blocks.&lt;/p&gt;
&lt;p&gt;At least for my use cases, I need something a bit more involved than this. In
order to add collection of these statistics to an existing benchmark harness I
need a wrapper script that transparently collects these statistics to a file.
It&#x27;s also helpful to capture the runtime address of executable mappings for
loaded libraries, allowing translation blocks to be attributed easily to
either the binary itself or &lt;code&gt;libc&lt;/code&gt;, &lt;code&gt;libm&lt;/code&gt; etc. We have &lt;code&gt;gdb&lt;/code&gt; connect to
QEMU&#x27;s gdbserver in order to dump those mappings. Do ensure you&#x27;re using a
recent version of QEMU (the version suggested in the patch application
instructions is definitely good) for this as I wasted quite some time running
into a &lt;a href=&quot;https://github.com/qemu/qemu/commit/8b647bd352505234cab2acd2422aba183a1aa1fd&quot;&gt;bug with file descriptor
numbers&lt;/a&gt;
that caused odd breakage.&lt;/p&gt;
&lt;p&gt;This &lt;code&gt;qemu-forwarder.sh&lt;/code&gt; script will capture the plugin&#x27;s output in a
&lt;code&gt;.qemu_out&lt;/code&gt; file and the mappings in a &lt;code&gt;.map&lt;/code&gt; file, both of which can be later
consumed by a detailed analysis script.&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #177500&quot;&gt;#!/bin/sh&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;QEMUDIR=$HOME&lt;/span&gt;/qemu/build
&lt;span style=&quot;color: #000&quot;&gt;SYSROOT=$HOME&lt;/span&gt;/rvsysroot
&lt;span style=&quot;color: #000&quot;&gt;QEMU=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$QEMUDIR&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/qemu-riscv64 \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  -L &lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$SYSROOT&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt; \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  -plugin &lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$QEMUDIR&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/contrib/plugins/libhotblocks.so,limit=0,inline=on \&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;  -d plugin,nochain&amp;quot;&lt;/span&gt;

&lt;span style=&quot;color: #000&quot;&gt;SUFFIX=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;[&lt;/span&gt; -e &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$1&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;.qemu_out&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;]&lt;/span&gt;; &lt;span style=&quot;color: #A90D91&quot;&gt;then&lt;/span&gt;
  &lt;span style=&quot;color: #000&quot;&gt;NUM=&lt;/span&gt;&lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;
  &lt;span style=&quot;color: #A90D91&quot;&gt;while&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;[&lt;/span&gt; -e &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$1&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;.qemu_out.&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$NUM&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;]&lt;/span&gt;; &lt;span style=&quot;color: #A90D91&quot;&gt;do&lt;/span&gt;
    &lt;span style=&quot;color: #000&quot;&gt;NUM=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$((&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;NUM&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;+&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;))&lt;/span&gt;
  &lt;span style=&quot;color: #A90D91&quot;&gt;done&lt;/span&gt;
  &lt;span style=&quot;color: #000&quot;&gt;SUFFIX=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;.&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$NUM&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;fi&lt;/span&gt;

&lt;span style=&quot;color: #000&quot;&gt;GDB_SOCK=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(&lt;/span&gt;mktemp -u&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt;
setarch &lt;span style=&quot;color: #A90D91&quot;&gt;$(&lt;/span&gt;uname -m&lt;span style=&quot;color: #A90D91&quot;&gt;)&lt;/span&gt; -R &lt;span style=&quot;color: #000&quot;&gt;$QEMU&lt;/span&gt; -g &lt;span style=&quot;color: #000&quot;&gt;$GDB_SOCK&lt;/span&gt; -D &lt;span style=&quot;color: #000&quot;&gt;$1&lt;/span&gt;.qemu_out&lt;span style=&quot;color: #000&quot;&gt;$SUFFIX&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$@&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &amp;amp;
&lt;span style=&quot;color: #000&quot;&gt;QEMU_PID=$!&lt;/span&gt;

&lt;span style=&quot;color: #000&quot;&gt;RETRY_COUNT=&lt;/span&gt;&lt;span style=&quot;color: #1C01CE&quot;&gt;0&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;while&lt;/span&gt; ! &lt;span style=&quot;color: #000&quot;&gt;[&lt;/span&gt; -e &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$GDB_SOCK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;]&lt;/span&gt;; &lt;span style=&quot;color: #A90D91&quot;&gt;do&lt;/span&gt;
  &lt;span style=&quot;color: #000&quot;&gt;RETRY_COUNT=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$((&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;RETRY_COUNT&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;+&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;))&lt;/span&gt;
  &lt;span style=&quot;color: #A90D91&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;[&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;$RETRY_COUNT&lt;/span&gt; -eq &lt;span style=&quot;color: #1C01CE&quot;&gt;10&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;]&lt;/span&gt;; &lt;span style=&quot;color: #A90D91&quot;&gt;then&lt;/span&gt;
    &lt;span style=&quot;color: #A90D91&quot;&gt;echo&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;Timed out waiting for gdb socket to be created&amp;quot;&lt;/span&gt;
    &lt;span style=&quot;color: #A90D91&quot;&gt;exit&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;
  &lt;span style=&quot;color: #A90D91&quot;&gt;fi&lt;/span&gt;
  sleep &lt;span style=&quot;color: #1C01CE&quot;&gt;0&lt;/span&gt;.1
  &lt;span style=&quot;color: #A90D91&quot;&gt;if&lt;/span&gt; ! &lt;span style=&quot;color: #A90D91&quot;&gt;kill&lt;/span&gt; -0 &lt;span style=&quot;color: #000&quot;&gt;$QEMU_PID&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt;&amp;gt;/dev/null; &lt;span style=&quot;color: #A90D91&quot;&gt;then&lt;/span&gt;
    &lt;span style=&quot;color: #A90D91&quot;&gt;echo&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;QEMU process died before gdb socket was created&amp;quot;&lt;/span&gt;
    &lt;span style=&quot;color: #A90D91&quot;&gt;wait&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;$QEMU_PID&lt;/span&gt;
    &lt;span style=&quot;color: #A90D91&quot;&gt;exit&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;$?&lt;/span&gt;
  &lt;span style=&quot;color: #A90D91&quot;&gt;fi&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;done&lt;/span&gt;

gdb -batch &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -ex &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;set pagination off&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -ex &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;target remote &lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$GDB_SOCK&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -ex &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;break main&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -ex &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;continue&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -ex &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;set logging file &lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$1&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;.map&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$SUFFIX&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -ex &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;set logging enabled on&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -ex &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;info proc mappings&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -ex &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;detach&amp;quot;&lt;/span&gt; &amp;gt; /dev/null &lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt;&amp;gt;&amp;amp;&lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;wait&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;$QEMU_PID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above will work under LLVM&#x27;s &lt;code&gt;lit&lt;/code&gt;, though you will need to use a recent
enough version that doesn&#x27;t strip &lt;code&gt;HOME&lt;/code&gt; from the environment (or else edit
the script accordingly). It also produces output in sequentially numbered
files, again motivated by the desire to run under this script from &lt;code&gt;lit&lt;/code&gt; as
used by &lt;code&gt;llvm-test-suite&lt;/code&gt;&#x27;s SPEC configuration which can involve multiple
invocations of the same binary for a given benchmark (e.g. 500.perlbench_r).&lt;/p&gt;
&lt;h2 id=&quot;analysing-the-output&quot;&gt;&lt;a href=&quot;#analysing-the-output&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Analysing the output&lt;/h2&gt;
&lt;p&gt;A follow-up post will introduce the scripting I&#x27;ve built around this.&lt;/p&gt;
&lt;h2 id=&quot;recording-and-analysing-results-from-running-spec&quot;&gt;&lt;a href=&quot;#recording-and-analysing-results-from-running-spec&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Recording and analysing results from running SPEC&lt;/h2&gt;
&lt;p&gt;Assuming you have &lt;code&gt;qemu-forwarder.sh&lt;/code&gt;, in your llvm-test-suite directory:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;CONF=&lt;/span&gt;clang-head-test
&lt;span style=&quot;color: #000&quot;&gt;CLANG_BIN_DIR=$HOME&lt;/span&gt;/llvm-project/build/release/bin
&lt;span style=&quot;color: #000&quot;&gt;CFLAGS=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;-march=rv64gc_zba_zbb_zbs&amp;quot;&lt;/span&gt;
cat - &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;EOF &amp;gt; $CONF.cmake&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_SYSTEM_NAME Linux)&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_SYSROOT $HOME/rvsysroot)&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_C_COMPILER $CLANG_BIN_DIR/clang)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_CXX_COMPILER $CLANG_BIN_DIR/clang++)&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_C_COMPILER_TARGET riscv64-linux-gnu)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_CXX_COMPILER_TARGET riscv64-linux-gnu)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_C_FLAGS_INIT &amp;quot;$CFLAGS&amp;quot;)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_CXX_FLAGS_INIT &amp;quot;$CFLAGS&amp;quot;)&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_LINKER_TYPE LLD)&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;
cmake -G Ninja &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -B build.&lt;span style=&quot;color: #000&quot;&gt;$CONF&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --toolchain&lt;span style=&quot;color: #000&quot;&gt;=$CONF&lt;/span&gt;.cmake &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DTEST_SUITE_SPEC2017_ROOT&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;~/cpu2017 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DTEST_SUITE_SUBDIRS&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;External/SPEC &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DTEST_SUITE_COLLECT_CODE_SIZE&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;OFF &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DTEST_SUITE_COLLECT_COMPILE_TIME&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;OFF &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DTEST_SUITE_USER_MODE_EMULATION&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;ON &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DTEST_SUITE_RUN_UNDER&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;$(pwd)&lt;/span&gt;/qemu-forwarder.sh
cmake --build build.&lt;span style=&quot;color: #000&quot;&gt;$CONF&lt;/span&gt;
&lt;span style=&quot;color: #000&quot;&gt;$CLANG_BIN_DIR&lt;/span&gt;/llvm-lit -v --filter-out&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;.+_s|specrand&amp;#39;&lt;/span&gt; build.&lt;span style=&quot;color: #000&quot;&gt;$CONF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;526.blender_r&lt;/code&gt; test takes twice as long as the others, so you may wish to
skip it by instead executing something like:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;$CLANG_BIN_DIR&lt;/span&gt;/llvm-lit -v --filter-out&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;.+_s|specrand|blender&amp;#39;&lt;/span&gt; build.&lt;span style=&quot;color: #000&quot;&gt;$CONF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you want to re-run tests you must delete the previous &lt;code&gt;.qemu_out&lt;/code&gt; and
&lt;code&gt;.map&lt;/code&gt; files, which can be done with:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #000&quot;&gt;[&lt;/span&gt; -n &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;build.&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$CONF&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;]&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; find &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;build.&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$CONF&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; -type f -name &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;*.qemu_out*&amp;quot;&lt;/span&gt; -exec sh -c &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;    for q_file do&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;        base_path=&amp;quot;${q_file%.qemu_out*}&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;        rm -f &amp;quot;$q_file&amp;quot; &amp;quot;${base_path}.map&amp;quot;*&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;    done&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;&lt;/span&gt; sh &lt;span style=&quot;color: #000&quot;&gt;{}&lt;/span&gt; +
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In order to compare two SPEC builds, you can use something like the following
hacky script. Using the captured translation block execution data to generate
a plain executed instruction count is overkill as the example
&lt;a href=&quot;https://www.qemu.org/docs/master/about/emulation.html#instruction&quot;&gt;tests/tcg/plugin/insn.c&lt;/a&gt;
can easily dump for this for you directly. But by collecting the data upfront,
you can easily dive right into a more detailed analysis when you see a
surprising difference in executed instruction counts without rerunning the
binary.&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #177500&quot;&gt;#!/usr/bin/env python3&lt;/span&gt;

&lt;span style=&quot;color: #A90D91&quot;&gt;from&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;pathlib&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;import&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;Path&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;from&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;collections&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;import&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;defaultdict&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;import&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;sys&lt;/span&gt;

&lt;span style=&quot;color: #A90D91&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;collect_totals&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;root_dir&lt;/span&gt;):
    &lt;span style=&quot;color: #000&quot;&gt;totals&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;defaultdict&lt;/span&gt;(&lt;span style=&quot;color: #A90D91&quot;&gt;int&lt;/span&gt;)
    &lt;span style=&quot;color: #000&quot;&gt;root_path&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;Path&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;root_dir&lt;/span&gt;)&lt;span style=&quot;color: #000&quot;&gt;/&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;External&amp;quot;&lt;/span&gt;

    &lt;span style=&quot;color: #A90D91&quot;&gt;for&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;file_path&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;in&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;root_path.rglob&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;*.qemu_out*&amp;quot;&lt;/span&gt;):
        &lt;span style=&quot;color: #000&quot;&gt;benchmark_name&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;file_path.parts&lt;/span&gt;[&lt;span style=&quot;color: #1C01CE&quot;&gt;4&lt;/span&gt;]

        &lt;span style=&quot;color: #A90D91&quot;&gt;try&lt;/span&gt;:
            &lt;span style=&quot;color: #A90D91&quot;&gt;with&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;file_path.open&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;r&amp;quot;&lt;/span&gt;) &lt;span style=&quot;color: #A90D91&quot;&gt;as&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;f&lt;/span&gt;:
                &lt;span style=&quot;color: #000&quot;&gt;file_total&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;0&lt;/span&gt;
                &lt;span style=&quot;color: #A90D91&quot;&gt;for&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;line&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;in&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;f&lt;/span&gt;:
                    &lt;span style=&quot;color: #000&quot;&gt;parts&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;line.strip&lt;/span&gt;()&lt;span style=&quot;color: #000&quot;&gt;.split&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;,&amp;#39;&lt;/span&gt;)
                    &lt;span style=&quot;color: #177500&quot;&gt;# Only sum lines that match the expected format.&lt;/span&gt;
                    &lt;span style=&quot;color: #A90D91&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;len&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;parts&lt;/span&gt;) &lt;span style=&quot;color: #000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;4&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;and&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;parts&lt;/span&gt;[&lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt;]&lt;span style=&quot;color: #000&quot;&gt;.strip&lt;/span&gt;()&lt;span style=&quot;color: #000&quot;&gt;.isdigit&lt;/span&gt;():
                        &lt;span style=&quot;color: #177500&quot;&gt;# icount * ecount.&lt;/span&gt;
                        &lt;span style=&quot;color: #000&quot;&gt;file_total&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;int&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;parts&lt;/span&gt;[&lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt;]) &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;int&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;parts&lt;/span&gt;[&lt;span style=&quot;color: #1C01CE&quot;&gt;3&lt;/span&gt;])
                &lt;span style=&quot;color: #000&quot;&gt;totals&lt;/span&gt;[&lt;span style=&quot;color: #000&quot;&gt;benchmark_name&lt;/span&gt;] &lt;span style=&quot;color: #000&quot;&gt;+=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;file_total&lt;/span&gt;
        &lt;span style=&quot;color: #A90D91&quot;&gt;except&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;Exception&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;as&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;e&lt;/span&gt;:
            &lt;span style=&quot;color: #A90D91&quot;&gt;print&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;f&amp;quot;Error reading {&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;file_path&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}: {&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;e&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}&amp;quot;&lt;/span&gt;)

    &lt;span style=&quot;color: #A90D91&quot;&gt;return&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;totals&lt;/span&gt;

&lt;span style=&quot;color: #A90D91&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;__name__&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;==&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;__main__&amp;quot;&lt;/span&gt;:
    &lt;span style=&quot;color: #A90D91&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;len&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;sys.argv&lt;/span&gt;) &lt;span style=&quot;color: #000&quot;&gt;!=&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;3&lt;/span&gt;:
        &lt;span style=&quot;color: #A90D91&quot;&gt;print&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;Usage: spec-compare-helper &amp;lt;dir_a&amp;gt; &amp;lt;dir_b&amp;gt;&amp;quot;&lt;/span&gt;)
        &lt;span style=&quot;color: #000&quot;&gt;sys.exit&lt;/span&gt;(&lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;)

    &lt;span style=&quot;color: #000&quot;&gt;dir_a&lt;/span&gt;, &lt;span style=&quot;color: #000&quot;&gt;dir_b&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;sys.argv&lt;/span&gt;[&lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;], &lt;span style=&quot;color: #000&quot;&gt;sys.argv&lt;/span&gt;[&lt;span style=&quot;color: #1C01CE&quot;&gt;2&lt;/span&gt;]
    &lt;span style=&quot;color: #000&quot;&gt;totals_a&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;collect_totals&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;dir_a&lt;/span&gt;)
    &lt;span style=&quot;color: #000&quot;&gt;totals_b&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;collect_totals&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;dir_b&lt;/span&gt;)

    &lt;span style=&quot;color: #000&quot;&gt;benchmarks&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;sorted&lt;/span&gt;(&lt;span style=&quot;color: #A90D91&quot;&gt;set&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;totals_a.keys&lt;/span&gt;()) &lt;span style=&quot;color: #000&quot;&gt;|&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;set&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;totals_b.keys&lt;/span&gt;()))

    &lt;span style=&quot;color: #A90D91&quot;&gt;print&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;f&amp;quot;{&amp;#39;Benchmark&amp;#39;:&amp;lt;20} {&amp;#39;DirA&amp;#39;:&amp;gt;15} {&amp;#39;DirB&amp;#39;:&amp;gt;15} {&amp;#39;Diff (%)&amp;#39;:&amp;gt;10}&amp;quot;&lt;/span&gt;)
    &lt;span style=&quot;color: #A90D91&quot;&gt;print&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;=&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;60&lt;/span&gt;)

    &lt;span style=&quot;color: #A90D91&quot;&gt;for&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;benchmark&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;in&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;benchmarks&lt;/span&gt;:
        &lt;span style=&quot;color: #000&quot;&gt;val_a&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;totals_a.get&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;benchmark&lt;/span&gt;, &lt;span style=&quot;color: #1C01CE&quot;&gt;0&lt;/span&gt;)
        &lt;span style=&quot;color: #000&quot;&gt;val_b&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;totals_b.get&lt;/span&gt;(&lt;span style=&quot;color: #000&quot;&gt;benchmark&lt;/span&gt;, &lt;span style=&quot;color: #1C01CE&quot;&gt;0&lt;/span&gt;)
        &lt;span style=&quot;color: #000&quot;&gt;diff_pct&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt; ((&lt;span style=&quot;color: #000&quot;&gt;val_b&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;-&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;val_a&lt;/span&gt;) &lt;span style=&quot;color: #000&quot;&gt;/&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;val_a&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;*&lt;/span&gt; &lt;span style=&quot;color: #1C01CE&quot;&gt;100&lt;/span&gt;) &lt;span style=&quot;color: #A90D91&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;val_a&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;else&lt;/span&gt; &lt;span style=&quot;color: #A90D91&quot;&gt;float&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;inf&amp;quot;&lt;/span&gt;)

        &lt;span style=&quot;color: #A90D91&quot;&gt;print&lt;/span&gt;(&lt;span style=&quot;color: #C41A16&quot;&gt;f&amp;quot;{&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;benchmark&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;:&amp;lt;20} {&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;val_a&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;:&amp;gt;15} {&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;val_b&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;:&amp;gt;15} {&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;diff_pct&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;:&amp;gt;9.2f}%&amp;quot;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Which produces output looking something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Benchmark                       DirA            DirB   Diff (%)
============================================================
500.perlbench_r         180245097594    182078714777      1.02%
502.gcc_r               220874510659    219647717585     -0.56%
505.mcf_r               131589945456    134271153130      2.04%
508.namd_r              220648061019    216682202888     -1.80%
510.parest_r            291341820355    291844973715      0.17%
511.povray_r             31911866906     31103201809     -2.53%
519.lbm_r                94166321698     86910581403     -7.71%
520.omnetpp_r           138002605692    137676301622     -0.24%
523.xalancbmk_r         283566182007    284735075518      0.41%
525.x264_r              380165035845    379862173371     -0.08%
526.blender_r           660528270138    659361380750     -0.18%
531.deepsjeng_r         355058534962    349621355155     -1.53%
538.imagick_r           238573643488    238560676372     -0.01%
541.leela_r             421886351310    405423320484     -3.90%
544.nab_r               415595728542    391443973852     -5.81%
557.xz_r                132548718317    130229753780     -1.75%
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&#x27;s worth highlighting that as we&#x27;re running this under user-mode emulation,
the dynamic instruction count naturally never counts any instructions on the
kernel side that you would see if profiling a real system.&lt;/p&gt;
&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2025-12-15: Note that the qemu patches have now been accepted in the
maintainer&#x27;s tree.&lt;/li&gt;
&lt;li&gt;2025-12-02: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>Minipost: Olmo 3 training cost</title>
<published>2025-12-01T12:00:00Z</published>
<updated>2025-12-01T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2025q4/minipost-olmo3-training-cost"/>
<id>https://muxup.com/2025q4/minipost-olmo3-training-cost</id>
<content type="html">
&lt;p&gt;Recently I jotted down some notes on &lt;a href=&quot;/2025q4/minipost-llm-inference-vs-training-cost-for-deepseek&quot;&gt;LLM inference vs training costs for
DeepSeek&lt;/a&gt;
and I wanted to add on an additional datapoint for training cost based on the
recently released &lt;a href=&quot;https://allenai.org/blog/olmo3&quot;&gt;Olmo3 models&lt;/a&gt; from the
Allen Institute for AI (&quot;Ai2&quot;). The model family has 7B and 32B parameter
models, with &#x27;Think&#x27; variants available for 7B and 32B but so far only a 7B
&#x27;Instruct&#x27; non-reasoning version (but &lt;a href=&quot;https://xcancel.com/allen_ai/status/1991545790263857609&quot;&gt;watch this
space&lt;/a&gt;). What&#x27;s
particularly interesting about the Olmo models to me is that beyond providing
open weights, the training scripts and datasets are openly available as well.&lt;/p&gt;
&lt;p&gt;Going by the reported benchmarks at least it&#x27;s competitive with less open
models at a similar size, and importantly they&#x27;ve increased the supported
context length from the rather limiting 4k tokens supported by the Olmo 2
series to a much more usable 64k tokens. Given the relatively small size these
models are less capable than relatively chunky models like DeepSeek R1/V3.x or
Kimi K2, but I&#x27;ve been impressed by the capability of 32B dense models for
basic queries, and from my non-scientific testing both the 32B and 7B Olmo3
variants seem to do a reasonable job of summarising things like discussion
threads. You can experiment yourself at
&lt;a href=&quot;https://playground.allenai.org/&quot;&gt;playground.allenai.org&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;energy-required-for-training-olmo-3&quot;&gt;&lt;a href=&quot;#energy-required-for-training-olmo-3&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Energy required for training Olmo 3&lt;/h2&gt;
&lt;p&gt;One of the neat things about this level of openness is that it &lt;em&gt;should&lt;/em&gt; act as
a floor in terms of performance for future models of this size class assuming
they&#x27;re appropriately funded and don&#x27;t take too many risks chasing novelty.
Rerunning the training process with an updated dataset and some minor tweaks
is something you could imagine doing on some regular cadence, ideally as a
shared endeavour. Imagining this effort in the future, how much energy is
required? The initial version of the &lt;a href=&quot;http://allenai.org/papers/olmo3&quot;&gt;detailed Olmo 3 technical
report&lt;/a&gt; unfortunately has little to say on
this. We can get a back of the envelope figure in terms of GPU hours for
pre-training based on the reported 7700 tokens per second per GPU for the 7B
base model and 1900 tokens per second for the 32B base model and the ~6T token
dataset. But even better than that, we can just &lt;strong&gt;ask&lt;/strong&gt; the Ai2 folks
(sometimes the internet really does work wonderfully!). After asking on their
&lt;a href=&quot;https://discord.gg/ai2&quot;&gt;public Discord&lt;/a&gt; I was rapidly furnished with this
helpful answer:&lt;/p&gt;
&lt;blockquote&gt;
For some detailed numbers, we measured power consumption throughout training,
along with total GPU hours. We used ~234k H100 hours to pretrain the 7B, and
~1.05m H100 hours to pretrain the 32B. 1900 TPS is generally what our trainer
is capable of, but with restarts, evaluations, checkpointing, and occasional
network issues, the 32B took 1.05m hours. We measured an average power
consumption of ~621W while pretraining the 7B and ~649W while pretraining the
32B, and this means that our GPUs consumed ~146MWh for the 7B and ~681MWh for
the 32B. We&#x27;ll include more detailed GPU hour information in a future version
of the paper, including for post-training!
&lt;p&gt;&lt;em&gt;Ai2 Olmo 3 team &lt;a href=&quot;https://discord.com/channels/1241138968448340109/1441462011618922647/1441471645046014038&quot;&gt;on their
Discord&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So that&#x27;s 0.681 GWh in GPU power draw for pretraining the 32B model and
0.146 GWh in GPU power draw for pretraining the 7B model. As noted in the
quote, this is inclusive of restarts, checkpointing etc. But perhaps won&#x27;t
include previous early stage experimentation. I look forward to an updated
technical report with full details, but pretraining should cover the bulk of
the compute requirements (as a reference point, today&#x27;s &lt;a href=&quot;https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf&quot;&gt;DeepSeek V3.2
paper&lt;/a&gt;
found it notable that the post-training compute budget exceeded 10% of the
pretraining cost).&lt;/p&gt;
&lt;p&gt;The 0.681 GWh figure doesn&#x27;t account for full system power and cooling
cost. I&#x27;d love to be corrected, but I believe a 1.5x-2x multiplier would be an
assumption towards the upper end. But for the sake of this yardstick
comparison let&#x27;s look at a few comparisons based on the reported number:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;0.681 GWh of electricity would cost about £180k at UK residential rates
(capped at 26.35p per kWh currently). Substantially less in the USA.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.gov.wales/sites/default/files/publications/2023-11/leisure-centre-decarbonisation-guidance-note.pdf&quot;&gt;A larger leisure centre with a pool consumes ~2.5 GWh of energy per
year&lt;/a&gt;.
I don&#x27;t know if the idea of a &quot;leisure centre&quot; translates outside of the UK,
but basically it&#x27;s a swimming pool plus gym, squash/tennis courts etc.
&lt;ul&gt;
&lt;li&gt;The linked page claims ~2 GWh of energy in gas and 0.5 GWh in electricity.
For the gas, to compare like with like you&#x27;d need to consider the source
of energy for the electricity used for Olmo training.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;0.681 GWh is ~0.11% of &lt;a href=&quot;https://www.home.cern/resources/faqs/facts-and-figures-about-lhc&quot;&gt;LHC&#x27;s annual 600 GWh energy
consumption&lt;/a&gt;
or ~0.05% of CERN&#x27;s annual consumption.&lt;/li&gt;
&lt;li&gt;We can estimate a Boeing 787-9 flying from London Heathrow to SFO
consumes jet fuel containing ~0.58 GWh of energy.
&lt;ul&gt;
&lt;li&gt;Calculated with 8638km distance, 5.62kg fuel/km (taking the most economic
787-9 long haul figure from &lt;a href=&quot;https://en.wikipedia.org/wiki/Fuel_economy_in_aircraft&quot;&gt;this table on
Wikipedia&lt;/a&gt; and
&lt;a href=&quot;https://en.wikipedia.org/wiki/Jet_fuel#Typical_physical_properties_for_Jet_A_and_Jet_A-1&quot;&gt;11.95kWh/kg specific energy of jet
fuel&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;This is a yardstick rather than a direct comparison. A direct comparison
to the GWh of electricity used for the GPU compute of the LLM would depend
on the source of the electricity. If it was e.g. gas rather than
solar/hydro/wind then you&#x27;d want to compare the number of GWh consumed to
create that electricity which would of course be higher.&lt;/li&gt;
&lt;li&gt;As a further point of reference, FlightAware indicates 5 separate
direct LHR to SFO flights scheduled per day.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;more-efficient-llm-training&quot;&gt;&lt;a href=&quot;#more-efficient-llm-training&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;More efficient LLM training&lt;/h2&gt;
&lt;p&gt;We can hope for new breakthroughs, more efficient hardware, better datasets
and so on. But here is some work I noticed in the area. Fair warning: this
isn&#x27;t my field, and we have to recognise applying a research result to a
production training run is sure to have challenges even if the research
suggests the trade-offs are worthwhile. So consider this vague gesticulating
about seemingly interesting work that is going on and find someone who knows
what they&#x27;re talking about to confirm the degree to which it is
interesting/viable.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mixture of Experts (MoE) models are substantially cheaper to train which is
one reason the industry has moved in that direction. The next Ai2 Olmo
model is &lt;a href=&quot;https://old.reddit.com/r/LocalLLaMA/comments/1p24aet/ai2_just_announced_olmo_3_a_leading_fully_open_lm/npzqw4h/?context=3&quot;&gt;expected to be
MoE&lt;/a&gt;.
The &lt;a href=&quot;https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&quot;&gt;Qwen
blog&lt;/a&gt; has
a
&lt;a href=&quot;https://img.alicdn.com/imgextra/i1/O1CN01FUbdQa1i6J7tAfCCn_%21%216000000004363-2-tps-2860-1114.png&quot;&gt;graph&lt;/a&gt;
comparing the relative training cost in GPU hours of the dense Qwen3-32B vs
Qwen3-30B-A3b vs Qwen3-Next-80B-A3B, where the latter
makes further architectural changes, reporting a 10.7x reduction. ~2.5x of
that is going to come from the reduced corpus size (15T tokens down from
36T), but that still leaves plenty of improvement from other factors.&lt;/li&gt;
&lt;li&gt;Maybe it will be shown viable to train in lower precision such as MXFP8 or
even NVFP4, which would allow much more throughput for a similar energy
budget. Nvidia have worked to demonstrate this can be effective for
&lt;a href=&quot;https://arxiv.org/pdf/2506.08027&quot;&gt;both&lt;/a&gt;
&lt;a href=&quot;https://arxiv.org/pdf/2509.25149&quot;&gt;formats&lt;/a&gt; (see also &lt;a href=&quot;https://arxiv.org/pdf/2512.02010&quot;&gt;this work from
MIT&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Also from Nvidia, &lt;a href=&quot;https://arxiv.org/pdf/2511.16664&quot;&gt;Nemotron Elastic&lt;/a&gt;
showed a model architecture that allows deriving smaller models without
doing a separate pre-training runs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally, the cheapest way to train an LLM from scratch is...to find a way to
avoid the need to. For models like Olmo 3 that release the base model and
checkpoints, people can apply their own post-training or perform additional
pre-training.&lt;/p&gt;
&lt;h2 id=&quot;bonus-comparison-point-apertus&quot;&gt;&lt;a href=&quot;#bonus-comparison-point-apertus&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Bonus comparison point: Apertus&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://www.swiss-ai.org/apertus&quot;&gt;Apertus&lt;/a&gt; is a Swiss project to produce an
open LLM, with 70B and 8B models released so far. Their &lt;a href=&quot;https://arxiv.org/pdf/2509.14233&quot;&gt;full tech
report&lt;/a&gt; notes the following &quot;Once a
production environment has been set up, we estimate that the model can be
realistically trained in approximately 90 days on 4096 GPUs, accounting for
overheads. If we assume 560 W power usage per Grace-Hopper module in this
period, below the set power limit of 660 W, we can estimate 5 GWh power usage
for the compute of the pretraining run.&quot;&lt;/p&gt;
&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2025-12-04: Add link to &quot;Four Over Six&quot; NVFP4 training paper.&lt;/li&gt;
&lt;li&gt;2025-12-02: Added clarifying note about energy via gas in the
leisure centre comparison.&lt;/li&gt;
&lt;li&gt;2025-12-01: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>Minipost: Benchmarking the Hetzner AX102 vs CCX53</title>
<published>2025-11-30T12:00:00Z</published>
<updated>2025-11-30T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2025q4/minipost-benchmarking-hetzner-ax102-vs-ccx53"/>
<id>https://muxup.com/2025q4/minipost-benchmarking-hetzner-ax102-vs-ccx53</id>
<content type="html">
&lt;p&gt;I recently had reason to do a quick comparison of the performance of the
&lt;a href=&quot;https://www.hetzner.com/dedicated-rootserver/ax102/&quot;&gt;Hetzner AX102&lt;/a&gt; dedicated
server and the high-end &#x27;dedicated&#x27; CCX53 VPS on &lt;a href=&quot;https://www.hetzner.com/cloud/&quot;&gt;Hetzner
Cloud&lt;/a&gt; and thought I may as well write up the
results for posterity. I&#x27;m incapable of starting a post without some kind of
disclaimer so here comes the one for this post: naturally the two products
have major differences in terms of flexibility (spin-up/down at will, vs pay a
small setup fee and endure a wait time depending on hardware availability). So
depending on your use case, your requirements with respect to that flexibility
may override any cost differential.&lt;/p&gt;
&lt;h2 id=&quot;specs&quot;&gt;&lt;a href=&quot;#specs&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Specs&lt;/h2&gt;
&lt;p&gt;All costs are exclusive of VAT, assuming the lowest cost data center location,
and inclusive of IPv4 address.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AX102&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;16 core Ryzen 9 7950X3D (32 threads)&lt;/li&gt;
&lt;li&gt;128GB DDR5 RAM&lt;/li&gt;
&lt;li&gt;2 x 1.92TB NVMe&lt;/li&gt;
&lt;li&gt;104 EUR/month, 39 EUR one-off setup fee.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;CCX53&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unknown AMD CPU exposing 32vCPU (physical cores? threads?)&lt;/li&gt;
&lt;li&gt;128GB RAM&lt;/li&gt;
&lt;li&gt;600GB NVMe&lt;/li&gt;
&lt;li&gt;192.49 EUR/month maximum charge. 0.3085 EUR per hour (if you keep the same
VPS active over the month it won&#x27;t exceed the monthly price cap, so you
effectively get a small discount on the per-hour cost).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;benchmark&quot;&gt;&lt;a href=&quot;#benchmark&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Benchmark&lt;/h2&gt;
&lt;p&gt;Building Clang+LLVM+LLD, everyone&#x27;s favourite workload! Both systems are
running an up to date Arch Linux (more details on setting this up on the CCX53
in the appendix below) with clang 21.1.6. The dedicated machine has the
advantage of RAID 0 across the two SSDs, but also has encrypted rootfs
configured. I didn&#x27;t bother to set that up for the CCX53 VPS.&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;sudo pacman -Syu --needed clang lld cmake ninja wget
&lt;span style=&quot;color: #000&quot;&gt;LLVM_VER=&lt;/span&gt;&lt;span style=&quot;color: #1C01CE&quot;&gt;21&lt;/span&gt;.1.6
wget https://github.com/llvm/llvm-project/releases/download/llvmorg-&lt;span style=&quot;color: #C41A16&quot;&gt;${&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;LLVM_VER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}&lt;/span&gt;/llvm-project-&lt;span style=&quot;color: #C41A16&quot;&gt;${&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;LLVM_VER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}&lt;/span&gt;.src.tar.xz
tar -xvf llvm-project-&lt;span style=&quot;color: #C41A16&quot;&gt;${&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;LLVM_VER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}&lt;/span&gt;.src.tar.xz
&lt;span style=&quot;color: #A90D91&quot;&gt;cd&lt;/span&gt; llvm-project-&lt;span style=&quot;color: #C41A16&quot;&gt;${&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;LLVM_VER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;}&lt;/span&gt;.src

cmake -G Ninja &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DLLVM_ENABLE_PROJECTS&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;clang;lld&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DLLVM_TARGETS_TO_BUILD&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;all&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DLLVM_CCACHE_BUILD&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;OFF &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DCMAKE_C_COMPILER&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;clang &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DCMAKE_CXX_COMPILER&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;clang++ &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DLLVM_ENABLE_LLD&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;ON &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DCMAKE_BUILD_TYPE&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;Release &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -DLLVM_ENABLE_ASSERTIONS&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;ON &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -S llvm &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  -B build
&lt;span style=&quot;color: #A90D91&quot;&gt;time&lt;/span&gt; cmake --build build

&lt;span style=&quot;color: #A90D91&quot;&gt;printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;### Version info ###\n&amp;quot;&lt;/span&gt;
clang --version | head -n &lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;On both machines, ninja shows 5575 build steps.&lt;/p&gt;
&lt;p&gt;Results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AX102
&lt;ul&gt;
&lt;li&gt;10m27s (627s)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;CCX53
&lt;ul&gt;
&lt;li&gt;14m11s (851s, about 1.36x the AX102)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Running the clang and LLVM tests with &lt;code&gt;./build/bin/llvm-lit -s --order=lexical llvm/test clang/test&lt;/code&gt; (which shows 9402 tests) gives:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AX102
&lt;ul&gt;
&lt;li&gt;3m39s (219s)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;CCX53
&lt;ul&gt;
&lt;li&gt;4m28s (268s, about 1.24x the AX102)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I ran these multiple times, and in the case of the CCX53 across two different
VMs in different regions and saw only a few percentage points variance.&lt;/p&gt;
&lt;p&gt;Focusing on the results for build clang/llvm/lld, let&#x27;s figure out the cost
for 1000 from-scratch builds. Not so much as it&#x27;s a representative workload, but
because it gives an easy to compare metric that captures both the difference
in price and in performance. So calculating &lt;code&gt;time_per_build_in_hours * 1000 * cost_per_hour&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AX102
&lt;ul&gt;
&lt;li&gt;(626.6 / 3600) * 1000 * (104/720) = &lt;strong&gt;25.14 EUR&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Or if you include the setup fee and assume it&#x27;s amortised over 12 months:
&lt;ul&gt;
&lt;li&gt;(626.6/3600) * 1000 * ((104 + (39/12))/720) = &lt;strong&gt;25.93 EUR&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;CCX53
&lt;ul&gt;
&lt;li&gt;(850.6 / 3600) * 1000 * (192.49/720) = &lt;strong&gt;63.17 EUR&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Or using the 0.3085 EUR/hr price which you would pay if you didn&#x27;t run for
the whole month:
&lt;ul&gt;
&lt;li&gt;(850.6 / 3600) * 1000 * 0.3085 = &lt;strong&gt;72.89 EUR&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;appendix-ccx53-arch-linux-setup&quot;&gt;&lt;a href=&quot;#appendix-ccx53-arch-linux-setup&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Appendix: CCX53 Arch Linux setup&lt;/h2&gt;
&lt;p&gt;This could be scripted, but I just created the VPS via their web UI. Then after
it was provisioned, used that web UI to have it boot into a rescue system.
Then do an Arch bootstrap that roughly mirrors the &lt;a href=&quot;/arch-linux-on-remote-server-setup-runbook&quot;&gt;one I use on a dedicated
build machine&lt;/a&gt; except
that we don&#x27;t bother with encrypting the rootfs. The CCX* server types at
least &lt;a href=&quot;https://docs.hetzner.cloud/changelog#2023-08-23-new-server-types-with-dedicated-amd-vcpus&quot;&gt;use
UEFI&lt;/a&gt;
so we can keep using efistub for boot.&lt;/p&gt;
&lt;p&gt;First get a bootstrap environment and enter it:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;wget http://mirror.hetzner.de/archlinux/iso/latest/archlinux-bootstrap-x86_64.tar.zst
tar -xvf archlinux-bootstrap-x86_64.tar.zst --numeric-owner
sed -i &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;1s;^;Server=https://mirror.hetzner.de/archlinux/$repo/os/$arch\n\n;&amp;#39;&lt;/span&gt; root.x86_64/etc/pacman.d/mirrorlist
mount --bind root.x86_64/ root.x86_64/ &lt;span style=&quot;color: #177500&quot;&gt;# See &amp;lt;https://bugs.archlinux.org/task/46169&amp;gt;&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;About to enter bootstrap chroot\n===============================\n&amp;quot;&lt;/span&gt;
./root.x86_64/bin/arch-chroot root.x86_64/
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now set info that will be used throughout the process:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span style=&quot;color: #A90D91&quot;&gt;export&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;NEW_HOST_NAME=&lt;/span&gt;archvps
&lt;span style=&quot;color: #A90D91&quot;&gt;export&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;PUBLIC_SSH_KEY=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOfpPQ1j+XLsapAhONAQmvu6TZGT5y8jeziM4Vio1NrA asb@plurp&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;export&lt;/span&gt; &lt;span style=&quot;color: #000&quot;&gt;NEW_USER=&lt;/span&gt;asb
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And now proceed to set up the disks, create filesystems, perform an initial
bootstrap and chroot into the new rootfs:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pacman-key --init
pacman-key --populate archlinux
pacman -Sy --noconfirm xfsprogs dosfstools

sfdisk /dev/sda &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;EOF&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;label: gpt&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;start=1MiB, size=255MiB, type=uefi&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;start=256MiB, type=linux&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;

mkfs.fat -F32 /dev/sda1
mkfs.xfs /dev/sda2

mount /dev/sda2 /mnt
mkdir /mnt/boot
mount /dev/sda1 /mnt/boot
pacstrap /mnt base linux linux-firmware efibootmgr &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  xfsprogs dosfstools &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
   python3 &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  openssh sudo net-tools git man-db man-pages vim
genfstab -U /mnt &amp;gt;&amp;gt; /mnt/etc/fstab

&lt;span style=&quot;color: #A90D91&quot;&gt;printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;About to enter newrootfs chroot\n===============================\n&amp;quot;&lt;/span&gt;
arch-chroot /mnt
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Do final configuration from within the chroot:&lt;/p&gt;
&lt;div class=&quot;highlight&quot; style=&quot;background: #ffffff&quot;&gt;&lt;pre style=&quot;line-height: 125%;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;sed /etc/locale.gen -i -e &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;s/^\#en_GB.UTF-8 UTF-8.*/en_GB.UTF-8 UTF-8/&amp;quot;&lt;/span&gt;
locale-gen
&lt;span style=&quot;color: #177500&quot;&gt;# Ignore &amp;quot;System has not been booted with systemd&amp;quot; and &amp;quot;Failed to connect to bus&amp;quot; error for next command.&lt;/span&gt;
systemd-firstboot --locale&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;en_GB.UTF-8 --timezone&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;UTC --hostname&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$NEW_HOST_NAME&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
ln -s /dev/null /etc/udev/rules.d/80-net-setup-link.rules &lt;span style=&quot;color: #177500&quot;&gt;# disable persistent network names&lt;/span&gt;

&lt;span style=&quot;color: #177500&quot;&gt;# No longer need to disable large fallback image as Arch stopped generating it&lt;/span&gt;
&lt;span style=&quot;color: #177500&quot;&gt;# by default&lt;/span&gt;

&lt;span style=&quot;color: #A90D91&quot;&gt;printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;efibootmgr before changes:\n==========================\n&amp;quot;&lt;/span&gt;
efibootmgr -u
&lt;span style=&quot;color: #177500&quot;&gt;# Set up efistub&lt;/span&gt;
efibootmgr &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --disk /dev/sda &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --part &lt;span style=&quot;color: #1C01CE&quot;&gt;1&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --create &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --label &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;Arch Linux&amp;#39;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --loader /vmlinuz-linux &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --unicode &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;root=/dev/sda2 rw initrd=\initramfs-linux.img&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;\&lt;/span&gt;
  --verbose
&lt;span style=&quot;color: #A90D91&quot;&gt;printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;efibootmgr after changes:\n=========================\n&amp;quot;&lt;/span&gt;
efibootmgr -u

mkswap --size&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;8G --file /swapfile
cat - &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;EOF &amp;gt; /etc/systemd/system/swapfile.swap&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;[Unit]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Description=Swap file&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;[Swap]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;What=/swapfile&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;[Install]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;WantedBy=multi-user.target&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;
systemctl &lt;span style=&quot;color: #A90D91&quot;&gt;enable&lt;/span&gt; swapfile.swap

cat - &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;lt;&amp;lt;EOF &amp;gt; /etc/systemd/network/10-eth0.network&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;[Match]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Name=eth0&lt;/span&gt;

&lt;span style=&quot;color: #C41A16&quot;&gt;[Network]&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;DHCP=yes&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Address=$(ip -6 addr show dev eth0 scope global | grep &amp;quot;scope global&amp;quot; | cut -d&amp;#39; &amp;#39; -f6)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Gateway=$(ip route show | head -n 1 | cut -d&amp;#39; &amp;#39; -f 3)&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;Gateway=fe80::1&lt;/span&gt;
&lt;span style=&quot;color: #C41A16&quot;&gt;EOF&lt;/span&gt;
systemctl &lt;span style=&quot;color: #A90D91&quot;&gt;enable&lt;/span&gt; systemd-networkd.service systemd-resolved.service systemd-timesyncd.service
&lt;span style=&quot;color: #A90D91&quot;&gt;printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;PasswordAuthentication no\n&amp;quot;&lt;/span&gt; &amp;gt; /etc/ssh/sshd_config.d/20-no-password-auth.conf
systemctl &lt;span style=&quot;color: #A90D91&quot;&gt;enable&lt;/span&gt; sshd.service
useradd -m -g users -G wheel -s /bin/bash &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$NEW_USER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;
usermod --pass&lt;span style=&quot;color: #000&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;#39;!&amp;#39;&lt;/span&gt; root &lt;span style=&quot;color: #177500&quot;&gt;# disable root login&lt;/span&gt;
chmod +w /etc/sudoers
&lt;span style=&quot;color: #A90D91&quot;&gt;printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;%%wheel ALL=(ALL) ALL\n&amp;quot;&lt;/span&gt; &amp;gt;&amp;gt; /etc/sudoers
chmod -w /etc/sudoers
mkdir &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;/home/&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$NEW_USER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.ssh&amp;quot;&lt;/span&gt;
&lt;span style=&quot;color: #A90D91&quot;&gt;printf&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;%s\n&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$PUBLIC_SSH_KEY&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt; &amp;gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;/home/&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$NEW_USER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.ssh/authorized_keys&amp;quot;&lt;/span&gt;
chmod &lt;span style=&quot;color: #1C01CE&quot;&gt;700&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;/home/&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$NEW_USER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.ssh&amp;quot;&lt;/span&gt;
chmod &lt;span style=&quot;color: #1C01CE&quot;&gt;600&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;/home/&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$NEW_USER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.ssh/authorized_keys&amp;quot;&lt;/span&gt;
chown -R &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$NEW_USER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;:users&amp;quot;&lt;/span&gt; &lt;span style=&quot;color: #C41A16&quot;&gt;&amp;quot;/home/&lt;/span&gt;&lt;span style=&quot;color: #000&quot;&gt;$NEW_USER&lt;/span&gt;&lt;span style=&quot;color: #C41A16&quot;&gt;/.ssh&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now set password:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;passwd &quot;$NEW_USER&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then ctrl-d twice and set a symlink for resolv.conf:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ln -sf ../run/systemd/resolve/stub-resolv.conf root.x86_64/mnt/etc/resolv.conf
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, &lt;code&gt;reboot&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Remember to &lt;code&gt;ssh-keygen -R $THE_IP_ADDRESS&lt;/code&gt; so you don&#x27;t get ssh host
verification errors.&lt;/p&gt;

&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2025-11-30: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
<entry>
<title>Minipost: LLM inference vs training costs for DeepSeek</title>
<published>2025-11-29T12:00:00Z</published>
<updated>2025-11-29T12:00:00Z</updated>
<link rel="alternate" href="https://muxup.com/2025q4/minipost-llm-inference-vs-training-cost-for-deepseek"/>
<id>https://muxup.com/2025q4/minipost-llm-inference-vs-training-cost-for-deepseek</id>
<content type="html">
&lt;p&gt;Tl;dr: Based on published data from DeepSeek, we can estimate it takes
something like ~70 days of inference traffic (served by DeepSeek themselves,
ignoring any other providers) to match the GPU hours used for the final
training run for V3 and R1.&lt;/p&gt;
&lt;p&gt;Simon Willison recently &lt;a href=&quot;https://bsky.app/profile/simonwillison.net/post/3m6qdf5rffs2l&quot;&gt;reshared some figures on inference costs for
LLMs&lt;/a&gt;. I
couldn&#x27;t agree more with the comment further down that thread &quot;The big AI labs
continue to be infuriatingly opaque about the actual figures for their total
electricity and water consumption&quot;.&lt;/p&gt;
&lt;p&gt;A number of responses wonder about the cost of training. If you accept the
reported figures for serving a query, what impact does it have if you amortise
the energy spent training the model over the served queries? Mistral did this
for their &lt;a href=&quot;https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai&quot;&gt;lifecycle
analysis&lt;/a&gt;
but they grouped together &quot;training and inference&quot; and kept confidential the
ratio of energy for training vs inference by reporting a figure that combined
the training cost with 18 months of usage. The thread reminded me of another
datapoint available for DeepSeek that seemed worth writing up. I think this
gives some helpful intuition for the amortised cost of training for a widely
used model of that size, but to state the obvious any attempt to apply that
intuition to other models is totally reliant on how widely used it is.&lt;/p&gt;
&lt;p&gt;DeepSeek have published figures both on training and on inference for
DeepSeek&#x27;s website and API users. I will attempt to consistently refer to the
figure for training as &quot;final run training cost&quot; to reflect the fact the
number of GPU hours used in experimentation and failed attempts isn&#x27;t
reported. For final run training for DeepSeek-R1:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2.788M H800 GPU hours for V3 which serves as the R1 base (see Table 1
in the &lt;a href=&quot;https://arxiv.org/pdf/2412.19437&quot;&gt;V3 technical report&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;0.147M H800 GPU hours for building R1 on top of V3 (see Supplementary Table
4 in the &lt;a href=&quot;https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-025-09422-z/MediaObjects/41586_2025_9422_MOESM1_ESM.pdf&quot;&gt;supplementary
information&lt;/a&gt;
for the &lt;a href=&quot;https://www.nature.com/articles/s41586-025-09422-z&quot;&gt;R1 Nature
article&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total&lt;/strong&gt;: 2.935M H800 GPU hours&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now for inference, back in February DeepSeek wrote up &lt;a href=&quot;https://github.com/deepseek-ai/open-infra-index/blob/main/202502OpenSourceWeek/day_6_one_more_thing_deepseekV3R1_inference_system_overview.md&quot;&gt;details of their
inference
system&lt;/a&gt;
giving details of cost of serving, profit margin, and load over a 24h period.
So yes, we&#x27;re extrapolating from this datapoint and assuming it&#x27;s
representative. Given the worldwide inference of DeepSeek R1/V3 is surely much
larger (being openly licensed there are many vendors who are serving it), I&#x27;m
not overly worried about this aspect. Their reported average inference serving
infrastructure occupancy is 226.75 nodes (each node containing 8 H800 GPUs),
meaning &lt;strong&gt;43536 H800 GPU hours per day&lt;/strong&gt;. At that rate, it will take &lt;strong&gt;~67.5
days&lt;/strong&gt; of traffic for the same number of H800 GPU hours to be used for
inference as for the final training run.&lt;/p&gt;
&lt;p&gt;All this to say, for a widely used model of DeepSeek R1 scale when looking at
the cost of inference, accounting for the amortised final run training cost is
more likely to be a multiplier of 2x or less rather than something much
larger. In terms of energy, this does assume that the power draw of the H800
GPUs while running inference is similar to the draw during training. And to
underline again, the reported training cost surely doesn&#x27;t include
experimentation, aborted runs etc.&lt;/p&gt;

&lt;hr style=&quot;margin-top:1.75rem&quot;/&gt;&lt;details id=&quot;article-changelog&quot;&gt;&lt;summary&gt;&lt;a href=&quot;#article-changelog&quot; class=&quot;anchor&quot; aria-hidden=&quot;true&quot; tabindex=&quot;-1&quot;&gt;&lt;/a&gt;Article changelog&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;2025-11-29: Initial publication date.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
</content>
</entry>
</feed>
