Storing data in pointers
Introduction
On mainstream 64-bit systems, the maximum bit-width
of a virtual address is somewhat lower than 64 bits (commonly 48 bits). This
gives an opportunity to repurpose those unused bits for data storage, if
you're willing to mask them out before using your pointer (or have a hardware
feature that does that for you - more on this later). I wondered what happens
to userspace programs relying on such tricks as processors gain support for
wider virtual addresses, hence this little blog post. TL;DR is that there's no
real change unless certain hint values to enable use of wider addresses are
passed to mmap
, but read on for more details as well as other notes about
the general topic of storing data in pointers.
Storage in upper bits assuming 48-bit virtual addresses
Assuming your platform has 48-bit wide virtual addresses, this is pretty
straightforward. You can stash whatever you want in those 16 bits, but you'll
need to ensure you masking them out for every load and store (which is cheap,
but has at least some cost) and would want to be confident that there's no
other attempted users for these bits. The masking would be slightly different
in kernel space due to rules on how the upper bits are set:
What if virtual addresses are wider than 48 bits?
So we've covered the easy case, where you can freely (ab)use the upper 16 bits
for your own purposes. But what if you're running on a system that has wider
than 48 bit virtual addresses? How do you know that's the case? And is it
possible to limit virtual addresses to the 48-bit range if you're sure you
don't need the extra bits?
You can query the virtual address width from the command-line by cat
ting
/proc/cpuinfo
, which might include a line like address sizes : 39 bits physical, 48 bits virtual
. I'd hope there's a way to get the same information
without parsing /proc/cpuinfo
, but I haven't been able to find it.
As for how to keep using those upper bits on a system with wider virtual
addresses, helpfully the behaviour of mmap
is defined with this
compatibility in mind. It's explicitly documented for
x86-64,
for
AArch64 and
for RISC-V
that addresses beyond 48-bits won't be returned unless a hint parameter beyond
a certain width is used (the details are slightly different for each target).
This means if you're confident that nothing within your process is going to be
passing such hints to mmap
(including e.g. your malloc
implementation), or
at least that you'll never need to try to reuse upper bits of addresses
produced in this way, then you're free to presume the system uses no more than
48 bits of virtual address.
Top byte ignore and similar features
Up to this point I've completely skipped over the various architectural
features that allow some of the upper bits to be ignored upon dereference,
essentially providing hardware support for this type of storage of additional
data within pointeres by making additional masking unnecessary.
- x86-64 keeps things interesting by having slightly different variants of
this for Intel and AMD.
- Intel introduced Linear Address Masking (LAM), documented in chapter 6 of
their document on instruction set extensions and future
features. If enabled
this modifies the canonicality check so that, for instance, on a system
with 48-bit virtual addresses bit 47 must be equal to bit 63. This would
allow bits 62:48 (15 bits) can be freely used with no masking needed.
"LAM57" allows 62:57 to be used (6 bits). It seems as if Linux is
currently opting to only support LAM57 and not
LAM48. Support for LAM can be
configured separately for user and supervisor mode, but I'll refer you to
the Intel docs for details.
- AMD instead describes Upper Address
Ignore
(see section 5.10) which allows bits 63:57 (7 bits) to be used, and unlike
LAM doesn't require bit 63 to match the upper bit of the virtual address.
As documented in LWN, this caused some concern from the Linux kernel
community. Unless I'm missing it, there
doesn't seem to be any level of support merged in the Linux kernel at the
time of writing.
- RISC-V has the proposed pointer masking
extension
which defines new supervisor-level extensions Ssnpm, Smnpm, and Smmpm to
control it. These allow
PMLEN
to potentially be set to 7 (masking the
upper 7 bits) or 16 (masking the upper 16 bits). In usual RISC-V style, it's
not mandated which of these are supported, but the draft RVA23 profile
mandates that PMLEN=7 must be supported at a
minimum.
Eagle-eyed readers will note that the proposed approach has the same issue
that caused concern with AMD's Upper Address Ignore, namely that the most
significant bit is no longer required to be the same as the top bit of the
virtual address. This is
noted
in the spec, with the suggestion that this is solvable at the ABI level and
some operating systems may choose to mandate that the MSB not be used for
tagging.
- AArch64 has the Top Byte
Ignore (TBI)
feature, which as the name suggests just means that the top 8 bits of a
virtual address are ignored when used for memory accesses and can be used to
store data. Any other bits between the virtual address width and top byte
must be set to all 0s or all 1s, as before. TBI is also used by Arm's
Memory Tagging
Extension
(MTE), which uses 4 of those bits as the "key" to be compared against the
"lock" tag bits associated with a memory location being accessed. Armv8.3
defines another potential consumer of otherwise unused address bits,
pointer
authentication
which uses 11 to 31 bits depending on the virtual address width if TBI isn't
being used, or 3 to 23 bits if it is.
A relevant historical note that multiple people pointed out: the original
Motorola 68000 had a 24-bit address bus and so the top byte was simply
ignored which caused well documented porting issues when trying to expand the
address space.
Storing data in least significant bits
Another commonly used trick I'd be remiss not to mention is repurposing a
small number of the least significant bits in a pointer. If you know a certain
set of pointers will only ever be used to point to memory with a given minimal
alignment, you can exploit the fact that the lower bits corresponding to that
alignment will always be zero and store your own data there. As before,
you'll need to account for the bits you repurpose when accessing the pointer -
in this case either by masking, or by adjusting the offset used to access the
address (if those least significant bits are known).
As suggested by Per
Vognsen,
after this article was first published, you can exploit x86's scaled index
addressing mode to use up
to 3 bits that are unused due to alignment, but storing your data in the upper bits.
The scaled index addressing mode meaning there's no need for separate pointer
manipulation upon access. e.g. for an 8-byte aligned address, store it
right-shifted by 3 and use the top 3 bits for metadata, then scaling by 8
using SIB when accessing (which effectively ignores the top 3 bits). This has
some trade-offs, but is such a neat trick I felt I have to include it!
Some real-world examples
To state what I hope is obvious, this is far from an exhaustive list. The
point of this quick blog post was really to discuss cases where additional
data is stored alongside a pointer, but of course unused bits can also be
exploited to allow a more efficient tagged union representation (and this is
arguably more common), so I've included some examples of that below:
- The fixie trie, is a
variant of the trie that uses 16 bits in each pointer to store a bitmap used
as part of the lookup logic. It also exploits the minimum alignment of
pointers to repurpose the least significant bit to indicate if a value is a
branch or a leaf.
- On the topic of storing data in the least significant bits, we have a handy
PointerIntPair
class in LLVM to allow the easy implementation of this optimisation. There's
also an 'alignment niches'
proposal for Rust which would
allow this kind of optimisation to be done automatically for
enum
s (tagged
unions). Another example of repurposing the LSB found in the wild would be
the Linux kernel using it for a spin
lock
(thanks Vegard Nossum for the
tip, who
notes this is used in the kernel's directory entry cache hashtable). There
are surely many many more examples.
- Go repurposes both upper and lower bits in its
taggedPointer,
used internally in its runtime implementation.
- If you have complete control over your heap then there's more you can do to
make use of embedded metadata, including using additional bits by avoiding
allocation outside of a certain range and using redundant mappings to avoid
or reduce the need for masking. OpenJDK's ZGC is a good example of
this, utilising a
42-bit address space for objects and upon allocation mapping pages to
different aliases to allow pointers using their metadata bits to be
dereferenced without masking.
- A fairly common trick in language runtimes is to exploit the fact that
values can be stored inside the payload of double floating point NaN (not a
number) values and overlap it with pointers (knowing that the full 64 bits
aren't needed) and even small integers. There's a nice description of this
in
JavaScriptCore,
but it was famously used in
LuaJIT. Andy Wingo
also has a helpful
write-up.
Along similar lines, OCaml steals just the least significant bit in order to
efficiently support unboxed
integers
(meaning integers are 63-bit on 64-bit platforms and 31-bit on 32-bit
platforms).
- Apple's Objective-C implementation makes heavy use of unused pointer bits,
with some examples
documented
in
detail
on Mike Ash's excellent blog (with a more recent scheme described on Brian
T. Kelley's blog.
Inlining the reference count (falling back to a hash lookup upon overflow)
is a fun one. Another example of using the LSB
to store small strings in-line is squoze.
- V8 opts to limit the heap used for V8 objects to 4GiB using pointer
compression, where an offset is
used alongside the 32-bit value (which itself might be a pointer or a 31-bit
integer, depending on the least significant bit) to refer to the memory
location.
- As this list is becoming more of a collection of things slightly outside the
scope of this article I might as well round it off with the XOR linked
list, which reduces the
storage requirements for doubly linked lists by exploiting the reversibility
of the XOR operation.
- I've focused on storing data in conventional pointers on current commodity
architectures but there is of course a huge wealth of work involving tagged
memory (also an area where I've
dabbled -
something for a future blog post perhaps) and/or alternative pointer
representations. I've touched on this with MTE (mentioned due to its
interaction with TBI), but another prominent example is of course
CHERI which moves
to using 128-bit capabilities in order to fit in additional inline metadata.
David Chisnall provided some observations based on porting code to CHERI
that relies on the kind of tricks described in this
post.
Fin
What did I miss? What did I get wrong? Let me know on
Mastodon or email (asb@asbradbury.org).
You might be interested in the discussion of this article on
lobste.rs, on
HN, on various
subreddits,
or on Mastodon.
Article changelog
- 2023-12-02:
- Reference Brian T. Kelley's blog providing a more up-to-date description
of "pointer tagging" in Objective-C. Spotted on
Mastodon.
- 2023-11-27:
- Mention Squoze (thanks to Job van der
Zwan).
- Reworded the intro so as not to claim "it's quite well known" that the
maximum virtual address width is typically less than 64 bits. This might
be interpreted as shaming readers for not being aware of that, which
wasn't my intent. Thanks to HN reader jonasmerlin for pointing this
out.
- Mention CHERI is the list of "real world examples" which is becoming
dominated by instances of things somewhat different to what I was
describing! Thanks to Paul Butcher for the
suggestion.
- Link to relevant posts on Mike Ash's blog
(suggested by
Jens Alfke).
- Link to the various places this article is being discussed.
- Add link to M68k article
suggested
by Christer Ericson (with multiple others suggesting something similar -
thanks!).
- 2023-11-26:
- Minor typo fixes and rewordings.
- Note the Linux kernel repurposing the LSB as a spin lock (thanks to Vegard
Nossum for the
suggestion).
- Add SIB addressing idea shared by Per
Vognsen.
- Integrate note
suggested
by Andy Wingo that explicit masking often isn't needed when the least
significant pointer bits are repurposed.
- Add a reference to Arm Pointer Authentication (thanks to the
suggestion from
Tzvetan Mikov).
- 2023-11-26: Initial publication date.