Storing data in pointers

2023Q4. Last update 2023Q4. History↓

Introduction

On mainstream 64-bit systems, the maximum bit-width of a virtual address is somewhat lower than 64 bits (commonly 48 bits). This gives an opportunity to repurpose those unused bits for data storage, if you're willing to mask them out before using your pointer (or have a hardware feature that does that for you - more on this later). I wondered what happens to userspace programs relying on such tricks as processors gain support for wider virtual addresses, hence this little blog post. TL;DR is that there's no real change unless certain hint values to enable use of wider addresses are passed to mmap, but read on for more details as well as other notes about the general topic of storing data in pointers.

Storage in upper bits assuming 48-bit virtual addresses

Assuming your platform has 48-bit wide virtual addresses, this is pretty straightforward. You can stash whatever you want in those 16 bits, but you'll need to ensure you masking them out for every load and store (which is cheap, but has at least some cost) and would want to be confident that there's no other attempted users for these bits. The masking would be slightly different in kernel space due to rules on how the upper bits are set:

What if virtual addresses are wider than 48 bits?

So we've covered the easy case, where you can freely (ab)use the upper 16 bits for your own purposes. But what if you're running on a system that has wider than 48 bit virtual addresses? How do you know that's the case? And is it possible to limit virtual addresses to the 48-bit range if you're sure you don't need the extra bits?

You can query the virtual address width from the command-line by catting /proc/cpuinfo, which might include a line like address sizes : 39 bits physical, 48 bits virtual. I'd hope there's a way to get the same information without parsing /proc/cpuinfo, but I haven't been able to find it.

As for how to keep using those upper bits on a system with wider virtual addresses, helpfully the behaviour of mmap is defined with this compatibility in mind. It's explicitly documented for x86-64, for AArch64 and for RISC-V that addresses beyond 48-bits won't be returned unless a hint parameter beyond a certain width is used (the details are slightly different for each target). This means if you're confident that nothing within your process is going to be passing such hints to mmap (including e.g. your malloc implementation), or at least that you'll never need to try to reuse upper bits of addresses produced in this way, then you're free to presume the system uses no more than 48 bits of virtual address.

Top byte ignore and similar features

Up to this point I've completely skipped over the various architectural features that allow some of the upper bits to be ignored upon dereference, essentially providing hardware support for this type of storage of additional data within pointeres by making additional masking unnecessary.

A relevant historical note that multiple people pointed out: the original Motorola 68000 had a 24-bit address bus and so the top byte was simply ignored which caused well documented porting issues when trying to expand the address space.

Storing data in least significant bits

Another commonly used trick I'd be remiss not to mention is repurposing a small number of the least significant bits in a pointer. If you know a certain set of pointers will only ever be used to point to memory with a given minimal alignment, you can exploit the fact that the lower bits corresponding to that alignment will always be zero and store your own data there. As before, you'll need to account for the bits you repurpose when accessing the pointer - in this case either by masking, or by adjusting the offset used to access the address (if those least significant bits are known).

As suggested by Per Vognsen, after this article was first published, you can exploit x86's scaled index addressing mode to use up to 3 bits that are unused due to alignment, but storing your data in the upper bits. The scaled index addressing mode meaning there's no need for separate pointer manipulation upon access. e.g. for an 8-byte aligned address, store it right-shifted by 3 and use the top 3 bits for metadata, then scaling by 8 using SIB when accessing (which effectively ignores the top 3 bits). This has some trade-offs, but is such a neat trick I felt I have to include it!

Some real-world examples

To state what I hope is obvious, this is far from an exhaustive list. The point of this quick blog post was really to discuss cases where additional data is stored alongside a pointer, but of course unused bits can also be exploited to allow a more efficient tagged union representation (and this is arguably more common), so I've included some examples of that below:

Fin

What did I miss? What did I get wrong? Let me know on Mastodon or email (asb@asbradbury.org).

You might be interested in the discussion of this article on lobste.rs, on HN, on various subreddits, or on Mastodon.


Article changelog