Missing FEATs: D128
26 Dec 2023
The 2022 and 2023 ARM architecture extensions introduced quite a lot of interesting features, but many of them sadly remain without good documentation in the official architecture reference manual. Luckily, we don't just have the manual: we also have ASL and FVPs.
FEAT_D128
introduces a new form of 128-bit (rather than
64-bit) wide memory descriptors, as part of the «VMSAv9-128».
What's in those new descriptor formats?
General Outline
The use of the new descriptor format is controlled by
TCR2_ELx.D128
for Stage 1 translations, except that it
cannot be enabled for the EL2 translation regime (i.e. the
single-stage single-privilege-level translation regime used by
nVHE EL2 code). When it is enabled, several other translation
features must be enabled, as shown by a constraint that the
corresponding TCR2_ELx
bits are RES1
in this situation, namely PnCH
(part
of the translation hardening extension, which I will discuss in
another post), AIE
, indicating memory attribute indices
are 4, rather than 3, bits, and most interestingly PIE
:
the new descriptor formats are reliant on the permission indexing
extensions, and the new formats do not contain the legacy
AP[2:1]
bits.
For Stage 2 translations, the use of the new descriptor format is
controlled by VTCR_EL2.D128
. As in stage 1,
VTCR_EL2.S2PIE
is also RES1
when this is enabled. Strangely, VTCR_EL2.AssuredOnly
is
RES0
when D128 is enabled, but the
AssuredOnly bit (in bit 114 rather than bit 58, where it is
located for 64-bit descriptors) is nevertheless unconditionally
enabled when D128 is enabled.
Before considering the actual details of the descriptor formats,
it is worth noting that the doubling in size of the descriptors
naturally also halves the number of them that fit within one
page. Rather than make a table at a given level take up two
pages, the architecture reduces the number of bits resolved with
each table lookup. For example, a configuration with 48-bit
input addresses and 4k pages would usually use four levels of
tables: there are \(\oldstyle 36\) (\(\oldstyle 48 - 12\)) bits
of input address to resolve, and since each 4k table can contain
\(\oldstyle 2^9\) (\(\oldstyle 512 = 4096/8\)) 8-byte
descriptors, each table resolves 9 bits leading to four tables.
With D128
enabled, however, each table contains only 256
16-byte descriptors, and so each level of lookup resolves only 8
bits of the input address, requiring 5 table lookups to resolve
the full address.
Table Descriptors
127 | 126 | 125 | 124 | 123 | 122 | 115 | 114 | 113 | 112 | 111 | 110 | 109 | 108 | 96 | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NSTable |
APTable |
*XNTable |
Protected or AssuredOnly |
DisCH |
skl | ||||||||||||||||||||||||||
95 | 64 | ||||||||||||||||||||||||||||||
63 | 56 | 55 | 32 | ||||||||||||||||||||||||||||
Next-level table address | |||||||||||||||||||||||||||||||
31 | \(m\) | \(m-1\) | 12 | 11 | 10 | 9 | 7 | 6 | 5 | 1 | 0 | ||||||||||||||||||||
Next-level table address | A | nT | 1 |
Turning our attention first to the table descriptor format, the most obvious change, beyond the extension of the next-level table's physical address to 56 bits, is the removal of bit 1's function of indicating whether this descriptor is intended for a table or a block. This function has instead been taken on by the two-bit skl field, which specifies a number of descriptor levels, after this one, to skip. A descriptor is treated as a block descriptor if its skip-level field indicates that the next descriptor would be for Level 4 (n.b. as in the VMSAv8-64, the maximum lookup level is 3, and translation configurations which require more than 4 levels begin at levels below 0).
Unlike the traditional block/table descriptor dichotomy, however,
earlier table descriptors can also contain non-zero skl fields.
In these cases, the number of bits of input address resolved by
the next table is increased, with a concomitant increase in table
size beyond one page. TTBRx_ELx
(and VTTBR_EL2
)
also contain SKLx
(resp. SKL
) fields which behave
in a similar way to skip some number of initial levels of
lookup.
The assignment of bits 123 to 126 as various XNTable and APTable
bits is unclear: the ASL (in AArch64.S1ApplyTablePerms
)
clearly intentionally saves these bits as
APTable/XNTable/PXNTable/UXNTable, but the saved values are never
used, since D128
implies PIE
in every regime.
The FEAT_THE
Protected (Stage 1)/AssuredOnly (Stage 2) bit
is moved from bit 52 to bit 114, presumably to make room for the
larger physical addresses.
The nT
bit used to avoid break-before-make when replacing
a block descriptor with a table descriptor or vice versa,
previously assigned bit 16 of leaf descriptors, is given bit 6 of
table descriptors, presumably since the skl field makes «blocks
of tables» possible.
The DisCH bit is entirely new, and, for Stage 1 translations only, disables the effect of the Contiguous bit in leaf descriptors under this table. Similarly, the A bit is new and, for both Stage 1 and Stage 2 translations, provides a table-level Access flag, which can be managed by hardware.
Leaf Descriptors
127 | 126 | 125 | 124 | 121 | 120 | 119 | 118 | 115 | 114 | 113 | 112 | 111 | 110 | 109 | 108 | 96 | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NS | POI | PII | P | G | C | skl | |||||||||||||||||||||||||
95 | 64 | ||||||||||||||||||||||||||||||
63 | 56 | 55 | 32 | ||||||||||||||||||||||||||||
Base address | |||||||||||||||||||||||||||||||
31 | \(m\) | \(m-1\) | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 2 | 1 | 0 | ||||||||||||||||||
Base address | NSE or nG or FnXS |
AF | SH | nDirty |
nT | AttrIndex | 1 |
The changes to the leaf descriptors follow a similar pattern, with fewer feature additions. The skip-level bits are in the same position and continue to replace the function of bit 1. The NS bit is moved to bit 127, consistent with the table descriptor format above. Since the 128-bit descriptors are used only with permission indirection enabled, the AP[2:1] bits are no longer necessary, and bit 6 is reassigned to nT while bit 7 unconditionally reprises its role as the dirty bit. The memory attribute index is extended to 4 bits at Stage 1, as well as Stage 2, taking over the NS bit's former position, and the permission overlay index is also extended to 4 bits. The permission indirection bits are consolidated into bits 115 to 118, rather than being spread throught bits 6, 51, 53, and 54 as they were before. The Protected bit occupies the same position as it does for table descriptors; the Guarded bit (part of FEAT_GCS) is next to it in bit 113, and the Contiguous bit is moved to bit 111. Altogether rather pedestrian changes in return for doubling the size of descriptors!