ARM架构验证中ISA仿真无法覆盖的测试点分析（完整版）

发表于2026-05-24|更新于2026-05-26|tech

|浏览量:

背景：在CPU TOP验证中，通常使用ISA（指令集仿真器）+ RTL（寄存器传输级）协同仿真（co-simulation）的方法：同一段程序在ISA和RTL上同时运行，对比结果来验证架构行为正确性。但有些行为是ISA仿真器无法建模的，导致co-sim无法给出有意义的比对结果。本文基于ARM DDI 0487 Architecture Reference Manual，系统梳理这些盲区。

一、TLB 架构 (D8 + K1)

1.1 Fault 缓存不对称性（D8.15.1）

不同类型的页表fault在TLB中的缓存行为完全不同：

Fault类型	是否被TLB缓存	Spec Tag
Translation fault	不缓存	RMLNTS, IXFTPJ
Address size fault	不缓存	RGGQPR, IPBYZQ
Access flag fault	不缓存	IPNQBP, ITQXPT
Permission fault	会被缓存	RJVXRH, IWJLXV

Spec原文（Permission fault）：

“A translation table entry that generates a Permission fault is permitted to be cached in a TLB.” (RJVXRH)

“If software updates a stage 1 or stage 2 translation table due to a Permission fault, then the software is required to invalidate the appropriate TLB entry to prevent stale information in a TLB from being used by a subsequent memory access.” (IWJLXV)

为什么不能co-sim？

ISA没有TLB结构，每次翻译都从内存页表重新开始
RTL的TLB中可能cache了旧的permission fault entry，修改权限后不执行TLBI会导致行为不同
最严重的是：Permission fault是唯一会被缓存的fault类型，所以软件在修复Translation/AF fault后不需要TLBI，但修复Permission fault后必须TLBI
ISA无法感知这种差异——它永远看不到stale entry

Impact：HIGH——直接掩盖TLB维护遗漏问题

1.2 TLB 多命中冲突（D8.15.1.6）

Spec原文：

“When a lookup address hits multiple TLB entries, it is IMPLEMENTATION DEFINED whether a TLB conflict abort is generated.” (RZQNWZ)

“When a TLB has not been properly invalidated, such as when architecturally required TLB invalidation is not done, an address lookup might hit multiple TLB entries.” (ICNNYQ)

TLB conflict abort 的报告方式：

指令取指 → Instruction Abort
数据访问/Cache管理指令 → Data Abort
AT指令（特定情况下）→ Data Abort
AT指令（其他情况）→ 不产生abort (IDHRWD)
被禁用的翻译阶段 → 不产生 (RZMVZT)

为什么不能co-sim？

ISA中的TLB永远为空（每次都走页表walk），不可能有多个entry匹配同一地址
整个TLB conflict abort异常类在ISA仿真中完全不可见
这是硬件检测到遗漏TLBI的唯一直接症状
TLB conflict abort的触发、报告方式都是IMPLEMENTATION DEFINED

Impact：HIGH

1.3 TLB Lockdown 与 TLBI 交互（D8.16.2, D8.17.4）

Spec原文（Lockdown）：

“TLB lockdown support is IMPLEMENTATION DEFINED.” (RSSQZC)

“A locked TLB entry is guaranteed to remain in the TLB, unless the locked TLB entry is affected by a TLB maintenance operation.” (RXXDPS)

Spec原文（TLBI对Locked entry的影响）：

TLBI ALL操作：若entry被锁定，ID行为是 不affected 或 产生ID Data Abort (RVGPNS)

TLBI by VA/ASID操作：若entry被锁定，ID行为是 无效化、不affected 或 产生ID Data Abort (RBMHZW)

为什么不能co-sim？

ISA没有TLB entry，没有entry可以lock
TLBI在ISA中总是空操作（TLB为空），永远成功
无法测试TLBI ALL时locked entry被保留的情况
无法测试TLBI ALL触发IMPLEMENTATION DEFINED Data Abort的情况
RTOS/安全关键系统常用的TLB lockdown特性完全无法验证

Impact：HIGH（对使用TLB lockdown的软件）

1.4 Break-Before-Make 违规（D8.17.1）

Spec原文：

BBM序列要求：Invalid entry → DSB → TLBI → DSB → New entry → DSB (RDDMVT)

“If translation table entries are changed without appropriate TLB maintenance operations… it is possible that TLBs concurrently hold multiple different copies of those translation table entries.” (RFVQCK)

为什么不能co-sim？

ISA没有TLB，没有”old entry”和”new entry”共存的窗口期
写Invalid entry后ISA立即读到新值，不需要DSB/TLBI
无法检测软件是否跳过了TLBI步骤
无法检测软件是否跳过了DSB屏障
BBM违规是芯片设计中最难定位的问题之一，但ISA仿真完全无法覆盖

Impact：HIGH

1.5 多TLB Entry的 CONSTRAINED UNPREDICTABLE 行为（D8.17.1）

Spec原文：

CU行为允许以下结果之一：使用匹配的entry之一翻译，或使用多个匹配entry的合并结果 (RFVQCK)

但不能允许访问在当前Security state/Exception level下本不可能访问的内存区域

“If multiple TLB entries translate the same address, then the minimum set of TLB maintenance operations required to guarantee all TLB entries associated with that address and translation regime have been invalidated is IMPLEMENTATION DEFINED.” (RGRVDR)

为什么不能co-sim？

ISA永远使用内存中的单一entry，无法探索CU行为的组合空间
无法验证安全不变性（CU定义禁止权限提升，但无法验证具体实现不违反此规则）
最小TLBI范围是IMPLEMENTATION DEFINED，ISA无法提供任何指导

Impact：HIGH

1.6 Intermediate TLB / PA-indexed Cache + CnP 多PE共享（D8.17）

Spec原文（FEAT_nTLBPA）：

若不支持FEAT_nTLBPA，中间TLB结构可以用页表项所在位置的PA或IPA来索引 (RBKKRB, RTJQVP)

FEAT_nTLBPA允许软件确定是否存在PA/IPA索引的中间TLB缓存结构 (IPFNFJ)

Spec原文（Common not Private - CnP）：

若CnP页表中不同PE的TTBR值不同，系统为misconfigured，TLB可能产生conflict abort或CU行为 (RQLGWZ)

为什么不能co-sim？

ISA没有中间TLB缓存结构，也没有PA/IPA索引的概念
单PE ISA仿真无法检测跨PE TLB共享问题
嵌套虚拟化场景中页表本身也被映射时，ISA无法检测stale intermediate cache

Impact：MEDIUM

1.7 推测性 Access Flag 更新（D8）

Spec原文（D8 页107）：

“If a speculative update of a stage 1 Access flag would otherwise be permitted, but the stage 2 translation of the stage 1 descriptor is read-only, then the speculative update of the stage 1 Access flag does not occur.”

这说明正常情况下（stage 2 read-write时），硬件允许在推测翻译过程中推测性地设置页表项的AF位。

为什么不能co-sim？

ISA只在architecturally committed的指令进行地址翻译时更新AF
RTL可以在推测路径上的翻译过程中设置AF
推测路径上访问过的地址，其页表AF位可能在RTL中被置1，但ISA模型从未访问过该地址
通过软件遍历页表可以观察到这种差异：RTL的页表中有AF=1的条目，ISA的页表中对应条目AF=0
这与committed访问所需的AF更新不同——推测性AF更新超出了ISA模型的范围

Impact：HIGH——页表AF位的差异在软件中可观察，且影响后续行为（如swap策略）

1.8 EPDn 禁止 Table Walk 后 TLB Stale Entry 的残留问题（D8.15.1）

Spec原文：

“If a stage 1 translation regime supports two VA ranges and TCR_ELx.EPDn is 1, then when a TLB miss occurs based on TTBRn_ELx, a level 0 Translation fault is generated, and no translation table walk is done.”

EPD0/EPD1（TCR_EL1, TCR_EL2, TCR_EL3 中的位）控制是否允许使用对应的 TTBR 进行页表 walk。当 EPDn=1 时，对该 TTBR 范围的访问发生 TLB miss 后不执行页表 walk，直接报 level 0 Translation fault。

为什么不能co-sim？

软件流程：
1. TCR_EL1.EPD0 = 0（允许TTBR0 walk），TTBR0 有有效页表
2. 访问 TTBR0 范围的 VA → RTL的TLB 中缓存了该翻译（TLB fill）
3. 软件设置 TCR_EL1.EPD0 = 1（禁止TTBR0 walk）
4. 再次访问同一 VA

RTL：TLB hit（stale entry）→ 翻译成功，访问正常完成
ISA：无 TLB → 总是"TLB miss" → 检查 EPD0=1 → 报 level 0 Translation fault

根因分析：

EPDn 的控制点在 TLB miss 之后才被检查，TLB hit 可以绕过 EPDn
ISA 没有 TLB 结构，不存在 TLB hit 的可能，所以 EPDn 对 ISA 总是起效
软件设置 EPDn=1 后如果没有执行 TLBI，RTL 中残留的 TLB entry 仍可提供服务
这个场景在 TLB 有 entry 时 RTL 行为正常，ISA 却报 fault，两者完全不同

验证影响分析：

如果软件在设置 EPDn=1 之前确保 TLB 已 invalidate（执行了 TLBI），则 ISA 和 RTL 行为一致
如果软件没有执行 TLBI，RTL 可能正常访问而 ISA 报 fault——co-sim 无法比对
更隐蔽的情况：EPDn 在 0→1→0 之间转换，中间没有做 TLBI，TLB 中残留的 entry 在 EPDn 恢复为 0 后仍然有效

Impact：HIGH——这是一个 TLB 维护与转换控制字段交互的典型盲区，架构要求 EPDn 切换时软件负责 TLB 一致性

1.9 系统寄存器字段的 TLB 缓存一致性（D8 + D24）

ARM ARM 在多处明确声明某些系统寄存器字段 “is permitted to be cached in a TLB”。这意味着该字段值在 TLB walk 时被缓存到 TLB entry 中，后续 TLB hit 使用缓存值而非重新读取寄存器。

为什么不能co-sim？

软件流程：
1. TLB fill → 系统寄存器字段 X 的值被缓存到 TLB entry 中
2. 软件修改 X 的新值（如更改内存属性、权限、大小端等）
3. 再次访问同一 VA

RTL: TLB hit → 使用 entry 中缓存的 X 旧值 → 行为基于旧配置
ISA: 无 TLB → 总是读取 X 的当前值 → 行为基于新配置

这是一个系统性的盲区。D8 和 D24 中共有 30+ 个字段被标记为 TLB 可缓存：

字段	寄存器	作用	盲区影响
WXN	SCTLR_EL1/EL2/EL3	Write implies Execute-Never	改 WXN 后 TLB 用旧权限
PrivWXN / UnprivWXN	SCTLR_EL2	特权/非特权 WXN	同上
SIF	SCR_EL3	Secure Instruction Fetch	改 SIF 后 Secure 取指权限不一致
E2H	HCR_EL2	EL2 Host mode	改 E2H 需 TLBI ALLE2（架构已声明）
VM	HCR_EL2	Stage 2 翻译使能	改 VM 后 TLB 有旧 stage1-only entry
DC	HCR_EL2	Default Cacheable	改 DC 后 memory type 不一致
PTW	HCR_EL2	Protected Table Walk	改 PTW 后 walk 行为不同
EE	SCTLR_EL1	数据访问大小端	改 EE 后 TLB 用错字节序→数据损坏
HAFT	TCR_ELx	AF/Dirty HW 管理	改 HAFT 后 AF 更新策略不一致
AMAIR_ELx.Attr	AMAIR_EL1/EL2/EL3	内存属性间接	改属性后 TLB 用旧 memory attribute
PTTWI	—	Table Walk Indirection	Walk 地址间接参数 stale
nTLSMD	SCTLR_EL2	LDM/STM to Device trap	trap 行为不一致
CnP	TTBR_ELx	Common not Private	已在 1.6 覆盖
GPCCR_EL3.*	GPCCR_EL3	GPC 全配置	见第十一节

关键案例分析：SCTLR_EL1.EE（大小端）

SCTLR_EL1.EE 控制 EL1 数据访问的大小端。当 TLB 缓存了 EE 值：

1. EE=0 (little-endian) → TLB fill → entry 标记为 LE
2. 软件设置 EE=1 (big-endian)，未执行 TLBI
3. 再次访问同一 VA
   RTL: TLB hit → 用缓存值 EE=0 → 以小端读取 → data = 0x1234
   ISA: 读当前 EE=1 → 以大端读取 → data = 0x3412
   → 数据值完全错乱

验证影响分析：

这不是个别字段的问题，而是一类盲区：所有 “permitted to be cached in a TLB” 的字段都有此风险
架构一致性要求软件修改这些字段后执行 TLBI，但遗漏或顺序错误很常见
对于 EE、DC、WXN 这类直接影响数据值或访问成败的字段，Impact 为 HIGH
对于 HAFT、nTLSMD 等仅影响特定边缘行为的字段，Impact 为 MEDIUM

Impact：HIGH——系统性盲区，影响 10+ 架构寄存器字段

1.10 NFD0/NFD1：Non-Fault TLB Miss 处理不一致（D24）

Spec原文：

“NFD0, bit [53]. Controls how a TLB miss is reported in response to a non-fault unprivileged access for a virtual address that is translated using TTBR0_EL1.”
“0b1: A TLB miss on a virtual address … causes the access to fail without taking an exception.”

NFD0/NFD1（TCR_EL1, TCR_EL2 中的位）控制 SVE Non-Fault Unprivileged (NFU) 加载的 TLB miss 处理方式：

NFDn=1：TLB miss → 不执行 table walk → silently fail（返回 0）
NFDn=0：TLB miss → 正常 walk → 结果取决于页面权限

为什么不能co-sim？

1. 普通 load 到 VA X → TLB miss → walk → TLB filled → 返回数据 0x42
2. LDFF1 (NFU load) 到同一 VA X，NFDn=1
   RTL: TLB hit（来自 step1） → 返回数据 0x42（NFDn 不生效，因为没有 TLB miss）
   ISA: 无 TLB → 总是 TLB miss → NFDn=1 → fail → 返回 0
   → co-sim 比对: 0x42 vs 0 → MISMATCH

更底层的问题：NFDn 的设计意图是控制 NFU 加载的时序侧信道，但其触发条件是 “TLB miss”。ISA 永远有 TLB miss（无 TLB 结构），所以 NFDn 对 ISA 总是生效。RTL 的 TLB hit 可绕过 NFDn 检查。

验证影响分析：

需要 FEAT_SVE + NFDn=1 + 之前有非 NFU 访问到同一 VA
影响范围局限在 SVE NFU 加载的个别 element 数据值
对于验证 LDFF1/LDFF1 这类指令的场景有影响

Impact：MEDIUM——场景较窄，但数据差异可直接观察

Spec原文：

“The architecture permits a local monitor to transition to the Open Access state as a result of speculation, or from some other cause.”

架构约束只有两条：

不能因推测进入Exclusive状态（只能被清除）
不能无限延迟forward progress

为什么不能co-sim？

1
2
3

LDXR x0, [addr]    // ISA: monitor→Exclusive, RTL: monitor→Exclusive (一致)
  ...推测执行路径上的load/store...
STXR w1, x2, [addr] // ISA: 成功(w1=0), RTL: 推测把monitor清掉了 → 失败(w1=1)

ISA没有推测执行概念，无法预测推测什么时侯发生、是否清掉monitor
STXR的返回值在ISA和RTL之间会出现预期内的不匹配
验证框架需要能处理这种差异，不误报为bug

Impact：HIGH

三、BRBE 分支记录扩展（D19）

3.1 Misprediction 标志（D19.1.3）

Spec原文：

“Branch prediction behavior is IMPLEMENTATION DEFINED and this is an indication of whether such prediction succeeded, or not.”

misprediction的定义：

条件分支方向被错误预测至少一次
分支目标地址被错误预测至少一次
分支根本没有被分支预测器预测

为什么不能co-sim？

BRBINF<n>_EL1.MPRED记录的是分支预测器的结果
ISA没有分支预测器，无法判断分支在RTL中是否正确预测
对于异常记录MPRED固定为0——这是唯一能co-sim的场景

Impact：HIGH

3.2 Cycle Count 字段（D19.1.2）

Spec原文：

“Each Branch record contains a cycle count value which indicates the number of PE clock cycles that occurred between the previous Branch record being generated and this Branch record being generated.”

CC值的有效性状态：

条件	CCU	CC
Unknown（首次使能、退出prohibited region等）	1	全0
Overflow	0	全1
Valid	0	编码后的cycle计数值

为什么不能co-sim？

Cycle count = 真实PE cycle数，纯timing信息
多线程实现中只计thread active的cycle
ISA没有cycle概念

Impact：HIGH

3.3 CONSTRAINED UNPREDICTABLE 行为（D19）

CU场景	Spec位置	描述
BRB IALL与乱序分支交互	D19.4.1	已执行未提交的分支是否被IALL无效化
BRB INJ注入后是否产生事件	D19.3	ID
Freeze时basic block捕获	D19.3	freeze时是否捕获当前basic block
BRB INJ在prohibited region外	D19.5.1	是否注入、cycle count是否变unknown

为什么不能co-sim？

ISA无法确定IALL时哪些分支还在推测状态
需检查RTL在所有CU场景下是否落在约束范围内，而非简单比对

Impact：MEDIUM

四、PMU 性能监视器（D13/D14）

4.1 可co-sim的事件

INST_RETIRED — ISA知道执行了多少指令
EXC_TAKEN, EXC_RETURN — ISA知道异常事件
BR_RETIRED — ISA可以数分支指令条数

4.2 不能co-sim的事件（微架构相关）

PMU事件	不能co-sim原因
`L1D_CACHE_REFILL` / `L1I_CACHE_REFILL` / `L2D_CACHE_REFILL`	ISA无cache，不知refill
`L1D_TLB_REFILL` / `L1I_TLB_REFILL` / `L2D_TLB_REFILL`	ISA永远TLB hit（直接走页表）
`STALL_FRONTEND` / `STALL_BACKEND`	ISA无流水线，无stall
`BUS_ACCESS`	ISA无总线模型
`MEM_ACCESS`	ISA不知道微架构级memory访问次数

4.3 推测/乱序计数（D13）

Spec原文：

“Events can be counted speculatively, out-of-order, or both with respect to the simple sequential execution of the program. Events might also be counted simultaneously by other event counters when the overflow occurs, including events from different instructions.”

“The architecture does not define the point in a pipeline where the event counter is incremented.”

为什么不能co-sim？

ISA顺序执行、architecturally retired才计数的行为与RTL完全不同
RTL中推测执行的指令也可能触发PMU事件
PMU counter increment的pipeline stage不定义
同一次执行在不同微架构上可能产生不同的PMU计数值
即使INST_RETIRED这类architectural事件也存在inaccuracy窗口

Impact：HIGH

4.4 Threshold/Edge Counting + 多线程（D13）

Threshold计数依赖每个cycle的事件密度（如STALL_SLOT），ISA单发射模型与RTL宽发射完全不同。MT位控制跨PE计数，ISA无法模拟多线程环境。

Impact：HIGH

五、SPE 统计分析扩展（D17/D18）

5.1 Sample Population：微操作而非架构指令（D17）

Spec原文：

“If FEAT_SPE_ArchInst is not implemented, IMPLEMENTATION DEFINED microarchitectural operations (micro-ops).”

“An architecture instruction might create more than one micro-op for each instruction. A micro-op might also be removed or merged with another micro-op in the Execution stream, so an architecture instruction might create no micro-ops for an instruction.”

为什么不能co-sim？

ISA在架构指令级别操作，不知道硬件如何拆分为微操作
同一条指令在不同微架构上可能产生0个、1个或多个micro-op
采样群体本身在ISA级别不可知

Impact：HIGH

5.2 随机采样扰动（D17）

Spec原文：

“The random number generator is IMPLEMENTATION DEFINED.”

“It is IMPLEMENTATION DEFINED whether the PE adds the random number to the sample interval counter prior to counting down the interval, or after the counter reaches zero and the counter has been reloaded.”

为什么不能co-sim？

采样哪个指令取决于硬件RNG
ISA无法复制具体实现的RNG序列
即使知道RNG序列，扰动与pipeline的交互也不同

Impact：HIGH

5.3 Cache/TLB Miss 标志（D18 Events Packet）

SPE sample record中的Events Packet包含大量微架构相关的可选标志位：

位域	描述	是否可选
E[3]	L1 data cache refill/miss	对Store为ID/CU
E[7]	Branch mispredicted	依赖分支预测器
E[8]	Last Level cache access	可选
E[9]	Last Level cache miss	可选
E[10]	Remote access	可选
E[19]	L2 data cache access	可选
E[20]	L2 data cache miss	可选
E[21]	Cache data modified	可选
E[22]	Recently fetched	可选
E[23]	Data snooped	可选
E[15:12]	IMPLEMENTATION DEFINED	—
E[31:26]	IMPLEMENTATION DEFINED (SPEv1p3前)	—
E[63:48]	IMPLEMENTATION DEFINED	—

为什么不能co-sim？

ISA没有cache层次结构，无法判断L1/L2/LLC hit/miss
ISA没有分支预测器，无法判断E[7] mispredicted
Data source字段完全IMPLEMENTATION DEFINED
Store的cache事件更不可预测（write buffer合并、store-forwarding等）

Impact：HIGH

5.5 RNDR/RNDRRS NZCV 与 “Reasonable Period”（C6 + K12）

Spec原文：

“When a valid random number is returned, the PSTATE.NZCV flags are set to 0b0000. If the random number hardware is not capable of returning a random number in a reasonable period of time, the PSTATE.NZCV flags are set to 0b0100, and the random number generation instructions return the value 0. The definition of ‘reasonable period of time’ is IMPLEMENTATION DEFINED.”

为什么不能co-sim？

ISA无法知道RTL的硬件RNG是否有足够熵
如果硬件RNG熵不足，RTL的RNDR返回NZCV=0b0100 + data=0
ISA在co-sim中只能假设”总能返回有效随机数”（NZCV=0b0000）
当RTL因熵不足返回特殊状态时，ISA/RTL值必然不匹配
验证框架需要处理这种情况

Impact：MEDIUM

5.4 延迟计数器（D18）

延迟计数器的”complete”定义是ID选择的：

“It is IMPLEMENTATION DEFINED whether the operation has committed its results to the architectural state of the PE.”

Translation latency的重叠计数也是ID选择：

“It is IMPLEMENTATION DEFINED whether a cycle is counted if a part of the operation is accessing memory, having completed an address translation for that part.”

Impact：HIGH（延迟值） / MEDIUM（translation latency）

六、ETE 嵌入式跟踪扩展 + TRBE 跟踪缓冲（D4/D5/D6）

6.1 E/N Atom 生成（D4）

Spec原文：

“The Atom element is one of the following types: E Atom, N Atom.” (RPRNZH)

“For conditional branches: E Atom = branch was taken, N Atom = branch was not taken.” (Table D4-10)

“A FEAT_ETE trace unit traces speculatively-executed instructions in the same way as all other instructions, so that both speculatively-executed instructions and architecturally-executed instructions appear in the instruction trace element stream.” (IRTJNK)

为什么不能co-sim？

ISA没有trace生成逻辑，不知道哪些指令是”P0 instruction”
ISA没有”branch taken/not taken在trace层”的概念
Atom在speculative时生成，通过Commit/Cancel元素解析
ISA顺序退休，不会产生speculative trace后再修正

Impact：HIGH

6.2 推测深度与 Commit/Cancel 解析（D4.8）

Spec原文：

“Each P0 element is traced and is considered speculative until either committed by a Commit element or canceled by a Cancel element.” (IYYMXT)

“TRCIDR8.MAXSPEC indicates the IMPLEMENTATION DEFINED maximum speculation depth.” (IRKYCD)

“The level of speculation that is revealed in the trace is IMPLEMENTATION SPECIFIC.”

为什么不能co-sim？

ISA没有pipeline，没有speculation tracking
没有rewind points、没有tag-based speculation系统
ISA无法产生speculative atom后再commit/cancel
需要完全独立的trace生成引擎

Impact：HIGH

6.3 Cycle Count 元素（D4.11）

Spec原文：

“A Cycle Count element indicates the number of PE clock cycles between the two most recent Commit elements.” (RVZXNN)

“The cycle counter has an IMPLEMENTATION DEFINED size of between 12 and 20 bits.” (RTYNZR)

“The first Cycle Count element after the PE clock has been restarted should have an UNKNOWN cycle count.” (IPDBDY)

为什么不能co-sim？

无PE clock、无cycle counter、无TRCCCCTLR threshold
Unknown cycle count场景特别难以建模
Cycle count需要外部timing模型注入

Impact：HIGH

6.4 Timestamp 元素（D4.11）

Spec原文：

“A timestamp value of zero indicates that the timestamp value is UNKNOWN.” (RBRJJF)

“The source for the payload of Timestamp elements is controlled by the TRFCR registers and the virtual timers.” (IYQJDR)

为什么不能co-sim？

ISA没有全局timestamp源
Timestamp来自system-wide timer infrastructure
UNKNOWN timestamp (zero) 是ISA唯一能原生产生的情况

Impact：MEDIUM-HIGH

6.5 Atom 打包协议（D5）

ETE protocol有7种atom packet格式，从1个atom到23个atom不等。Atom packing是协议层的压缩算法，ISA完全没有打包的概念。

Packet格式	内容
Atom Format 1	1 atom
Atom Format 2	2 atoms
Atom Format 3	3 atoms
Atom Format 4	4 atoms
Atom Format 5.1/5.2	固定模式或5 atoms
Atom Format 6	3-23 E atoms + trailing

Impact：HIGH

6.6 TRBE Buffer Full 行为（D6）

Spec原文：

Circular Buffer mode / Wrap mode / Fill mode (IGYHBH)

“When the Trace Buffer Unit is enabled and running, and the Trace Buffer Unit is not able to accept the trace data, the Trace Buffer Unit rejects the trace data from the trace unit.” (RLNTVR)

“The access granule for writes to the trace buffer by the Trace Buffer Unit is IMPLEMENTATION DEFINED, up to a maximum of 2KB, and might vary from time to time.” (RBWNRF)

为什么不能co-sim？

ISA没有memory buffer、write pointer、base/limit pointer
没有wrap检测、没有management event、没有overflow-to-memory
Buffer full时trace data丢失需要实现特定的状态机
Write alignment和access granule都是ID

Impact：HIGH

七、Debug Watchpoint 与推测执行（D2.9）

7.1 SVE/SME Watchpoint 触发条件（D2.9.6.3）

Spec原文：

“For SVE predicated vector load or store instructions… when the instruction performs a non-speculative single-copy atomic access matching a configured watchpoint due to an Active element, a Watchpoint exception is generated.”

为什么不能co-sim？

SVE gather/scatter等指令内部分多次atomic access
哪些是speculative、哪些是non-speculative取决于RTL实现
ISA看整条SVE指令是一个整体操作，不知道内部哪些次访问是推测性的
First-fault load（如LDFF1）推测性探测超过当前vector长度的地址，这些推测访问不应触发watchpoint

Impact：HIGH

7.2 Debug 事件与 Committed/Speculative 交互（D2.9）

Spec原文：

与watchpoint/breakpoint相关的speculation约束：架构不保证speculative路径上的debug事件是否触发

为什么不能co-sim？

ISA不知道RTL中哪条指令是speculative的
无法预测watchpoint在推测路径上是否触发

Impact：HIGH

八、RAS / 错误报告（D20）

8.1 错误注入与SEI/SDE

暂未深入分析 — 需要进一步调研D20章节关于RAS错误注入、SEI（System Error Interrupt）、SDE（Software Delegated Exception）等机制中的co-sim盲区。

8.2 推测性 SError（FEAT_SpecSEI）

Spec原文：

“PE can generate SError interrupt exceptions from speculative reads of memory, including speculative instruction fetches. FEAT_SpecSEI is OPTIONAL from Armv8.0.”

ID_AA64MMFR0_EL1.SpecSEI 和 ID_AA64DFR0_EL1.SpecSEI 定义支持级别：

从不因推测读产生SError
可从推测读内存产生SError
仅从推测指令取指产生SError

为什么不能co-sim？

ISA不进行推测读内存和推测指令取指
如果RTL在推测读时遇到内存错误（如ECC不可纠正错误、poison），会产生SError中断
ISA仿真永远不会触发此场景
SError异常会导致PSTATE变化、异常处理程序执行，ISA/RTL的执行流因此产生根本性分歧

Impact：MEDIUM——此场景需要RAS错误注入才能触发，co-sim框架很难预先安排

九、K1 CONSTRAINED UNPREDICTABLE 行为集（K1）

K1章节定义了大量AArch64下的CONSTRAINED UNPREDICTABLE行为，以下是与co-sim盲区直接相关的部分。

9.1 非幂等内存映射为Normal类型后的推测访问（K1.2.10）

Spec原文：

“If non-idempotent memory locations are mapped using the Normal memory type, the state of the non-idempotent memory location may become corrupted in following circumstances:
— Speculative read accesses may cause accesses to the non-idempotent memory locations that would not occur as part of a simple sequential execution.
— Writes to non-idempotent memory locations might be merged or split. In this case, the number and size of writes seen by the memory location might not be the number and size that occur as part of a simple sequential execution.”

为什么不能co-sim？

ISA没有推测执行，不会产生对非幂等内存的推测访问
ISA不会合并或拆分写操作，每次写都是程序指定的size
如果软件错误地将非幂等外设内存映射为Normal类型，RTL的推测访问可能损坏外设状态，而ISA永远观察不到此问题
这是典型的”软件bug只有硬件才能发现”的场景

Impact：HIGH

9.2 从Device内存取指（K1.2.11）

Spec原文：

“If a location in memory has the Device attribute and is not marked as execute-never, then an implementation might perform speculative instruction accesses to this memory location at times when address translation is enabled.”

CU行为允许实现选择：

当作Normal Non-cacheable处理
触发Permission fault

为什么不能co-sim？

ISA不进行speculative instruction fetch
RTL的speculative fetch可能对Device内存发起实际读操作，造成side-effect
处理方式（当作Normal vs 报fault）是ID的

Impact：MEDIUM

9.3 越界Cache Maintenance Set/Way/Index（K1.2.18）

Spec原文：

“In the cache maintenance by set/way instructions DC CISW, DC CSW, and DC ISW, if any set/way/index argument is larger than the value supported by the implementation, then the behavior is CONSTRAINED UNPREDICTABLE and one of the following occurs:
— The instruction is UNDEFINED.
— The instruction performs cache maintenance on one of: No cache lines / A single arbitrary cache line / Multiple arbitrary cache lines.”

为什么不能co-sim？

ISA不知道cache geometry（有多少set、way、index）
ISA不能判断给定的set/way值是否超过实现支持的范围
越界后RTL可能维护了一个”arbitrary cache line”，ISA无法知道是哪条
CU允许UNDEFINED、NOP、或操作任意cache line，ISA无法选择

Impact：HIGH（对使用cache by set/way的软件）

9.4 CSSELR/CCSIDR 查询不存在的cache level（K1.2.12）

Spec原文：

“If the CSSELR_EL1.{Level, InD, TnD} is programmed to a cache level that is not implemented, then a read of CSSELR_EL1 returns an UNKNOWN value.”
对CCSIDR_EL1的读操作，实现必须选择：NOP / UNDEFINED / 返回UNKNOWN值

为什么不能co-sim？

ISA不知道cache hierarchy实现了哪些level
读取CSSELR/CCSIDR时，ISA无法产生与RTL匹配的UNKNOWN值
CCSIDR返回的cache geometry信息（line size、set count、way count）对软件行为有影响（如cache flushing loop），ISA无法提供准确值

Impact：MEDIUM

9.5 地址回绕时的UNKNOWN地址（K1.2.9）

Spec原文：

“If the PE executes a load or store instruction where the calculated virtual address, total access size, and alignment mean that it accesses the bytes at 0xFFFF_FFFF_FFFF_FFFF and 0x0000_0000_0000_0000, then the bytes that appear to be from 0x0000_0000_0000_0000 onwards are accessed at an UNKNOWN address. It is permitted for the UNKNOWN address used to be a fixed value such that it always generates an MMU fault.”

为什么不能co-sim？

ISA可以计算回绕地址，产生确定性的VA
RTL可能使用UNKNOWN地址，产生MMU fault或访问意外位置
ISA无法产生UNKNOWN地址的随机行为

Impact：LOW（这种地址回绕在实际软件中极少见）

9.6 CU指令被当成UNDEFINED时的ESR_ELx（K1.2.8）

Spec原文：

“When a CONSTRAINED UNPREDICTABLE instruction is treated as UNDEFINED, ESR_ELx is UNKNOWN.”

为什么不能co-sim？

ISA可以模拟将CU指令作为UNDEFINED处理
但ESR_ELx异常综合征的值是UNKNOWN，ISA无法预测RTL产生的具体syndrome值
Syndrome值的一致性在co-sim中无法保证

Impact：MEDIUM

9.7 LDXR/STXR 配对违规（K1.2.16）

Spec原文：

CU情况包括：STXR的目标VA与之前LDXR不同 / transaction size不同 / 访问的寄存器数量不同 / memory attributes不同

为什么不能co-sim？

ISA可以检测到这些违规，但CU行为包括UNDEFINED、NOP、STXR可能成功或失败等
具体行为是ID的
特别是memory attributes改变的情况（由于页表修改），RTL可能使用了stale TLB entry的属性，而ISA看到的是新页表

Impact：MEDIUM（软件本不应这么做，但co-sim无法保证行为一致）

9.8 推测执行后Cache/TLB状态不可恢复（D24）

Spec原文（D24中引用）：

“instruction caches, data caches and TLBs which can be altered as a result of speculation caused by a mispredicted execution, but is not restored to the state prior to the speculation when the misprediction is resolved.”

为什么不能co-sim？

这是Spectre类型攻击的架构基础：推测执行可以永久改变cache/TLB内容
ISA没有cache/TLB，无法模拟推测路径上的cache fill
推测路径上分配的cache line在mispredict解析后不会被驱逐
后续非推测访问在RTL中可能命中（被推测填充的cache line），但ISA需要访问内存

Impact：HIGH

十、SVE First-Fault Load 与 FFR（C8 + B1 + K1）

10.1 Gather/Scatter 模式的 Fault 顺序（C8 伪代码）

SVE First-fault gather/scatter load（如 LDFF1B 带 vector index）的element访问顺序在ISA和RTL中不同：

Spec原文（伪代码关键段）：

faulted = faulted || fault;
if faulted then
   ElemFFR(e, esize) = '0';
unknown = unknown || ElemFFR(e, esize) == '0';
if unknown then
   if !fault && ConstrainUnpredictableBool(Unpredictable_SVELDNFDATA) then
      result[e*:esize] = data;
   elsif ConstrainUnpredictableBool(Unpredictable_SVELDNFZERO) then
      result[e*:esize] = Zeros{esize};
   else
      result[e*:esize] = orig[e*:esize];

其中 ConstrainUnpredictableBool 的默认值是：

Unpredictable_NONFAULT → Constraint_FALSE（实现可因任何理由 suppress non-fault load）
Unpredictable_SVELDNFDATA → Constraint_TRUE（fault后可继续加载数据）
Unpredictable_SVELDNFZERO → Constraint_TRUE（可写0）

为什么不能co-sim？

Fault顺序是微架构决定的：gather/scatter load中element地址被探测MMU的顺序取决于RTL的memory pipeline。ISA伪代码按element编号处理，RTL可能按不同顺序访问，导致FFR pattern不同。
Non-fault load可被随意suppress：Unpredictable_NONFAULT允许RTL因任何原因suppress load（如cache hit时）。ISA模型总是执行，导致FFR的0位位置不一致。
Post-fault数据是CU的：一旦FFR某位变0，后续active element的数据可以是真实加载值、0、或保持原值。ISA和RTL可能对不同element选择不同选项。
Contiguous load也有问题：即使连续访问是顺序的，第一个fault检测点取决于页表walk timing。两个page都可访问但walk时间不同时，RTL可能先从较快page加载数据。

Impact：HIGH——FFR值和post-fault的destination register数据都无法co-sim比对

10.2 SVE 跨页边界不同内存属性（K1.2.13/15）

Spec原文：

“If a single load or store instruction generates multiple memory accesses, such that the total set of accesses crosses a page boundary to a memory location that has a different memory type, Normal or Device, or Shareability attribute results in CONSTRAINED UNPREDICTABLE behavior.”

允许的行为：

每次访问使用对应地址的属性
产生Alignment fault
指令执行NOP

对于SVE predicated非连续load/store，element访问顺序在不同页边界上是微架构决定的。ISA按element编号顺序处理，但RTL可能为性能重排序访问。

为什么不能co-sim？

跨页两端memory type不同时（一个Normal一个Device），访问顺序影响可见的side-effect
ISA顺序访问和RTL重排序访问可能产生不同的Device内存访问模式
CU行为中选择NOP或Alignment fault，ISA和RTL可能做不一致选择

Impact：MEDIUM

十一、Granule Protection Check 的 TLB 缓存盲区（D9）

11.1 GPT 条目 TLB 缓存（D9.5）

Spec原文：

“GPT entries may only be cached in a TLB if they are reachable and valid.” (RQBKYP)
“GPT information cached in a TLB is permitted to be shared across multiple PEs” (RYMRVT)

Granule Protection Check（GPC）是 Arm RME（Realm Management Extension）的一部分，通过 Granule Protection Table（GPT）检查物理地址的访问权限。GPT 条目在符合条件时可以被缓存在 TLB 中。

为什么不能co-sim？

1. GPT 配置完成，Root 软件设置 GPCCR_EL3.GPC=1
2. 访问 PA → RTL TLB miss → GPT walk → GPT entry cached in TLB
3. Root 软件修改 GPT（如调整 Granule 保护属性），未执行 TLBI *PA*
4. 再次访问同一 PA
   RTL: TLB hit → 使用缓存的旧 GPT entry → 基于旧保护的 check 结果
   ISA: 无 TLB → 总是重新读取 GPT → 基于新保护的 check 结果

与 Permission fault 缓存的类比：

Permission fault 是 translation table 层面的权限缓存（blog 1.1）
GPT/GPF 缓存是 GPC 层面的保护缓存，机制完全平行

关键区别：

GPT 缓存通过 TLBI *PA* 指令维护（而非常规的 TLBI ASID/VMID）
GPT 信息可以与 stage 1/2 翻译信息组合在一个 TLB entry 中，使得 invalidation 更复杂
GPCCR_EL3 配置字段本身也可以被缓存在 TLB 中

验证影响分析：

GPT 缓存在 TLB 中后，修改 GPT 而不执行 TLBI PA → 旧的保护规则仍在 RTL 中生效
ISA 总是使用新 GPT → 可能报 GPF，而 RTL 用旧 GPT → 访问成功
或相反：ISA 允许访问但 RTL 用旧 GPT → 报 GPF

Impact：HIGH——GPC 是安全隔离的核心机制，缓存不一致直接破坏保护语义

11.2 Granule Protection Fault TLB 缓存（D9.5）

Spec原文：

“Because GPT entries are permitted to be cached in a TLB if they are reachable and valid, translations that result in a GPF are permitted to be cached in a TLB.” (IJMYRB)

Granule Protection Fault（GPF）是 GPC 检查失败时产生的故障。与 Permission fault 类似，产生 GPF 的 GPT 条目也可以被 TLB 缓存。

为什么不能co-sim？

场景：同一 PA 修改 GPT 前后

修改前 GPT: PA 属于 Granule 保护等级 0（无访问限制）
  → 访问成功，TLB fill（缓存了 GPT 信息表明该 PA 允许访问）

修改后 GPT: PA 改为 Granule 保护等级 3（仅 Realm 可访问）
  → 未执行 TLBI *PA*

再次访问该 PA:
  RTL: TLB hit → 缓存中 GPT 显示"允许访问" → 访问成功
  ISA: TLB miss → 读取新 GPT → 发现保护等级 3 → 当前 PE 无权限 → GPF

这是 Permission fault 缓存的 GPC 对等体：

Permission fault 缓存导致 translation table 权限更新的 stale entry（blog 1.1）
GPF 缓存导致 Granule Protection Table 保护更新的 stale entry

验证影响分析：

RME 安全模型中，GPT 更新后的 stale GPF 缓存可能掩盖安全漏洞
软件修改 GPT 后必须执行 TLBI PA 使缓存失效
ISA 无法模拟 GPF 缓存行为——ISA 总是重新 walk GPT

Impact：HIGH——GPC 安全模型的验证关键点，与 Permission fault 缓存平行

11.3 GPT Block/Contiguous 描述符缓存（D9.5）

Spec原文：

“GPT information from a level 0 GPT Block descriptor is permitted to be cached in a TLB as though the block is a contiguous region of granules, each of the size configured in GPCCR_EL3.PGS.”
“Information from a GPT Contiguous descriptor is permitted to be cached in a TLB or a table walk cache for an input address range up to the size indicated by the Contig field.”

GPT 的 Block 和 Contiguous 描述符提供了对大范围地址的高效保护描述。当这些描述符被缓存时，TLB 可以覆盖更大的地址范围。

为什么不能co-sim？

Block/Contiguous 描述符的缓存范围取决于 PGS（Protected Granule Size）和 Contig 字段
GPCCR_EL3.PGS 本身也是 “permitted to be cached in a TLB” 的字段
如果 PGS 被修改但 TLBI PA 未执行，TLB 可能用错误的 granule size 解析 Block 描述符

Impact：MEDIUM——依赖具体 GPT 配置，场景较窄

总结

#	测试点	涉及章节	根因	验证影响
1	Fault缓存不对称	D8.15.1	Permission fault可被TLB缓存，其他不缓存	改权限后不TLBI导致行为不同
2	TLB conflict abort	D8.15.1.6	TLB多命中，ISA无多entry	唯一硬件可检测的TLBI遗漏症状
3	TLB lockdown + TLBI	D8.16.2, D8.17.4	Locked entry在TLBI时的ID行为	无法测试TLBI与lockdown交互
4	Break-Before-Make	D8.17.1	无TLB→无新旧entry共存窗口	BBM违规完全不可见
5	多TLB entry CU	D8.17.1	CU行为组合空间无法探索	安全不变性不可验证
6	Exclusive monitor推测清除	B2.12.1.1	推测可清monitor	STXR返回值不一致
7	BRBE mispred flag	D19.1.3	预测结果是μarch定义	无法提供参考值
8	BRBE cycle count	D19.1.2	Cycle count = 纯timing	无法提供参考值
9	BRBE CU场景	D19	IALL/freeze/INJ的CU行为	需检查约束合法而非比对
10	PMU推测/乱序计数	D13	Pipeline point未定义	RTL与ISA计数差异大
11	PMU微架构事件	D13/D14	Refill/Stall/TLB miss	ISA无cache/TLB/流水线
12	SPE μop采样	D17	采样群体是micro-op非架构指令	采样群体本身不可知
13	SPE RNG采样扰动	D17	RNG实现定义	无法预测哪条指令被采样
14	SPE Events Packet	D18	Cache/TLB miss/branch mispred/data source全部μarch相关	大量字段不可co-sim
15	SPE延迟计数器	D18	Complete定义、Translation overlap都是ID	延迟参考值不可靠
16	ETE E/N Atom生成	D4	无trace生成逻辑	需独立trace生成引擎
17	ETE推测解析	D4.8	Commit/Cancel依赖pipeline	ISA无法产生speculative trace
18	ETE cycle count	D4.11	纯timing + ID counter size	需外部timing模型
19	ETE timestamp	D4.11	依赖系统级timer	UNKNOWN timestamp是唯一可能
20	ETE atom打包	D5	协议层压缩	ISA无packet概念
21	TRBE buffer full	D6	无memory buffer/pointer	Overflow、data loss不可模拟
22	Debug watchpoint + speculation	D2.9.6.3	哪次访问是speculative取决于RTL	无法判断watchpoint触发
23	中间TLB结构	D8.17	PA/IPA索引、splintering	嵌套虚拟化场景不可见
24	推测AF位更新	D8	推测翻译中可置AF=1	页表AF位在ISA/RTL间可见差异
25	EPDn=1后TLB stale entry残留	D8.15.1	ISA无TLB→总是miss→走EPDn检查	RTL因TLB hit绕过EPDn，翻译成功
26	非幂等Normal内存推测损坏	K1.2.10	推测读/写合并可损坏外设	软件映射错误只有硬件能发现
27	Device内存推测取指	K1.2.11	Speculative fetch from Device	RTL可能触发实际读操作
28	越界Cache maintenance Set/Way	K1.2.18	ISA不知cache geometry	CU行为：UNDEFINED/NOP/arbitrary line
29	CSSELR不存在的cache level	K1.2.12	ISA不知cache hierarchy	返回UNKNOWN值
30	推测Cache/TLB状态不可恢复	D24	推测fill在mispredict后保留	Spectre类问题的架构基础
31	CU→UNDEFINED的ESR_ELx	K1.2.8	Syndome值UNKNOWN	ISA/RTL syndrome不匹配
32	推测性SError (FEAT_SpecSEI)	A2 + D24	推测读触发SError	ISA从不推测，永不会触发
33	SVE gather/scatter fault顺序	C8 + B1	Fault探测顺序取决于μarch pipeline	FFR pattern和post-fault数据都不可比对
34	SVE跨页不同内存属性	K1.2.13/15	Element访问顺序是μarch决定	Device访问模式ISA/RTL可能不同
35	RNDR NZCV reasonable period	C6 + K12	ID定义”合理时间”	RTL可能因熵不足返回NZCV=0b0100
36	系统寄存器TLB缓存一致性	D8 + D24	30+字段”permitted to be cached in TLB”	改寄存器后TLB用旧值，ISA用新值
37	NFDn NFU TLB miss	D24	ISA总是TLB miss→NFDn总生效	RTL TLB hit绕过NFDn，NFU返回数据不一致
38	GPT条目TLB缓存	D9.5	GPT entry可被TLB缓存	改GPT后ISA/RTL保护规则不同
39	GPF TLB缓存	D9.5	GPF可被TLB缓存（同Permission fault）	改GPT后stale GPF掩盖安全漏洞
40	GPT Block/Contiguous描述符缓存	D9.5	Block描述符可TLB缓存，PGS可缓存	Granule size不一致，保护范围错乱

验证策略建议：

ID/CU行为：不要求ISA/RTL一致，检查RTL行为在架构约束范围内

纯微架构信息（cycle count、cache/TLB flag）：单独设计directed test，或用性能计数器/形式化方法验证

推测相关（exclusive monitor、watchpoint、ETE atom）：需要验证框架识别”预期内不匹配”

PMU/SPE：建立参考模型（如gem5）而非ISA作为比对基准

ETE/TRBE：需要独立的trace生成引擎，ISA仅提供Layer 3（指令级事件）部分信息

文章作者: 翁贞华

文章链接: http://47.239.17.74/2026/05/24/arm-cosim-blindspots-analysis/

CPU验证 ARM ETE ISA仿真 RTL仿真 Co-simulation TLB BRBE PMU SPE TRBE Debug