Formal Verification of a Realistic Compiler

English Version

Formal Verification of a Realistic Compiler
by Xavier Leroy (xavier.leroy@inria.fr)

Abstract: This paper reports on the development and formal verification (proof of semantic preservation) of CompCert, a compiler from Clight (a large subset of the C programming language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context of critical software and its formal verification: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well.

Note: A previous version of this paper was published in Proceedings of the 33rd Symposium on the Principles of Programming Languages. ACM, NY, 2006.

Introduction

中文版本

真实编译器的形式化验证
董渊 翻译; 王生原 郭宇 审校

摘 要: 本文讨论CompCert编译器的开发和形式化验证(语义保持性证明),该编译器将Clight(一个C语言的较大子集)翻译为PowerPC汇编代码, 文中使用Coq证明辅助工具来编写编译器并完成其正确性证明。对于关注关键软件及其形式化验证的情况下,此编译器非常有用: 编译器的验证保证源代码级证明过的安全性质能够一直保持到编译所产生的可执行代码。

注: 本文初稿发表于第33届编程语言原理研讨会(POPL 2006)论文集,ACM,纽约,2006年

1. 介绍

Can you trust your compiler? Compilers are generally assumed to be semantically transparent: the compiled code should behave as prescribed by the semantics of the source program. Yet, compilers—and especially optimizing compilers—are complex programs that perform complicated symbolic transformations. Despite intensive testing, bugs in compilers do occur, causing the compilers to crash at compile-time or—much worse—to silently generate an incorrect executable for a correct source program.
编译器可信吗?通常大家会认为编译器是语义透明的,即编译生成的代码,其行为理所当然地准确反映源代码程序的语义。 然而,编译器,特别是优化编译器,是一种实施复杂符号变换的复杂软件,即便经过了严格测试,编译器中的错误依然屡见不鲜。 这些错误可能会导致编译器的编译时崩溃,而更可怕的是不声不响地将原本正确的源程序变换为错误的可执行程序。
For low-assurance software, validated only by testing, the impact of compiler bugs is low: what is tested is the executable code produced by the compiler; rigorous testing should expose compiler-introduced errors along with errors already present in the source program. Note, however, that compiler-introduced bugs are notoriously difficult to expose and track down. The picture changes dramatically for safetycritical, high-assurance software. Here, validation by testing reaches its limits and needs to be complemented or even replaced by the use of formal methods such as model checking, static analysis, and program proof. Almost universally, these formal verification tools are applied to the source code of a program. Bugs in the compiler used to turn this formally verified source code into an executable can potentially invalidate all the guarantees so painfully obtained by the use of formal methods. In future, where formal methods are routinely applied to source programs, the compiler could appear as a weak link in the chain that goes from specifications to executables. The safety-critical software industry is aware of these issues and uses a variety of techniques to alleviate them, such as conducting manual code reviews of the generated assembly code after having turned all compiler optimizations off. These techniques do not fully address the issues, and are costly in terms of development time and program performance.
对于安全保证要求较低的软件而言,只需经过测试来确认即可,编译器的错误影响不大。 此类测试的对象是编译器所生成的可执行代码,严格的测试在发现源程序错误的同时应该也可以发现编译器引入的错误。 需要特别关注的是,编译器引入的错误通常都是那种极难发现、极难定位的错误。 而对于安全关键的高安全保证的软件,情况则大不相同。此时,单纯的测试手段已无能为力, 需要采用诸如模型检测、静态分析和程序证明等一系列形式化方法来补救甚至完全替代。 然而,几乎无一例外地,大家都只在程序的源代码级运用各种形式化验证工具, 因此,已验证的源代码很有可能由于编译器内部的小错误而变成一个有问题的可执行程序, 人们煞费苦心通过形式化方法得到的的所有特性都将随着编译器的错误而付之东流。 在不久的将来,当形式化方法在源程序级别中普遍应用的时候,编译器有可能成为从程序规范到可执行代码这个完整变换链条中最为薄弱的一环。 安全关键软件产业已经开始意识到问题严重性,并已经开始使用诸如“不进行任何优化并对编译生成的汇编代码进行人工检查”等各种技术来缓解之。 然而,即便是付出巨大的开发时间和程序性能成本,这些技术还是治标不治本,无法从根本上解决这个难题。
An obviously better approach is to apply formal methods to the compiler itself in order to gain assurance that it preserves the semantics of the source programs. For the last 5 years, we have been working on the development of a realistic, verified compiler called CompCert. By verified, we mean a compiler that is accompanied by a machine-checked proof of a semantic preservation property: the generated machine code behaves as prescribed by the semantics of the source program. By realistic, we mean a compiler that could realistically be used in the context of production of critical software. Namely, it compiles a language commonly used for critical embedded software: neither Java nor ML nor assembly code, but a large subset of the C language. It produces code for a processor commonly used in embedded systems: we chose the PowerPC because it is popular in avionics. Finally, the compiler must generate code that is efficient enough and compact enough to fit the requirements of critical embedded systems. This implies a multipass compiler that features good register allocation and some basic optimizations.
一个显而易见的好办法是应用形式化方法对编译器自身进行验证,以保证编译器能够在进行翻译变换的同时保持源代码的语义。 过去五年来,我们一直在开发一个具备这一特性的真实的、已验证的编译器,并将其命名为CompCert。 称其为已验证,是指这个编译器同时带有可机器自动检查的证明,该证明表明该编译器具有语义保持特性, 即所生成的机器代码的行为准确地反映源代码程序的语义。 称其为真实,我们是指这个编译器可以实实在在地应用于关键软件的生产过程,实际上, 该编译器处理的一个通常意义下可用于关键嵌入软件开发的语言,既不是JAVA,也不是ML,更不是汇编代码,而是一个很大的C语言子集; 而且该编译器生成的代码可运行于嵌入系统常用的处理器,我们选择的是航空工业中非常普及的PowerPC处理器; 最后,该编译器所生成的代码应该足够高效、足够紧凑,能够满足关键嵌入系统的需求。 这意味着该编译器是一个支持多遍编译、具备良好的寄存器分配算法和基本优化能力的编译器。
Proving the correctness of a compiler is by no ways a new idea: the first such proof was published in 196716 (for the compilation of arithmetic expressions down to stack machine code) and mechanically verified in 1972.17 Since then, many other proofs have been conducted, ranging from single-pass compilers for toy languages to sophisticated code optimizations.8 In the CompCert experiment, we carry this line of work all the way to end-to-end verification of a complete compilation chain from a structured imperative language down to assembly code through eight intermediate languages. While conducting the verification of CompCert, we found that many of the nonoptimizing translations performed, while often considered obvious in the compiler literature, are surprisingly tricky to formally prove correct.
证明编译器正确性这个想法由来已久,最初的证明工作发表于1967年[16] (证明的是算术表达式到栈式机代码的编译),之后的机械化验证完成于1972年[17]。 此后,人们提出各种各样的证明,其覆盖范围从针对模型语言的简单单遍编译器一直到拥有复杂代码优化的编译器[8]。 在CompCert的实践中,我们将上述工作扩展为一个复杂编译链的端到端验证方案, 该编译器从结构化命令式语言开始,经历八种中间语言,最终变换为汇编代码。 对CompCert的验证过程中,我们也发现,虽然一些非优化变换在编译文献认为是理所当然的,然而 其正确性的形式化证明竟复杂得出乎意料。
This paper gives a high-level overview of the CompCert compiler and its mechanized verification, which uses the Coq proof assistant.3, 7 This compiler, classically, consists of two parts: a front-end translating the Clight subset of C to a low-level, structured intermediate language called Cminor, and a lightly optimizing back-end generating PowerPC assembly code from Cminor. A detailed description of Clight can be found in Blazy and Leroy5; of the compiler front-end in Blazy et al.4; and of the compiler back-end in Leroy.11, 13 The complete source code of the Coq development, extensively commented, is available on the Web.12
本文高度概括地给出CompCert编译器以及使用Coq证明辅助工具给出的机械化证明。[3,7] 该编译器采用经典的模块结构,分别为: 前端将C语言的子集Clight翻译为一种称为Cminor的低级结构化中间语言; 而包含少量优化的后端将Cminor生成为PowerPc汇编代码。 关于Clight的详细描述参见文献[5],编译器前端工作参看文献[4],而后端相关的工作参看文献[11,13], 包含Coq代码和详尽注释的完整源代码可从项目网站[12]获得。
The remainder of this paper is organized as follows. Section 2 compares and formalizes several approaches to establishing trust in the results of compilation. Section 3 describes the structure of the CompCert compiler, its performance, and how the Coq proof assistant was used not only to prove its correctness but also to program most of it. By lack of space, we will not detail the formal verification of every compilation pass. However, Section 4 provides a technical overview of such a verification for one crucial pass of the compiler: register allocation. Finally, Section 5 presents preliminary conclusions and directions for future work.
本文后续部分组织如下:第二节比较几种“建立可信编译结果”的方法并给出形式化定义; 第三节介绍CompCert编译器的结构、性能、以及在使用Coq工具编写编译器代码的同时进行正确性证明的方法。 限于文章篇幅,我们无法深入介绍每一个编译遍的形式化验证,因此在第四节中介绍其中最为关键的一遍——寄存器分配——的技术要点。 最后,第五节给出初步结论和未来方向。

2. 通向可信编译器之路

2.1 语义保持的记法
Consider a source program S and a compiled program C produced by a compiler. Our aim is to prove that the semantics of S was preserved during compilation. To make this notion of semantic preservation precise, we assume given semantics for the source and target languages that associate observable behaviors B to S and C. We write S ⇓ B to mean that program S executes with observable behavior B. The behaviors we observe in CompCert include termination, divergence, and “going wrong” (invoking an undefined operation that could crash, such as accessing an array out of bounds). In all cases, behaviors also include a trace of the input–output operations (system calls) performed during the execution of the program. Behaviors therefore reflect accurately what the user of the program, or more generally the outside world the program interacts with, can observe.
考虑源程序S和编译生成程序C,我们的目标是证明编译过程中S的语义始终保持不变。 为了更精确地描述语义保持特征,假定将外部可见行为B关联到源语言程序S和目标语言程序C。 采用记号S ⇓ B来表示程序S运行时的外部可见行为是B, CompCert中可见行为包括终止、发散和执行错误 (如数组访问越界访问等可能导致系统崩溃的未定义操作调用)。 所有情况下,执行过程中出现的输入、输出(系统调用)轨迹都属于程序行为, 这些行为准确反映程序用户或者与程序有交互关系的外部世界能够观察到的所有内容。
The strongest notion of semantic preservation during compilation is that the source program S and the compiled code C have exactly the same observable behaviors:
关于编译过程中语义保持的最强的记法是源程序S和编译结果程序C具有完全等价的外部可见行为。
∀B, S ⇓ B ⇔ C ⇓ B (1)
Notion (1) is too strong to be usable. If the source language is not deterministic, compilers are allowed to select one of the possible behaviors of the source program. (For instance, C compilers choose one particular evaluation order for expressions among the several orders allowed by the C specifications.) In this case, C will have fewer behaviors than S. Additionally, compiler optimizations can optimize away “going wrong” behaviors. For example, if S can go wrong on an integer division by zero but the compiler eliminated this computation because its result is unused, C will not go wrong. To account for these degrees of freedom in the compiler, we relax definition (1) as follows:
公式(1)的要求太强,因而根本无法使用。在源语言语义非确定的情况下, 允许编译器选定源程序的可能行为之一 (例如,C编译器可以在其规范允许的几种求值顺序中选择某一特定方案), 这时,程序C的行为将比S的行为要少。 更进一步,编译器的优化算法还可能通过优化消除出错的行为, 比如,如果S在整数被0除的情况下出错, 编译器则根据该计算结果没有引用的实际情况而消除这个计算过程, 这种情况下C程序就不会出同样的错误。考虑到编译器中需要这种适度的自由,我们放松定义(1)中要求,得到下式:
S safe ⇒ (∀B, C ⇓ B ⇒ S ⇓ B) (2)
(Here, S safe means that none of the possible behaviors of S is a “going wrong” behavior.) In other words, if S does not go wrong, then neither does C; moreover, all observable behaviors of C are acceptable behaviors of S. (这里的S safe表示S中没有可能造成出错的行为)。 换言之,如果S不出错,则C也不会出错; 而且,所有C的可见行为均为S的可接受行为。
In the CompCert experiment and the remainder of this paper, we focus on source and target languages that are deterministic (programs change their behaviors only in response to different inputs but not because of internal choices) and on execution environments that are deterministic as well (the inputs given to the programs are uniquely determined by their previous outputs). Under these conditions, there exists exactly one behavior B such that S ⇓ B, and similarly for C. In this case, it is easy to prove that property (2) is equivalent to
在CompCert的实践中以及本文剩余部分,我们所关注的源和目标语言均是确定的 (程序行为的变化仅仅取决于输入的变化,而不取决于程序的内部选择), 并且执行环境也是确定的(提交给程序的输入可以由前一个输出唯一确定)。 这种情况下,存在唯一的行为B保证S ⇓ B,同时也存在唯一的行为B保证C ⇓ B, 因此可以很方便地证明性质(2)等价于
∀B ∉Wrong, S ⇓ B ⇒ C ⇓ B (3)
(Here, Wrong is the set of “going wrong” behaviors.) Property (3) is generally much easier to prove than property (2), since the proof can proceed by induction on the execution of S. This is the approach that we take in this work.
(这里的Wrong是出错行为的集合)。性质(3)通常比性质(2)更容易证明, 因为我们可以通过归纳S的执行来完成证明,这正是本文工作所采用的方法。
From a formal methods perspective, what we are really interested in is whether the compiled code satisfies the functional specifications of the application. Assume that these specifications are given as a predicate Spec(B) of the observable behavior. We say that C satisfies the specifications, and write C |= Spec, if C cannot go wrong (C safe) and all behaviors of B satisfy Spec (∀B, C ⇓ B ⇒ Spec(B) ). The expected correctness property of the compiler is that it preserves the fact that the source code S satisfies the specification, a fact that has been established separately by formal verification of S:
从形式化方法的角度来看,我们真正关心的是编译器生成代码是否满足该应用程序的功能规范。 假定该规范是通过可见行为的谓词Spec(B)给出, 如果C不出错(C Safe)并且所有行为B均满足规范(∀B, C ⇓ B ⇒ Spec(B)), 则我们认为C满足其规范并记为C |= Spec。 我们所期望的编译器正确性性质是源代码符合其规范这一事实能够在编译过程中得到保持, 而这一事实需要通过形式化验证源代码程序而单独获得:
S |= Spec ⇒ C |= Spec (4)
It is easy to show that property (2) implies property (4) for all specifications Spec. Therefore, establishing property (2) once and for all spares us from establishing property (4) for every specification of interest.
容易看出对于所有规范Spec,从性质(2)可以推导出性质(4)。 而且,建立性质(2)可以一次性解决,而不需要针对每一个感兴趣规范去逐个建立性质(4).
A special case of property (4), of considerable historical importance, is the preservation of type and memory safety, which we can summarize as “if S does not go wrong, neither does C”:
作为性质(4)的一个特例而一直备受关注的重要特性是类型和存储安全性保持, 我们记为“如果S不出错,则C也不会出错”:
S safe ⇒ C safe (5)
Combined with a separate check that S is well-typed in a sound type system, property (5) implies that C executes without memory violations. Type-preserving compilation18 obtains this guarantee by different means: under the assumption that S is well typed, C is proved to be well typed in a sound type system, ensuring that it cannot go wrong. Having proved properties (2) or (3) provides the same guarantee without having to equip the target and intermediate languages with sound type systems and to prove type preservation for the compiler.
和利用合理类型系统独立检查得到的S良类型特性组合在一起, 性质(5)可以推导出“C的运行将没有存储安全问题”这一结论。 而类型保持编译[8]通过另外的途径得到与此相同的结果: 假定S是良类型的,在合理类型系统中证明C是良类型的,则可以保证C不会出错。 证明性质(2)或(3)即可提供相同的安全保证, 既不需要为目标语言和中间语言设计合理的类型系统, 也不需要证明编译器的类型保持特性。
2.2 已验证、已确认、验证式编译器
We now discuss several approaches to establishing that a compiler preserves semantics of the compiled programs, in the sense of Section 2.1. In the following, we write S ≈ C, where S is a source program and C is compiled code, to denote one of the semantic preservation properties (1) to (5) of Section 2.1.
现在我们来讨论构建上小节(2.1)所讨论语义保持编译器的几种方法。 后面我们使用 S ≈ C 来表示上小节中(1)~(5)五个语义保持性质中的某一个, 其中S为源程序,C为编译后代码。
Verified Compilers. We model the compiler as a total function Comp from source programs to either compiled code (written Comp(S) = OK(C)) or a compile-time error (written Comp(S) = Error). Compile-time errors correspond to cases where the compiler is unable to produce code, for instance if the source program is incorrect (syntax error, type error, etc.), but also if it exceeds the capacities of the compiler. A compiler Comp is said to be verified if it is accompanied with a formal proof of the following property:
已验证编译器(Verified Compilers)。我们采用全函数Comp来刻画编译器, 该编译器要么翻译源代码为目标代码(记为Comp(S) = OK(C)), 要么出现编译时错误(记为Comp(S) = Error)。 编译时错误对应于编译器无法生成代码的情况, 比如输入不正确的源代码(语法错误、类型错误等), 或者超出编译器的处理能力。 编译器Comp,如果能够提供以下性质的形式化证明,称之为已验证编译器。
∀S, C, Comp(S) = OK (C) ⇒ S ≈ C (6)
In other words, a verified compiler either reports an error or produces code that satisfies the desired correctness property. Notice that a compiler that always fails (Comp(S) = Error for all S) is indeed verified, although useless. Whether the compiler succeeds to compile the source programs of interest is not a correctness issue, but a quality of implementation issue, which is addressed by nonformal methods such as testing. The important feature, from a formal verification standpoint, is that the compiler never silently produces incorrect code.
换而言之,已验证编译器要么报错,要么生成具备人们所期望正确性特性的代码。 值得注意的是,一个针对任何输入都输出“错误”的编译器,确实是一个已验证编译器 (对于所有的S,Comp(S) = Error),虽然这样的编译器毫无用处)。 是否能够成功编译所关心的源代码,这不是个正确性问题,但却是个实现质量问题, 这个问题可以通过测试等非形式化方法来解决。 从形式化验证的角度来看,更为重要的特征是 编译器永远不能不声不响地生成不正确的代码。
Verifying a compiler in the sense of definition (6) amounts to applying program proof technology to the compiler sources, using one of the properties defined in Section 2.1 as the high-level specification of the compiler.
基于定义(6)的含义证明一个编译器需要将程序证明技术应用于编译器源代码, 需要证明的编译器高层规范将由上小节(2.1)所讨论的某一性质给出。
Translation Validation with Verified Validators. In the translation validation approach20, 22 the compiler does not need to be verified. Instead, the compiler is complemented by a validator: a boolean-valued function Validate(S, C) that verifies the property S ≈ C a posteriori. If Comp(S) = OK(C) and Validate(S, C) = true, the compiled code C is deemed trustworthy. Validation can be performed in several ways, ranging from symbolic interpretation and static analysis of S and C to the generation of verification conditions followed by model checking or automatic theorem proving. The property S ≈ C being undecidable in general, validators are necessarily incomplete and should reply false if they cannot establish S ≈ C.
采用已验证确认器的翻译确认(Translation Validation with Verified Validators)。翻译确认方法[20,22]中不需要验证编译器, 而是采用确认器来解决:在代码生成之后用一个返回为布尔值的确认函数Validate(S, C) 来验证S ≈ C这一性质。如果Comp(S) = OK(C)且Validate(S, C) = true, 则认为编译生成代码C是可信任的。 可以采取多种方式进行确认,其范围覆盖从源、目标语言的符号化解释和静态分析一直 到基于模型检测或自动定理证明的验证条件生成。 通常S ≈ C性质是不可判定的,因此验证器必定是不完备的, 在无法得到S ≈ C时它应该回答“false”。
Translation validation generates additional confidence in the correctness of the compiled code, but by itself does not provide formal guarantees as strong as those provided by a verified compiler: the validator could itself be incorrect. To rule out this possibility, we say that a validator Validate is verified if it is accompanied with a formal proof of the following property:
翻译确认可以为所生成代码增加正确性方面的可信程度, 但是确认过程本身不像已验证编译器, 无法提供任何形式化的保证,因为确认器本身可能是不正确的。 为了排除这种可能性,我们要求确认器 Validate是已验证的, 验证过程中需要提供下面性质的形式化证明:
∀S, C, Validate(S, C) = true ⇒ S ≈ C (7)
The combination of a verified validator Validate with an unverified compiler Comp does provide formal guarantees as strong as those provided by a verified compiler. Indeed, consider the following function:
已验证的确认器Validate和未经验证的编译器Comp组合起来可以提供和已验证编译器等价的形式化保证, 事实上,考虑如下函数
Comp′(S) =
match Comp (S) with
| Error → Error
| OK (C) → if Validate (S, C) then OK(C) else Error
This function is a verified compiler in the sense of definition (6). Verification of a translation validator is therefore an attractive alternative to the verification of a compiler, provided the validator is smaller and simpler than the compiler.
这个函数就是基于定义(6)的已验证编译器, 因此如果确认器通常比编译器更为小巧、更为简单, 那么翻译确认器的验证就是编译器验证的一种颇具吸引力的替代方案。
Proof-Carrying Code and Certifying Compilers. The proof carrying code (PCC) approach1, 19 does not attempt to establish semantic preservation between a source program and some compiled code. Instead, PCC focuses on the generation of independently checkable evidence that the compiled code C satisfies a behavioral specification Spec such as type and memory safety. PCC makes use of a certifying compiler, which is a function CComp that either fails or returns both a compiled code C and a proof π of the property C |= Spec. The proof π, also called a certificate, can be checked independently by the code user; there is no need to trust the code producer, nor to formally verify the compiler itself. The only part of the infrastructure that needs to be trusted is the client- side checker: the program that checks whether π entails the property C Spec.
证明携带代码和出具证明编译器(Proof-Carrying Code and Certifying Compilers)。 证明携带代码(PPC)方法[1,19]不求建立源程序和所生成程序之间的语义保持特性, 而是着眼于生成可以独立检查的证据, 该证据表明编译所生成的代码C满足诸如类型安全和存储安全等行为规范Spec。 PPC使用了一种出具证明编译器,这种编译器可以描述为函数CComp,该函数要么失败, 要么在生成代码C的同时给出C |= Spec性质的证明 π ,这里的证明 π ,也称之为证书, 可以由代码使用方独立检查。 这种情况下,不再需要信任代码提供方, 也不再需要形式化地验证编译器自身, 唯一需要信任的东西仅仅是用户方的检查器。 该检查器是用于检查证书 π 是否满足性质C |= Spec的程序。
As in the case of translation validation, it suffices to formally verify the client-side checker to obtain guarantees as strong as those obtained from compiler verification of property (4). Symmetrically, a certifying compiler can be constructed, at least theoretically, from a verified compiler, provided that the verification was conducted in a logic that follows the “propositions as types, proofs as programs” paradigm. The construction is detailed in Leroy.11, section 2.
类似于翻译确认方法,出具证明编译方法需要形式化地验证用户方的检查器, 以获得等价于性质(4)中验证编译器的正确性保证。 对应地,如果可以遵循“命题即类型、证明即程序”对应关系而通过逻辑系统得到相应的证明, 那么至少从理论上讲利用已验证编译器可以构造出出具证明编译器, 具体构造方法详见Leroy文章[11]第2节。
2.3 编译遍的组合
Compilers are naturally decomposed into several passes that communicate through intermediate languages. It is fortunate that verified compilers can also be decomposed in this manner. Consider two verified compilers Comp1 and Comp2 from languages L1 to L2 and L2 to L3, respectively. Assume that the semantic preservation property ≈ is transitive. (This is true for properties (1) to (5) of Section 2.1.) Consider the error-propagating composition of Comp1 and Comp2:
编译器可以很自然地划分为通过中间语言而相互联系的几个遍, 值得庆幸的是,已验证编译器也可以采用同样的方式进行划分。 考虑两个已验证编译器Comp1 和 Comp2分别实现语言L1到L2和L2到L3的翻译, 假定语义保持特性 ≈ 满足传递性(2.1节中讨论的性质(1)到(5)均满足), 考虑Comp1 和 Comp2的错误传播组合:
Comp(S) = match Comp1 (S) with
| Error → Error
| OK (I) → Comp2 (I)
It is trivial to show that this function is a verified compiler from L1 to L3.
很容易看出,所得到的函数就是一个从语言L1到L3的已验证编译器。
2.4 小结
The conclusions of this discussion are simple and define the methodology we have followed to verify the CompCert compiler back-end. First, provided the target language of the compiler has deterministic semantics, an appropriate specification for the correctness proof of the compiler is the combination of definitions (3) and (6):
上述讨论的简单结论就是我们验证CompCert编译器后端所遵循的方法。 首先,假如编译器的目标语言具有确定的语义,那么适用于编译器正确性证明的规范可以是定义(3)和(6)的组合:
∀S, C, B ∉ Wrong, Comp(S) = OK(C) ∧ S ⇓ B ⇒ C ⇓ B
Second, a verified compiler can be structured as a composition of compilation passes, following common practice. However, all intermediate languages must be given appropriate formal semantics.
其次,遵循实际开发常识,已验证编译器由多个编译遍组合而成, 当然也需要为所有中间语言给定合适的形式语义。
Finally, for each pass, we have a choice between proving the code that implements this pass or performing the transformation via untrusted code, then verifying its results using a verified validator. The latter approach can reduce the amount of code that needs to be verified.
最后,对于每一个编译遍,我们可以有两个选择, 要么验证该遍的代码实现, 要么采用非可信代码实施代码翻译, 同时使用已验证确认器来验证其翻译结果。 后一种方法可以大大减少需要验证的代码数量。

3. CompCert编译器概览

3.1 源语言
The source language of the CompCert compiler, called Clight,5 is a large subset of the C programming language, comparable to the subsets commonly recommended for writing critical embedded software. It supports almost all C data types, including pointers, arrays, struct and union types; all structured control (if/then, loops, break, continue, Java-style switch); and the full power of functions, including recursive functions and function pointers. The main omissions are extended-precision arithmetic (long long and long double); the goto statement; nonstructured forms of switch such as Duff’s device; passing struct and union parameters and results by value; and functions with variable numbers of arguments. Other features of C are missing from Clight but are supported through code expansion (de-sugaring) during parsing: side effects within expressions (Clight expressions are side-effect free) and block-scoped variables (Clight has only global and function-local variables).
CompCert编译器的源语言称为Clight[5], 是一个兼容于通常编写关键嵌入式软件推荐使用的C语言子集。 该语言支持几乎全部C数据类型,包括指针、数组、结构和联合体等类型; 所有结构化控制(if/then, 循环, break, continue, Java风格的分支switch); 包含递归函数和函数指针在内的函数支持。 而忽略的主要是扩展精度类型的算术运算(long long和long double); goto 语句; 如达夫设备(Duff’s device)之类的非结构化switch转移; 结构体、联合体参数传递和结果值返回; 以及带有变长参数的函数。 其他Clight缺少的C语言特征则由语法解析过程中的代码展开(de-sugaring)来支持: 包括表达式的副作用(Clight表达式没有副作用)和块作用域的变量 (Clight只支持全局变量和具有函数作用域的局部变量)。
The semantics of Clight is formally defined in big-step operational style. The semantics is deterministic and makes precise a number of behaviors left unspecified or undefined in the ISO C standard, such as the sizes of data types, the results of signed arithmetic operations in case of overflow, and the evaluation order. Other undefined C behaviors are consistently turned into “going wrong” behaviors, such as dereferencing the null pointer or accessing arrays out of bounds. Memory is modeled as a collection of disjoint blocks, each block being accessed through byte offsets; pointer values are pairs of a block identifier and a byte offset. This way, pointer arithmetic is modeled accurately, even in the presence of casts between incompatible pointer types.
Clight语义的形式化定义采用大步操作方式给出。 其语义是确定的,同时还精确描述了ISO C标准中未加规范和未加定义的一系列行为, 比如数据类型的大小、有符号算术操作溢出情况下的结果、求值顺序等。 其他诸如去引用空指针或者数组访问越界等C中未定义的行为全部一致地表示为“出错”行为。 内存模型则采用不相交块的集合来描述,每一个块均可以通过字节偏移量来访问,指针量是由块标识符和字节偏移量组成的数据对。 即便出现不兼容指针类型的强制转换等情况,这种方式仍可以精确描述指针的算术运算。
3.2 编译遍和中间语言
The formally verified part of the CompCert compiler translates from Clight abstract syntax to PPC abstract syntax, PPC being a subset of PowerPC assembly language. As depicted in Figure 1, the compiler is composed of 14 passes that go through eight intermediate languages. Not detailed in Figure 1 are the parts of the compiler that are not verified yet: upstream, a parser, type-checker and simplifier that generates Clight abstract syntax from C source files and is based on the CIL library21; downstream, a printer for PPC abstract syntax trees in concrete assembly syntax, followed by generation of executable binary using the system’s assembler and linker.
CompCert编译器形式化验证的部分是从Clight抽象语法到PPC抽象语法的翻译实现, 其中PPC是PowerPC汇编语言子集。 如图1所示,该编译器包含14个遍,经过8种中间语言。 图1中没有详细标出的是该编译器中未验证的部分: 前面的部分是基于CIL库[21]的语法解析器、类型检查器和语法树化简器, 完成从C源代码文件到Clight抽象语法树的生成工作。 后面的部分,则是由PPC抽象语法树到具体汇编语法的输出, 随后的可执行二进制文件生成部分则采用系统自带的汇编器和链接器。


图 1: 编译遍和中间语言
The front-end of the compiler translates away C-specific features in two passes, going through the C#minor and Cminor intermediate languages. C#minor is a simplified, typeless variant of Clight where distinct arithmetic operators are provided for integers, pointers and floats, and C loops are replaced by infinite loops plus blocks and multilevel exits from enclosing blocks. The first pass translates C loops accordingly and eliminates all type-dependent behaviors: operator overloading is resolved; memory loads and stores, as well as address computations, are made explicit. The next intermediate language, Cminor, is similar to C#minor with the omission of the & (address-of) operator. Cminor function-local variables do not reside in memory, and their address cannot be taken. However, Cminor supports explicit stack allocation of data in the activation records of functions. The translation from C#minor to Cminor therefore recognizes scalar local variables whose addresses are never taken, assigning them to Cminor local variables and making them candidates for register allocation later; other local variables are stack-allocated in the activation record.
编译器前端采用C#minor和Cminor两种中间语言、经过两遍翻译去除所有的C语言特性。 C#minor是一种简化的无类型Clight语言变种, 它为整数、指针和浮点提供了不同的算术运算符,并且将C循环替换为无限循环附加语句块的形式,其中包含有多层次闭块的跳出语句。 第一遍编译据此翻译C循环并消除所有类型依赖的行为:解决运算符重载、将内存读写以及地址计算改为显示操作。 第二种中间语言为Cminor,类似于C#minor,在其基础上略去了取地址运算(&)。 Cminor函数内本地变量不在内存中存放,因而无法获得其地址。 当然,Cminor支持函数活动记录内数据的显式栈分配。 因而,从C#minor到Cminor的翻译过程中识别出那些从不使用取地址操作的标量特征的本地变量, 将其分配给Cminor本地变量并作为稍后寄存器分配的考虑对象, 而所有其他本地变量均在活动记录的栈上分配。
The compiler back-end starts with an instruction selection pass, which recognizes opportunities for using combined arithmetic instructions (add-immediate, not-and, rotate-and-mask, etc.) and addressing modes provided by the target processor. This pass proceeds by bottom-up rewriting of Cminor expressions. The target language is CminorSel, a processor-dependent variant of Cminor that offers additional operators, addressing modes, and a class of condition expressions (expressions evaluated for their truth value only).
该编译器后端开始于指令选择遍,完成识别(add-immediate, not-and, rotate-and-mask等)组合算术指令的使用时机、目标处理器提供的寻址方式等工作。 这一遍通过自底向上地重写Cminor表达式而实现。 其目标语言是CminorSel,一种机器相关的Cminor变种, 它提供更多的运算符、寻址方式和一系列条件表达式(求值时只能得到真值结果的表达式)。
The next pass translates CminorSel to RTL, a classic register transfer language where control is represented as a control-flow graph (CFG). Each node of the graph carries a machine-level instruction operating over temporaries (pseudo-registers). RTL is a convenient representation to conduct optimizations based on dataflow analyses. Two such optimizations are currently implemented: constant propagation and common subexpression elimination, the latter being performed via value numbering over extended basic blocks. A third optimization, lazy code motion, was developed separately and will be integrated soon. Unlike the other two optimizations, lazy code motion is implemented following the verified validator approach.24
接下来的遍翻译CminorSel至RTL,RTL是一种经典的寄存器传输语言,以控制流图(CFG)的形式给出, 图中的每一个节点均代表一个以临时量(伪寄存器)为操作数的机器指令。 RTL是便于开展数据流分析相关优化的中间表示,当前实现中有两种此类优化: 常数传播和公共子表达式消除。后者是通过扩展基本块上的值标号技术而实现。 此外,还单独开发惰性代码迁移作为第三种优化,本优化将在不久集成进来。 和前两种优化不同,惰性代码迁移是遵循已验证确认器方法而实现的。
After these optimizations, register allocation is performed via coloring of an interference graph.6 The output of this pass is LTL, a language similar to RTL where temporaries are replaced by hardware registers or abstract stack locations. The CFG is then “linearized,” producing a list of instructions with explicit labels, conditional and unconditional branches. Next, spills and reloads are inserted around instructions that reference temporaries that were allocated to stack locations, and moves are inserted around function calls, prologues and epilogues to enforce calling conventions. Finally, the “stacking” pass lays out the activation records of functions, assigning offsets within this record to abstract stack locations and to saved callee-save registers, and replacing references to abstract stack locations by explicit memory loads and stores relative to the stack pointer.
这些优化之后是基于冲突图着色的寄存器分配,[6] 这一遍输出结果是LTL,一种类似于RTL的语言,不同的是原来的临时量全部替换为硬件寄存器或者是抽象栈单元, 而原有的CFG也经过线性化而成为带有显示标号、条件和无条件跳转的指令序列。 接着,在引用分配到栈单元中临时量的指令前后插入溢出和重取操作,在函数调用、入口处理和出口处理处插入move指令以满足函数调用约定。 最后,栈规划遍完成函数活动记录的栈布局、在该记录中确定抽象栈单元的偏移量、确定已存储的被调函数保存(Callee-save)寄存器的偏移量、 将抽象栈单元的引用替换为相对于栈指针的显式内存读写等工作。
This brings us to the Mach intermediate language, which is semantically close to PowerPC assembly language. Instruction scheduling by list or trace scheduling can be performed at this point, following the verified validator approach again.23 The final compilation pass expands Mach instructions into canned sequences of PowerPC instructions, dealing with special registers such as the condition registers and with irregularities in the PowerPC instruction set. The target language, PPC, accurately models a large subset of PowerPC assembly language, omitting instructions and special registers that CompCert does not generate.
这时我们就得到了Mach中间语言,该语言的语义非常接近PowerPC汇编, 仍然遵循已验证确认器方法,这里可以实施基于表的指令调度或者轨迹(trace)调度。[23] 编译的最后一遍将Mach指令展开成为PowerPC指令序列, 需要特别处理条件寄存器等专用寄存器和非规整指令。 目标语言PPC是PowerPC汇编语言的一个准确的较大子集,省略了部分CompCert未使用的指令和专用寄存器。
From a compilation standpoint, CompCert is unremarkable: the various passes and intermediate representations are textbook compiler technology from the early 1990s. Perhaps the only surprise is the relatively high number of intermediate languages, but many are small variations on one another: for verification purposes, it was more convenient to identify each variation as a distinct language than as different subsets of a few, more general-purpose intermediate representations.
从纯粹编译的角度来看,CompCert确实是乏善可陈, 所有的中间表示和编译遍均出自上世纪九十年代初期的编译教课书。 也许唯一令人称奇的是拥有如此众多的中间语言,而它们之间差别却非常小。 但是从验证的角度来看,标识具有微小差异的变种为单独的语言更加方便, 而将其作为少量通用中间表示的不同子集则会带来很多麻烦。

3.3 证明编译器
The added value of CompCert lies not in the compilation technology implemented, but in the fact that each of the source, intermediate and target languages has formally defined semantics, and that each of the transformation and optimization passes is proved to preserve semantics in the sense of Section 2.4.
CompCert的价值不在于所实现的编译技术, 而在于这样一种雄辩的事实:形式化定义了所有源语言、中间语言和目标语言的语义, 以2.4节的思路形式化证明了所有翻译和优化遍的语义保持特性。
These semantic preservation proofs are mechanized using the Coq proof assistant. Coq implements the Calculus of Inductive and Coinductive Constructions, a powerful constructive, higher-order logic which supports equally well three familiar styles of writing specifications: by functions and pattern-matching, by inductive or coinductive predicates representing inference rules, and by ordinary predicates in first-order logic. All three styles are used in the CompCert development, resulting in specifications and statements of theorems that remain quite close to what can be found in programming language research papers. In particular, compilation algorithms are naturally presented as functions, and operational semantics use mostly inductive predicates (inference rules). Coq also features more advanced logical features such as higherorder logic, dependent types and an ML-style module system, which we use occasionally in our development. For example, dependent types let us attach logical invariants to data structures, and parameterized modules enable us to reuse a generic dataflow equation solver for several static analyses.
这里的语义保持证明均采用Coq辅助证明工具来实现机械化。 Coq实现了归纳与余归纳构造演算,这是一种功能强大的构造性高阶逻辑, 该逻辑同时支持三种众所周知的规范书写风格: 函数与模式匹配、表示推导规则的的归纳与余归纳谓词、 以及常规的一阶逻辑谓词。 CompCert开发中使用到所有上述三种风格, 得到的规范和定理命题基本接近通常在编程语言研究论文中的形式。 特别地,编译算法很自然地表达为函数,操作语义则大多数定义为归纳谓词(推导规则)。 开发中偶尔会用到一些Coq很多高级特性中的一部分,比如高阶逻辑、依赖类型和ML风格模块系统。 例如,依赖类型帮助我们为数据结构附加逻辑不变量, 而参数化模块则方便我们在几种静态分析之间重用通用数据流方程求解器。
Proving theorems in Coq is an interactive process: some decision procedures automate equational reasoning or Presburger arithmetic, for example, but most of the proofs consist in sequences of “tactics” (elementary proof steps) entered by the user to guide Coq in resolving proof obligations. Internally, Coq builds proof terms that are later rechecked by a small kernel verifier, thus generating very high confidence in the validity of proofs. While developed interactively, proof scripts can be rechecked a posteriori in batch mode.
在Coq中的证明定理是一个交互过程: 比如部分判定过程可以自动完成等式推论或者Presburger算法, 不过,更多的证明则是由用户输入用以指导Coq完成证明任务的tactics(基本证明步)序列。 Coq在内部建立证明条件,随后用一个微小的核心验证器来检查,这就大大增强了证明可信程度。 虽然证明需要交互开发,但是证明脚本可以批处理方式进行事后检查。
The whole Coq formalization and proof represents 42,000 lines of Coq (excluding comments and blank lines) and approximately three person-years of work. Of these 42,000 lines, 14% define the compilation algorithms implemented in CompCert, and 10% specify the semantics of the languages involved. The remaining 76% correspond to the correctness proof itself. Each compilation pass takes between 1,500 and 3,000 lines of Coq for its specification and correctness proof. Likewise, each intermediate language is specified in 300 to 600 lines of Coq, while the source language Clight requires 1,100 lines. Additional 10,000 lines correspond to infrastructure shared between all languages and passes, such as the formalization of machine integer arithmetic and of the memory model.
除注释和空行,整个Coq形式化定义和证明合计42,000行Coq代码,工作量大约3人年。 这42,000行中,14%用于定义CompCert中实现的编译算法,10%用于规范相关语言的语义, 其余76%则对应于编译器的正确性证明。 每一个编译遍的规范及其正确性证明约为1,500到3,000行Coq代码, 相应的每一种中间语言的规范约为300到600行代码,而源语言Clight则需要1,100行。 另外的10,000行Coq代码对应于所有语言和编译遍共享的底层框架,比如 机器整数运算和存储模型的形式化定义。
3.4 编写和运行编译器
We use Coq not only as a prover to conduct semantic preservation proofs, but also as a programming language to write all verified parts of the CompCert compiler. The specification language of Coq includes a small, pure functional language, featuring recursive functions operating by pattern-matching over inductive types (ML- or Haskell-style tree-shaped data types). With some ingenuity, this language suffices to write a compiler. The highly imperative algorithms found in compiler textbooks need to be rewritten in pure functional style. We use persistent data structures based on balanced trees, which support efficient updates without modifying data in-place. Likewise, a monadic programming style enables us to encode exceptions and state in a legible, compositional manner.
Coq不仅是我们构造语义保持证明的工具,同时也是所有CompCert编译器已验证部分的编程语言。 Coq的规范描述语言中包含一个使用基于归纳类型(ML或者Haskell风格的树状数据类型)模式匹配方法来处理递归函数的袖珍纯函数语言。 配合精巧的构思,这种语言足够编译器编写之需。 那些编译教科书中高度命令式的算法需要采用纯函数式风格重写。 我们采用基于平衡树的持久数据结构,用以支持无需原地修改的高效数据更新。 类似地,单质编程风格(Monadic Programming Style)帮助我们用一种清晰的、组合的方式来描述异常和状态。
The main advantage of this unconventional approach, compared with implementing the compiler in a conventional imperative language, is that we do not need a program logic (such as Hoare logic) to connect the compiler’s code with its logical specifications. The Coq functions implementing the compiler are first-class citizens of Coq’s logic and can be reasoned on directly by induction, simplifications, and equational reasoning.
和采用传统命令式语言实现编译器相比,这种非传统方法的主要优势在于, 我们不再需要一种程序逻辑(如Hoare逻辑)将编译器代码和对应的逻辑规范联系起来。 实现编译器的Coq函数是Coq逻辑的“头等公民”,可以直接采用归纳、化简和等式推导等方法证明。
To obtain an executable compiler, we rely on Coq’s extraction facility,15 which automatically generates Caml code from Coq functional specifications. Combining the extracted code with hand-written Caml implementations of the unverified parts of the compiler (such as the parser), and running all this through the Caml compiler, we obtain a compiler that has a standard, cc-style command-line interface, runs on any platform supported by Caml, and generates PowerPC code that runs under MacOS X. (Other target platforms are being worked on.)
Coq提供了一种可以从函数式规范中自动生成Caml源代码的代码提取工具支持, 我们采用这个工具来得到编译器可执行代码。 将提取得到的源代码和语法解析器等手工实现的未验证的Caml代码合而为一, 并利用Caml编译器编译之后, 我们就得到一个具有标准cc风格命令行接口的编译器, 其运行环境为所有Caml支持的平台, 所生成PowerPC代码的运行环境为MacOS X(其他运行环境的支持正在开发中)。
3.5 性能比较
To assess the quality of the code generated by CompCert, we benchmarked it against the GCC 4.0.1 compiler at optimization levels 0, 1, and 2. Since standard benchmark suites use features of C not supported by CompCert, we had to roll our own small suite, which contains some computational kernels, cryptographic primitives, text compressors, a virtual machine interpreter and a ray tracer. The tests were run on a 2 GHz PowerPC 970 “G5” processor.
为了评估CompCert生成代码的质量, 我们将其和GCC 4.0.1编译器在0, 1和2优化级别的结果进行比较。 很多标准基准测试程序集用到了CompCert不支持的C语言特性, 因此我们只能自行构造一个小的测试集, 这个集合中包含一些核心计算程序、加密原语、文本压缩、 一个虚拟机解释程序、还有一个射线追踪程序。 测试程序的运行环境是一个2GHz主频的PowerPC 970 “G5” 处理器。
As the timings in Figure 2 show, CompCert generates code that is more than twice as fast as that generated by GCC without optimizations, and competitive with GCC at optimization levels 1 and 2. On average, CompCert code is only 7% slower than gcc −01 and 12% slower than gcc −02. The test suite is too small to draw definitive conclusions, but these results strongly suggest that while CompCert is not going to win a prize in high performance computing, its performance is adequate for critical embedded code.
Compilation times of CompCert are within a factor of 2 of those of gcc−01, which is reasonable and shows that the overheads introduced to facilitate verification (many small passes, no imperative data structures, etc.) are acceptable.
时间比较结果如图2所示,CompCert所生成代码的运行效率超过不做优化的GCC所生成代码的两倍, 与GCC O1和O2优化结果可匹敌。 平均而言,CompCert代码仅比gcc O1结果慢7%, 而比gcc O2结果慢12%。 这个测试集还是太小,无法给出最终结论,但这些结果雄辩地说明, 虽然CompCert不是专门针对高性能计算领域,但是其性能满足关键嵌入程序的需求还是绰绰有余的。 CompCert的编译时间合理地限制在GCC O1编译时间的2倍之内, 对于这样一个引入易于验证特性的工具(很多短小的编译遍,没有命令式的数据结构等等)来讲, 这样的负担也是可以接受的。


图 2: 生成代码执行时间比较

4. 寄存器分配

To provide a more detailed example of a verified compilation pass, we now present the register allocation pass of CompCert and outline its correctness proof.
作为已验证编译器遍的详细实例,这节将介绍CompCert的寄存器分配遍及其正确性证明。

4.1 RTL中间语言
Register allocation is performed over the RTL intermediate representation, which represents functions as a CFG of abstract instructions, corresponding roughly to machine instructions but operating over pseudo-registers (also called “temporaries”).
Every function has an unlimited supply of pseudo-registers, and their values are preserved across function call. In the following, r ranges over pseudo-registers and l over labels of CFG nodes.
寄存器分配在RTL中间表示上实施。 RTL把函数表示为抽象指令的CFG图, 抽象指令严格对应机器指令, 但是其操作数为伪寄存器(也称为临时量)。 其中每个函数可以无数量限制地使用伪寄存器, 而这些伪寄存器的值在函数调用过程中保持不变。 下面我们用r和l分别表示伪寄存器和CFG节点标号。
Instructions,指令:
i ::= nop (l) no operation (go to l ),空操作(转标号l)
| op(op, r→, r, l) arithmetic operation,算术操作
| load (k, mode, r→, r, l) memory load,内存读取
| store(k, mode, r→, r, l) memory store,内存写入
| call(sig, (r | id), r→, r, l) function call,函数调用
| tailcall(sig,(r | id), r→) function tail call,函数尾调用
| cond(cond, r→, ltrue, lfalse) conditional branch,条件跳转
| return | return(r) function return,函数返回

Control-flow graphs,控制流图:
g ::= l → i finite map,有限映射

Internal functions,内部函数:
F ::= {name = id; sig = sig;
params = r→; parameters,参数
stacksize = n; size of stack data block,栈数据块尺寸
entrypoint = l; label of first instruction,第一条指令的标号
code = g} control-flow graph,控制流图

External functions,外部函数:
Fe ::= {name = id; sig = sig}

Each instruction takes its arguments in a list of pseudoregisters r→ and stores its result, if any, in a pseudo-register r. Additionally, it carries the labels l of its possible successors. Instructions include arithmetic operations op (with an important special case op(move, r, r′, l) representing a register-to-register copy), memory loads and stores (of a quantity κ at the address obtained by applying addressing mode mode to registers r→ ), conditional branches (with two successors), and function calls, tail-calls, and returns.
每一条指令通过伪寄存器表r→传递参数, 如果有返回参数则存放于另一个伪寄存器r中, 此外,该指令还记录着可能后继指令的标号l。 指令中包括算术运算op(含表示寄存器到寄存器复制的重要特例op(move, r, r′, l))、 内存存取(匹配寻址方式mode于寄存器r→获得内存地址,对该地址开始的k个单元进行操作)、 条件跳转(含两个后继指令)、以及函数调用、尾调用和函数返回。
An RTL program is composed of a set of named functions, either internal or external. Internal functions are defined within RTL by their CFG, entry point in the CFG, and parameter registers. External functions are not defined but merely declared: they model input/output operations and similar system calls. Functions and call instructions carry signatures sig specifying the number and register classes (int or float) of their arguments and results.
RTL表示的程序由内部和外部有名函数集合组成, 其中内部函数由其CFG、CFG的入口点和参数寄存器定义, 而外部函数只声明不定义,它们模拟输入/输出操作以及类似的系统调用。 函数和调用指令附带标记sig, 用于确定其参数和返回值的数目和寄存器类别(整型或者浮点)。
The dynamic semantics of RTL is specified in small-step operational style, as a labeled transition system. The predicate G ⊥ S →t S′ denotes one step of execution from state S to state S′. The global environment G maps function pointers and names to function definitions. The trace t records the input–output events performed by this execution step: it is empty (t = e) for all instructions except calls to external functions, in which case t records the function name, parameters, and results of the call.
RTL的动态语义采用小步操作方式,形式化描述为带有标号的转移系统。 谓词G ⊥ S →t S′表示从状态S执行一步转移到状态S‘, 全局环境G将函数指针和名称映射为函数定义, 轨迹t记录该执行步所发生的输入-输出事件: 调用外部函数的时候,t记录函数名称、参数和调用结果, 对于除此之外的所有指令,其轨迹均为空(t = e)。
Execution states S are of the form S(Σ, g, s, l, R, M) where g is the CFG of the function currently executing, l the current program point within this function, and s a memory block containing its activation record. The register state R maps pseudo-registers to their current values (discriminated union of 32-bit integers, 64-bit floats, and pointers). Likewise, the memory state M maps (pointer, memory quantity) pairs to values, taking overlap between multi-byte quantities into account.14 Finally, Σ models the call stack: it records pending function calls with their (g, s, l, R) components. Two slightly different forms of execution states, call states and return states, appear when modeling function calls and returns, but will not be described here.
执行状态S形式为S(Σ, g, s, l, R, M), 其中g是当前执行函数的CFG; l是该函数内的当前程序执行位置; s是包含其活动记录的内存块; 寄存器状态R将伪寄存器映射为其当前值(需要区别对待32位整数、64位浮点和指针); 类似地,内存状态M将指针、内存块偶对映射为值, 映射中需要考虑多字节内存块之间的相互重叠问题;[14] 最后,Σ描述调用栈,记录后续函数调用的(g, s, l, R)数据。 调用状态和返回状态用于描述函数返回和调用, 和执行状态有两个细微的差别,这里不再详述。
To give a flavor of RTL’s semantics, here are two of the rules defining the one-step transition relation, for arithmetic operations and conditional branches, respectively:
为了给出RTL语义的直观感觉, 这里分别写出算术操作和条件跳转两条指令单步转移关系规则的定义:
g (l) = op(op, r→, r, l′) eval_op(G, s, op, R(r→)) = u
-------------------------------------------------------
G⊥S(S, g, s, l, R, M) →e S(S, g, s, l′, R{r ← u}, M)

g(l) = cond(cond, r→, ltrue, lfalse)
l′ =ltrue if eval_cond(cond, R(r→)) = true
     lfalse if eval_cond(cond, R(r→)) = false
----------------------------------------------
G⊥S(S, g, s, l, R, M) →e S(S, g, s,l′, R, M)

4.2 寄存器分配算法
The goal of the register allocation pass is to replace the pseudo-registers r that appear in unbounded quantity in the original RTL code by locations l, which are either hardware registers (available in small, fixed quantity) or abstract stack slots in the activation record (available in unbounded quantity). Since accessing a hardware register is much faster than accessing a stack slot, the use of hardware registers must be maximized. Other aspects of register allocation, such as insertion of reload and spill instructions to access stack slots, are left to subsequent passes.
寄存器分配遍的目标是将 在原始RTL代码中l处出现的可以无限使用的伪寄存器r替换掉, 替换为硬件寄存器(少量固定个数)或者是活动记录中的抽象栈单元(无个数限制)。 硬件寄存器的访问速度要远远高于栈单元的访问速度, 因此必须最大限度地优先使用硬件寄存器。 寄存器分配需要考虑的的其他方面, 比如插入栈单元操作的溢出和重取指令, 则留给其他后续遍处理。
Register allocation starts with a standard liveness analysis performed by backward dataflow analysis. The dataflow equations for liveness are of the form
寄存器分配开始于采用逆向数据流分析的标准活跃分析,针对活跃信息的数据流方程形如:
LV(l) = ∪ {T (s, LV(s)) | s successor of l} (8)
The transfer function T(s, LV(s) ) computes the set of pseudo-registers live “before” a program point s as a function of the pseudo-registers LV(s) live “after” that point. For instance, if the instruction at s is op(op, r→, r, s′), the result r becomes dead because it is redefined at this point, but the arguments r→ become live. because they are used at this point: T(s, LV(s) ) = (LV(s){r}) ∪ r→. However, if r is dead “after” (r ∉ L(s) ), the instruction is dead code that will be eliminated later, so we can take T(s, LV(s) ) = LV (s) instead.
转换函数T(s, LV(s) )计算某个程序点s之前的活跃伪寄存器集合, T是该点之后活跃伪寄存器LV(s)的函数。 例如,假定s点的指令为op(op, r→, r, s′),由于r于当前点重新定值,因此r的结果将不再活跃, 而由于r→为当前点所引用,有T(s, LV(s) ) = (LV(s){r}) ∪ r→,因此参数r→开始活跃。 不过,如果之后的r非活跃(r ∉ L(s) ), 则该指令是死代码并将在稍后删除, 因此我们可以采用T(s, LV(s) ) = LV (s)来代替上式。
The dataflow equations are solved iteratively using Kildall’s worklist algorithm. CompCert provides a generic implementation of Kildall’s algorithm and of its correctness proof, which is also used for other optimization passes. The result of this algorithm is a mapping LV from program points to sets of live registers that is proved to satisfy the correctness condition LV(l) ⊇ T(s, LV(s) ) for all s successor of l. We only prove an inequation rather than the standard dataflow equation (8) because we are interested only in the correctness of the solution, not in its optimality.
采用Kildall工作表算法来迭代求解该数据流方程, CompCert给出Kildall算法的一种通用实现及其正确性证明,该实现也适用于其他相关优化遍。 该算法的结果是一个由LV形式的程序点到活跃寄存器集合的映射, 已经证明对于l的所有后继指令标号s,该映射均满足正确性条件LV(l) ⊇ T(s, LV(s) )。 我们只证明这个不等式方程,而没有证明标准数据流方程(8), 原因是我们所感兴趣的只是解的正确性而不是解的最优性。
An interference graph having pseudo-registers as nodes is then built following Chaitin’s rules,6 and proved to contain all the necessary interference edges. Typically, if two pseudo-registers r and r′ are simultaneously live at a program point, the graph must contain an edge between r and r′. Interferences are of the form “these two pseudo-registers interfere” or “this pseudo-register and this hardware register interfere,” the latter being used to ensure that pseudoregisters live across a function call are not allocated to caller-save registers. Preference edges (“these two pseudoregisters should preferably be allocated the same location” or “this pseudo-register should preferably be allocated this location”) are also recorded, although they do not affect correctness of the register allocation, just its quality.
接着采用Chaitin规则[6]来建立以伪寄存器为节点的冲突图,并证明了图中包含所有必须的冲突边。 典型情况是,如果在同一个程序点上两个伪寄存器r和r′同时活跃,则图中必定包含连接r和r′的边。 冲突的表现形式为“某两个伪寄存器冲突”或者“此伪寄存器和彼硬件寄存器冲突”, 后者用来保证跨函数调用活跃的伪寄存器不需要分配调用者保存的硬件寄存器。 虽说这些特性不影响寄存器分配的正确性,而只是影响到寄存器分配质量。 首选边(“这两个伪寄存器最好分配为同一个位置”或者“这个伪寄存器最好分配为这个位置”)也还是需要记录下来。
The central step of register allocation consists in coloring the interference graph, assigning to each node r a “color” j(r) that is either a hardware register or a stack slot, under the constraint that two nodes connected by an interference edge are assigned different colors. 寄存器分配的核心工作包含两步,一是冲突图着色, 二是基于某种约束为节点r指派一种“颜色”j(r)。 这里的颜色要么是一个硬件寄存器,要么是一个栈单元, 约束条件是必须为任何两个由冲突边连接的节点指派不同的颜色。 We use the coloring heuristic of George and Appel.9 Since this heuristic is difficult to prove correct directly, we implement it as unverified Caml code, then validate its results a posteriori using a simple verifier written and proved correct in Coq. 我们采用George和Appel[9]的启发着色方法。 该启发式方法的正确性很难直接证明, 因此我们先采用未经验证的Caml代码实现该方法, 然后采用一个由Coq编写并证明的简单验证器来对其结果进行事后确认。 Like many NP-hard problems, graph coloring is a paradigmatic example of an algorithm that is easier to validate a posteriori than to directly prove correct. The correctness conditions for the result j of the coloring are:
类似于很多NP难问题,图着色是一个事后确认易于直接给出正确性证明的算法中的一个典型例子。 着色结果j的正确性条件为:
1. j(r) ≠ j(r′) 若 r 和 r′ 冲突
2. j(r) ≠ l 若 r 和 l 冲突
3. j(r) 和 r 拥有相同的寄存器类别 (整型或浮点)
These conditions are checked by boolean-valued functions written in Coq and proved to be decision procedures for the three conditions. Compilation is aborted if the checks fail, which denotes a bug in the external graph coloring routine.
Finally, the original RTL code is rewritten. Each reference to pseudo-register r is replaced by a reference to its location j(r). Additionally, coalescing and dead code elimination are performed. A side-effect-free instruction l : op(op, r→, r, l′) or l: load(k, mode, r→, r, l′) is replaced by a no-op l: nop(l′) if the result r is not live after l (dead code elimination). Likewise, a move instruction l : op(move, rs, rd, l′) is replaced by a no-op l : nop(l′) if j(rd) = j(rs) (coalescing).
上述条件由Coq书写的布尔值函数来检查, 并且已经证明这些函数是上述条件的判定过程。 如果检查出错,则表明外部图着色函数存在错误,因此将终止编译。 最后的工作是重写原始RTL代码, 每一个伪寄存器r的引用都将替换为该寄存器位置j(r)的引用。 同时还将执行变量合并和死代码删除优化: 如果结果r在l之后不再活跃, 则无副作用的指令l : op(op, r→, r, l′) 或l: load(k, mode, r→, r, l′)将替换为空指令l: nop(l′)(死代码删除); 类似地,如果j(rd) = j(rs), 则寄存器移动指令l : op(move, rs, rd, l′)将替换为空指令l : nop(l′)(变量合并)。

4.3 证明语义保持
To prove that a program transformation preserves semantics, a standard technique used throughout the CompCert project is to show a simulation diagram: each transition in the original program must correspond to a sequence of transitions in the transformed program that have the same observable effects (same traces of input–output operations, in our case) and preserve as an invariant a given binary relation ∼ between execution states of the original and transformed programs. In the case of register allocation, each original transition corresponds to exactly one transformed transition, resulting in the following “lock-step” simulation diagram:
为了证明程序变换的语义保持特性,贯穿CompCert项目的标准技术是建立如下所示的模拟关系图: 原始程序的每一个单步转移一定对应于变换后程序的一个具有相同观察效果 (在我们这里是指具有相同的输入-输出操作轨迹)的转移序列, 同时以不变量形式保持原始程序和变换后程序的执行状态之间的一个二元关系~。 在寄存器分配的例子中,每一个原始转移步恰好对应于变换后的一个转移步, 其结果就得到如下的“固定步长(Lock-step)”模拟关系图。
S1 --- ∼ --- S1'
 |            |
t|            |t
 |            |
\/            \/
S2 --- ∼ --- S2'

(Solid lines represent hypotheses; dotted lines represent conclusions.) If, in addition, the invariant ∼ relates initial states as well as final states, such a simulation diagram implies that any execution of the original program corresponds to an execution of the transformed program that produces exactly the same trace of observable events. Semantic preservation therefore follows.
(实线为前提,而虚线为结论。) 进而,如不变量~表示初始状态和终止状态之间的关系, 则本模拟关系图表示:原始程序的任何执行都对应着变换后程序的执行,二者具有完全相同的可观察行为轨迹。 语义保持特性因而得证。
The gist of a proof by simulation is the definition of the ∼ relation. What are the conditions for two states S(Σ, g, s, l, R, M) and S(Σ′, g′, s′, l′, R′, M′) to be related? Intuitively, since register allocation preserves program structure and control flows, the control points l and l′ must be identical, and the CFG g′ must be the result of transforming g according to some register allocation j as described in Section 4.2. Likewise, since register allocation preserves memory stores and allocations, the memory states and stack pointers must be identical: M′ = M and s ′ = s.
证明模拟关系的要点是~关系的定义。 那么,两个状态S(Σ, g, s,l, R, M)和S(Σ′, g′, s′, l′, R′, M′)满足~关系的条件是什么呢? 直观来讲,由于寄存器分配过程中程序结构和控制流保持不变, 因此控制点l和l‘一定完全相等, 同时CFG g'一定是g经过4.2节所述某种寄存器分配j的变换结果。 同样地,因为寄存器分配过程中内存存储和分配保持不变, 因此内存状态和栈指针一定保持不变, 即 M′ = M 且 s ′ = s。
The nonobvious relation is between the register state R of the original program and the location state R′ of the transformed program. Given that each pseudo-register r is mapped to the location j(r), we could naively require that R(r) = R′(j(r) ) for all r. However, this requirement is much too strong, as it essentially precludes any sharing of a location between two pseudo-registers whose live ranges are disjoint. To obtain the correct requirement, we need to consider what it means, semantically, for a pseudo-register to be live or dead at a program point l. A dead pseudo-register r is such that its value at point l has no influence on the program execution, because either r is never read later, or it is always redefined before being read. Therefore, in setting up the correspondence between register and location values, we can safely ignore those registers that are dead at the current point l. It suffices to require the following condition:
原始程序的寄存器状态R和变换后程序的存储状态R′之间的关系并不那么明显, 假定每个伪寄存器r映射到分配单元j(r),我们可以简单地要求对所有r有R(r) = R′(j(r))。 不过,由于这个条件基本排除了两个活跃范围无重叠伪寄存器之间共享同一个寄存器分配单元的可能性,所以此要求太强。 为了获得正确的要求条件,我们需要认真考虑一个伪寄存器在程序点l处活跃和非活跃在语义上的意义。 对于一个非活跃伪寄存器r来说, 由于要么之后不会读取r的值,要么r会在下一次读取之前重新定值, 因此r在l处的值不会影响到程序的执行。 所以,在设定寄存器及其分配单元之间的对应关系时,我们可以安全地忽略当前点l的所有非活跃寄存器。 所要求的充分条件如下:
R(r) = R′(j(r) ) for all pseudo-registers r live at point l.
R(r) = R′(j(r) ) 对所有在l点活跃的伪寄存器r。
Once the relation between states is set up, proving the simulation diagram above is a routine case inspection on the various transition rules of the RTL semantics. In doing so, one comes to the pleasant realization that the dataflow inequations defining liveness, as well as Chaitin’s rules for constructing the interference graph, are the minimal sufficient conditions for the invariant between register states R, R′ to be preserved in all cases.
一旦建立状态之间的模拟关系, 证明述模拟关系图就成为一种检查各种RTL语义转移规则的常规工作。 而在检查过程中, 人们将会惊喜地发现定义活跃性的数据流不等式和构建冲突图所遵循的Chaitin规则, 是任意情况下寄存器状态R、R‘之间不变量保持的最小充分条件。

5. 结论与展望

The CompCert experiment described in this paper is still ongoing, and much work remains to be done: handle a larger subset of C (e.g. including goto); deploy and prove correct more optimizations; target other processors beyond PowerPC; extend the semantic preservation proofs to shared-memory concurrency, etc. However, the preliminary results obtained so far provide strong evidence that the initial goal of formally verifying a realistic compiler can be achieved, within the limitations of today’s proof assistants, and using only elementary semantic and algorithmic approaches. The techniques and tools we used are very far from perfect—more proof automation, higher-level semantics and more modern intermediate representations all have the potential to significantly reduce the proof effort— but good enough to achieve the goal.
本文所述的CompCert实践仍在进行中,仍有很多急待解决的问题: 比如处理包括goto在内的更大的C语言子集、加入并证明更多的优化方法、重定向到PowerPC之外的其他处理器、 扩展语言保持证明以支持共享存储式的并发等等。 尽管如此,我们的初步结论雄辩地证明, 在今天有限证明工具的条件下,在仅仅采用基本的语义和算法的情况下, 形式化验证真实可用编译器这一最初目标是完全可以实现的。 我们所采用的技术和工具远非完善,但已经足够实现本项目的目标, 而更好的自动证明工具、更高级的语义以及更为现代的中间表示必将极大地减少证明的工作量。
Looking back at the results obtained, we did not completely rule out all uncertainty concerning the correctness of the compiler, but reduced the problem of trusting the whole compiler down to trusting the following parts:
1. The formal semantics for the source (Clight) and target (PPC) languages.
2. The parts of the compiler that are not verified yet: the CIL-based parser, the assembler, and the linker.
3. The compilation chain used to produce the executable for the compiler: Coq’s extraction facility and the Caml compiler and run-time system. (A bug in this compilation chain could invalidate the guarantees obtained by the correctness proof.)
4. The Coq proof assistant itself. (A bug in Coq’s implementation or an inconsistency in Coq’s logic could falsify the proof.)
回首本项目的成果可以发现,我们并没有彻底解决所有编译器正确性相关的问题, 而仅仅是将从原来需要信任整个编译器的问题简化为目前只要信任以下的几个小的部分:
1. 源(Clight)和目标(PPC)语言的形式语义。
2. 编译器的有些组成部分还没有验证,包括基于CIL的解析器、汇编器和链接器。
3. 构造编译器执行代码的编译工具链,包括:从Coq中提取代码的工具、Caml语言编译器和运行时系统 (这个工具链中的任何错误都将彻底破坏正确性证明带来的特性)。
4. Coq辅助证明工具本身(Coq实现的错误或者是Coq逻辑的不一致同样将会毁掉所有证明)。
Issue (4) is probably the least concern: as Hales argues,10 proofs mechanically checked by a proof assistant that generates proof terms are orders of magnitude more trustworthy than even carefully hand-checked mathematical proofs.
问题(4)可能是最不需要担心的,正如Hales所言[10], 证明辅助工具机械化检查过的证明所生成证明项的可信程度, 要比哪怕是最仔细的手工检查数学证明都要高好几个数量级。
To address concern (3), ongoing work within the CompCert project studies the feasibility of formally verifying Coq’s extraction mechanism as well as a compiler from Mini-ML (the simple functional language targeted by this extraction) to Cminor. Composed with the CompCert back-end, these efforts could eventually result in a trusted execution path for programs written and verified in Coq, like CompCert itself, therefore increasing confidence further through a form of bootstrapping.
针对第(3)个问题,CompCert项目中正在进行的工作将研究 “从Coq中提取代码的工具和Mini-ML语言(专门用于代码提取的简单函数式语言)到Cminor编译器的形式化验证”的可行性。 毫无疑问地,这些工作和CompCert后端一起,将构成Coq中编码和验证的程序的可信任执行路径, 类似于CompCert自身,进一步的自展技术将大大提升其可信程度。
Issue (2) with the unverified components of CompCert can obviously be addressed by reimplementing and proving the corresponding passes. Semantic preservation for a parser is difficult to define, let alone prove: what is the semantics of the concrete syntax of a program, if not the semantics of the abstract syntax tree produced by parsing? However, several of the post-parsing elaboration steps performed by CIL are amenable to formal proof. Likewise, proving the correctness of an assembler and linker is feasible, if unexciting.
问题(2)中未经验证的CompCert模块,毫无疑问可以通过重新实现和证明相关编译遍的方式来解决。 解析器的语义保持特性很难定义,更可用说去证明: 如果不是解析所生成的抽象语法树的语义,那么一个程序具体词法的语义是什么呢? 不过,在解析遍之后,CIL所精心设计的几个修改步骤是适合于形式化证明的。 同理,虽说不是那么令人振奋,汇编器和链接器的正确性证明也是可行的。
Perhaps the most delicate issue is (1): how can we make sure that a formal semantics agrees with language standards and common programming practice? Since the semantics in question are small relative to the whole compiler, manual reviews by experts, as well as testing conducted on executable forms of the semantics, could provide reasonable (but not formal) confidence. Another approach is to prove connections with alternate formal semantics independently developed, such as the axiomatic semantics that underline tools for deductive verification of programs (see Appel and Blazy2 for an example). Additionally, this approach constitutes a first step towards a more ambitious, long-term goal: the certification, using formal methods, of the verification tools, code generators, compilers and runtime systems that participate in the development, validation and execution of critical software.
也许最令人头疼的是问题(1),如何才能确保形式化语义符合语言标准和编程常规? 由于上述问题中的语义和完整编译器的关系并不大, 所以专家人工审查 以及按照语义的可执行形态实施测试将能够提供合乎情理(而非形式化)的信心。 另外一种方式是证明与其他独立开发的、 诸如程序证明推导工具所基于的公理语义(如Appel and Blazy[2])等形式语义的关系。 更进一步,本方法仅仅是一个起点,将通向更为长远、宏大的目标, 这个目标就是形式化地验证用于开发、确认和运行关键软件相关的,包括验证工具本身、 代码生成器、编译器和运行时系统在内的整个环境。

致谢

The author thanks S. Blazy, Z. Dargaye, D. Doligez, B. Grégoire, T. Moniot, L. Rideau, and B. Serpette for their contributions to the CompCert development, and A. Appel, Y. Bertot, E. Ledinot, P. Letouzey, and G. Necula for their suggestions, feedback, and help. This work was supported by Agence Nationale de la Recherche, grant number ANR- 05-SSIA-0019.
作者感谢S. Blazy, Z. Dargaye, D. Doligez, B. Grégoire, T. Moniot, L. Rideau, 和 B. Serpette等人为CompCert开发所作出的贡献, 感谢A. Appel, Y. Bertot, E. Ledinot, P. Letouzey, 和 G. Necula等人的建议、反馈和帮助。 本文工作得到Agence Nationale de la Recherche, 编号ANR-05-SSIA-0019的资助。

名词索引 Index

错误,Bug
确认器,Validator
验证式编译器, A verifying compiler uses mathematical and logical reasoning to check the correctness of the programs that it compiles. (TONY HOARE, The Verifying Compiler: A Grand Challenge for Computing Research, Journal of the ACM, Vol. 50, No. 1, January 2003, pp. 63–69.)
出具证明编译器, A certifying compiler generates the independently checkable evidence that the compiled code C satisfies a behavioral specification Spec such as type and memory safety. (Necula, G.C. Proof-carrying code. In 24th Symposium on the Principles of Programming Languages (1997), ACM, 106–119.)
翻译确认, Translation validation.
已验证编译器, A verified compiler either reports an error or produces code that satisfies the desired correctness property. Leroy, X. The CompCert verified compiler, software and commented proof. Available at http://compcert. inria.fr/, Aug. 2008.
已确认编译器, A validated compiler is complemented by a validator: a boolean-valued function that verifies the property S ≈ C a posteriori. If Comp(S) = OK(C) and Validate(S, C) = true, the compiled code C is deemed trustworthy. (Necula, G.C. Translation validation for an optimizing compiler. In Programming Language Design and Implementation 2000 (2000), ACM, 83–95.)

References 参考文献

1. Appel, A.W. Foundational proofcarrying code. In Logic in Computer Science 2001 (2001), IEEE, 247–258.
2. Appel, A.W., Blazy, S. Separation logic for small-step Cminor. In Theorem Proving in Higher Order Logics, TPHOLs 2007, volume 4732 of LNCS (2007), Springer, 5–21.
3. Bertot, Y., Castéran, P. Interactive Theorem Proving and Program Development—Coq’Art: The Calculus of Inductive Constructions (2004), Springer.
4. Blazy, S., Dargaye, Z., Leroy, X. Formal verification of a C compiler front-end. In FM 2006: International Symposium on Formal Methods, volume 4085 of LNCS (2006), Springer, 460–475.
5. Blazy, S., Leroy, X. Mechanized semantics for the Clight subset of the C language. J. Autom. Reasoning (2009). Accepted for publication, to appear.
6. Chaitin, G.J. Register allocation and spilling via graph coloring. In 1982 SIGPLAN Symposium on Compiler Construction (1982), ACM, 98–105.
7. Coq development team. The Coq proof assistant. Available at http:// coq.inria.fr/, 1989–2009.
8. Dave, M.A. Compiler verification: a bibliography. SIGSOFT Softw. Eng. Notes 28, 6 (2003), 2.
9. George, L., Appel, A.W. Iterated register coalescing. ACM Trans. Prog. Lang. Syst. 18, 3 (1996), 300–324.
10. H ales, T.C. Formal proof. Notices AMS 55, 11 (2008), 1370–1380.
11. Leroy, X. Formal certification of a compiler back-end, or: programming a compiler with a proof assistant. In 33rd Symposium on the Principles of Programming Languages (2006), ACM, 42–54.
12. Leroy, X. The CompCert verified compiler, software and commented proof. Available at http://compcert. inria.fr/, Aug. 2008.
13. Leroy, X. A formally verified compiler back-end. arXiv:0902.2137 [cs]. Submitted, July 2008.
14. Leroy, X., Blazy, S. Formal verification of a C-like memory model and its uses for verifying program transformations. J. Autom. Reasoning 41, 1 (2008), 1–31.
15. Letouzey, P. Extraction in Coq: An overview. In Logic and Theory of Algorithms, Computability in Europe, CiE 2008, volume 5028 of LNCS (2008), Springer, 359–369.
16. McCarthy, J., Painter, J. Correctness of a compiler for arithmetical expressions. In Mathematical Aspects of Computer Science, volume 19 of Proceedings of Symposia in Applied Mathematics (1967), AMS, 33–41.
17. Milner, R., Weyhrauch, R. Proving compiler correctness in a mechanized logic. In Proceedings of 7th Annual Machine Intelligence Workshop, volume 7 of Machine Intelligence (1972), Edinburgh University Press, 51–72.
18. Morrisett, G., Walker, D., Crary, K., Glew, N. From System F to typed assembly language. ACM Trans. Prog. Lang. Syst. 21, 3 (1999), 528–569.
19. Necula, G.C. Proof-carrying code. In 24th Symposium on the Principles of Programming Languages (1997), ACM, 106–119.
20. Necula, G.C. Translation validation for an optimizing compiler. In Programming Language Design and Implementation 2000 (2000), ACM, 83–95.
21. Necula, G.C., McPeak, S., Rahul, S.P., Weimer, W. CIL: Intermediate language and tools for analysis and transformation of C programs. In Compiler Construction, volume 2304 of LNCS (2002), Springer, 213–228.
22. Pnueli, A., Siegel, M., Singerman, E. Translation validation. In Tools and Algorithms for Construction and Analysis of Systems, TACAS ‘98, volume 1384 of LNCS (1998), Springer, 151–166.
23. Tristan, J.-B., Leroy, X. Formal verification of translation validators: A case study on instruction scheduling optimizations. In 35th Symposium of the Principles of Programming Languages (2008), ACM, 17–27.
24. Tristan, J.-B., Leroy, X. Verified validation of lazy code motion. In Programming Language Design and Implementation 2009 (2009), ACM. To appear.