Compcert | Formal Verification of a Realistic Compiler

English Version

Technical perspective -- A compiler's story
by Greg Morrisett

Note: Communications of the ACM, Volume 52, Number 7 (2009), Pages 106-106

技术视角 ———— 编译器的故事
董渊翻译；王生原郭宇审校

In the early 1970s, pioneers like Floyd, Dijkstra, and Hoare argued that programs should be formally specified and proven correct. But for the past 40 years, most of the computer science community has discounted this vision as an unrealistic goal. Perhaps the most important reason has been simple economics: Throughout the 1980s and 1990s, the industry tended to be more interested in factors such as time-to-market than issues involving correctness.

上世纪七十年代，Floyd、Dijkstra和Hoare等先驱者认为应该形式化地给出程序规范并证明其正确性。然而，过去四十年来，多数计算机科学团体都认为这只是一个不切实际的目标，也许最重要的原因就是简单经济学原理：纵观八十年代和九十年代的二十多年间，工业界更倾心的指标是上市时间，而不是正确性相关问题。
But the context has changed dramatically over the intervening years: Security and reliability have become key concerns in the fastest growing sectors like embedded systems where lives may be at stake. Even in non-critical domains, developers must worry about bugs, including buffer overruns and race conditions that can lead to security exploits. Researchers have developed a variety of tools, including strong static type checkers, software model checkers, and abstract interpreters, all of which can (and are) used to help enforce a range of safety properties. Consequently, formal methods are in wide use today, albeit disguised within tools.
不过，这些年争议过后情况发生了重大变化：在诸如嵌入系统等快速增长且和生命财产安全息息相关的领域中，安全性和可靠性成为了关键问题。即便在不那么关键的领域中，开发人员也必须为缓冲区溢出和竞争条件等可能造成安全攻击的程序错误而提心吊胆。研究人员已经开发出诸如强静态类型检查器、软件模型检查器、抽象解释器等各种各样的工具，所有这些工具均可以（也已经）一定程度地增强程序安全特性。因此，形式化方法尽管是隐藏在工具背后，但其应用正日益广泛。
However, these tools typically operate at the source-level (or, at best, a VM bytecode level), and not at machine code level. To scale, they must make assumptions about how a compiler will refine the source code to machine code. For example, even though the C language specification does not specify an order of evaluation for function arguments, most analysis tools assume the compiler will use a left-to-right strategy, since analyzing all possible evaluation strategies would take too much time. This mismatch of assumptions, or a bug in the compiler itself, can easily render the analysis tool useless.
不过，这些工具通常都应用于源代码层面（或者，更好的情况是在VM字节码层面），而不是在机器码层面。而为了扩大其使用范围，这些方法必须对编译器从源代码到机器码的翻译方式做出一些假定，比如，尽管C语言规范中没有规定函数参数的求值顺序，但是很多分析工具都假定编译器采用从左到右的求值策略，而其原因仅仅是分析所有的求值策略可能花费太多时间。诸如此类违反前提假定、或者是编译器自身错误等问题，均可以轻而易举地让这类分析工具失效。
In the following paper, Xavier Leroy addresses these problems by describing a compiler he built and verified using the Coq proof development system. Although he is not the first to build a verified translator, Leroy's compiler is notable and exciting for three key reasons: First, it maps a useful source language (a significant subset of C) to PowerPC assembly, making it directly usable for a range of embedded applications. Second, his compiler includes a number of analyses and optimizations, such as liveness analysis and graph-coloring register allocation, that makes the resulting code competitive with gcc -0. Third, the proof of correctness is mechanically checked, yielding the highest level of assurance we can hope to achieve. In short, developers can be assured that, in spite of optimization, the output of Leroy's compiler will behave the same as the source code input.
下面这篇文章中，Xavier Leroy采用一个经Coq证明辅助系统开发并验证的编译器来专门解决这类问题。虽说他已不是第一位开发已验证程序变换工具的人，不过，Leroy所开发的编译器值得称道并令人振奋，三个关键原因是：第一，该编译器将可用的源语言（C的较大子集）翻译为PowerPC汇编代码，可直接用于一系列嵌入应用的开发；第二，该编译器包括了活跃分析和图着色寄存器分配在内的很多分析与优化过程，这使得其结果可匹敌于gcc O0产生的代码；第三，其正确性证明可以机械化检查，达到了我们所能期望的最高可信程度。简而言之，开发人员能够确信“即便是经过优化，Leroy的编译器所生成的代码仍将保持和输入源代码完全等价的行为”。
There are a number of hidden contributions to this work, beyond the construction of a fully verified, optimizing compiler: The compiler was built in a modular, pipelined fashion as a series of translations between well-specified intermediate languages, making it possible to add new optimizations or reuse components for other projects. For example, the specification for the C-subset can be reused to build new verified tools, such as a source-level analysis.
除了构造一个完全经过验证的优化编译器之外，这项工作还有一些更多贡献：采用了模块化、管道－过滤器设计模式，将整个编译器划分为一系列规范良好的中间语言之间变换的模块，这为加入新优化算法或者是在别的项目中重用这些模块提供了可能。比如，C子集的规范就可以进一步用于构建源码分析器等新的已验证工具。
The compiler also demonstrates judicious use of translation validation in lieu of full verification. With translation validation, we can use unverified components to compute something (for example, an assignment of variables to registers) and need only check the output is valid (no interfering variables are assigned to the same register). Only the checker must be verified to ensure soundness, and this is often much easier than validating a full analysis and transformation. Translation validation is one engineering technique that can drastically reduce the burden of building verified systems.
同时，这个编译器还说明了在完整验证中适时使用翻译确认方法是一个明智之选。采用这一方法，我们可以使用某些未经验证的计算模块（比如，变量的寄存器分配），而需要做的仅仅是检查其输出结果（不将相互冲突的变量分配到同一个寄存器）。要保证其可靠性，只要验证相关的检查器即可，而这种验证工作常常要比确认一个完整的分析和转换算法容易得多。可以说，翻译确认是一种能够大幅度消减构建已验证系统工作负担的工程技术。
This paper also shows how far we must go before verification becomes routinely feasible for production compilers or any other setting. Foremost is the cost of constructing and maintaining the proof as the code evolves. Leroy's proof of correctness is about five to six times as big as the compiler itself, making it difficult to significantly change the code without breaking the proof. However, the proof was constructed largely by hand, and for the most part, does not take advantage of semi-automated decision procedures or proof search, a research area that has seen tremendous progress over the past decade. Indeed, work by other researchers following Leroy's lead suggests we can potentially cut the size of the proofs by up to an order of magnitude.
本文还展示了形式化验证成为人们开发编译器或其他工具的常规方法之前尚需完成的工作，最为重要的一点是代码演进过程中证明的构造和维护开销。 Leroy的正确性证明的规模大约是编译器自身代码的5～6倍，这就使得人们很难在不改变证明的情况下对编译器进行较大规模的修改。而且，大量的证明采用手工方法构造，大多数情况下，还没有利用半自动决策过程或者证明搜索技术带来的好处，而这些相关技术恰恰处于最近十年来进步最显著的研究领域。事实上，和Leroy一起工作的其他研究人员认为，完全有可能将证明削减将近一个数量级。
Perhaps the biggest challenge we face is specification. Compilers have a fairly clean notion of "correctness" (the output code should behave the same as the input code), but most software systems do not. For example, what does it mean for an operating system or Web browser to be correct? At best we can hope to formalize some safety and security properties that these systems should obey, and be willing to adapt these properties as our understanding of failures and attacks improves. In turn, this demands a verification architecture that allows specifications to be modified and adapted almost as frequently as the code. Fortunately, verified compilers make it possible to do this sort of adaptation using high-level languages without sacrificing assurance for the generated machine code.
或许我们所面临的最大挑战是程序规范。编译器具有相对清晰的“正确性”概念（输出代码应该和输入代码具有相同的行为），然而其他大多数软件系统并不是这样。比如，操作系统或网络浏览器正确性的含义是什么？最终我们希望能形式化这些系统应该遵循的某些安全性特性，并利用这些特性来增进对程序失效和攻击机理的理解。这就需要有一个让我们可以像修改代码那样频繁地修改规范的程序验证架构。非常幸运的是，已验证编译器让我们拥有了这种可能，我们可以任意修改高级语言代码而不会牺牲对所生成机器代码的保证。
Consequently, I think we are on the verge of a new engineering paradigm for safety- and security-critical software systems, where we rely upon formal, machine-checked verification for certification, instead of human audits. Leroy's compiler is an impressive step toward this goal.
总而言之，我认为我们正处于一个崭新的起点，开发安全关键软件系统的全新工程方式的崭新起点，这种方式中，我们所信赖的证书是形式化的、可机器检查的证明，而不再是人工审计结果。 Leroy的编译器正是向着这一目标大步迈进中令人印象深刻的一步。

Author:
Greg Morrisett is the Allen B. Cutting Professor of Computer Science and associate dean for Computer Science and Engineering at Harvard University.
DOI: http://doi.acm.org/10.1145/1538788.1538813