编译器自举笔记

来源：互联网发布：数据专员招聘要求编辑：程序博客网时间：2024/06/10 09:28

自举相关

bootstrapping

In computer science, bootstrapping is the process of writing a compiler (or assembler) in the source programming language that it intends to compile. Applying this technique leads to a self-hosting compiler. An initial minimal core version of the compiler is generated in a different language (which could be assembly language); from that point, successive expanded versions of the compiler are run using the minimal core of the language.

Self-hosting compilers

Like any other software, there are benefits from implementing a compiler in a high-level language. In particular, a compiler can be self-hosted – that is, written in the programming language it compiles. Building a self-hosting compiler is a bootstrapping problem, i.e. the first such compiler for a language must be either hand written machine code or compiled by a compiler written in another language, or compiled by running the compiler in an interpreter.

The chicken and egg problem

If one needs to compile a compiler for language X (written in language X), there is the issue of how the first compiler can be compiled. The different methods that are used in practice to solving this chicken or the egg problem include:

Implementing an interpreter or compiler for language X in language Y. Niklaus Wirth reported that he wrote the first Pascal compiler in Fortran.[citation needed]
Another interpreter or compiler for X has already been written in another language Y; this is how Scheme is often bootstrapped.
Earlier versions of the compiler were written in a subset of X for which there existed some other compiler; this is how some supersets of Java, Haskell, and the initial Free Pascal compiler are bootstrapped.
A compiler supporting non-standard language extensions or optional language features can be written without using those extensions and features, to enable it being compiled with another compiler supporting the same base language but a different set of extensions and features. The main parts of the C++ compiler clang were written in a subset of C++ that can be compiled by both g++ and Microsoft Visual C++. Advanced features are written with some GCC extensions.
The compiler for X is cross compiled from another architecture where there exists a compiler for X; this is how compilers for C are usually ported to other platforms. Also this is the method used for Free Pascal after the initial bootstrap.
Writing the compiler in X; then hand-compiling it from source (most likely in a non-optimized way) and running that on the code to get an optimized compiler. Donald Knuth used this for his WEBliterate programming system.
Methods for distributing compilers in source code include providing a portable bytecode version of the compiler, so as to bootstrap the process of compiling the compiler with itself. The T-diagram is a notation used to explain these compiler bootstrap techniques.[2] In some cases, the most convenient way to get a complicated compiler running on a system that has little or no software on it involves a series of ever more sophisticated assemblers and compilers.[3]

机器生汇编，汇编生B，B生C，C生万物

C语言编译器是用C语言开发“这句话的正确理解应该是这样的一个过程：

首先使用汇编语言编写出一个C语言编译器 ①.exe（也就是早起的C编译器）;
有了 ①.exe 之后，就可以用 ①.exe来编译C代码，得到一个程序 ②.exe
②.exe的功能就可以是读取文本（即C语言源代码），根据文本的生成相应的汇编代码。
这里的②.exe 其实就是”用C语言开发的C语言编译器“

作者：郭无心
链接：https://www.zhihu.com/question/20369232/answer/64684538
来源：知乎
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

你想创造一门V语言而且用V语言来写V编译器的话，你得按照下面的方法做：
1、用C++把那个编译器（A）写出来，顺便留下很多测试用例。
2、用V语言把那个编译器写（B）出来，用A.exe来编译B，修改直到所有测试用例都通过为止。
3、B.exe来编译B自己得到B2.exe，修改直到B2.exe所有测试用例都通过为止。这是为了保证，就算B本身有很多bug，至少编译自己是没有bug的，从而你就可以走到第四步。
4、当你觉得有信心了，用A.exe把B编译一遍，就得到了B.exe。然后A的代码和A.exe都在也不需要存在了，删掉他们。以后你就不断的用B.exe来编译下一个版本的B就好了。就自举了。

作者：vczh
链接：https://www.zhihu.com/question/28513473/answer/41094452
来源：知乎
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

阅读全文

0 0