Monday 7 September 2015

A Smaller, Better compiler suite.

You should be able to get a C compiler, assembler, linker and libc for any supported target in less than 30 seconds just by typing make... Or at least thats my plan.

I have started work on a BSD licensed simple but powerful C compiler suite here https://github.com/andrewchambers/c (A C port/continuation of my now frozen Go based C compiler). After a few months of work in my free time the compiler is building some non trivial test cases on amd64 Ubuntu, but no real software.

I encourage you to clone it and have a play around.

Some general goals I have in mind are:
  • Compile times that are 2 - 5 times faster than gcc or clang. TCC is 10 times faster, but does not have text assembly or an AST.
  • Be one to two orders of magnitude smaller than gcc and clang/llvm. For every million lines of gcc code, we could have ten thousand line of code.
  • Emit assembly that has performance at least equal to tcc. This is a modest performance goal so we don't focus prematurely on this over compatibility.
  • Have the whole system build from source in less than 30 seconds (probably much less) on a modest desktop machine or even low end arm systems.
  • Be zero config compatible with the excellent Musl libc on Linux.

To answer why I would start a new compiler suite from scratch, perhaps the following will resonate with you.


GCC and Clang:


GCC is large and complicated and non standard. Generally porting it is difficult and out of reach of hobbyists. Building these compilers from source requires 20 minutes to many hours. LLVM and Clang suffer from the same issues and they have added CMake to the list of things I can't get behind.

For most of my use cases I question the need for hundreds of thousands of lines of optimizer code. I think the Google Go toolchain + stdlib's 30 second build proves this nicely. I would prefer a simple C compiler written in C, to a complicated C++ compiler written in C++ supporting all of C++ with C on the side.

Bootstrapping these cross compilers with working libc's is so complicated/arcane there are dedicated tools like buildroot and crosstool-ng just to manage the complexity.

Both these compilers also seem to require more ram and cpu to self host than modest hardware or emulators like qemu can provide. This is actually a serious barrier to overcome when trying to work with many platforms.

TCC:


TCC is extremely fast and small, I generally use tcc as my primary C compiler when I don't want to deal with GCC. I have two issues with this compiler.

I don't think I am alone in saying the code style is terse, hard to understand. Perhaps it was written with speed alone in mind, perhaps the lack of AST has allowed some ugly hacks into the code base, or perhaps my taste is just different. I would encourage you to make these judgement call for yourself by comparing code.

The major limitation however, is that because TCC emits binary directly with no text assembly, it is much harder to use with some hobby systems which have existing assemblers. This was the main deal breaker for me.

PCC and 8CC:


PCC is old, mature, and generates good code and can build real programs. 8cc is simple and self hosting with a small and nice code base.

These are the best candidate's so far to meet my goals. All I can really say is I think we can take the best ideas from these projects, and have no problem sharing code/design in order to create the best system possible.


2 comments:

  1. You might want to check out cparser/Firm: http://pp.ipd.kit.edu/firm/

    ReplyDelete
    Replies
    1. The parser alone is 40k lines of code in cparser. This strikes me as a bit odd considering tcc including 7 or 8 ports including assemblers and some linker code is less than 40k lines of code.

      libfirm itself is nearly 200k lines of code. To me that is crazy even for an SSA backend. Consider the Go toolchain which has just added an SSA backend for a language more complicated than C, this backend is 20k lines of code, or 10 times smaller.

      Delete