Andrew Chambers on Compilers: September 2015

Sunday, 20 September 2015

Compiler Warnings Considered Harmful

In this post I would like to make the argument that compiler warnings are bad. This may seem like a crazy thing to say, but bear with me while I make the case.

There are two times when a compiler is being used, the first is when the developer is writing the code, the second is when an end user or packager is building the package from source. The second use case is actually done far more often for popular software, but is seriously under served by compiler writers. For packagers, compiler warnings do a few things:

Break the build when -Werror is enabled and different compilers add new diagnostics.
Train people to ignore messages from the compilers.
Waste CPU time by checking things the person has no ability to fix anyway.
Look ugly.

For the programmer, warnings are actually useful, but misplaced. The compiler's job is to turn code into assembly/bytecode/whatever, and nothing else. The compiler should only stop on an invalid program. We actually already have tools designed to warn the programmer of bugs, and those are called linters and static analysers.

My proposal is simple. Shift all warnings from compilers and into code analysis tools that are as easy to run as the compiler itself. That way programmers get good warnings and our compilers can be faster and less annoying for everyone else. The Go programming language designers already realized this, with a compiler that emits no warnings, and excellent tools for catching bugs (https://golang.org/cmd/vet/, https://github.com/golang/lint).

Whatever happened to do one thing, and one thing well. Compiler warnings are bad design.

Saturday, 12 September 2015

Calling conventions are hard - Fuzz them!

I am busy implementing the C AMD64 calling conventions in my C compiler suite and have a topic worthy of a post. It is about testing the C ABI (How C programs layout structs and perform function calls).

The old Linux C x86 ABI was relatively simple, to call a function you pushed arguments onto the stack in reverse order and you are done with it. Unfortunately for me, most people now use AMD64 processors, so that is what I need to target first. The AMD64 ABI designers apparently didn't like simple or well specified things (presumably because it would make software engineering too easy), so they created this document to describe the way C structs/arguments are laid out in memory and registers among other things.

I have a few problems with the document, such as a lack of examples, lack of pseudo code for the classification algorithm, and underspecified edge cases. However, regardless of whether my complaints are valid or not, I still need to implement the thing correctly before my compiler can self host. I need a good way to test my implementation...

Enter ABIFUZZ

We have a few C compilers like gcc and clang we can test against, but hand writing interesting test cases is a chore, so I decided to automate it. The general steps are quite simple:

Decide how many arguments you want.
Decide the types of those arguments.
Generate values for the arguments.
Decide the return type.
Generate a return values.
Generate code to do the call and check the values.

The tool is located here here and took an afternoon to write. Here's the end result:

The final step is to write a script to split the caller and callee into two files to test interop when each is compiled by different C compiler.

Bugs found:

http://savannah.nongnu.org/bugs/index.php?45950
https://github.com/andrewchambers/c/issues/14
https://github.com/andrewchambers/c/issues/13

Monday, 7 September 2015

A Smaller, Better compiler suite.

You should be able to get a C compiler, assembler, linker and libc for any supported target in less than 30 seconds just by typing make... Or at least thats my plan.

I have started work on a BSD licensed simple but powerful C compiler suite here https://github.com/andrewchambers/c (A C port/continuation of my now frozen Go based C compiler). After a few months of work in my free time the compiler is building some non trivial test cases on amd64 Ubuntu, but no real software.

I encourage you to clone it and have a play around.

Some general goals I have in mind are:

Compile times that are 2 - 5 times faster than gcc or clang. TCC is 10 times faster, but does not have text assembly or an AST.
Be one to two orders of magnitude smaller than gcc and clang/llvm. For every million lines of gcc code, we could have ten thousand line of code.
Emit assembly that has performance at least equal to tcc. This is a modest performance goal so we don't focus prematurely on this over compatibility.
Have the whole system build from source in less than 30 seconds (probably much less) on a modest desktop machine or even low end arm systems.
Be zero config compatible with the excellent Musl libc on Linux.

To answer why I would start a new compiler suite from scratch, perhaps the following will resonate with you.

GCC and Clang:

GCC is large and complicated and non standard. Generally porting it is difficult and out of reach of hobbyists. Building these compilers from source requires 20 minutes to many hours. LLVM and Clang suffer from the same issues and they have added CMake to the list of things I can't get behind.

For most of my use cases I question the need for hundreds of thousands of lines of optimizer code. I think the Google Go toolchain + stdlib's 30 second build proves this nicely. I would prefer a simple C compiler written in C, to a complicated C++ compiler written in C++ supporting all of C++ with C on the side.

Bootstrapping these cross compilers with working libc's is so complicated/arcane there are dedicated tools like buildroot and crosstool-ng just to manage the complexity.

Both these compilers also seem to require more ram and cpu to self host than modest hardware or emulators like qemu can provide. This is actually a serious barrier to overcome when trying to work with many platforms.

TCC:

TCC is extremely fast and small, I generally use tcc as my primary C compiler when I don't want to deal with GCC. I have two issues with this compiler.

I don't think I am alone in saying the code style is terse, hard to understand. Perhaps it was written with speed alone in mind, perhaps the lack of AST has allowed some ugly hacks into the code base, or perhaps my taste is just different. I would encourage you to make these judgement call for yourself by comparing code.

The major limitation however, is that because TCC emits binary directly with no text assembly, it is much harder to use with some hobby systems which have existing assemblers. This was the main deal breaker for me.

PCC and 8CC:

PCC is old, mature, and generates good code and can build real programs. 8cc is simple and self hosting with a small and nice code base.

These are the best candidate's so far to meet my goals. All I can really say is I think we can take the best ideas from these projects, and have no problem sharing code/design in order to create the best system possible.