Wednesday, May 10, 2006

Writing a pure functional lazy call-by-name compiler

There are lots of minimal functional language/lambda expression evaluators out there, so instead I thought I'd write one that actually compiled to standalone C. But by accident it seems to have grown and I now have a 'typeless' Haskell, or at least a language with ML type grammar, lazy evaluation and call-by-name, primitive monad support and only two types - integers and lists. I'm now at the stage where I can compile pieces of code like this:

return a s = a : s.
bind x f s = let vs = x s in seq vs ((f (head vs) (tail vs))).
bind' x f s = bind x (\a b -> f b) s.

contains x = if x
(if (head x==81) 1 (contains (tail x)))

getLine = do {
c <- getChar;
if (c==10)
(return 0)
do {
l <- getLine;
return (c : l)


putLine l = if l
do {
putChar (head l);
putLine (tail l)
(return 0).

prog = do {
a <- getLine;
if (contains a) ( do { putLine a; putChar 10 }) (return 0);

start = prog 666.

This piece of code does the same as "grep Q" and amazingly doesn't leak memory and could probably outrun a snail (just). When I tidy it up and fix the bugs I'll release the compiler on my web page.

It's been one hell of a learning experience. The main thing I've learnt is that those compiler writers are damn clever people. But I also learnt other things like: even the most innocent looking expressions can gobble up vast amounts of memory in a lazy language, it's really heard to debug lazy programs even when you have complete freedom to make your compiler emit whatever 'instrumentation' you want, garbage collection is more subtle than I thought and Haskell has a weird grammar. On the other hand, some things were easier than I expected. Dealing with lambda expressions looks hard at first - but lambda lifting is very easy to implement so that once you have a way of compiling equations, lambdas come for free. And ultimately it's incredible how much stuff you can get for nothing. The core compiler is miniscule - it knows how to compile one operation - function application. The rest is syntactic sugar or part of the library that the C code links to.

My ultimate goal is to get my compiler to emit self-contained C programs that are as small as possible. I'd like to take these programs and make them run on a microcontroller like the one in this robot I built last year. It has 1K of RAM, 8K of program space and runs at 8MHz, which may be enough for a simple light following algorithm. (In assembler it'd be about 100 bytes long and use 2 or 3 bytes of RAM.) I don't want to actually achieve anything useful - I'm too much of a mathematician to want to do that. I just want to construct an example of what I see as an extreme form of computational perversity: a pure language with no side effects having actual side effects in the physical world.


Kim-Ee Yeoh said...

I can relate to some of your experiences developing a compiler. One of the better kept secrets for learning Haskell is Peyton-Jones and Lester's Implementing functional languages: a tutorial, which guides you through several ways of executing Haskell, including the G-machine at the heart of GHC.

Best of luck on the embedded project. I'd really be keen to know how it turns out.

sigfpe said...


My code is an amalgamation of various ideas I picked up by skimming a bunch of documents like that tutorial. I find that it's better to try to implement something first and then read the papers - that way you have a better appreciation for what the problems are. In fact, many statements I'd read about lazy functional compiler writing used to make no sense to me at all, but now make perfect sense. So I'll probably read that tutorial properly some time soon.

Blog Archive