Monday, March 27, 2006

The General Theory of Self-Reproducing Programs

I'm guessing that anyone reading this is already familiar with the idea of programs that output themselves. If you're not there's a great list of such programs here. But what I was surprised to discover at the weekend was that there is in fact a bit of general theory about this. In particular, what do we need in a computer language to guarantee we can write a self-replicator?

Consider the following in Haskell:

let p x = x ++ show x in putStrLn $ p"let p x = x ++ show x in putStrLn $ p"

Evaluate this expression in an interactive Haskell session and it prints itself out. But there's a nice little cheat that made this easy: the Haskell 'show' function conveniently wraps a string in quotation marks. So we simply have two copies of once piece of code: one without quotes followed by one in quotes. In C, on the other hand, there is a bit of a gotcha. You need to explicitly write code to print those extra quotation marks. And of course, just like in Haskell, this code needs to appear twice, once out of quotes and once in. But the version in quotes needs the quotation marks to be 'escaped' using backslash so it's notactually the same as the first version. And that means we can't use exactly the same method as with Haskell. The standard workaround is not to represent the quotation marks directly in the strings, but instead to use the ASCII code for this character and use C's convenient %c mechanism to print at. For example:


main(a){printf(a="main(a){printf(a=%c%s%c,34,a,34);}",34,a,34);}


Again we were lucky, C provides this great %c mechanism. What do you need in a language to be sure you can write a self-replicator?

It turns out there is a very general approach to writing self-replicators that's described in Vicious Circles. What follows is essentially from there except that I've simplified the proofs by reducing generality.

We'll use capital letters to represent programs. Typically these mean 'inert' strings of characters. I'll use square brackets to indicate the function that the program evaluates. So if P is a program to compute the mathematical function p, we write [P](x) = p(x). P is a program and [P] is a function. We'll consider both programs that take arguments like the P I just mentioned, and also programs, R, that take no arguments, so [R] is simply the output or return value of the program R.

Now we come to an important operation. We've defined [P](x) to be the result of running P with input x. Now we define P(x) to be the program P modified so that it no longer takes an argument or input but instead substitutes the 'hard-coded' value of x instead. In other words [P(x)] = [P](x). P(x) is, of course, another program. There are also many ways of implementing P(x). We could simply evaluate [P](x) and write a program to simply print this out or return it. On the other hand, we could do the absolute minimum and write a new piece of code that simply calls P and supplies it with a hard-coded argument. Whatever we choose is irrelevant to the following discussion. So here's the demand that we make of our programming language: that it's powerful enough for us to write a program that can compute P(x) from inputs P and x. This might not be a trivial program to write, but it's not conceptually hard either. It doesn't have gotchas like the quotation mark issue above. Typically we can compute P(x) by some kind of textual substitution on P.

With that assumption in mind, here's a theorem: any program P that takes one argument or input has a fixed point, X, in the sense that running P with input X gives the same result as just running X. Given an input X, P acts just like an interpreter for the programming language as it outputs the same thing as an
interpreter would given input X.

So here's a proof:
Define the function f(Q) = [P](Q(Q)). We've assumed that we can write a program that computes P(x) from P and x so we know we can write a program to compute Q(Q) for any Q. We can then feed this as an input to [P]. So f is obviously computable by some program which we call Q0. So [Q0](Q) = [P](Q(Q)).

Now the fun starts:

[P](Q0(Q0)) = [Q0](Q0) (by definition of Q0)
= [Q0(Q0)] (by definition of P(x))


In other words Q0(Q0) is our fixed point.

So now take P to compute the identity function. Then [Q0(Q0)] = [P](Q0(Q0)) = Q0(Q0). So Q0(Q0) outputs itself when run! What's more, this also tells us how to do other fun stuff like write a program to print itself out backwards. And it tells us how to do this in any reasonably powerful programming language. We don't need to worry about having to work around problems like 'escaping' quotation marks - we can always find a way to replicate the escape mechanism too.

So does it work in practice? Well it does for Haskell - I derived the Haskell fragment above by applying this theorem directly, and then simplifying a bit. For C++, however, it might give you a piece of code that is longer than you want. In fact, you can go one step further and write a program that automatically generates a self-replicator. Check out Samuel Moelius's kpp. It is a preprocessor that converts an ordinary C++ program into one that can access its own source code by including the code to generate its own source within it.

Another example of an application of these methods is Futamura's theorem which states that there exists a program that can take as input an interpreter for a language and output a compiler. I personally think this is a little bogus.

9 comments:

Kenny said...

"So here's the demand that we make of our programming language: that it's powerful enough for us to write a program that can compute P(x) from inputs P and x."

Do you mean "a program that can compute [P(x)] from inputs P and x."? This is always where the different types of quotation marks and evaluation notations get tricky. I've only ever learned about this stuff in the theory of truth and philosophy of language, but I've thought that computer programming would be the best way to really learn it.

I tried to write something similar (on the philosophical side) here:

http://www.ocf.berkeley.edu/~easwaran/blog/2006/02/quantifying_into_sentence_posi.html

Kenny said...

Does this link work?

sigfpe said...

I really do mean compute P(x). I mean that you need to be able to write a program that can take as input another program P, and some data x, and output another program that acts like P except with x hard-coded into it. This is the crux of the matter - you need to be able to do this kind of textual processing on the source code, P.

An example should make things clear: in Haskell, "p" is an expression representing the function called "p". "p x" means the expression that you get from applying p to x. So in Haskell, to a good approximation, you can compute P(x) simply by concatenating the code for x after P, and that's what ++ does in my code. You also need use "show x" instead of "x" to deal with quotes. And strictly speaking it should put parentheses around the x, but I cheated a bit because I knew that the one time I applied this function I wasn't going to have operator precedence issues.

That kpp program I link to is an example of a program that computes P(x) from P and x, but you have to interpret 'argument' or 'input' every so slightly differently.

Incidentally, a piece of code that computes [P(x)], instead of P(x), from P and x is an example of what the book calls an "interpreter" as it interprets the program P. And there's a computable function that can automatically turn one of these into a "compiler".

I can follow your second link. Haven't read it yet. Though I can say that this is essentially the same as the key step in Tarski's proof of the undefinability of truth, and also Godel's theorem.

Kenny said...

Ok, I think I see what's going on. When you give as one possibility for a way to get P(x), "a new piece of code that simply calls P and supplies it with a hard-coded argument", this shows that if you have an interpreter in your language, then you have sufficient power to get from P and x to P(x)? If that's right, am I correct in saying that you're interpreting "program" in a rather extensional way, so that P(x) is just any program that always has the same output as P does on input x? That is, as opposed to saying that there could be a distinct function that gives the same outputs, but P(x) is one that has the code for x substituted in a very particular way in the code for P?

sigfpe said...

P(x) is just any program that always has the same output as P does on input x?

Yes. P(x) is any program with the property [P(x)] = [P](x). In order to do the self-reproducing trick this way we use a program S such that P(x) = [S](P,x), ie. S makes P(x) out of P and x. A heavy handed approach to implementing S might be to actually build S from an interpreter that computes [P](x) and then outputs a program that outputs this value.

Vicious Circles defines a whole family of these S programs and defines a language to be "Turing Complete" if it can compute computable functions and has such programs S. The latter condition was new to me.

I didn't fully understand the posting that you linked to (though I think I get the gist of it). It looks like what I'm saying (repeating from the book, actually) does address one of the issues: you really don't have to worry about escaping quotes as there's always guaranteed to be a way for a program to quote the required part of itself. Many of the examples here are attempts to work around self-imposed restrictions making it harder and harder to do the escaping - disallowing ASCII codes. disallowing printf and so on. As long as you don't break Turing completeness you're guaranteed to find a way.

augustss said...

In what sense do you think that the Futamura projections are bogus?
They have been implemented and there has been a lot of work on the im the field of partial evaluation.

sigfpe said...

augustss,

According to the definition in Vicious Circles, there is a program that can turn an interpreter into a compiler. Unpacking the definitions you find there is a trivial way to do this: just write a program that takes as input code written in language L and outpus an interpreter for L bundled with with that source code. Behaviourally it looks like a compiler - it outputs standalone pieces of code generated from source code in L. But this doesn't give you anything more than an interpreter because internally it is an interpreter. You can do this because there is a trivial way to 'cheat' an implemention of partial evaluation. There doesn't seem to be anything about the theorem, as stated in VC, that forces partial evaluation to mean anything non-trivial.

augustss said...

Yes, a trivial implementation of partial evaluation is by some form of currying. But that implementation is uninteresting.
You have to be very careful to define what it means to have a "real" partial evaluator, but it can be done. (If I remember right Neil Jones et al have done it by a complexity measure that includes constant factors.)

What I find amazing is that the third Futamura projection can be implemented so that it actually does something interesting, i.e., suitable applied it does specialize an interpreter into a compiler.
And it has done some non-trivial work.

sigfpe said...

augustss,

Actually I've had to deal with the difference between trivial and non-trivial partial evaluation recently. I was writing Haskell code to evaluate expressions in a Newton-Raphson style solver. This expression evaluator was a kind of interpreter which meant it was slow - it had to reparse the expression every time the functions were evaluated. After a lot of fiddling about I eventually managed to convert it, by hand, into a 'compiler' which returned a partially evaluated function that no longer needed to reparse the original expressions. In effect I was ensuring that the expression parsing was done by partial evaluation leaving the remainder of the computation, the actual expression computation, uncompleted and ready for numerical input. What was interesting was just how similar the interpretation and compilation code was.

Blog Archive