Saturday, January 26, 2008

The Type that Should Not Be

> import Prelude hiding ((^))

I've been pondering a type that's so mind-bogglingly weird that I can no longer think about it without frying my neurons. So instead I'll just dump the contents of my brain here so you can fry your neurons too.

Here's an interesting type to warm up with. In Haskell:

> data X = Tree [X]

At first look it seems not be well-founded in the sense that a value of type X is a list of X's and we haven't yet defined what X's are. But we don't need an X to make an empty list of X's. So we can build elements inductively starting with Tree []. If you think of Tree [] as being a leaf, you can see why I called this type Tree (though they are trees whose limbs can have infinite sequences of children).

Even without the empty set forming a 'foundation' we can still build circular elements like

> x = Tree [x,x]

Now lists are ordered and can have repeated elements. What about if we eliminate these two properties. The way to do that would be with a set type. So how do we represent sets in Haskell. Well certainly not with Haskell's Data.Set because that only represents finite sets. On approach to supporting infinite sets is to identify a set with a predicate for telling whether an element is a member of a set. In other words, the type X→Bool makes a tolerably good set of X's.

So now we're ready for an interesting type:

> data U = U { f :: U -> Bool }

(BTW U stands for universe).

The first thing to think about is what this might mean mathematically. Well it looks like a solution to the equation X=2X, in other words, a set that equals its own power set. We know that in set theory this gives rise to Russell's paradox. One way to embed recursive types into set theory is to tweak set theory, for example by using non-well-founded set theory. But Russell's paradox is so bad that it breaks just about any attempt at devising a set theory where a set can equal its power set.

And yet we can build this type in Haskell. How come? The explanation is simple, Russell's paradox tells us that there must be some elements whose evaluation doesn't terminate. In fact, we can replay Russell's paradox directly in Haskell. But before that we need some definitions.

Firstly, are we sure we can even construct an element of U? In order to define an element of U we need to define a boolean predicate on U but we can't even compare two elements of U for equality. But we can define the 'empty' element phi, along with a membership test:

> phi = U $ const False
> x `e` y = f y x

Unlike conventional set theory, we can define negation of elements. We can also define union and intersection:

> neg a = U $ \x -> not (x `e` a)
> a \/ b = U $ \x -> x `e` a || x `e` b
> a /\ b = U $ \x -> x `e` a && x `e` b

We're ready for the Russell paradox. Define the 'set' (I give in, let's informally call these things sets) of all sets that don't contain themselves.

> nonSelfContaining = U $ \a -> not (a `e` a)

Now try to compute nonSelfContaining `e` nonSelfContaining. It doesn't terminate. It's not just a matter of having implemented it badly, it can't possibly terminate as there is no answer that could make sense. So if you're designing a language that's intended to be total, you'd better rule out types like U.

So we now have two elements of U, phi and neg phi. Can we make any more? Well here's one way:

> c a = U $ \x -> a `e` x

Intuitively, c a is the set of all sets that contain a. But there are two potential problems that come to mind with this. One is, we've only made two sets so far, so "the set of all sets that contain a" might not be interesting. And secondly, can we be sure c a is distinct from a?

The first issue is easily dismissed. neg phi is obviously in c phi, but phi isn't. So we can distinguish phi and c phi. c phi is also different from neg phi (check to see if phi is an element of each). In fact, define functional power by

> f ^ 0 = \x -> x
> f ^ n = \x -> (f^(n-1)) (f x)

and it can be shown that c^m phi is distinct from c^n phi for m≠n. (Hint: Try checking to see if c^m (neg phi) is an element of c^n phi for various m and n.) So we have constructed an infinite number of elements of U.

But, we know that we can write code for elements of U that doesn't terminate. Why not restrict ourselves to just the computable elements of U? But there's a catch. How do we define a computable element of U when elements of U are functions. One approach is this: define x to be computable if y `e` x is computable for all y. But that won't do because it won't terminate if y isn't computable. So how about defining x to be computable if y `e` x is computable for all computable y. But that's circular, and not in a nice inductive way. U is so perverse that just attempting to define computability for it gives a circular definition. But that's not necessarily a problem. Define a 'universe' to be a set, C, of computations of type U such that C = the set of x such that y`e` x terminates for all y in C. The question now is this: is there a unique set C satisfying this 'equation'. We can rule out the empty set as phi must be in any such C. There's also another thing we can prove. If we have two such sets, C and C', we can prove that the intersection of C and C' is a universe. So we can take the intersection of all universes to form the smallest universe. So we can define a something in U to be computable if it is in this smallest set.

So here's my first open question: Are there larger 'universes' than the smallest one, or is it unique? It's not hard to see that a universe is closed under (non-recursive applications of) negation, finite union, finite intersection and the function c above. My second open question: What other computable elements are there?

Time to stop this and start cooking dinner.

Labels: , ,

Sunday, January 06, 2008

What does topology have to do with computability?

I've been playing with code for exact real arithmetic. Unfortunately I've been too busy to get it polished but I do have time to mention two interesting and counterintuitive facts about real arithmetic on computers. It's also a good opportunity to say a tiny bit more about exhaustive search.
Firstly, what does exact real arithmetic mean? It's a widely held belief that computers (at least idealised Turing machines with as much memory as you need) can't represent real numbers precisely. This is a myth. For example, a common way to represent real numbers is as lists of binary digits. Any programming language that supports the notion of an infinite stream, like Haskell, can therefore represent an infinite stream of digits. If you're unhappy with the notion of an infinite stream, think functionally instead. Define a stream to be a function that takes as argument an integer n and returns the nth element. For example

third n | even n = 1
| odd n = 0

can be interpreted as exactly representing the real number 1/3=0.010101..2. This is a perfectly well defined object that sits in a finite amount of memory. So we don't need infinite computers to represent real numbers as sequences of binary digits. (Though you may notice that we can't represent all real numbers, for example Chaitin's much overrated constant.) Exact real arithmetic is simply arithmetic with real numbers represented with some representation like this.

So now I'm ready to mention the first counterintuitive result. If we represent real numbers as streams of binary digits like above (maybe stored with an exponent or a separate integer part) then it's impossible to implement addition correctly.

The argument is straightforward. Suppose our addition function receives these two inputs: 0.000000... and 0.01111111... What should it output as the first digit after the "binary" point? If the first stream continues as repeating zeroes forever, and the second stream eventually turns into an infinite stream of zeroes, then this digit should be 0. But if both streams eventually turn into infinite streams of ones, then this digit should be 1, as a result of the carry that will happen. But there's no way an addition function can know what will eventually happen to these streams. In order to compute the first digit of the result, it needs to see all of the digits of the inputs. But no computable function can view all of the input digits. And hence addition is impossible to implement.

But that's no reason to give up. The problem isn't fundamental to exact real arithmetic, it's an issue with the representation being used. So what representation should we be using? One place to look is at how mathematicians define the real numbers. A popular approach is via the Cauchy sequence. A real number can be defined as a sequence of rational numbers a1, a2, a3... with the property that for any ε, there is a point in the sequence beyond which the difference between any two elements is less than ε. For example the sequence 3,31/10,314/100,3141/1000,... might represent π. So we might try to copy this and represent real numbers as streams of rationals. In a way it works. The resulting type will have all the properties of real numbers that you know and love. But unfortunately it's not a very useful representation because you can't answer even the most basic queries about real numbers with it. For example, there's no way of extracting a rational approximation with given accuracy from such a sequence. You know that given any accuracy, if you walk down the sequence far enough you'll find a suitable rational, but you don't know how far down the sequence to walk. So we need to modify this representation so that not only is each term in the sequence an approximation to the desired real, but that there is a way of finding approximations to a desired accuracy. One simple way is this: choose the rational numbers in the sequence so that the nth rational is within 2-n of our desired real. When I say 'Cauchy sequence' below I'll mean a sequence with a constraint like this.

Note that this representation is highly non-unique. There are many sequences that could be used to represent any real number. But this isn't a problem. Even the standard decimal representation is non-unique because 1 and 0.999... recurring both represent the same real. This means we can write functions on our real representation that depend on the specific sequence we've chosen to represent a real number and hence can't be interpreted as functions of the real numbers. (Eg. consider the trivial function that just returns the first element of he sequence.) But it's still possible to write functions are well-defined in the sense that if two different representations of the same real are passed in as inputs, then the returned sequences represent the same real, even though the actual sequences may be different. For example, we can write a function to halve a real number simply by halving all of the rationals in the Cauchy sequence. Exercise: give a well-defined implementation of addition using this representation. (Hint: It's a short one-liner in Haskell, but it's not completely trivial.)

Now I can state the second counterintuitive result: all well-defined computable functions are continuous.

Before proving that, I want to say something more about exhaustive search over streams. You may remember that I rewrote the key proof using more elementary language because the original work was phrased in the language of topology. Explaining that properly would require me to set up quite a bit of mathematical machinery. However, I can give a hint as to why topology is connected to the computability of certain functions. The key reason is this: by and large, Topology is the study of continuous functions.

Consider Cauchy's definition of continuity. The function f is continuous at x if for all ε, there exists a δ such that whenever |y-x|<δ, |f(y)-f(x)|<ε. There's another way of phrasing that. f is continuous at f if for any ε there is a δ such that even if the argument, x, to f has an error of δ, f(x) still has an accuracy of ε. So now think about exact reals defined by Cauchy sequences. Suppose f is a well-defined function, as above. Then to find f(x) to within 2-m we just need to find the mth term in the Cauchy sequence f returns. If f is a computable function, it obviously can only examine a finite number of elements of the sequence of x. Suppose the latest element in the sequence it looks at is the nth one. Then if f is implemented correctly it's managing to compute f(x) to an accuracy of 2-m even though its input was only known to an accuracy of 2-n. In other words, f is continuous. (Choose m such that 2-m<ε and δ<2-n.)

We can take a slightly different perspective. For any x, x is a stream of rationals and f(x) is a stream of rationals. For f to be a proper computable function then to compute n terms of f(x) we need m terms of x, where m is some finite function of n. In other words, if we fire off a loop to compute the terms of f(x) one by one the code should keep producing terms one after another without ever been caught in a non-productive infinite loop. Now compare this with how I defined total functions for streams. So the condition that f be continuous is more or less the condition for f to continue to be productive. And that means we can talk about the computability of such functions in terms of continuity and use the language of topology to talk about computability. I find this quite amazing. Cauchy's centuries old definition of continuity contains within it the seeds you need to define the notion of computability for codata.

(One cool thing about this is that topology has a concept called compactness. Compact spaces share many properties with finite sets. In particular, the interval [0,1]={x|0≤x≤1} is compact. And that suggests that maybe it can be searched in a finite time. That suggestion is correct and is the reason why it's possible to exhaustively search the infinite space of streams in finite time.)

I wish I could give some references for further reading on this but I pulled it together from little clues scattered over many papers and presentations on the web. If someone knows the canonical reference on computability and topology, maybe they could post it in the comments.

Update: Don't miss this excellent post by Andrej Bauer.