Wednesday, May 21, 2008

Life in a Lazy Universe

I've seen lots of articles on simulated reality recently. Supposing for a bit that this is a sensible hypothesis to entertain and not just a science fictional conceit, what might be the consequences of supposing that such a 'reality' is running on a machine with lazy evaluation?

If your task were to simulate a universe well enough for its inhabitants to be convinced it was seamless, then there is an obvious optimisation that could be made. You could simulate just those parts that could be perceived by the inhabitants. But the catch is that if an inhabitant were to explore a new region, the simulation would be required to fill in that region. Just creating a blank new area wouldn't do, it's current state would need to be consistent with having had a plausible history and so the simulation would be required to fill in not just its current state, but also its past. This is precisely what is provided by lazy evaluation - demand driven evaluation that takes place only when results are observed. It seems natural that such a simulation should make use of laziness.

But such optimisation doesn't need to be limited to lazily computing 'new regions' where by 'region' we mean a spatially localised volume. We could also imagine implementing level of detail. If we don't look closely at an object in our grasp, there's no need to compute every single detail of it. We need only provide detail at the resolution at which it can be perceived. We'd like this built into every aspect of the simulation so that anything within it is computed only to a desired accuracy. I've spoken a few times about such a data structure in posts like this. This is precisely what the real numbers give us. Computationally, a real number is an object which when given a degree of accuracy, returns a rational number (or some other finitely manipulatable structure) that represents the real to this accuracy. The wikipedia article suggests that 'the use of continua in physics constitutes a possible argument against the simulation of a physical universe'. This is diametrically opposed to what I might argue: in fact the presence of continua suggests the existence of an efficient demand driven simulation with level of detail. The problem is that the people who have been thinking about these issues have been thinking in terms of traditional imperative style programming where a real number is typically represented to some fixed finite precision. But in fact there are infinitely many real numbers that can be represented exactly if we think in terms of lazy data structures.

But this is all idle speculation if we don't make predictions. So here is one. In a computer simulated universe, all physics must obviously be computable. But all computable functions are continuous. So in a simulated universe we'd expect the physical laws involving real numbers to make use of them only in a continuous way. Sound familiar?

The next section is an aside and I need to give a spoiler alert as I'll be mentioning some aspects of Greg Egan's Permutation City.

What exactly what is meant by 'demand driven' above? If we have implemented a simulated universe I expect we would be interested in looking in on it from time to time. So this is what is usually meant by 'demand'. Whenever the program performs I/O to show us what is going on, it would trigger the evaluation of the lazy thunks that had been sitting around. But consider how the universe might appear to its inhabitants. Whether or not we look in on it us surely irrelevant to the inhabitants, assuming we just look and don't try to communicate. But if I/O isn't performed, functional programmers don't make much of a distinction between an unevaluated thunk and an evaluated one. They are observationally equivalent. So do we in fact need to run the simulation for its inhabitants to consider themselves to exist? Anyway, I won't say anything more about this because Permutation City deals with this issue at length. I'm just rephrasing it as a lazy evaluation issue.

My next post will be on life in a monadic universe.

Update: I'm going to add some detail to the above. Suppose we represent real numbers as infinite sequences of digits. Suppose also that we have some module in a simulation that, let's say, resolves collisions between particles. You might ask it "what is the velocity, to 20 digits, of particle A after the collision". It must then look at all of the relevant inputs, decide how accurately it needs them, and then propagate new requests. For example it might work like this "if I need the velocity of A to 20 digits I need the mass of particle B to 25 digits and the velocity of particle C to 26 digits..." and so on. The idea of demand driven modules that produce outputs to a given accuracy by making requests of their inputs to a given accuracy exactly parallels the subject of analysis in mathematics. There you ask questions like "if I need to know f(x,y) with an error of no more than δ, what ε of error can I tolerate in x and y?". Functions that allow this kind of analysis are precisely the continuous ones. In other words, a simulated universe that is built from continuous functions lends itself to a demand driven implementation. One might argue that the whole of the fields of analysis (and topology) are the study of functions that can be evaluated in this way. And of course, in our universe, the functions we study in physics are typically the continuous functions. So contrary to the claim that the existence of continua argues against the idea that the universe is a simulation, I'd like to point out that they might actually make it more convenient to simulate our universe. In fact, we use this fact all the time in ordinary everyday physics simulations because we know we only need to work to a certain level of accuracy to get accurate results.

The claim that this universe can be simulated, and the claim that this universe is a simulation, are philosophical claims that I don't like to waste time on. But the claim that the existence of continua blocks the notion that the universe is a simulation is a fairly precise statement of logic, mathematics and physics. I claim it is wrong, and that's the only claim I'm making here.

Saturday, May 17, 2008

The Interchange Law

I find it amazing the way you can take a bunch of abstract nonsense and turn it into concrete Haskell code for doing something mundane. Having down-to-earth examples really does help with getting your head around the abstractions.

Anyway, on page 43 of Categories for the Working Mathematician (2nd Ed.) is the 'interchange law' for horizontal and vertical composition of natural transformations. Not only can we find a nice mundane example to help thinking about it, we can even test it using Quickcheck.

For convenience we can define a natural transformation type:

> {-# OPTIONS -fglasgow-exts -fno-warn-missing-methods #-}

> import Test.QuickCheck
> import Control.Monad.Writer

> type Natural f g = forall a.f a -> g a

Where f and g are intended to be instances of Functor.

If we have a natural transformation from f to g, and another from g to h, then we can compose them using the ordinary Haskell composition operator (.). In Haskell, functors are frequently containers, so we can use containers as a guiding example. If we know how to map a bag of stuff to a sack of stuff, and we know how to map a sack of stuff to a box of stuff, then we just perform the two operations in sequence and we can map a bag of stuff to a box of stuff. This is known as vertical composition.

But there is also another way to compose natural transformations known as horizontal composition. Again we can think about containers. Suppose we know how to map a bag of stuff to a sack of stuff, and we know how to map a box of stuff to a crate of stuff, then we also know how to map a bag of boxes of stuff to a sack of crates of stuff. There are two obvious ways to do this: we could convert the bag of boxes of stuff to a sack of boxes, and then convert each box to a crate. Or we could convert the boxes to crates first, and then map the resulting bag of crates to a sack of crates. We can define two binary operators `o` and `o'` to perform each of these tasks:

> o,o' :: (Functor f,Functor f',Functor g,Functor g') => Natural f' g' ->
> Natural f g -> (forall c.f' (f c) -> g' (g c))
> o s t x = s (fmap t x)
> o' s t x = fmap t (s x)

Intuitively we'd expect these things to be equal, and in fact they always are. (Exercise: write a quickcheck for this based on the code below.)

Now we need some functorial containers to play with:

> data Pair x = Pair x x deriving (Eq,Show)
> instance Functor Pair where
> fmap f (Pair a b) = Pair (f a) (f b)

> newtype Id x = Id x deriving (Eq,Show)
> instance Functor Id where
> fmap f (Id x) = Id (f x)

Now we can define a bunch of natural transformations mapping between these containers and some others:

> alpha :: Natural Pair []
> alpha (Pair x y) = [x,y]

> beta :: Natural [] Maybe
> beta [] = Nothing
> beta (x:xs) = Just x

> alpha' :: Natural ((,) a) Id
> alpha' (a,x) = Id x

> beta' :: Natural Id (Either b)
> beta' (Id x) = Right x

On page 43 is the interchange law. Superficially it looks a lot like abiding. For any α, β, α' and β' with compatible types, we have

(β . α) o (β' . α') = (β o β') . (α o α')

This is the identity I want to check for the special case of the types I've chosen above. So we can define the left and right hand sides:

> lhs = (beta . alpha) `o` (beta' . alpha')
> rhs = (beta `o` beta') . (alpha `o` alpha')

> type From = Pair (Float,Integer)
> type To = Maybe (Either String Integer)

And here we go:

> test1 = quickCheck (\x -> lhs (x :: From) == (rhs x :: To))

Just type test1 in ghci to hear the sweet sound of 100 tests passing.

Anyway, we're just a hair's breadth away from defining 2-categories now. But I'll leave that for another day.

Thinking about it, I gave slogans for the previous theorems so why don't I give one for the interchange law. Take a deep breath. If you know how to convert a bowl of stuff into a box of stuff, and a box into a bag, and an urn into a crate, and a crate into a sack, then there are two equivalent ways to convert a bowl of urns into a bag of sacks: we can either construct methods going from bowl to bag and from urn to sack and combine them to go from bowl of urns to a bag of sacks, OR, we can go from a bowl of urns to a box of crates to a bag of sacks. I'm sure it's all a lot clearer now. ;-)

Well that's enough abstract nonsense. Time to get away from it all and watch some youtube videos. Eh? Oh well. Just wish me luck for tomorrow when I try to beat my fastest time in Bay to Breakers. Maybe some other bayfpers will be there too.

> instance Arbitrary a => Arbitrary (Id a) where
> arbitrary = liftM Id arbitrary

> instance Arbitrary a => Arbitrary (Pair a) where
> arbitrary = liftM2 Pair arbitrary arbitrary

Saturday, May 10, 2008

Desingularisation and its applications

> {-# OPTIONS -fno-warn-missing-methods #-}

Here's a neat trick you can carry out with the Dual class that I've talked about many times before. Suppose you define a function like

> sinc x = sin x/x

Clearly this function has a singularity at x=0 where the division isn't defined. But this is an example of a removable singularity because it is possible to replace this function with another that is continuous (in fact, analytic) everywhere and is well defined at x=0. We could do this simply by defining:

> sinc' 0 = 1
> sinc' x = sin x/x

but it'd be cool if we could automate the process. Here's a function that does exactly that:

> desingularise f x = re (f (D x 1))

As if by magic, we can now compute

> ex1 = desingularise sinc 0

We can try other functions too:

> ex2 = desingularise (\x -> x/(exp x-1)) 0

So how does it work? Well I've modified my usual definition of (/) so that it is able to divide one infinitesimal number by another:

> instance Fractional a => Fractional (Dual a) where
> D 0 a'/D 0 b' = D (a'/b') 0
> D a a'/D b b' = D (a/b) ((b*a'-a*b')/(b*b))

desingularise 'perturbs' the value of its argument infinitesimally so that when we pass in an argument of zero we end up evaluating our function not at zero, but somewhere infinitesimally close. So our division by zero becomes division of infinitesimals and the above definition comes into play. The first clause of the definition can also be viewed as an application of l'Hôpital's rule.

But it doesn't solve all of our problems. For example

> ex3 = desingularise (\x -> sin x^2/x^2) 0

doesn't quite work. You need two applications of l'Hôpital's rule. We can implement this easily with

> ex4 = desingularise (desingularise (\x -> sin x^2/x^2)) 0

That's fine if you know how bad your singularity is in advance. If you don't, you can use power series instead. None of these methods will work for this function however:

> horrible x = exp (-1/x^2)

Anyway, the fact that it doesn't work for all functions doesn't detract from the fact that it's a pretty approach from solving the problem when you have a fairly well behaved singularity.

I've not actually seen anyone else use dual numbers to smooth over singularities in this way, but a few years ago I did come across something very closely related that apparently caused a little controversy in the computational geometry world.

Suppose you want to determine whether or not a point lies within a possibly non-convex and arbitrarily complicated polygon. One algorithm is this: follow a line in any direction from the point to infinity (though just outside the bounding box of the polygon would do) and count how many times you cross the boundary of the polygon. If you count an odd number of crossings you're in the polygon, otherwise you're outside. It's beautifully simple and if nobody had thought of it before it'd be worthy of publication.

But like the methods described in many computational geometry papers I've read, this one often fails in practice. There are at least two sources of difficulty. Firstly, computational geometry algorithms are subject to numerical errors. You often find yourself having to decide whether or not a point is on one side or another of a line, say, and getting inconsistent results because two different ways of computing the same thing result in different decisions being made because of rounding. That can be fixed by using something like arbitrary precision rational arithmetic. But even if you do this, there is still a problem.

When you follow a line from the point you're testing you may find that it doesn't simply cross one of the sides of the polygon, it may travel precisely through a vertex. If your code isn't written carefully you may accidentally count that as zero, one or two crossings. It's easy to write sloppy code that gets it right most of the time, but it's hard to work out every possible case where some kind of degeneracy might happen. What happens if the outward ray is actually collinear with a polygon face? Do vertex intersections and collinearities cover everything?

An ugly kludge is this: at the first sign of trouble, restart the algorithm but jiggle all the inputs randomly a bit. For a large class of algorithms this will work tolerably well, depending on your definition of tolerable. (Compare with Sard's theorem in Morse theory.) But it really is ugly, especially if you've gone through the hassle of using arbitrary precision arithmetic. But maybe you can see what's coming...

Back in '94 Edelsbrunner and Mücke came up with the idea of deforming the geometry infinitesimally, giving their method the utterly bizarre name "Simulation of Simplicity". I think it took people a while to realise that they were actually doing the same thing as people doing automatic differentiation and so the whole notion of an infinitesimal perturbation left people wondering what in fact they were doing. More seriously was this: writing robust computational geometry code that catches all of the degenerate cases is hard. How is it that such a simple algorithm could be slotted into a sloppily written polymorphic algorithm and turn into a watertight one? Was everyone in computational geometry going to be put out of work?

Luckily for the computational geometers it still takes quite a bit of work to get the details exactly right, and even when you do, the result isn't necessarily an efficient solution to the problem. I know that when I had a messy computational geometry problem to solve when I was working on MI:2 I just wrote code that turned a blind eye to degeneracies and aggressively pruned away anything that looked suspicious: like near zero length lines and near zero area faces. If a stray pixel poked through somewhere that it shouldn't, and tweaking the tolerance didn't fix it, an artist found a way to hide it. Sometimes beauty and elegance come second place to getting the job done on time.

> data Dual a = D a a deriving (Show,Eq)

> re (D a _) = a
> im (D _ b) = b

> instance Num a => Num (Dual a) where
> fromInteger i = D (fromInteger i) 0
> (D a a')+(D b b') = D (a+b) (a'+b')
> (D a a')-(D b b') = D (a-b) (a'-b')
> (D a a')*(D b b') = D (a*b) (a*b'+a'*b)

> instance Floating a => Floating (Dual a) where
> cos (D a a') = D (cos a) (-sin a*a')
> sin (D a a') = D (sin a) (cos a*a')
> exp (D a a') = D (exp a) (exp a*a')

Saturday, May 03, 2008

You Could Have Defined Natural Transformations

I'll assume you know what a functor is (and optionally a bit of Haskell):

Consider these two functions

f:Z x Z -> Z    f(x,y) = x+y
g:Z x Z -> Z    f(x,y) = x

There's a fundamental difference between them. In some sense f "makes use" of the fact that we're applying functions to integers, but even though g is acting on integers, it's not using the fact that we're acting on integers. But this is a vague idea. How can we make it precise?

Consider the same in the context of computer programs. This applies in many programming languages, but I'll use Haskell here

> f (x,y) = x+y
> g (x,y) = x

When we ask ghci for the types of these it tells us that f is of type (Num t) => (t, t) -> t but that g is of type (t, t1) -> t. ghci has spotted that f is making use of an interface exposed by integer types (the type class Num) but that g makes no assumptions at all about its arguments. So in a programming language (with static types) we can make the notion above precise by having a type signature that explicitly says something like "I'm not going to make any assumptions about my arguments". But how can we say something like this in the language of set theory, say? Set theory doesn't come with a mechanism for making such promises. (Well, ZF out of the box doesn't, anyway.)

So consider a set-theoretical function like this

h:Z x Z -> Z x Z x Z
h(a,b) = (b,a,a)

What does it mean to say that h doesn't use specific facts about its arguments? It clearly doesn't mean that h doesn't depend on its arguments because the right hand side clearly depends on a and b. We need to say that the right hand side depends on the arguments, but that it somehow doesn't make specific use of them.

Consider an example:

h(1,2) = (2,1,1).

Part of what we mean by not using specific knowledge is that if we were to replace one argument by another value we expect to perform the same substitution on the right. For example, replacing 1 by 3 should give:

h(3,2) = (2,3,3)

So for any 'substitution' function k:Z -> Z we want

h(k(1),k(2)) = (k(2),k(1),k(1))

More generally, we want:

h(k(a),k(b)) = (k(b),k(a),k(a))

Note that we've made use of the functions (a,b) -> (k(a),k(b)) and (a,b,c) -> (k(a),k(b),k(c)) to implement the idea of applying the same substitution to each of the elements. We might want to work with a wide variety of mathematical 'containers' like tuples and sequences as well as more abstract objects. Can we generalise to all of these in one go?

The function that coverts a set X to X x X, or X x X x X forms part of a functor. The other part maps functions between sets to functions between containers. So we can work with the usual ordered pair functor L with LX = X x X and Lf(x,y) = (f(x),f(y)). Similarly we can use MX = X x X x X and Mf(x,y,z) = (f(x),f(y),f(z)). We can stop talking about 'containers' now and talk about functors instead. We can rewrite our rule above as

h(Lk(a,b)) = Mk(h(a,b)).

I've been working with h acting on containers of integers, but if h truly doesn't rely on specific properties of its arguments then there is a version of h for any set X, ie. for any X we have a function

hX : X x X -> X x X x X
hX(a,b) = (b,a,a)

and when we use a function k to substitute values in the arguments there's no reason to substitute values from the same set. For example, if we let A be the set of the 26 letters of the alphabet (using your favourite way to encode letters as sets), then we could let k be the function that maps n to the nth letter (modulo 26) and still expect

hA(Lk(1,2)) = Mk(hZ(1,2))

to hold. For example

hA('a','b') = ('b','a','a')

More generally, h is a family of functions such that for any k:X -> Y we have:

hY o Lk = Mk o hX

Note that there's nothing set-specific about this, it's a statement that could hold in any category.

We can take this as a definition. A natural transformation between functors L,M:C -> D is a family of morphisms hX, one for each object X in C, such that for any morphism k:X -> Y in C we have:

hY o Lk = Mk o hX

And there we have it. Starting with the idea of a function that doesn't 'peek inside' its arguments, we're led inexorably to the idea of substitutability of arguments and from there to a categorical definition of a natural transformation.

As the definition has been generalised from Set to any category you can have fun unpacking the definition again in other categories. For Set-like categories with functions as morphisms you'll often find that natural transformations are still functions that don't make too many assumptions about their arguments, but in each case the rule about what exactly constitutes 'too many assumptions' will be different. The category of vector spaces is a good place to start.

One last point: Category theory was more or less invented for natural transformations. And it's curious how it fits so nicely with the idea of abstraction in computer science, ie. the idea of a function that is limited to using a particular interface on an object.

One interesting result relating to natural transformations is the Yoneda lemma. Note also that the Theorems for Free are also largely about similar kinds of substitutions on function arguments.