Search This Blog

Saturday, December 01, 2007

What's all this E8 stuff about then? Part 3.

Isospin


The idea behind symmetry in physics is that there are things that you can change without making a difference to how a system evolves. I've already given the example of how rotating a pool table has no effect on the physics of the game. But here's another very different symmetry: in particle accelerator events that involve only the strong force, it makes no difference whether you're working with protons or neutrons. The strong force is unable to distinguish between these two types of particle. So we have a nice symmetry: for each proton or neutron in a system, we have a group of size two corresponding to the fact that we can swap one for the other. Now clearly protons and neutrons are different, they have different charges, but that only makes a difference to electromagnetic interactions, not strong interactions. This symmetry is called isospin symmetry.

But it gets more interesting when we consider quantum mechanics. Represent a proton by a vector (1,0) and a neutron by a vector (0,1). Quantum mechanics allows us to form superpositions by adding linear combinations of physical states to get new ones. So if (1,0) is a physical state, and (0,1) is also one, then so is (0.5,0.5). This is like the Schrödinger Cat state, it's a mixture of proton and neutron. In fact, we can form states (a,b) for any complex numbers a and b. So now we have scope for even more symmetry because not only should neutrons and protons be interchangeable, any mixture should be interchangeable with any other. What Heisenberg proposed was this: maybe the proton and neutron, considered as 2D vectors, have rotational symmetry just like the pool table. (Well, I don't think he was thinking in terms of pool tables.) Because we're dealing with complex numbers he proposed a symmetry group called SU(2), instead of SO(2), but it's a similar kind of thing, just the complex number variation of the 2D rotation group. SU(2) acts on the (complex) 2D space spanned by the proton vector (1,0) and the neutron state (0,1) and so this space forms what I previously called a Lie group representation. The proton and neutron form a 2-dimensional representation of SU(2).

So let's think about this more generally. If we have n particles that we suspect are connected by some symmetry group G, then they must form an n-dimensional representation of the group. Such a collection of particles is also called a multiplet. But here's the cool thing: if we have some kind of classification theorem listing the kinds of symmetry groups we expect to see in nature, and if we have a theorem classifying all of their representations, then we expect the particles of nature to be collected into families of similar particles that are organised just like those representations. So just knowing some pure mathematics should tell us a lot about particle physics.

Simple Lie Groups and E8


I've already said a little about how mathematicians go about the process of classifying representations of Lie groups by looking at maximal tori. Now I want to look at how Lie groups themselves are classified. Lie groups can often be built out of smaller pieces. This suggests the idea of finding the basic building blocks which can't be built out of simpler Lie groups. For a large class of groups, the basic building blocks are known as the simple Lie groups. (Now you know what the "Simple" in the title of Garrett Lisi's paper is about.) A priori you might imagine that you have all kinds of freedom when you try to construct new Lie groups. Incredibly you don't, and the list of simple Lie groups is quite...well...simple. The Lie groups fall into seven families called A, B, C, ..., G. The individual members of these families are given names like C4, where the integer is the dimension of the maximal torus. Four of the families, A to D, are fairly straightforward, and the groups form infinite sequences A1, A2, A3 and so on. Some of these are familiar. For example the B series are the rotation groups in odd dimensional spaces. B1 is just another name for SO(3). But curiously, the other three families, E, F and G, are finite. These are known as the exceptional Lie groups. The complete list is E6, E7, E8, F4 and G2. (What's even more exceptional about these is that many other branches of mathematics have exceptional objects too, and they're closely related to these. And now you know what 'exceptional' in that title means.) E8 is the highest dimensional of the exceptional simple Lie groups. In other words, you can think of it as the largest possible symmetry we could have that isn't simply built out of smaller pieces, or isn't just part of a straightforward sequence of symmetries. But it's not 'simple' in the ordinary sense of the word, it's 248-dimensional.

So if we could arrange the particles of nature into a family of size 248 then we'd have strong evidence that E8 was the symmetry of nature. This is what Lisi has attempted to do.

More Spontaneosly Broken Symmetry


But there's a complication. Above I talked out the SU(2) symmetry of protons and neutrons. But we know that symmetry doesn't hold exactly because the electromagnetic force doesn't respect it. In fact, photons react to protons and neutrons quite differently. But there's a way out: broken symmetry. In earlier posts I've already talked about this in the case of pool. But now consider a system like a sealed container of water. Again, fundamental principles suggest that this should have SO(3) symmetry. But we know that if we tilt a container of water the water doesn't tilt with the container, the level of the surface remains aligned perpendicular to gravity. On the other hand, as we heat up the water it eventually boils. At that point it's hard to tell gravity is at work at all because the force of gravity is so small compared to the intermolecular forces at work in a gas. The higher the temperature, the harder it is to see the local gravity. So pumping energy into a system serves to remove any kind of local accidents of history to reveal the true symmetry. The idea in particle physics is that at low energies we see a reduced symmetry, but at higher energies we should see more symmetries coming into play. In fact, this idea works incredibly well and in 1979 the Nobel prize was awarded to Glashow, Salam and Weinberg for realising that at higher energies the reduced symmetry of electromagnetism (which, by the way, is SO(2)) could be seen as part of a bigger symmetry. So what we expect to see is a chain of subgroups, one contained in the next, corresponding to physics at different energies. In practice physicists use Lie algebras instead of Lie groups most of the time because infinitesimal operations are usually much easier to work with, but the principle is the same.

So now I can finally say what Lisi has attempted to do: he has arranged all of the particles of nature to form a representation of E8. He has then found subgroups of E8 corresponding to the breaking of this symmetry. At low energies, the different forces of nature see different symmetries corresponding to these subgroups. As these subgroups correspond to less symmetry, we end up with smaller sets of particles that are interchangeable with each other, and so smaller multiplets. So the multiplets of E8 break up into smaller multiplets corresponding to representations of these subgroups. Lisi has shown how to arrange the particles of nature into multiplets, and has also shown how to break up these multiplets into smaller ones that correspond to the properties of the familiar fundamental forces we can observe in the lab. There are all kinds of problems with the way he's done this which makes the work controversial. But in spirit, it's not so different from what physicists have been doing for decades.

Wednesday, November 28, 2007

The IO Monad for People who Simply Don't Care

The theoretical underpinnings of monads in Haskell can be tricky to grasp. If you simply want to read some input and write some output, and you don't want to waste time with nonsense like the "monad laws", you've come to the right place.

Getting Started



Many programming languages make a distinction between expressions and commands. For example, in BASIC you couldn't say LET A = PRINT 1 and in C you can't say x = if (a==1) { 2; } else { 3; }. Haskell is no different. Like other languages it makes the distinction, and like other languages it has its own slightly idiosyncratic notion of what the difference is. The IO monad is just a device to help make this distinction. And that's the last time I'll mention the word 'monad' for a while.

Here's an example of a Haskell function:


f x = 2*x+1


The right hand side is an expression. Expressions are made of subexpressions and the value of an expression is determined by the value of its subexpressions. There is no room for anything like a print command here because a print command doesn't return a value, it produces output as a side-effect. So in order to do I/O we need something different: the do keyword. It's easy to use: you just write do and then follow it by an indented list of commands. Here's a complete Haskell program:


main = do
print "Hello"
let x = 1+2+3+4+5
print "The answer is"
print (2*x)



Note the let. It's just like the old days of BASIC programming where you can use let to assign a name to a value. Note also that commands take arguments that can be expressions. So print (2*x) is a command, but 2*x is an expression. Again, little different from a language like Python.

But what about input? Haskell has a command called getLine that reads input from the user. So here's an interesting feature of Haskell: commands can return values. But a command that returns a value is different from an expression with a value. The difference means we have to write code like this:


main = do
print "Hello"
x <- getLine
print "The input was"
print x


We have to use <- instead of let ... = ... to get a returned value out of a command. It's a pretty simple rule, the only hassle is you have to remember what's a command and what's a function.

You can also write your own commands. Here's a simple example that returns a pair of lines entered by the user:



get2Lines = do
line1 <- getLine
line2 <- getLine
return (line1,line2)

main = do
(a,b) <- get2Lines
print ("You typed " ++ a ++ " then " ++ b ++ ".")


Note how we used <- instead of let to get the result of getLine. We also did the same to get the result of get2Lines. But notice one extra feature: we had to use return to return a value out of a do block.

So that's pretty much it. Here are the rules:

  1. To introduce a list of commands, use do.
  2. To return a value from a command, use return.
  3. To assign a name to an expression inside a do block use let.
  4. To assign a name to the result of a command, use <-.


So how do you know what's a command and what's an expression? If it has any chance of doing I/O it must be a command, otherwise it's probably an expression. But you can do silly things like:

f x = do
return [x,x]

main = do
x <- f 10
print x


so occasionally there will be commands that don't do any IO. But not the other way round.

Types


Everything in Haskell has a type, even commands. In general, if a command returns a value of type a then the command itself is of type IO a. It just has an IO wrapped around the return type. For example, getLine is of type IO String and f above is of type a -> IO [a]. Apart from that extra IO it's like ordinary expression types. But note that if you have a bit of code like


main = do
print (getLine)


you get a type error. This is because you must use <- to extract the input string from the getLine command. Because getLine has type IO String instead of type String the compiler can catch these kinds of mistakes. Giving commands and expressions different types means we can't accidentally mix them up.

At this point you might have realised that if


f x = do
return x

main = do
x <- f 5
print x


is legal, then this might be legal too:


main = do
x <- return 1
print x


It is. return is simply a function of type a -> IO a that converts a value of type a to a command of type IO a.

You can also rewrite this


inputWithPrompt = do
print "Hello, enter a line"
x <- getLine
return x


with


inputWithPrompt = do
print "Hello, enter a line"
getLine


In the first version we extracted x from getLine and then converted this value back into another command with return. In the second version we took a shortcut and simply ended with getLine. So we probably ought to add one more rule

    5. The type of a sequence of commands is the type of the last line.

getLine already has the type of a command that returns a string so we can just make it the last line. But use the former version if you find this confusing, and then you only need four rules.

There's one last thing that confuses people. The type of an if statement is the type of its branches. So if you want an if statement inside a do block, it needs to be a command, and so its branches need to be commands also. So it's


main = do
if 2/=1
then do
print "2/=1"
print "Cool"
else do
print "2==1"
print "Not cool"


not


main = do
if 2/=1
then
print "2/=1"
print "Cool"
else
print "2==1"
print "Not cool"



Oh and a little tip. If the do block has just one command, you can drop the do. So this compiles:


main = do
if 2/=1
then
print "2/=1"
else
print "2==1"



Why make the command/expression distinction at all?


You don't care, or you wouldn't be reading this. But suppose you did. Then I'd say: Firstly, let's state clearly what this distinction is. If something is of type IO a then it's a command returning an value of type a. Otherwise it's an expression. That's the rule. So here's the important bit: following the rules above it's completely impossible to put an executed command inside an expression. As the only way to do I/O is with commands, that means you have no way to find out what Haskell is doing inside expressions. And that's the whole point! Haskell likes to keep its evaluation secret from you and doesn't want you outputting information that might give away what's going on. For example, if (print 1)+(print 2) were legal Haskell, and it printed 1, followed by 2, then you'd know that it evaluated the left argument to + first, and then the second. Haskell doesn't want to reveal this because it wants the freedom to rearrange your code internally without changing its behaviour. And this makes things easier for you too. When you see a+b you know its exactly the same as b+a, whereas in C++, say, you have to worry about whether a and b have side effects. If the type isn't IO a, then you can sleep at night in the confidence that there are no side effects.

(Do you see why you can't sneak a command into an expression? It's quite neat. To make an expression out of a command you'd need to do something like extract a return value out of a command. But to do that you need to use <- to extract the value of the command. But you can only use <- inside a do block. So the result has to be a command. Once you're in a command, there's no way out!)

It's all lies I tell you!


One last thing. Much of what I've said above is false. But what I hope I have done is describe a subset of Haskell in which you can start some I/O programming without having a clue what a monad is. Once you're successfully bootstrapped, then you'll be ready for the truth. No more of that "monads are too hard" defense.

PS You can think of the above as using taint as a way to sequester commands from other expressions.

Back to E8 this weekend...

Friday, November 23, 2007

What's all this E8 stuff about then? Part 2.

Let's recap: I introduced the idea of symmetry groups, sets of operations that leave something unchanged. A particular type of group is the Lie group, it's a group where everything is smooth enough that you can do calculus with them. I also talked about Lie algebras which you can think of as elements of Lie groups that are infinitesimally close to doing nothing, or as rates of change of elements in a Lie group.

What does symmetry have to do with physics?


As my ultimate goal is talk about physics, it's time for a paragraph on that subject, just so we don't lose sight of where we're going. So imagine a game of pool. Pool is a game of physics, the outcome of any play depends on factors such as friction, collision angles, the coefficient of restitution of the balls (ie. their bounciness) and a whole host of other factors. But despite the complexity of the factors involved, we know that they are invariant under rotation. What I mean is that if we were to rotate the table and everything on it by some random angle (eg. from a North-South orientation to a NW-SE orientation) it makes no difference to the game. (Well, maybe the floor has a slight tilt, so let's rotate the room as well if that's the case.) If everything is rotated together then you end up with a situation that's essentially identical. One of the fundamental principles of physics is that the laws of physics don't care about the absolute orientation of anything. We called the set of rotations in a plane SO(2) so we can summarise this by saying that pool has SO(2) symmetry. But does it have SO(3) symmetry? Obviously rotating the table so that it tilts will change the way the game plays out. But that's because of an 'accident' of history, we just happen to be on the surface of the Earth and there's a gravitational field directed downwards. But from the point of view of fundamental physics there's nothing special about 'down'. If we were to tilt the pool table and then tilt the entire Earth underneath it, we should still expect to see no change to the physics of pool. In fact, we get to try this experiment out every day because the entire pool/Earth system rotates through (approximately) 360° every day and we don't notice any change in the way pool tables work. So pool actually has SO(3) symmetry, but because of stuff going on in our neighbourhood (ie. a having a planet under our feet), it looks like it only has SO(2) symmetry. SO(3) is a fundamental symmetry of physics, but that fact that on Earth gravity messes this up and leaves us with SO(2) symmetry is a phenomenon known as symmetry breaking. More of this later, back to the mathematics.

Lie Group Representations


I've talked about how groups can act on spaces. For example SO(3) acts on 3D space and SO(2) acts on 2D spaces, ie. planes. But there's some flexibility here and we can decouple the group from the space it acts on. The way SO(2) acts on a plane is given by a certain rule. We can make up new rules and study those. For example, we could allow SO(2) to act on 3D space by this rule: rotate your x and y coordinates using the same rule as in the plane, and leave z unchanged. So if a certain 2D rotation (ie. a 90 degree rotation) maps the point (x,y) to (y,-x), then according to this new rule it maps (x,y,z) to (y,-x,z). We could make up other rules. For example we could apply SO(2) to y and z, and leave x untouched. In fact, we can make infinitely many rules like this because there are infinitely many axes (by axis I just mean direction) we could choose to leave untouched. We can define a representation to be a rule that takes an element of a group and turns it into a transformation on a space. We're interested mainly in what are called linear representations (I'll give some justification for this in part 3). These are representations where the transformations map the origin to the origin, map straight lines to straight lines, and map the midpoints between pairs of points to the midpoints between pairs of points. It's not hard to see that rotations do all three of these things. In fact, as I will only talk about linear representations, I'll drop the word linear from now on.

I just showed how SO(2) has multiple representations because of the different ways in can be applied to 3D space. But we can also define alternative rules for how to apply SO(2) to the same space, eg. the plane. Here's a really simple one called the trivial representation: we simply say that elements of SO(2) do nothing. It's not very interesting, but it is a perfectly valid representation. Here's another: apply elements of SO(2) backwards. So if we have an element of SO(2) that says "rotate by 10° clockwise" the backwards rule says "apply a 10 degree rotation anticlockwise". It's a different representation. But if you look at the plane from underneath, it actually looks just like the original representation. So even though these are different representations, there's a sense in which it's equivalent to the original one.

But here's a representation of SO(2) that really is different to the original one: If the element of SO(2) says "rotate by x°" we rotate by 2x° instead. We can think of it as a double-speed rule. We apply the rotations in SO(2) twice. If we run through the elements of SO(2) starting at 0° and working our way up to 360°, then the representation rotates our space twice as fast and ends up rotating the space twice.

What about a half-speed rule? Sounds like it might work. But it fails for a simple reason. A 360 degree rotation is the same as a zero degree rotation. But the half-speed rule says that the former should rotate by 180° and the latter should rotate by 0°. These are distinct rotations, so our rule doesn't make any sense. As a result we're restricted to rules that are n-speed rules, where n is an integer. It's not hard to see that if we choose n to be a negative integer then by looking at the plane from underneath we get the same rule as using -n, looking from above. So we can discard the ones corresponding to negative integers. And we're left with one rule for each non-negative integer. In fact, these are all the representations that are possible on a 2D plane.

Weights


[Optional "advanced section": Now go back to 3D again. When we apply SO(2) to 3D space we find that SO(2) always leaves some axis fixed. So given a 3D representation of SO(2), our 3D space always splits into a 1D space that's left unchanged, and a 2D space. You can think of SO(2) as using the trivial rule on the 1D space, and using any of the rules in the previous paragraph on the 2D space. Something similar happens in any dimension of space. The representation of SO(2) will split up into pieces with some axes left untouched, and others, always grouped in pairs, that transform like in the previous paragraph. In other words, the representations of SO(2) can be broken down into fundmental building blocks which we call irreducible representations. In the previous paragraph I actually classified all of the irreducible representations of SO(2).

We can use the properties of SO(2) to understand other Lie groups. The idea is that not only is SO(2) frequently a subgroup of Lie groups, it's often a subgroup in multiple ways, ie. there are multiple ways to find a copy of SO(2) in SO(3). We've already seen that SO(2) can be embedded in SO(3) by interpreting rotations around any fixed axis as elements of a sub-SO(2). But notice how if we find two different embeddings of SO(2), they are forced to "interfere" with each other. For example, let's choose an SO(2) that acts on the x-y plane as described above. Suppose we now pick another SO(2) subgroup. No matter how we choose it, it must act on either x or y. There simply isn't room to find two SO(2)'s that don't at some point overlap with each other. But imagine the group of 4-dimensional rotations, caled SO(4). (It's not that scary, it's just like SO(3) except that we can make rotations that "mix up" any pair of directions, or combinations of such rotations.) We could pick one SO(2) that acts on the x-y plane and another that acts on the z-t plane (assuming the fourth axis is called 't'). But we won't be able to pick more than two for the same reason as before. Suppose we pick as many SO(2)'s as are possible. Then what we have is whats known as a maximal torus.

Here's a feeble attempt at drawing all of this. One of the SO(2)'s inside SO(4) rotates one pair of axes into each other, the other one rotates the other pair:

Now think about representations of a Lie group. These are a rule that tell you how each element in the group transforms our space. As the elements of our SO(2) subspaces obviously live in the group the rule must apply to these also. So a representation on a group also gives a representation of all of the SO(2)'s in it. So each SO(2) making up our maximal torus must basicaly act like we described above: by rotating some pairs of axes around at some "speed" and leaving other axes untouched. So if we pick a maximal torus of a Lie group, any representation splits up into a bunch of pairs of axes and each pair has an integer "speed" associated to each SO(2) in the maximal torus. The tuple of "speeds" for each pair of axes is known as a weight. By interpreting the weights as the coordinates of points, the collection of weights that arise from any particular representation can be drawn in a diagram. The dimension of the diagram is the number of non-interfering SO(2)'s in the maximal torus. And if you want to see what these diagrams look like, Garrett Lisi has some nice ones in his paper. Note that that paper also contains some "root diagrams". I don't have time to talk about those except to say that (1) they are weight diagrams for one particular special representation and (2) they tell you a lot about the geometry of all possible sets of weights for a particular Lie group.

One last thing for the "advanced" section: a similar analysis can be carried out for Lie algebras as opposed to Lie groups. When physicists draw diagrams of weights they are often talking about the weights of Lie algebras, but these things are intimately related.

There's probably one important message to take from this section: there are a lot of constraints on representations, you can't just make up any old rule. So just knowing that a given Lie group acts on some space you already know a lot of information, even if you don't know what the exact representation is. BTW One of the biggest applications of representation theory is in chemistry where you can read off information about the number of electrons allowed in atomic orbitals directly from representations of SO(3).]

Final words


That was pretty tough. Next time I'll talk about physics and it should get a bit easier. I'll explain why much of modern physics is the study of Lie group representations and I'l explain the 'exceptional' and 'simple' in the title of Garrett's paper.

(And I apologise for all of my sins of omission. For example, the analysis above only works for certain types of Lie group and really everything should be done using complex numbers, not real numbers. But I'm trying to compress a few hundred pages of mathematics into a single posting.)

Saturday, November 17, 2007

What's all this E8 stuff about then? Part 1.

With all the buzz about E8 recently I want to give an introduction to it that's pitched slightly higher than pop science books, but that doesn't assume more than some basic knowledge of geometry and vectors and a vague idea of what calculus is. There's a lot of ground to cover and so I can't possibly tell the truth exactly as it is without some simplification. But I'm going to try to be guilty of sins of omission rather than telling outright lies.

Discrete symmetry groups.

Consider a perfect cube. If you turn it upside-down it still looks just like a cube in exactly the same configuration. Rotate it through 90 degrees about any axis going through the centre of a face and it looks the same. Rotate it through 120 degrees around a diagonal axis and it also looks the same. In fact there are 24 different rotations, including simply doing nothing, that you can perform on a cube, that leave it looking how you started. (Exercise: show it's 24.)



These 24 operations form the symmetries of the cube. Note how you can take any pair of symmetries and combine them to make another symmetry simply by applying one and then another. For example, if A represents rotation by 90° about the x-axis and B represents a rotation by 180° about the same axis then we can use shorthand to write AB to mean a rotation by 270°. The convention is that we read from right to left and so AB means "do B first, then A". If we define C=AB think about what AC means. It's a 270°+90° rotation, ie. a 360° rotation which does nothing. The convention is to call that 1. So C is the opposite or inverse of A because AC=CA=1. When you have a bunch of operations that you can combine like this, and every operation has an inverse, then the operations form what mathematicians call a group. The 24 symmetries of the cube form a group of size 24. Although you can think of a group as a bunch of operations on something, you can also think of a group as a thing in its own right, independent of the thing it acts on. For example, we can write down some rules about A, B and C above. AC=CA=1, C=AB, AAB=1, BB=1, C1=1C=C and so on. In principle we can give a single letter name name to each of the 24 symmetries and write out a complete set of rules about how to combine them. At that point we don't need the cube any more, we can just consider our 24 letters, with their rules, to be an object of interest in its own right.

We can think of the group as acting on the cube, but if we specified some suitable rules, we can make this group act on other things too. For example, we can use the coordinates (x,y,z) to specifiy any point in space. We used B to mean a 180 degree rotation around the x-axis. If we apply that rotation to any point in space (x,y,z) it gets mapped to (x,-y,-z). Similary A maps the point (x,y,z) to (x,z,-y). So the the 24 element cube group also acts on 3D space. This is an example of a group representation, and I'll get onto that later.

(So far I've been talking about rotations we can perform on the cube. But imagine you could also reflect the cube - as if looking at the cube in a mirror and then pulling the reflection back out into the real world. You can't do this with a real cube but you could with an imaginary one. How many symmetries do you think you get now?)

SO(3) and Lie groups

With the cube example above, all of our rotations had to be multiples of certain angles like 90° and 120°. Any kind of rotation through an angle like 37° leaves the cube in a weird orientation that doesn't look like the original. So the symmetries of the cube form a discrete group because there's only a discrete set of possibilities. But there are also continuous groups where you can smoothly slide from one symmetry to another and everything in-between is also a symmetry. For example consider a perfect 3D sphere at the origin. You can rotate it through any angle and it still looks like the same sphere. Any rotation about the origin is a symmetry of the sphere, and given two symmetries you can slide from one to the other. At every stage, the sphere will still look like the original sphere. Like in the discrete case, this set of symmetries forms a group, because you can combine and invert operations. But there are now infinitely many operations. We can rotate around any axis by any angle we choose and it's still a sphere. But even though there is an infinity of such rotations, it's still manageable. In fact, any rotation can be described by three parameters, for example the three that aeronautical engineers might use called pitch, roll and yaw. Mathematicians call the group of 3D rotations SO(3). That 3 refers to the fact that we're talking about rotations of 3-dimensional space. But because it takes three parameters to describe a rotation, SO(3) is itself 3-dimensional. This double appearance of the number 3 is a 'coincidence', as we'll see in the next section.

There's one more thing I need to add about continuous groups. Some are also differentiable. You don't need to worry in too much detail about what this means but it ultimately says that we can do calculus with these groups. For example, we can ask questions not only about the orientation of a sphere, but also about the rate of change of its orientation. So SO(3) is a differentiable group. The usual name for these is Lie groups, after the mathematician Sophus Lie.

Subgroups and SO(2)

SO(3) is the group of 3D rotations. But all of the 24 symmetries of the cube we considered are also rotations. So the cube is actually a subset of SO(3). And as it's a subset of a group that's also a group, we call it a subgroup.

Now consider the set of all rotations we could perform on a 2D space. These also form a group. In fact, given an element of this group we can apply it to some text to rotate it in 2D. As above, use 1 to mean "no rotation", use A to mean "rotate 45° anticlockwise" and B to mean "rotate 45° clockwise". We can illustrate how these act on Hello:


If you've ever playing with a drawing package, like the one I just used, this will all be very familiar. And it should also be obvious that 1=11111=AB=BA=AAAAAAAA=BBBBBBBB. (Note how I'm only talking about the orientation of the word. If we were taking into account the position of the word on the page it'd be more complex. But there would still be a group describing it.)

Although the screen the text Hello is on is flat and two-dimensional, it can be considered to be embedded in the 3D space around it. And all of the 2D rotations can be considered to be 3D rotations also, just 3D rotations that happen to be oriented along the axis pointing straight into the screen.



So 2D rotations can be considered to be particular 3D rotations and and SO(2) is a subgroup of SO(3).

Note that 2D rotations can be described by just a single number, the rotation angle. So SO(2) is 1-dimensional, even though it's defined through its effect on a 2D space. The two threes that appear in connection with SO(3) above really are a coincidence. In 4-dimensions we have SO(4) which is actually a 6-dimensional group. In 4D you need 6 numbers like pitch, roll and yaw to define your orientation. But I'm not asking you to try to visualise that now.

Rates of change and Lie algebras.

Imagine a sphere tumbling around. At any moment its orientation in space can be described by an element of SO(3). So we might naturally be led to ask the question "what is its rate of change of orientation?". What would such an answer look like?

Let's review basic calculus first. Consider position instead of orientation. Suppose a vehicle is driving along a road. At time t it's at distance A and at time t+1s it's at distance B. Then its average velocity over the 1s is (B-A)/1s. But that's an average over a period. What's its velocity exactly at time t? We could compute its distance, say C, at time t+0.0001s, and compute (C-A)/0.0001s. But that's still an average over a short period of time.The tool for telling us about instantaneous rates of change is calculus, and here it tells us that if the motion is continuous we can take a limit. So if at time t+d, its position is given by D, then the velocity is the limit, as d tends to zero, of (D-A)/d.

Back to the tumbling sphere. We could say "at time t it has orientation A and time t+1s it has orientation B". But as with velocity, that tells you what happened between t and t+1s, not what it was doing at any instant. We haven't said what the rate of change is precisely at time t. Imagine we take some kind of limit of the above statement as we consider the change from time t to time t+d where d becomes arbitrarily small. One thing we can say is that if the motion is continuous then the start and finish orientations, say A and D, will become arbitrarily close. The difference between them will be an infnitesimally small rotation. You can describe an infinitesimally small rotation by an infinitesimally small vector: let the direction of the vector be the axis of rotation and its length be the angle of rotation. Suppose this vector is V. Then the rate of change of orientation is V/d. This is commonly known as angular velocity and it's a vector. Note that it's not an infinitesimally small vector because the small V and small d will usually cancel to give a nice finite size angular velocity, just as the ordinary velocity calculation above gives a finite answer. In fact, we can easily interpret the angular velocity vector. Its direction is the axis of rotation and its length is the rate of change of angle around that axis (in degrees per second for example).

At this point you might say "why not represent any angle by a vector, not just an infinitesimal one". For example, you might think v=(180°,0,0) could be used to mean a rotation through 180° around the x-axis. You can do that if you like, but this doesn't resepct the additive properties of vectors. For example, it'd be useful if v+v=(360°,0,0) then represented "no rotation" because a 360° rotation is no rotation at all. But "no rotation" should be represented by (0,0,0). So this isn't a very convenient way to talk about rotations. But rates of change of orientation are different. If we define w=(180°/s,0,0) then w represents rotation around the x-axis at one revolution every two seconds. w+w=(360°/s,0,0) is also a perfectly good angular velocity, one that's twice as fast. Unlike angles, angular velocities don't wrap around when you get to 360°. You can crank angular velocities up as high as you like. So vectors are a good way to represent rates of change of orientation, even if they're not good at representing orientations.

Let's do this in 2D. It's much easier. All rotations are in the plane and so we don't need to specify an axis. We just need to specify a single number to define an orientation, and a rate of change of orientation is just given by a single number also. That one number can be thought of as a 1-dimensional vector.

So what kind of thing is this rate of change object? In each case it can be thought of as a vector (in the case of SO(2) it's a 1-dimensional vector), but it's not an element of the group because an angular velocity is not the same thing as an orientation. Mathematically it's called an element of a Lie algebra. The Lie algebra of a Lie group is the set of vectors that describe rates of change of elements of the group. I mentioned above that SO(4) is 6-dimensional. This means that the Lie algebra is made of 6-dimensional vectors. You can think of the 6 components of these vectors as describing how the 6 generalised pitch, roll and yaw parameters vary over time.

There's a convention that we use lower case names for Lie algebras. So the Lie algebra of SO(3) is called so(3). That makes a sort of intuitive sense, so(3) describes really small rotations and SO(3) describes full-sized ones.

Closing words

That brings me close to the end of Part 1. But I can now say one very important thing. There are a number of things that mathematicians call E8, but one of them is a Lie algebra. And that's the thing that Garrett Lisi is talking about.

PS I welcome feedback from anyone. I want to make this as understandable to as many people as possible.

Sunday, November 11, 2007

A Small Combinatorial Library

I thought it'd be fun to dig up my first ever Haskell project from about 2 years ago. I wrote a little about part of it before but I didn't write about what I did with it. Apologies for some of the formatting, I haven't yet got around to updating every line of this code. But it definitely still works. (As usual with old code, I had to fix it in places. How is it that old code ceases to work? Is there some kind of bit rot that sets in if you leave code unused for more than a couple of years?)

And as usual, this post is written in literate Haskell so you can just run it interacively in ghci.


> import Ratio


The idea here was to define a little DSL in Haskell for counting the size of a variety of combinatorial objects. Rather than start with the theory, I'm going to look at some examples in action first.

The idea is that I'll be considering different ways to structure the set {1,2,...,n} (with n=0 meaning the empty set). Let's start with the simplest example: the singleton set with one element. I'll denote this by the symbol z. Consider how many ways we can structure the set {1,2,...,n} as a one element set. It's pretty obvious that you can only structure this set as a one element set if it has one element. So if we use count to count the number of ways of doing things, we expect count 1 z = 1 and count n z = 0 for n equal to anything else.

In fact, let's define


> sample a = map (flip count a) [0..10]
> ex0 :: Fractional a => [a]
> ex0 = z


I haven't told you what value z has, just how it responds to the count function. You can now try evaluating sample ex0.

Now consider the next function in our DSL, list. list a gives the number of ways to form a list of a's from {1,...,n}. A good example is simply


> ex1 :: Fractional a => [a]
> ex1 = list z


ie. the number of ways to form a list of n http://sigfpe.blogspot.com/elements. For example, for n=3 we have [1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]. Obviously count n (list z) is n!.

Many of the examples I'm going to look at are listed in Sloane's famous encyclopedia of integer sequences. n! is listed as A000142.

Next consider pair a which is the number of ways of forming an ordered pair of a's. For example, the pairs of lists from {1,2,3} are ([],[1,2,3]),([1],[2,3]),([2],[1,3]),([3],[1,2]),([1,2],[3]) and so on. With a little work you should count 24 of those. This is given by


> ex2 :: Fractional a => [a]
> ex2 = pair (list z)


Conversely we can consider lists of pairs. Clearly this is zero for n odd. For n=4 we get terms like [(1,2),(3,4)], [(1,3),(2,4)] and so on. Again there are 24. The code is:


> ex3 :: Fractional a => [a]
> ex3 = list (pair z)


This DSL also supports constraints. For example we can redo the pair of lists example with the proviso that the lists are both non-empty:


> ex4 :: Fractional a => [a]
> ex4 = pair (nonEmpty (list z))


More generally the // operator can provide any numerical constraint. For example:


> ex5 :: Fractional a => [a]
> ex5 = pair ((list z) // (>=3))


Gives the ways of forming pairs of lists of length >=3. Clearly this is zero for n<6 and for n=6 we just get 6!

At this point I need to point out that these examples all run petty quickly so we're not simply generating and counting.

The next function is ring. A ring is like a list except that cyclic permutations are considered equal. So [1,2,3], [2,3,1] and [3,1,2] are the same ring. How many rings are there on {1,2,...,n}?


> ex6 :: Fractional a => [a]
> ex6 = ring z


If you evaluate sample ex6 you may recognise that it's simply (n-1)!.

The next useful function is set. set a counts the number of sets of a's you can form from {1,...,n}. Clearly count n (set z) is 1 as there's only one set you can form from {1,...,n}. So here's a nice example:


> ex7 :: Fractional a => [a]
> ex7 = set (nonEmpty (ring z))


That's counting the ways of forming {1,...,n} into sets of non-empty rings. If you apply sample you may spot that we have n! again. There's a really neat explanation why. Every permutation of {1,...,n} is a product of disjoint cycles. The order of the product doesn't matter so we have a set (rather than a list) of disjoint cycles. A cycle is obviously non-empty and cycles are just rings. So ex7 counts permutations. And the number of permutations on {1,...,n} is n!.

Here's an old classic (A000110):


> ex8 :: Fractional a => [a]
> ex8 = set (nonEmpty (set z))


This is the number of ways of partitioning {1,...,n} into sets of non-empty sets. For n=3 we get {{1,2,3}},{{1},{2,3}},{{2},{1,3}},{{3},{1,2}} and {{1},{2},{3}}. In other words, 5.

Another useful function is oneOf. oneOf a b means forming {1,...,n} into either an a or a b.

For example we can count the ways to split a set into a set of singletons and pairs. For n=3 we have {1,(2,3)},{2,(1,3)},{3,(1,2)}, the previous three with the pair flipped and {1,2,3}, in other words count 3 ex9 is 7:


> ex9 :: Fractional a => [a]
> ex9 = set (oneOf z (pair z))


By the way, that's Sloane's A047974.

Now some more examples just for fun:

Partitions into lists of size >=2. (Sloane's A052845):


> ex10 :: Fractional a => [a]
> ex10 = set ((list z) // (>=2))


Partitions into lists of cycles (Sloane's A007840):


> ex11 :: Fractional a => [a]
> ex11 = list (nonEmpty (ring z))


Partitions into an even number of non-empty sets (Sloane's A020557):


> ex12 :: Fractional a => [a]
> ex12 = set (nonEmpty (set z)) // even


Partitions into sets of non-empty sets of non-empty sets (Sloane's A000258):


> ex13 :: Fractional a => [a]
> ex13 = set (nonEmpty (set (nonEmpty (set z))))


Thirteen examples is enough for anyone. So how does it work? I haven't time to give a crash course on generating functions but I can at least sketch the idea. In particular, exponential generating functions. The idea is this: if I have a sequence a0,a1,a2,... then its exponential generating function is:

a0+a1z+a2z2/2!+a3z3/3!+...

So we can represent any sequence of integers as a power series. For some sequences this series doesn't converge. But that's not a problem. We can still formally manipulate the series as long as we don't try to substitute an actual real number for z. So we call these formal power series. The amazing fact is this: the sorts of operations we might perform on power series when performing ordinary arithmetic (adding hem, multiplying them, finding their logarithms and so on) turn out to be exactly the kind of operations you would perform on them to solve combinatorial problems. So many combinatorial problems can be reduced to finding power series for various arithmetical combinations of other power series.

So now we need some code to manipulate power series. This is such a popular Haskell example that I'll breeze through the code and give some links to other people's documentation. I represent power series simply as a list of coefficients.

When computing with formal power series we frequently require products of the form (a0+a1*t+a2*t2+...)*0 We know the result is zero no matter what the ai are and by immediately returning zero we can eliminate bottomless recursions. We use the operator *! for this purpose. !* is the symmetric counterpart of *!


> (*!) _ 0 = 0
> (*!) a b = a*b

> (!*) 0 _ = 0
> (!*) a b = a*b


We represent formal power series over an algebraic structure a as infinite list of type a.

Addition and subtraction are straightforward pointwise applications of addition and subtraction in R.


> (^+) a b = zipWith (+) a b
> (^-) a b = zipWith (-) a b


Multiplication itself is achieved through the operation of convolution:


> ~(a:as) `convolve` (b:bs) = (a *! b):
> ((map (a !*) bs) ^+ (as `convolve` (b:bs)))


If f(z) and g(z) are two power series, this computes f(g(z)):


> compose (f:fs) (0:gs) = f:(gs `convolve` (compose fs (0:gs)))


Note how that function fails if the series for g starts with anything other than zero. A few of my operations will be like that. They correspond to situations where the operation doesn't make sense on a power series (at least no without some additional structure or assumptions).

The (functional) inverse of a function:


> inverse (0:f:fs) = x where x = map (recip f *) (0:1:g)
> _:_:g = map negate (compose (0:0:fs) x)


Reciprocal of power series:


> invert x = r where r = map (/x0) ((1:repeat 0) ^- (r `convolve` (0:xs)))
> x0:xs = x


Division of power series is defined in terms of invert though there is probably a direct definition that is as simple. Note the special case handling of power series that both start with zero:


> (^/) (0:a) (0:b) = a ^/ b
> (^/) a b = a `convolve` (invert b)


Here comes the definition of z that we've been using:

z = [0,1,0,0,...] in other words the generator of formal power series.
one = [1,0,0,...], ie. the multiplicative unit.


> z :: Fractional a => [a]
> z = 0:1:repeat 0


d computes the derivative of formal power series using
d (t^n) = n*t^(n-1)


> d (_:x) = zipWith (*) (map fromInteger [1..]) x


Integration is simply the inverse of differentiation. We assume the integral has a zero leading term.


> integrate x = 0 : zipWith (/) x (map fromInteger [1..])


Compute the square of a formal power series:


> square x = x `convolve` x

> instance (Num r) => Num [r] where
> x+y = zipWith (+) x y
> x-y = zipWith (-) x y
> ~x*y = x `convolve` y
> fromInteger x = fromInteger x:repeat 0
> negate x = map negate x
> signum (x:_) = signum x:repeat 0
> abs (x:xs) = error "Can't form abs of a power series"

> instance (Fractional r) => Fractional [r] where
> x/y = x ^/ y
> fromRational x = fromRational x:repeat 0

> sqrt' x = 1:rs where rs = map (/2) (xs ^- (rs `convolve` (0:rs)))
> _:xs = x

> instance (Fractional r) => Floating [r] where
> sqrt (1:x) = sqrt' (1:x)
> sqrt _ = error "Can only find sqrt when leading term is 1"


We apply transcendental functions to formal power series by finding suitable integro-differential representations of the function. For exp consider:
d/dt exp(f(t)) = f'(t)*exp(f(t))
We can immediately write
exp(f(t)) = integral f'(t)*exp(f(t))
This gives a recursive definition of the exponential of a power series.


> exp x = e where e = 1+integrate (e * d x)


d/dt log(f(t)) = f'(t)/f(t) which gives the following definition:


> log x = integrate (d x/x)


We give mutually recursive definitions for sine and cosine:


> sin x = integrate ((cos x)*(d x))
> cos x = [1] ... negate (integrate ((sin x)*(d x)))


The inverse trignometric functions are similarly straightforward except for the inverse cosine. The inverse cosine at zero is transcendental and so might not exist in R. On the other hand there is no power series for acos(1+x). So we can give no formal power series for this function.


> asin x = integrate (d x/sqrt(1-x*x))
> atan x = integrate (d x/(1+x*x))
> acos x = error "Unable to form power series for acos"


The hyperbolic trigonometric functions are similar to the ordinary trigonometric functions.


> sinh x = integrate ((cosh x)*(d x))
> cosh x = [1] ... integrate ((sinh x)*(d x))

> asinh x = integrate (d x/sqrt(1+x*x))
> atanh x = integrate (d x/(1-x*x))
> acosh x = error "Unable to form power series for acosh"

> pi = error "There is no formal power series for pi"


t is the generator of the formal power series. t' is the generator for formal power series over the rationals.


> t :: (Num a) => ([a])
> t = 0:1:repeat 0

> t' :: [Rational]
> t' = t


The ... operator is used to specify a formal power series with known leading terms. If l is a list of n elements then the first n elements of l ... f are given by the elements of l. The remaining terms are those of l.


> lead [] x = x
> lead (a:as) x = a : (lead as (tail x))

> a ... x = lead a x


And now we can get back to combinatorics. The definitions of the functions I gave above are stunningly simple:


> one = t'
> list x = 1/(1-x)
> set x = exp x
> ring x = -log(1-x)
> pair x = x*x
> oneOf a b = a+b


How does that work? Just about any book on combinatorics will tell you. I quite like Generatingfunctionology though it does make a mountain out of a molehill in places.


> necklace x = -log(1-x)/2+x/2+x*x/4


union a b is the number of ways of splitting the set {1,...,n} into two so that one half is a combinatorial structure given by a and the other half is given by b. It's definition is amazingly simple:


> union a b = a*b

> (//) :: Fractional a => [a] -> (Integer -> Bool) -> [a]
> (//) a c = zipWith (\a-> \b->(if (c a :: Bool) then b else 0)) [(0::Integer)..] a
> nonEmpty a = a // (/= 0)

> factorial 0 = 1
> factorial n = n*factorial (n-1)

> count n a = numerator ((a!!(fromInteger n)) * (factorial (fromInteger n)))


Now for some more examples. A tree is, by definition, a single parent node and a set of child trees. So we have the recursive definition:


> tree x = p where p = [0] ... union (set p) x


The [0] ... is to get the recursion started. Try sample (tree z) to count the number of trees.

The last example is neat. graph is the exponential generating function of the number of ways of making a graph from {1,...,n}. (A006125) It's not too hard to see what's going on: a graph on a set S is given by a set of edges. In other words it's a subset of the set of unordered pairs of elements of S. If S has size n, there are n(n-1)/2 such pairs.


> graph = [2^((n*(n-1) `div` 2)) / product (map fromInteger [1..n]) | n <- [0..]] :: [Rational]


exp has a really amazing property. If f is the expnential power series corresponding to counting some structure, say F, on {1,...,n} then exp(f) is the power series of the structure consisting of the union of a bunch of non-zero sized F's. We can put this into practice: all graphs can be decomposed as the union of connected graphs. So the generating function of connected graphs is given by


> connectedGraph = 1 + log graph


(The 1+ is because of the non-zero I just mentioned.)

That was Sloane's A001187.

By the way, this plays a big role in quantum field theory where we have to enumerate Feynman diagrams.

Anyway, if you didn't previously know about generating functions I hope this whets your appetite to read up on them some more. They are an almost magical part of mathematics that makes the solution of some very difficult seeming combinatorial problems almost trivial. They also have countless real world applications ranging from statistics to digital signal processing.

Saturday, November 03, 2007

A Blobby Language

As I don't want to run the risk of drawing down the ire of George Lucas's lawyers I work hard not to post anything in my blog that has anything to do with my work, that could be useful for work, or even anything that could have any useful application whatsoever. But earlier this year I presented a little bit of work publicly and the presentation is available online, so there can be no harm in linking to it.



First some background: Renderman is a "production quality" 3D renderer developed by Pixar and used by many visual effects houses, including my employer, ILM. One of its neatest features is that lighting and shading computations (ie. the calculation of the colour of every pixel once the geometry that you're looking at has been determined) are performed in a shading language called Renderman Shader Language. It's a C-like DSL that's compiled to a bytecode that runs on a SIMD virtual machine.

Buried within the Renderman documentation is another less well known virtual machine that is used to compute "blobbies". A blobby is simply an isosurface, ie. the set of points that form the solution to f(x,y,z)=c for a function f. The virtual machine is the one on which the function f is computed. Unlike shading, however, there are bytecodes for this VM, but no programming language. So this presentation was about my solution to the problem: embedding a tiny DSL in Python that compiles down to bytecodes. You write ordinary looking Python code but arithmetic operators and other functions are overloaded to build an AST that is converted to an "opcode" stream that is sent to Renderman. From a computer science standpoint it really is basic stuff, but as the author of the blobby code (Tom Duff of Duff's Device in fact) said to me, he's been waiting for someone to write a compiler for his opcodes for years and mine was apparently the first one. So here's a link to the presentation. There isn't really much detail in the presentation, but it might give at least a tiny flavour of what I do in my day job.

Saturday, October 20, 2007

Using Thermonuclear Pennies to Embed Complex Numbers as Types

Forget about types for now.

The Game of Thermonuclear Pennies


The game of Thermonuclear Pennies is a lot like the game of Nuclear Pennies. It's played on a semi-infinite strip of cells, extending to infinity on the right, with a bunch of pennies in each cell (and a finite number in total). Instead of a single penny fissioning into two pennies, it now splits into three adjacent pennies. And conversely, three neighbouring pennies may be fused into one.

Here are examples of legal moves:

and


Again the puzzles consist of trying to get from a start position to a target position. Here's a nice example:



Just as with Nuclear Pennies we can assign numerical values to positions in this game in such as way that if there is a legal sequence of moves from A to B then the value of A equals the value of B. In this case we assign the values very slightly differently. Each cell is assigned a value as follows:

Where i is the usual square root of -1. The value of a position is simply the sum of the values of the pennies where each penny takes on the value of the cell it sits in. So

As before, we say that moving from A to B is paralegal if A and B have the same value. Because i satisfies i=1+i+i2 it should be clear that legal implies paralegal again. But here's a surprising fact: if both A and B have pennies not on the leftmost cell, then a move from A to B is legal if it is paralegal. In other words, we can tell if a sequence of moves is possible just by looking at the numerical value of the start and end points. What's more, the corresponding result also holds for Nuclear Pennies and a wide variety of related games besides. Before proving that I want to talk about the algebraic structure of these types of games. (And I just figured out how to procedurally generate diagrams with Omnigraffle using Applescript so it's an excuse to draw lots of diagrams.)

(BTW If we're allowed to have a negative number of pennies in a cell then you can simply treat a position in these games as polynomials with integer coefficients. You can then use standard theorems about polynomials to prove the result in a straightforward way. But those theorems rely on subtraction, and without negative numbers those methods fail.)

Firstly, we can add positions in Thermonuclear pennies (which I'll now call TNP). Simply add the numbers of pennies in each cell:

We can also multiply positions. We do this by making a 'multiplication table' from the original positions and then summing along the lower-left to upper-right diagonals. I hope this example makes it unambiguous:


Exercises. Convince yourself that A+B=B+A, A*(B+C) = A*B+A*C, (A+B)*C = A*C+B*C, A*(B*C) = (A*B)*C.

If you did the exercises, you've now shown that TNP positions form a commutative semiring, or rig, with the empty board serving as 0.

Now we're ready to use a proof from Objects of Categories as Complex Numbers by Fiore and Leinster. If we define

then every position is a polynomial, with non-negative integer coefficients, in x. We can also interpret the equation x=1+x+x2 as saying that fission and fusion are legal moves. More generally, we consider two positions equivalent if there is a sequence of legal moves going from one to the other where each move maps f(x)+x to f(x)+1+x2 or the converse. If we define p1(x)=x and p2(x)=1+x2 then equation (3) in that paper defines exactly what we mean by a sequence of legal moves. (BTW For those wondering about the order in which I wrote this, I read that definition after inventing the game :-) So now we can apply Theorem 5.1 to find

Corollary


Let q1(x) and q2(x) represent TNP positions with at least one penny somewhere other than the far left, then if x^2+1=0⇒q1(x)=q2(x) ring-theoretically, then there is a legal sequence of moves from q1(x) to q1(x).

"x^2+1=0⇒q1(x)=q2(x)" is just another way of saying q1(i)=q2(i). So we have a simple way to tell whether there is a legal way of getting from one position to another. The puzzle example I gave above is soluble because i5=i.

Actually, the corollary isn't too hard to prove without the theorem. Here's a hint for how to do it. If we allow negative numbers of pennies the puzzle is fairly easy to solve. But we don't need negative pennies because if there is at least one penny, we can saturate as many positions as we like with as many pennies as we like simply by madly fissioning pennies all over the place in a big chain reaction. So we start by doing the chain reaction to borrow lots of pennies, then carrying out the solution using negative numbers (which won't actually ever go negative if our chain reaction was big enough) and then reversing the chain reaction to pay back what we borrowed. (It's a bit like real life. In a financial market without negative numbers there are many transactions that can't be performed. But as soon as we allow borrowing we open up many more possibilities.)

Embedding Complex Numbers as Types


So back to types. People have frequently found the need to embed the natural numbers as types. A popular scheme is something (in Haskell) like

> data Zero
> data S a = S a
> type Two = S One
> type Three = S Two

and so on. Then we can go on to define addition and multiplication. But types already have a natural addition and multiplication: the type constructors Either and (,). The problem is that, for example, Either One Three isn't the same type as (Two,Two). We could relax things a bit and allow isomorphism instead of equality. But even then, these types aren't isomorphic. Instead we could define:

> data Zero
> data Unit = Unit
> type S a = Either Unit a
> type One = S Zero
> type Two = S One
> type Three = S Two

Now we can use Either and (,) as addition and multiplication.

But if you're content to live with isomorphism then maybe we could embed other types. Consider the type

> data Tree = Leaf | Trunk Tree | Fork Tree Tree

It's easy to write an isomorphism Tree -> Either One (Either Tree (Tree,Tree)). In other words, up to isomorphism we have Tree=1+Tree+Tree2. If you remember my earlier post it should be clear that legal sequences of moves in TNP give rise to isomorphisms of types constructed from Tree. In other words, theorems about TNP apply to Tree. Therefore given two polynomials, p1 and p2, p1(Tree)=p2(Tree) if and only if p1(i)=p2(i), as long as the pi contain non-constant terms. Looked at another way, given any Gaussian integer, a+bi, we can embed this as a type in such a way that the embedding respects Either and (,). In fact the type

aTree+bTree2+cTree3+dTree4

embeds (d-b)+(a-c)i. For example, abusing Haskell notation,

> type Zero = Tree+Tree3

really does act like zero in that (Zero,p(Tree)) has an isomorphism with Zero for any non-constant polynomial p.

As far as I can see, this fact is completely and utterly useless...

NB When I say isomorphism above I mean "particularly nice isomorphism", which in this case means an isomorphism that takes time O(1). Otherwise all countable tree structures would trivially be isomorphic.

Saturday, October 06, 2007

How can it possibly be that we can exhaustively search an infinite space in a finite time?

There has been a bit of talk recently about how it is possible to exhaustively search certain infinite spaces in a finite time. Unfortunately the explanations seem to be couched in topological language making them a little inaccessible. So I thought it'd be good to attempt a rigorous, but elementary, proof so that is possible to see exactly where the 'magic' happens.

Consider the type of infinite streams of booleans:

type Natural = Integer
Stream :: Natural -> Bool

Can we write a function search, that takes as argument a boolean predicate on Stream, and tells us whether or not there is a stream that satisfies it?

search :: (Stream -> Bool) -> Bool

This sounds like an impossible task. There are uncountable many infinite streams of booleans, so how can we hope to search them all? There's one important thing in our favour: we're restricting ourselves to predicates that are computable and total. In other words, whatever stream we pass to the predicate, they have to return a boolean in a finite time.

First, let's deal with an objection that make search seem impossible to implement. Given a 4-tuple of integers, we can encode these integers in a stream. For example, the integers (written in binary) (11,10,1,111) could be encoded as

11110011100011001111110100000000000000000...

where we code the binary digit b as 1b, we separate the integers using 00 and terminate the sequence using 01. We could then write a predicate that checks to see if the 4-tuple gives a solution to xn+yn=zn with x,y,z>0 and n>=2. Plugging that into search we apparently have a way to decide the Fermat-Wiles theorem without doing any difficult number theory. Surely, therefore, search can't exist? But the predicate we've just described isn't total. Suppose we feed it the string 111111... as input. It's going to think that these represent an x with leading digits 111... and it's going to sit there collecting digits forever. It's never going to terminate. So the requirement that the predicate be total is such a strong condition that we can't actually implement the Fermat predicate, even though it uses nothing more than simple arithmetic. So maybe search is beginning to look a little more plausible.

Here's the key thing: any predicate on boolean streams must terminate after a finite time, meaning it only reads a finite number of booleans. So given any predicate p, there is a (mathematical) function dp on the boolean streams with the property that dp(s) is the index of the last bit p looks at when given s as input. dp(s) is obviously always finite.

Now at this point you might immediately jump to the opposite conclusion that of course search is possible because you only need to check a finite number of bits. But even though dp(s) is always finite, it might be unbounded as s varies over the infinite set of streams. So the finiteness of dp on its own is not enough to bound our search. For example, consider predicates on 4-tuples of integers again. Each of these predicates reads only a finite number of bits, and yet we don't expect a general purpose terminating search algorithm can tell us whether or not Fermat's equation has a solution. So to prove termination for the boolean stream case we need to exploit some more structure.

Here's a first hint at what that structure might be. Suppose we try to make dp unbounded by arranging that

dp(00000....) = 1

dp(10000....) = 2

dp(01000....) = 3

dp(11000....) = 4

and so on.

You might notice that we immediately have a contradiction. If dp(00000...)=1 then the predicate must stop reading bits after it has read an initial 0. So actually, the sequence dp(01000...) is also 1. Lots of streams share prefixes in this way, so the question is, does the infinitude of the set of boolean streams win out, or does the amount of sharing of prefixes eventually tame the infinity?

Consider the following infinite binary tree:



Each boolean stream corresponds to a single downward path through this tree. Fix a predicate p. Suppose, then, that this predicate, when fed the stream 011..., requires only 3 bits to be read, ie. dp(011...) = 3. Then we can cut off everything below that point on the tree because no predicate need read that far. We've removed every infinite path passing through 011:



Every stream s has a finite cutoff after dp(s) steps. And after we've hacked away all of the bits that our predicate never looks at what we're left with is a finitely branching tree with no infinite paths. Now comes the magic: we invoke König's Lemma which says "every tree that contains infinitely many vertices, each having finite degree, has at least one infinite simple path.". Conversely, if there are no infinite paths, the tree must be finite. In other words, for any predicate p we only need to search a finite tree.

The rest is just details, such as the precise method by which we determine when we've finished searching. In the case of MartĂ­n EscardĂł's paper the details are some of the most ingenious coding I've ever seen.

Update: I fixed the type of search above. Thanks to Cale for pointing this out. Unfortunately I deleted his comment by accident so I'm crediting him here.

Sunday, September 30, 2007

Arboreal Isomorphisms from Nuclear Pennies

The game of Nuclear Pennies is a game for one player played on a semi-infinite strip of sites, each of which may contain any non-negative number of pennies. The aim is to achieve a target configuration of pennies from a starting configuration by means of penny fusion and penny fission. In penny fission, a penny is split into a pair of pennies in the two neighbouring sites. In penny fusion, two pennies, separated by exactly one site, are fused into one penny in the intervening site. Obviously no fission or fusion take place at the very end site because there is no room for one of the fission products or fusion precursors. The following diagrams show some legal moves in Nuclear Pennies:

One using fission:



One using fusion:



Can you achieve this target configuration:



Starting from this configuration?



It's a surprisingly non-trivial task but I urge you to have a go. If you give up, here's a video solution:



So a natural question arises: what single penny configurations can be achieved from that starting position? As every move in Nuclear Pennies is reversible, if you solved the puzzle you know we can shift a single penny 6 places left or right. (But to move 6 spaces left we need one extra space on the left.) So labelling positions 0,1,2,... we know that starting from 7 we can place a penny on 6n+1 for all non-negative n. But what about other positions?

We can borrow a technique from M Reiss's seminal paper on peg solitaire: Beiträge zur Theorie des Solitär-Spiels, Crelles Journal 54 (1857) 344-379. The idea is to associate an algebraic value with each position and then ensure that each this is chosen so that each legal move leaves the value unchanged.

Label the sites as follows:



We consider the value of each penny to be the value of its site and to get the value of a position, add the values of the individual pennies. If we pick x to be one of the complex cube roots of -1 then we have x=x2+1, because x3+1=(1+x)(x2+1-x). We define a move to be paralegal if it doesn't change the value of a position. It should be clear that penny fusion and fission are both paralegal. In other words

legal ⇒ paralegal

and

paraillegal ⇒ illegal


Now consider the values of 1,x,x2,... and so on. It's not hard to show that the only time an element in this sequence takes the value 1 is when the exponent is 6. In fact, x is a sixth root of unity. So this shows we can never shift a single coin by anything other than a multiple of 6 sites. As the video shows how to shift by 6 sites we know we can shift a single coin by n sites if and only if n is a multiple of 6 (and there's one site available to the left of the leftmost penny). It's neat that we took an excursion through the field of complex numbers to prove something about a seemingly simple piece of discrete mathematics.

But who cares about nuclear coin reactions? That's not my real motivation here. Believe it or not, each coin shuffle above corresponds to an isomorphism between certain types and solving the puzzle above actually demonstrates a really neat isomorphism between tuples of data structures.

Go back to the algebra above. We have that legal ⇒ paralegal because coin reactions correspond to legal manipulations of x. Fissioning a coin represents replacing x by 1+x2, for example. But the converse isn't true. We can manipulate complex numbers in all kinds of interesting ways that don't correspond to coin reactions. For example, we can write -x2 but we aren't allowed negative numbers of coins in a site. But there's another algebraic structure which corresponds exactly to coin reactions. To see that, it's time to start writing Haskell code:


> import Prelude hiding (Left,Right)
> import Control.Monad
> import Test.QuickCheck

> data T = Leaf | Fork T T deriving (Eq,Show)


T is a simple binary tree type. That declaration simply says that in the algebra of types, T=T2+1. Now unlike complex numbers, when you form a polynomial in the type T you can't have anything other than a non-negative integer as a coefficient. In other words, positions in the game of Nuclear Pennies correspond precisely to polynomials, in the algebra of types, in T.

For example, consider the first move in the video solution I gave:



It corresponds to

T7->T8+T6

or in Haskell notation an isomorphism

(T,T,T,T,T,T,T) -> Either (T,T,T,T,T,T,T,T) (T,T,T,T,T,T)


So you know what I'm going to do next, I'm going to code up the entire solution as a Haskell program mapping seven trees to one and back again. You don't need to read all of the following, but I thought I'd put it here for completeness. So skip over the code to the end if you feel like it...

Firstly, Either is a bit tedious to write. So here's my own implementation that uses a slightly non-standard way to write a type constructor:


> data a :+ b = Left a | Right b


We know that the type algebra is commutative, so A+B=B+A. But that '=' sign is really an isomorphism, not equality, and we'll need that isomorphism explicitly:


> commute :: a :+ b -> b :+ a
> commute (Left a) = Right a
> commute (Right b) = Left b


Same goes for associativity:


> associate :: (a :+ b) :+ c -> a :+ (b :+ c)
> associate (Left (Left a)) = Left a
> associate (Left (Right b)) = Right (Left b)
> associate (Right c) = Right (Right c)


And associativity the other way. Here I'm starting a convention of using primes to represent the inverse of a function.


> associate' :: a :+ (b :+ c) -> (a :+ b) :+ c
> associate' (Left a) = Left (Left a)
> associate' (Right (Left b)) = Left (Right b)
> associate' (Right (Right c)) = Right c


For a fixed b, '+b' is a functor so f :: a -> b induces a map a+b -> c+b. Unfortunately I don't know an easy way to make '+b', as opposed to 'b+', a functor in Haskell. So I'll have to make do with this function that lifts f to a+b:


> liftLeft :: (a -> c) -> a :+ b -> c :+ b
> liftLeft f (Left a) = Left (f a)
> liftLeft f (Right b) = Right b


I'm going to need all these.


> liftLeft2 = liftLeft . liftLeft
> liftLeft3 = liftLeft . liftLeft2
> liftLeft4 = liftLeft . liftLeft3
> liftLeft5 = liftLeft . liftLeft4


I'm going to represent tuples a little weirdly, though arguably this is much more natural than the usual way to write tuples. Tn corresponds to Tn:


> type T0 = ()
> type T1 = (T,T0)
> type T2 = (T,T1)
> type T3 = (T,T2)
> type T4 = (T,T3)
> type T5 = (T,T4)
> type T6 = (T,T5)
> type T7 = (T,T6)
> type T8 = (T,T7)


And now we need functions to assemble and disassemble trees at the start of a tuple. These correspond to fusion and fission respectively in the game.


> assemble (Left x) = (Leaf,x)
> assemble (Right (a,(b,x))) = (Fork a b,x)

> assemble' (Leaf,x) = Left x
> assemble' (Fork a b,x) = Right (a,(b,x))


Here's the first step in the solution. That was easy.


> step1 :: T7 -> T6 :+ T8
> step1 = assemble'


It gets successively harder because we need to push the assemble' function down into the type additions:


> step2 :: T6 :+ T8 -> T5 :+ T7 :+ T8
> step2 = liftLeft assemble'

> step3 :: T5 :+ T7 :+ T8 -> T4 :+ T6 :+ T7 :+ T8
> step3 = liftLeft2 assemble'

> step4 :: T4 :+ T6 :+ T7 :+ T8 -> T3 :+ T5 :+ T6 :+ T7 :+ T8
> step4 = liftLeft3 assemble'

> step5 :: T3 :+ T5 :+ T6 :+ T7 :+ T8 -> T2 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8
> step5 = liftLeft4 assemble'

> step6 :: T2 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8 -> T1 :+ T3 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8
> step6 = liftLeft5 assemble'


We're going to use a few functions whose sole purposes is to shuffle around type additions using commutativity and associativity.


> swap23 :: (a :+ b) :+ c -> (a :+ c) :+ b
> swap23 = commute . associate . liftLeft commute

> shuffle1 :: T1 :+ T3 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8 -> T1 :+ T4 :+ T3 :+ T5 :+ T6 :+ T7 :+ T8
> shuffle1 = liftLeft4 swap23

> shuffle2 :: (T1 :+ T4) :+ T3 :+ T5 :+ T6 :+ T7 :+ T8 -> T3 :+ T5 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4)
> shuffle2 = swap23 . liftLeft swap23 . liftLeft2 swap23 . liftLeft3 swap23 . liftLeft4 commute


After the left-to-right sweep, here's the right-to-left sweep:


> step7 :: T3 :+ T5 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4) -> T4 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4)
> step7 = liftLeft4 assemble

> step8 :: T4 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4) -> T5 :+ T7 :+ T8 :+ (T1 :+ T4)
> step8 = liftLeft3 assemble

> step9 :: T5 :+ T7 :+ T8 :+ (T1 :+ T4) -> T6 :+ T8 :+ (T1 :+ T4)
> step9 = liftLeft2 assemble

> step10 :: T6 :+ T8 :+ (T1 :+ T4) -> T7 :+ (T1 :+ T4)
> step10 = liftLeft assemble

> shuffle3 :: T7 :+ (T1 :+ T4) -> T1 :+ T4 :+ T7
> shuffle3 = commute


And now we have another left-to-right sweep followed by a right-to-left sweep:


> step11 :: T1 :+ T4 :+ T7 -> T2 :+ T0 :+ T4 :+ T7
> step11 = liftLeft2 (commute . assemble')

> step12 :: T2 :+ T0 :+ T4 :+ T7 -> T3 :+ T1 :+ T0 :+ T4 :+ T7
> step12 = liftLeft3 (commute . assemble')

> step13 :: T3 :+ T1 :+ T0 :+ T4 :+ T7 -> T4 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7
> step13 = liftLeft4 (commute . assemble')

> step14 :: T4 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7-> T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7
> step14 = liftLeft5 (commute . assemble')

> shuffle4 :: T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7 -> T7 :+ T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4
> shuffle4 = liftLeft5 commute . liftLeft4 swap23 . liftLeft3 swap23 . liftLeft2 swap23 . liftLeft swap23 . swap23

> shuffle5 :: T7 :+ T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 -> T7 :+ T5 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0
> shuffle5 = liftLeft3 swap23 . liftLeft2 swap23 . liftLeft swap23 . swap23

> step15 :: T7 :+ T5 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0 -> T6 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0
> step15 = liftLeft5 (assemble . commute)

> step16 :: T6 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0 -> T5 :+ T3 :+ T2 :+ T1 :+ T0
> step16 = liftLeft4 (assemble . commute)

> step17 :: T5 :+ T3 :+ T2 :+ T1 :+ T0 -> T4 :+ T2 :+ T1 :+ T0
> step17 = liftLeft3 (assemble . commute)

> step18 :: T4 :+ T2 :+ T1 :+ T0 -> T3 :+ T1 :+ T0
> step18 = liftLeft2 (assemble . commute)

> step19 :: T3 :+ T1 :+ T0 -> T2 :+ T0
> step19 = liftLeft (assemble . commute)

> step20 :: T2 :+ T0 -> T1
> step20 = assemble . commute


And put it all together:


> iso :: T7 -> T1
> iso = step20 . step19 . step18 . step17 . step16 . step15 . shuffle5 . shuffle4 . step14 .
> step13 . step12 . step11 . shuffle3 . step10 . step9 . step8 . step7 .
> shuffle2 . shuffle1 . step6 . step5 . step4 . step3 . step2 . step1


At this point I might as well write the inverse. This was mostly derived from the forward function using vim macros:


> step20' :: T1 -> T2 :+ T0
> step20' = commute . assemble'

> step19' :: T2 :+ T0 -> T3 :+ T1 :+ T0
> step19' = liftLeft (commute . assemble')

> step18' :: T3 :+ T1 :+ T0 -> T4 :+ T2 :+ T1 :+ T0
> step18' = liftLeft2 (commute . assemble')

> step17' :: T4 :+ T2 :+ T1 :+ T0 -> T5 :+ T3 :+ T2 :+ T1 :+ T0
> step17' = liftLeft3 (commute . assemble')

> step16' :: T5 :+ T3 :+ T2 :+ T1 :+ T0 -> T6 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0
> step16' = liftLeft4 (commute . assemble')

> step15' :: T6 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0 -> T7 :+ T5 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0
> step15' = liftLeft5 (commute . assemble')

> shuffle5' :: T7 :+ T5 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0 -> T7 :+ T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4
> shuffle5' = swap23 . liftLeft swap23 . liftLeft2 swap23 . liftLeft3 swap23

> shuffle4' :: T7 :+ T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 -> T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7
> shuffle4' = swap23 . liftLeft swap23 . liftLeft2 swap23 . liftLeft3 swap23 . liftLeft4 swap23 . liftLeft5 commute

> step14' :: T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7 -> T4 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7
> step14' = liftLeft5 (assemble . commute)

> step13' :: T4 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7 -> T3 :+ T1 :+ T0 :+ T4 :+ T7
> step13' = liftLeft4 (assemble .commute)

> step12' :: T3 :+ T1 :+ T0 :+ T4 :+ T7 -> T2 :+ T0 :+ T4 :+ T7
> step12' = liftLeft3 (assemble . commute)

> step11' :: T2 :+ T0 :+ T4 :+ T7 -> T1 :+ T4 :+ T7
> step11' = liftLeft2 (assemble . commute)

> shuffle3' :: T1 :+ T4 :+ T7 -> T7 :+ (T1 :+ T4)
> shuffle3' = commute

> step10' :: T7 :+ (T1 :+ T4) -> T6 :+ T8 :+ (T1 :+ T4)
> step10' = liftLeft assemble'

> step9' :: T6 :+ T8 :+ (T1 :+ T4) -> T5 :+ T7 :+ T8 :+ (T1 :+ T4)
> step9' = liftLeft2 assemble'

> step8' :: T5 :+ T7 :+ T8 :+ (T1 :+ T4) -> T4 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4)
> step8' = liftLeft3 assemble'

> step7' :: T4 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4) -> T3 :+ T5 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4)
> step7' = liftLeft4 assemble'

> shuffle2' :: T3 :+ T5 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4) -> (T1 :+ T4) :+ T3 :+ T5 :+ T6 :+ T7 :+ T8
> shuffle2' = liftLeft4 commute . liftLeft3 swap23 . liftLeft2 swap23 . liftLeft swap23 . swap23

> shuffle1' :: T1 :+ T4 :+ T3 :+ T5 :+ T6 :+ T7 :+ T8 -> T1 :+ T3 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8
> shuffle1' = liftLeft4 swap23

> step6' :: T1 :+ T3 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8 -> T2 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8
> step6' = liftLeft5 assemble

> step5' :: T2 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8 -> T3 :+ T5 :+ T6 :+ T7 :+ T8
> step5' = liftLeft4 assemble

> step4' :: T3 :+ T5 :+ T6 :+ T7 :+ T8 -> T4 :+ T6 :+ T7 :+ T8
> step4' = liftLeft3 assemble

> step3' :: T4 :+ T6 :+ T7 :+ T8 -> T5 :+ T7 :+ T8
> step3' = liftLeft2 assemble

> step2' :: T5 :+ T7 :+ T8 -> T6 :+ T8
> step2' = liftLeft assemble

> step1' :: T6 :+ T8 -> T7
> step1' = assemble

> iso' :: T1 -> T7
> iso' = step1' . step2' . step3' . step4' .
> step5' . step6' . shuffle1' . shuffle2' . step7' .
> step8' . step9' . step10' . shuffle3' . step11' .
> step12' . step13' . step14' . shuffle4' . shuffle5' .
> step15' . step16' . step17' . step18' . step19' . step20'


Enough already! Time to test the code. It's already pretty obvious that each of the steps above is invertible but let's use QuickCheck anyway. And here's a small puzzle: why to I have return Leaf twice?


> instance Arbitrary T where
> arbitrary = oneof [return Leaf,return Leaf,liftM2 Fork arbitrary arbitrary]

> main = do
> quickCheck $ \x -> x==iso (iso' x)
> quickCheck $ \x -> x==iso' (iso x)


All done.

So time to think about what exactly we have here. We have a way to pack 7-tuples of trees into a single tree exactly. It's easy to store two trees in a tree say, just by making them the left and right branch of another tree. But then we get some 'wastage' because the isolated leaf doesn't correspond to any pair of trees. We have a perfect packing. Of course the set of trees and the set of 7-tuples of trees have the same cardinality, and it's not hard to find a bijection by enumerating these sets and laying them out side by side. But such a bijection could get arbitrarily complicated. But the isomorphism above only looks at the tops of the trees and can be executed lazily. It's a particularly nice isomorphism. And from what I showed above, you can only pack (6n+1)-tuples into a single tree. Weird eh?

And everything I say here is derived from the amazing paper Seven Trees in One by Andreas Blass. I've mentioned it a few times before but I've been meaning to implement the isomorphism explicitly for ages. Blass's paper also shows that you can, to some extent, argue paralegal ⇒ legal - something that's far from obvious.

Blog Archive