Wednesday, November 28, 2007

The IO Monad for People who Simply Don't Care

The theoretical underpinnings of monads in Haskell can be tricky to grasp. If you simply want to read some input and write some output, and you don't want to waste time with nonsense like the "monad laws", you've come to the right place.

Getting Started

Many programming languages make a distinction between expressions and commands. For example, in BASIC you couldn't say LET A = PRINT 1 and in C you can't say x = if (a==1) { 2; } else { 3; }. Haskell is no different. Like other languages it makes the distinction, and like other languages it has its own slightly idiosyncratic notion of what the difference is. The IO monad is just a device to help make this distinction. And that's the last time I'll mention the word 'monad' for a while.

Here's an example of a Haskell function:

f x = 2*x+1

The right hand side is an expression. Expressions are made of subexpressions and the value of an expression is determined by the value of its subexpressions. There is no room for anything like a print command here because a print command doesn't return a value, it produces output as a side-effect. So in order to do I/O we need something different: the do keyword. It's easy to use: you just write do and then follow it by an indented list of commands. Here's a complete Haskell program:

main = do
print "Hello"
let x = 1+2+3+4+5
print "The answer is"
print (2*x)

Note the let. It's just like the old days of BASIC programming where you can use let to assign a name to a value. Note also that commands take arguments that can be expressions. So print (2*x) is a command, but 2*x is an expression. Again, little different from a language like Python.

But what about input? Haskell has a command called getLine that reads input from the user. So here's an interesting feature of Haskell: commands can return values. But a command that returns a value is different from an expression with a value. The difference means we have to write code like this:

main = do
print "Hello"
x <- getLine
print "The input was"
print x

We have to use <- instead of let ... = ... to get a returned value out of a command. It's a pretty simple rule, the only hassle is you have to remember what's a command and what's a function.

You can also write your own commands. Here's a simple example that returns a pair of lines entered by the user:

get2Lines = do
line1 <- getLine
line2 <- getLine
return (line1,line2)

main = do
(a,b) <- get2Lines
print ("You typed " ++ a ++ " then " ++ b ++ ".")

Note how we used <- instead of let to get the result of getLine. We also did the same to get the result of get2Lines. But notice one extra feature: we had to use return to return a value out of a do block.

So that's pretty much it. Here are the rules:

  1. To introduce a list of commands, use do.
  2. To return a value from a command, use return.
  3. To assign a name to an expression inside a do block use let.
  4. To assign a name to the result of a command, use <-.

So how do you know what's a command and what's an expression? If it has any chance of doing I/O it must be a command, otherwise it's probably an expression. But you can do silly things like:

f x = do
return [x,x]

main = do
x <- f 10
print x

so occasionally there will be commands that don't do any IO. But not the other way round.


Everything in Haskell has a type, even commands. In general, if a command returns a value of type a then the command itself is of type IO a. It just has an IO wrapped around the return type. For example, getLine is of type IO String and f above is of type a -> IO [a]. Apart from that extra IO it's like ordinary expression types. But note that if you have a bit of code like

main = do
print (getLine)

you get a type error. This is because you must use <- to extract the input string from the getLine command. Because getLine has type IO String instead of type String the compiler can catch these kinds of mistakes. Giving commands and expressions different types means we can't accidentally mix them up.

At this point you might have realised that if

f x = do
return x

main = do
x <- f 5
print x

is legal, then this might be legal too:

main = do
x <- return 1
print x

It is. return is simply a function of type a -> IO a that converts a value of type a to a command of type IO a.

You can also rewrite this

inputWithPrompt = do
print "Hello, enter a line"
x <- getLine
return x


inputWithPrompt = do
print "Hello, enter a line"

In the first version we extracted x from getLine and then converted this value back into another command with return. In the second version we took a shortcut and simply ended with getLine. So we probably ought to add one more rule

    5. The type of a sequence of commands is the type of the last line.

getLine already has the type of a command that returns a string so we can just make it the last line. But use the former version if you find this confusing, and then you only need four rules.

There's one last thing that confuses people. The type of an if statement is the type of its branches. So if you want an if statement inside a do block, it needs to be a command, and so its branches need to be commands also. So it's

main = do
if 2/=1
then do
print "2/=1"
print "Cool"
else do
print "2==1"
print "Not cool"


main = do
if 2/=1
print "2/=1"
print "Cool"
print "2==1"
print "Not cool"

Oh and a little tip. If the do block has just one command, you can drop the do. So this compiles:

main = do
if 2/=1
print "2/=1"
print "2==1"

Why make the command/expression distinction at all?

You don't care, or you wouldn't be reading this. But suppose you did. Then I'd say: Firstly, let's state clearly what this distinction is. If something is of type IO a then it's a command returning an value of type a. Otherwise it's an expression. That's the rule. So here's the important bit: following the rules above it's completely impossible to put an executed command inside an expression. As the only way to do I/O is with commands, that means you have no way to find out what Haskell is doing inside expressions. And that's the whole point! Haskell likes to keep its evaluation secret from you and doesn't want you outputting information that might give away what's going on. For example, if (print 1)+(print 2) were legal Haskell, and it printed 1, followed by 2, then you'd know that it evaluated the left argument to + first, and then the second. Haskell doesn't want to reveal this because it wants the freedom to rearrange your code internally without changing its behaviour. And this makes things easier for you too. When you see a+b you know its exactly the same as b+a, whereas in C++, say, you have to worry about whether a and b have side effects. If the type isn't IO a, then you can sleep at night in the confidence that there are no side effects.

(Do you see why you can't sneak a command into an expression? It's quite neat. To make an expression out of a command you'd need to do something like extract a return value out of a command. But to do that you need to use <- to extract the value of the command. But you can only use <- inside a do block. So the result has to be a command. Once you're in a command, there's no way out!)

It's all lies I tell you!

One last thing. Much of what I've said above is false. But what I hope I have done is describe a subset of Haskell in which you can start some I/O programming without having a clue what a monad is. Once you're successfully bootstrapped, then you'll be ready for the truth. No more of that "monads are too hard" defense.

PS You can think of the above as using taint as a way to sequester commands from other expressions.

Back to E8 this weekend...

Friday, November 23, 2007

What's all this E8 stuff about then? Part 2.

Let's recap: I introduced the idea of symmetry groups, sets of operations that leave something unchanged. A particular type of group is the Lie group, it's a group where everything is smooth enough that you can do calculus with them. I also talked about Lie algebras which you can think of as elements of Lie groups that are infinitesimally close to doing nothing, or as rates of change of elements in a Lie group.

What does symmetry have to do with physics?

As my ultimate goal is talk about physics, it's time for a paragraph on that subject, just so we don't lose sight of where we're going. So imagine a game of pool. Pool is a game of physics, the outcome of any play depends on factors such as friction, collision angles, the coefficient of restitution of the balls (ie. their bounciness) and a whole host of other factors. But despite the complexity of the factors involved, we know that they are invariant under rotation. What I mean is that if we were to rotate the table and everything on it by some random angle (eg. from a North-South orientation to a NW-SE orientation) it makes no difference to the game. (Well, maybe the floor has a slight tilt, so let's rotate the room as well if that's the case.) If everything is rotated together then you end up with a situation that's essentially identical. One of the fundamental principles of physics is that the laws of physics don't care about the absolute orientation of anything. We called the set of rotations in a plane SO(2) so we can summarise this by saying that pool has SO(2) symmetry. But does it have SO(3) symmetry? Obviously rotating the table so that it tilts will change the way the game plays out. But that's because of an 'accident' of history, we just happen to be on the surface of the Earth and there's a gravitational field directed downwards. But from the point of view of fundamental physics there's nothing special about 'down'. If we were to tilt the pool table and then tilt the entire Earth underneath it, we should still expect to see no change to the physics of pool. In fact, we get to try this experiment out every day because the entire pool/Earth system rotates through (approximately) 360° every day and we don't notice any change in the way pool tables work. So pool actually has SO(3) symmetry, but because of stuff going on in our neighbourhood (ie. a having a planet under our feet), it looks like it only has SO(2) symmetry. SO(3) is a fundamental symmetry of physics, but that fact that on Earth gravity messes this up and leaves us with SO(2) symmetry is a phenomenon known as symmetry breaking. More of this later, back to the mathematics.

Lie Group Representations

I've talked about how groups can act on spaces. For example SO(3) acts on 3D space and SO(2) acts on 2D spaces, ie. planes. But there's some flexibility here and we can decouple the group from the space it acts on. The way SO(2) acts on a plane is given by a certain rule. We can make up new rules and study those. For example, we could allow SO(2) to act on 3D space by this rule: rotate your x and y coordinates using the same rule as in the plane, and leave z unchanged. So if a certain 2D rotation (ie. a 90 degree rotation) maps the point (x,y) to (y,-x), then according to this new rule it maps (x,y,z) to (y,-x,z). We could make up other rules. For example we could apply SO(2) to y and z, and leave x untouched. In fact, we can make infinitely many rules like this because there are infinitely many axes (by axis I just mean direction) we could choose to leave untouched. We can define a representation to be a rule that takes an element of a group and turns it into a transformation on a space. We're interested mainly in what are called linear representations (I'll give some justification for this in part 3). These are representations where the transformations map the origin to the origin, map straight lines to straight lines, and map the midpoints between pairs of points to the midpoints between pairs of points. It's not hard to see that rotations do all three of these things. In fact, as I will only talk about linear representations, I'll drop the word linear from now on.

I just showed how SO(2) has multiple representations because of the different ways in can be applied to 3D space. But we can also define alternative rules for how to apply SO(2) to the same space, eg. the plane. Here's a really simple one called the trivial representation: we simply say that elements of SO(2) do nothing. It's not very interesting, but it is a perfectly valid representation. Here's another: apply elements of SO(2) backwards. So if we have an element of SO(2) that says "rotate by 10° clockwise" the backwards rule says "apply a 10 degree rotation anticlockwise". It's a different representation. But if you look at the plane from underneath, it actually looks just like the original representation. So even though these are different representations, there's a sense in which it's equivalent to the original one.

But here's a representation of SO(2) that really is different to the original one: If the element of SO(2) says "rotate by x°" we rotate by 2x° instead. We can think of it as a double-speed rule. We apply the rotations in SO(2) twice. If we run through the elements of SO(2) starting at 0° and working our way up to 360°, then the representation rotates our space twice as fast and ends up rotating the space twice.

What about a half-speed rule? Sounds like it might work. But it fails for a simple reason. A 360 degree rotation is the same as a zero degree rotation. But the half-speed rule says that the former should rotate by 180° and the latter should rotate by 0°. These are distinct rotations, so our rule doesn't make any sense. As a result we're restricted to rules that are n-speed rules, where n is an integer. It's not hard to see that if we choose n to be a negative integer then by looking at the plane from underneath we get the same rule as using -n, looking from above. So we can discard the ones corresponding to negative integers. And we're left with one rule for each non-negative integer. In fact, these are all the representations that are possible on a 2D plane.


[Optional "advanced section": Now go back to 3D again. When we apply SO(2) to 3D space we find that SO(2) always leaves some axis fixed. So given a 3D representation of SO(2), our 3D space always splits into a 1D space that's left unchanged, and a 2D space. You can think of SO(2) as using the trivial rule on the 1D space, and using any of the rules in the previous paragraph on the 2D space. Something similar happens in any dimension of space. The representation of SO(2) will split up into pieces with some axes left untouched, and others, always grouped in pairs, that transform like in the previous paragraph. In other words, the representations of SO(2) can be broken down into fundmental building blocks which we call irreducible representations. In the previous paragraph I actually classified all of the irreducible representations of SO(2).

We can use the properties of SO(2) to understand other Lie groups. The idea is that not only is SO(2) frequently a subgroup of Lie groups, it's often a subgroup in multiple ways, ie. there are multiple ways to find a copy of SO(2) in SO(3). We've already seen that SO(2) can be embedded in SO(3) by interpreting rotations around any fixed axis as elements of a sub-SO(2). But notice how if we find two different embeddings of SO(2), they are forced to "interfere" with each other. For example, let's choose an SO(2) that acts on the x-y plane as described above. Suppose we now pick another SO(2) subgroup. No matter how we choose it, it must act on either x or y. There simply isn't room to find two SO(2)'s that don't at some point overlap with each other. But imagine the group of 4-dimensional rotations, caled SO(4). (It's not that scary, it's just like SO(3) except that we can make rotations that "mix up" any pair of directions, or combinations of such rotations.) We could pick one SO(2) that acts on the x-y plane and another that acts on the z-t plane (assuming the fourth axis is called 't'). But we won't be able to pick more than two for the same reason as before. Suppose we pick as many SO(2)'s as are possible. Then what we have is whats known as a maximal torus.

Here's a feeble attempt at drawing all of this. One of the SO(2)'s inside SO(4) rotates one pair of axes into each other, the other one rotates the other pair:

Now think about representations of a Lie group. These are a rule that tell you how each element in the group transforms our space. As the elements of our SO(2) subspaces obviously live in the group the rule must apply to these also. So a representation on a group also gives a representation of all of the SO(2)'s in it. So each SO(2) making up our maximal torus must basicaly act like we described above: by rotating some pairs of axes around at some "speed" and leaving other axes untouched. So if we pick a maximal torus of a Lie group, any representation splits up into a bunch of pairs of axes and each pair has an integer "speed" associated to each SO(2) in the maximal torus. The tuple of "speeds" for each pair of axes is known as a weight. By interpreting the weights as the coordinates of points, the collection of weights that arise from any particular representation can be drawn in a diagram. The dimension of the diagram is the number of non-interfering SO(2)'s in the maximal torus. And if you want to see what these diagrams look like, Garrett Lisi has some nice ones in his paper. Note that that paper also contains some "root diagrams". I don't have time to talk about those except to say that (1) they are weight diagrams for one particular special representation and (2) they tell you a lot about the geometry of all possible sets of weights for a particular Lie group.

One last thing for the "advanced" section: a similar analysis can be carried out for Lie algebras as opposed to Lie groups. When physicists draw diagrams of weights they are often talking about the weights of Lie algebras, but these things are intimately related.

There's probably one important message to take from this section: there are a lot of constraints on representations, you can't just make up any old rule. So just knowing that a given Lie group acts on some space you already know a lot of information, even if you don't know what the exact representation is. BTW One of the biggest applications of representation theory is in chemistry where you can read off information about the number of electrons allowed in atomic orbitals directly from representations of SO(3).]

Final words

That was pretty tough. Next time I'll talk about physics and it should get a bit easier. I'll explain why much of modern physics is the study of Lie group representations and I'l explain the 'exceptional' and 'simple' in the title of Garrett's paper.

(And I apologise for all of my sins of omission. For example, the analysis above only works for certain types of Lie group and really everything should be done using complex numbers, not real numbers. But I'm trying to compress a few hundred pages of mathematics into a single posting.)

Saturday, November 17, 2007

What's all this E8 stuff about then? Part 1.

With all the buzz about E8 recently I want to give an introduction to it that's pitched slightly higher than pop science books, but that doesn't assume more than some basic knowledge of geometry and vectors and a vague idea of what calculus is. There's a lot of ground to cover and so I can't possibly tell the truth exactly as it is without some simplification. But I'm going to try to be guilty of sins of omission rather than telling outright lies.

Discrete symmetry groups.

Consider a perfect cube. If you turn it upside-down it still looks just like a cube in exactly the same configuration. Rotate it through 90 degrees about any axis going through the centre of a face and it looks the same. Rotate it through 120 degrees around a diagonal axis and it also looks the same. In fact there are 24 different rotations, including simply doing nothing, that you can perform on a cube, that leave it looking how you started. (Exercise: show it's 24.)

These 24 operations form the symmetries of the cube. Note how you can take any pair of symmetries and combine them to make another symmetry simply by applying one and then another. For example, if A represents rotation by 90° about the x-axis and B represents a rotation by 180° about the same axis then we can use shorthand to write AB to mean a rotation by 270°. The convention is that we read from right to left and so AB means "do B first, then A". If we define C=AB think about what AC means. It's a 270°+90° rotation, ie. a 360° rotation which does nothing. The convention is to call that 1. So C is the opposite or inverse of A because AC=CA=1. When you have a bunch of operations that you can combine like this, and every operation has an inverse, then the operations form what mathematicians call a group. The 24 symmetries of the cube form a group of size 24. Although you can think of a group as a bunch of operations on something, you can also think of a group as a thing in its own right, independent of the thing it acts on. For example, we can write down some rules about A, B and C above. AC=CA=1, C=AB, AAB=1, BB=1, C1=1C=C and so on. In principle we can give a single letter name name to each of the 24 symmetries and write out a complete set of rules about how to combine them. At that point we don't need the cube any more, we can just consider our 24 letters, with their rules, to be an object of interest in its own right.

We can think of the group as acting on the cube, but if we specified some suitable rules, we can make this group act on other things too. For example, we can use the coordinates (x,y,z) to specifiy any point in space. We used B to mean a 180 degree rotation around the x-axis. If we apply that rotation to any point in space (x,y,z) it gets mapped to (x,-y,-z). Similary A maps the point (x,y,z) to (x,z,-y). So the the 24 element cube group also acts on 3D space. This is an example of a group representation, and I'll get onto that later.

(So far I've been talking about rotations we can perform on the cube. But imagine you could also reflect the cube - as if looking at the cube in a mirror and then pulling the reflection back out into the real world. You can't do this with a real cube but you could with an imaginary one. How many symmetries do you think you get now?)

SO(3) and Lie groups

With the cube example above, all of our rotations had to be multiples of certain angles like 90° and 120°. Any kind of rotation through an angle like 37° leaves the cube in a weird orientation that doesn't look like the original. So the symmetries of the cube form a discrete group because there's only a discrete set of possibilities. But there are also continuous groups where you can smoothly slide from one symmetry to another and everything in-between is also a symmetry. For example consider a perfect 3D sphere at the origin. You can rotate it through any angle and it still looks like the same sphere. Any rotation about the origin is a symmetry of the sphere, and given two symmetries you can slide from one to the other. At every stage, the sphere will still look like the original sphere. Like in the discrete case, this set of symmetries forms a group, because you can combine and invert operations. But there are now infinitely many operations. We can rotate around any axis by any angle we choose and it's still a sphere. But even though there is an infinity of such rotations, it's still manageable. In fact, any rotation can be described by three parameters, for example the three that aeronautical engineers might use called pitch, roll and yaw. Mathematicians call the group of 3D rotations SO(3). That 3 refers to the fact that we're talking about rotations of 3-dimensional space. But because it takes three parameters to describe a rotation, SO(3) is itself 3-dimensional. This double appearance of the number 3 is a 'coincidence', as we'll see in the next section.

There's one more thing I need to add about continuous groups. Some are also differentiable. You don't need to worry in too much detail about what this means but it ultimately says that we can do calculus with these groups. For example, we can ask questions not only about the orientation of a sphere, but also about the rate of change of its orientation. So SO(3) is a differentiable group. The usual name for these is Lie groups, after the mathematician Sophus Lie.

Subgroups and SO(2)

SO(3) is the group of 3D rotations. But all of the 24 symmetries of the cube we considered are also rotations. So the cube is actually a subset of SO(3). And as it's a subset of a group that's also a group, we call it a subgroup.

Now consider the set of all rotations we could perform on a 2D space. These also form a group. In fact, given an element of this group we can apply it to some text to rotate it in 2D. As above, use 1 to mean "no rotation", use A to mean "rotate 45° anticlockwise" and B to mean "rotate 45° clockwise". We can illustrate how these act on Hello:

If you've ever playing with a drawing package, like the one I just used, this will all be very familiar. And it should also be obvious that 1=11111=AB=BA=AAAAAAAA=BBBBBBBB. (Note how I'm only talking about the orientation of the word. If we were taking into account the position of the word on the page it'd be more complex. But there would still be a group describing it.)

Although the screen the text Hello is on is flat and two-dimensional, it can be considered to be embedded in the 3D space around it. And all of the 2D rotations can be considered to be 3D rotations also, just 3D rotations that happen to be oriented along the axis pointing straight into the screen.

So 2D rotations can be considered to be particular 3D rotations and and SO(2) is a subgroup of SO(3).

Note that 2D rotations can be described by just a single number, the rotation angle. So SO(2) is 1-dimensional, even though it's defined through its effect on a 2D space. The two threes that appear in connection with SO(3) above really are a coincidence. In 4-dimensions we have SO(4) which is actually a 6-dimensional group. In 4D you need 6 numbers like pitch, roll and yaw to define your orientation. But I'm not asking you to try to visualise that now.

Rates of change and Lie algebras.

Imagine a sphere tumbling around. At any moment its orientation in space can be described by an element of SO(3). So we might naturally be led to ask the question "what is its rate of change of orientation?". What would such an answer look like?

Let's review basic calculus first. Consider position instead of orientation. Suppose a vehicle is driving along a road. At time t it's at distance A and at time t+1s it's at distance B. Then its average velocity over the 1s is (B-A)/1s. But that's an average over a period. What's its velocity exactly at time t? We could compute its distance, say C, at time t+0.0001s, and compute (C-A)/0.0001s. But that's still an average over a short period of time.The tool for telling us about instantaneous rates of change is calculus, and here it tells us that if the motion is continuous we can take a limit. So if at time t+d, its position is given by D, then the velocity is the limit, as d tends to zero, of (D-A)/d.

Back to the tumbling sphere. We could say "at time t it has orientation A and time t+1s it has orientation B". But as with velocity, that tells you what happened between t and t+1s, not what it was doing at any instant. We haven't said what the rate of change is precisely at time t. Imagine we take some kind of limit of the above statement as we consider the change from time t to time t+d where d becomes arbitrarily small. One thing we can say is that if the motion is continuous then the start and finish orientations, say A and D, will become arbitrarily close. The difference between them will be an infnitesimally small rotation. You can describe an infinitesimally small rotation by an infinitesimally small vector: let the direction of the vector be the axis of rotation and its length be the angle of rotation. Suppose this vector is V. Then the rate of change of orientation is V/d. This is commonly known as angular velocity and it's a vector. Note that it's not an infinitesimally small vector because the small V and small d will usually cancel to give a nice finite size angular velocity, just as the ordinary velocity calculation above gives a finite answer. In fact, we can easily interpret the angular velocity vector. Its direction is the axis of rotation and its length is the rate of change of angle around that axis (in degrees per second for example).

At this point you might say "why not represent any angle by a vector, not just an infinitesimal one". For example, you might think v=(180°,0,0) could be used to mean a rotation through 180° around the x-axis. You can do that if you like, but this doesn't resepct the additive properties of vectors. For example, it'd be useful if v+v=(360°,0,0) then represented "no rotation" because a 360° rotation is no rotation at all. But "no rotation" should be represented by (0,0,0). So this isn't a very convenient way to talk about rotations. But rates of change of orientation are different. If we define w=(180°/s,0,0) then w represents rotation around the x-axis at one revolution every two seconds. w+w=(360°/s,0,0) is also a perfectly good angular velocity, one that's twice as fast. Unlike angles, angular velocities don't wrap around when you get to 360°. You can crank angular velocities up as high as you like. So vectors are a good way to represent rates of change of orientation, even if they're not good at representing orientations.

Let's do this in 2D. It's much easier. All rotations are in the plane and so we don't need to specify an axis. We just need to specify a single number to define an orientation, and a rate of change of orientation is just given by a single number also. That one number can be thought of as a 1-dimensional vector.

So what kind of thing is this rate of change object? In each case it can be thought of as a vector (in the case of SO(2) it's a 1-dimensional vector), but it's not an element of the group because an angular velocity is not the same thing as an orientation. Mathematically it's called an element of a Lie algebra. The Lie algebra of a Lie group is the set of vectors that describe rates of change of elements of the group. I mentioned above that SO(4) is 6-dimensional. This means that the Lie algebra is made of 6-dimensional vectors. You can think of the 6 components of these vectors as describing how the 6 generalised pitch, roll and yaw parameters vary over time.

There's a convention that we use lower case names for Lie algebras. So the Lie algebra of SO(3) is called so(3). That makes a sort of intuitive sense, so(3) describes really small rotations and SO(3) describes full-sized ones.

Closing words

That brings me close to the end of Part 1. But I can now say one very important thing. There are a number of things that mathematicians call E8, but one of them is a Lie algebra. And that's the thing that Garrett Lisi is talking about.

PS I welcome feedback from anyone. I want to make this as understandable to as many people as possible.

Sunday, November 11, 2007

A Small Combinatorial Library

I thought it'd be fun to dig up my first ever Haskell project from about 2 years ago. I wrote a little about part of it before but I didn't write about what I did with it. Apologies for some of the formatting, I haven't yet got around to updating every line of this code. But it definitely still works. (As usual with old code, I had to fix it in places. How is it that old code ceases to work? Is there some kind of bit rot that sets in if you leave code unused for more than a couple of years?)

And as usual, this post is written in literate Haskell so you can just run it interacively in ghci.

> import Ratio

The idea here was to define a little DSL in Haskell for counting the size of a variety of combinatorial objects. Rather than start with the theory, I'm going to look at some examples in action first.

The idea is that I'll be considering different ways to structure the set {1,2,...,n} (with n=0 meaning the empty set). Let's start with the simplest example: the singleton set with one element. I'll denote this by the symbol z. Consider how many ways we can structure the set {1,2,...,n} as a one element set. It's pretty obvious that you can only structure this set as a one element set if it has one element. So if we use count to count the number of ways of doing things, we expect count 1 z = 1 and count n z = 0 for n equal to anything else.

In fact, let's define

> sample a = map (flip count a) [0..10]
> ex0 :: Fractional a => [a]
> ex0 = z

I haven't told you what value z has, just how it responds to the count function. You can now try evaluating sample ex0.

Now consider the next function in our DSL, list. list a gives the number of ways to form a list of a's from {1,...,n}. A good example is simply

> ex1 :: Fractional a => [a]
> ex1 = list z

ie. the number of ways to form a list of n For example, for n=3 we have [1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]. Obviously count n (list z) is n!.

Many of the examples I'm going to look at are listed in Sloane's famous encyclopedia of integer sequences. n! is listed as A000142.

Next consider pair a which is the number of ways of forming an ordered pair of a's. For example, the pairs of lists from {1,2,3} are ([],[1,2,3]),([1],[2,3]),([2],[1,3]),([3],[1,2]),([1,2],[3]) and so on. With a little work you should count 24 of those. This is given by

> ex2 :: Fractional a => [a]
> ex2 = pair (list z)

Conversely we can consider lists of pairs. Clearly this is zero for n odd. For n=4 we get terms like [(1,2),(3,4)], [(1,3),(2,4)] and so on. Again there are 24. The code is:

> ex3 :: Fractional a => [a]
> ex3 = list (pair z)

This DSL also supports constraints. For example we can redo the pair of lists example with the proviso that the lists are both non-empty:

> ex4 :: Fractional a => [a]
> ex4 = pair (nonEmpty (list z))

More generally the // operator can provide any numerical constraint. For example:

> ex5 :: Fractional a => [a]
> ex5 = pair ((list z) // (>=3))

Gives the ways of forming pairs of lists of length >=3. Clearly this is zero for n<6 and for n=6 we just get 6!

At this point I need to point out that these examples all run petty quickly so we're not simply generating and counting.

The next function is ring. A ring is like a list except that cyclic permutations are considered equal. So [1,2,3], [2,3,1] and [3,1,2] are the same ring. How many rings are there on {1,2,...,n}?

> ex6 :: Fractional a => [a]
> ex6 = ring z

If you evaluate sample ex6 you may recognise that it's simply (n-1)!.

The next useful function is set. set a counts the number of sets of a's you can form from {1,...,n}. Clearly count n (set z) is 1 as there's only one set you can form from {1,...,n}. So here's a nice example:

> ex7 :: Fractional a => [a]
> ex7 = set (nonEmpty (ring z))

That's counting the ways of forming {1,...,n} into sets of non-empty rings. If you apply sample you may spot that we have n! again. There's a really neat explanation why. Every permutation of {1,...,n} is a product of disjoint cycles. The order of the product doesn't matter so we have a set (rather than a list) of disjoint cycles. A cycle is obviously non-empty and cycles are just rings. So ex7 counts permutations. And the number of permutations on {1,...,n} is n!.

Here's an old classic (A000110):

> ex8 :: Fractional a => [a]
> ex8 = set (nonEmpty (set z))

This is the number of ways of partitioning {1,...,n} into sets of non-empty sets. For n=3 we get {{1,2,3}},{{1},{2,3}},{{2},{1,3}},{{3},{1,2}} and {{1},{2},{3}}. In other words, 5.

Another useful function is oneOf. oneOf a b means forming {1,...,n} into either an a or a b.

For example we can count the ways to split a set into a set of singletons and pairs. For n=3 we have {1,(2,3)},{2,(1,3)},{3,(1,2)}, the previous three with the pair flipped and {1,2,3}, in other words count 3 ex9 is 7:

> ex9 :: Fractional a => [a]
> ex9 = set (oneOf z (pair z))

By the way, that's Sloane's A047974.

Now some more examples just for fun:

Partitions into lists of size >=2. (Sloane's A052845):

> ex10 :: Fractional a => [a]
> ex10 = set ((list z) // (>=2))

Partitions into lists of cycles (Sloane's A007840):

> ex11 :: Fractional a => [a]
> ex11 = list (nonEmpty (ring z))

Partitions into an even number of non-empty sets (Sloane's A020557):

> ex12 :: Fractional a => [a]
> ex12 = set (nonEmpty (set z)) // even

Partitions into sets of non-empty sets of non-empty sets (Sloane's A000258):

> ex13 :: Fractional a => [a]
> ex13 = set (nonEmpty (set (nonEmpty (set z))))

Thirteen examples is enough for anyone. So how does it work? I haven't time to give a crash course on generating functions but I can at least sketch the idea. In particular, exponential generating functions. The idea is this: if I have a sequence a0,a1,a2,... then its exponential generating function is:


So we can represent any sequence of integers as a power series. For some sequences this series doesn't converge. But that's not a problem. We can still formally manipulate the series as long as we don't try to substitute an actual real number for z. So we call these formal power series. The amazing fact is this: the sorts of operations we might perform on power series when performing ordinary arithmetic (adding hem, multiplying them, finding their logarithms and so on) turn out to be exactly the kind of operations you would perform on them to solve combinatorial problems. So many combinatorial problems can be reduced to finding power series for various arithmetical combinations of other power series.

So now we need some code to manipulate power series. This is such a popular Haskell example that I'll breeze through the code and give some links to other people's documentation. I represent power series simply as a list of coefficients.

When computing with formal power series we frequently require products of the form (a0+a1*t+a2*t2+...)*0 We know the result is zero no matter what the ai are and by immediately returning zero we can eliminate bottomless recursions. We use the operator *! for this purpose. !* is the symmetric counterpart of *!

> (*!) _ 0 = 0
> (*!) a b = a*b

> (!*) 0 _ = 0
> (!*) a b = a*b

We represent formal power series over an algebraic structure a as infinite list of type a.

Addition and subtraction are straightforward pointwise applications of addition and subtraction in R.

> (^+) a b = zipWith (+) a b
> (^-) a b = zipWith (-) a b

Multiplication itself is achieved through the operation of convolution:

> ~(a:as) `convolve` (b:bs) = (a *! b):
> ((map (a !*) bs) ^+ (as `convolve` (b:bs)))

If f(z) and g(z) are two power series, this computes f(g(z)):

> compose (f:fs) (0:gs) = f:(gs `convolve` (compose fs (0:gs)))

Note how that function fails if the series for g starts with anything other than zero. A few of my operations will be like that. They correspond to situations where the operation doesn't make sense on a power series (at least no without some additional structure or assumptions).

The (functional) inverse of a function:

> inverse (0:f:fs) = x where x = map (recip f *) (0:1:g)
> _:_:g = map negate (compose (0:0:fs) x)

Reciprocal of power series:

> invert x = r where r = map (/x0) ((1:repeat 0) ^- (r `convolve` (0:xs)))
> x0:xs = x

Division of power series is defined in terms of invert though there is probably a direct definition that is as simple. Note the special case handling of power series that both start with zero:

> (^/) (0:a) (0:b) = a ^/ b
> (^/) a b = a `convolve` (invert b)

Here comes the definition of z that we've been using:

z = [0,1,0,0,...] in other words the generator of formal power series.
one = [1,0,0,...], ie. the multiplicative unit.

> z :: Fractional a => [a]
> z = 0:1:repeat 0

d computes the derivative of formal power series using
d (t^n) = n*t^(n-1)

> d (_:x) = zipWith (*) (map fromInteger [1..]) x

Integration is simply the inverse of differentiation. We assume the integral has a zero leading term.

> integrate x = 0 : zipWith (/) x (map fromInteger [1..])

Compute the square of a formal power series:

> square x = x `convolve` x

> instance (Num r) => Num [r] where
> x+y = zipWith (+) x y
> x-y = zipWith (-) x y
> ~x*y = x `convolve` y
> fromInteger x = fromInteger x:repeat 0
> negate x = map negate x
> signum (x:_) = signum x:repeat 0
> abs (x:xs) = error "Can't form abs of a power series"

> instance (Fractional r) => Fractional [r] where
> x/y = x ^/ y
> fromRational x = fromRational x:repeat 0

> sqrt' x = 1:rs where rs = map (/2) (xs ^- (rs `convolve` (0:rs)))
> _:xs = x

> instance (Fractional r) => Floating [r] where
> sqrt (1:x) = sqrt' (1:x)
> sqrt _ = error "Can only find sqrt when leading term is 1"

We apply transcendental functions to formal power series by finding suitable integro-differential representations of the function. For exp consider:
d/dt exp(f(t)) = f'(t)*exp(f(t))
We can immediately write
exp(f(t)) = integral f'(t)*exp(f(t))
This gives a recursive definition of the exponential of a power series.

> exp x = e where e = 1+integrate (e * d x)

d/dt log(f(t)) = f'(t)/f(t) which gives the following definition:

> log x = integrate (d x/x)

We give mutually recursive definitions for sine and cosine:

> sin x = integrate ((cos x)*(d x))
> cos x = [1] ... negate (integrate ((sin x)*(d x)))

The inverse trignometric functions are similarly straightforward except for the inverse cosine. The inverse cosine at zero is transcendental and so might not exist in R. On the other hand there is no power series for acos(1+x). So we can give no formal power series for this function.

> asin x = integrate (d x/sqrt(1-x*x))
> atan x = integrate (d x/(1+x*x))
> acos x = error "Unable to form power series for acos"

The hyperbolic trigonometric functions are similar to the ordinary trigonometric functions.

> sinh x = integrate ((cosh x)*(d x))
> cosh x = [1] ... integrate ((sinh x)*(d x))

> asinh x = integrate (d x/sqrt(1+x*x))
> atanh x = integrate (d x/(1-x*x))
> acosh x = error "Unable to form power series for acosh"

> pi = error "There is no formal power series for pi"

t is the generator of the formal power series. t' is the generator for formal power series over the rationals.

> t :: (Num a) => ([a])
> t = 0:1:repeat 0

> t' :: [Rational]
> t' = t

The ... operator is used to specify a formal power series with known leading terms. If l is a list of n elements then the first n elements of l ... f are given by the elements of l. The remaining terms are those of l.

> lead [] x = x
> lead (a:as) x = a : (lead as (tail x))

> a ... x = lead a x

And now we can get back to combinatorics. The definitions of the functions I gave above are stunningly simple:

> one = t'
> list x = 1/(1-x)
> set x = exp x
> ring x = -log(1-x)
> pair x = x*x
> oneOf a b = a+b

How does that work? Just about any book on combinatorics will tell you. I quite like Generatingfunctionology though it does make a mountain out of a molehill in places.

> necklace x = -log(1-x)/2+x/2+x*x/4

union a b is the number of ways of splitting the set {1,...,n} into two so that one half is a combinatorial structure given by a and the other half is given by b. It's definition is amazingly simple:

> union a b = a*b

> (//) :: Fractional a => [a] -> (Integer -> Bool) -> [a]
> (//) a c = zipWith (\a-> \b->(if (c a :: Bool) then b else 0)) [(0::Integer)..] a
> nonEmpty a = a // (/= 0)

> factorial 0 = 1
> factorial n = n*factorial (n-1)

> count n a = numerator ((a!!(fromInteger n)) * (factorial (fromInteger n)))

Now for some more examples. A tree is, by definition, a single parent node and a set of child trees. So we have the recursive definition:

> tree x = p where p = [0] ... union (set p) x

The [0] ... is to get the recursion started. Try sample (tree z) to count the number of trees.

The last example is neat. graph is the exponential generating function of the number of ways of making a graph from {1,...,n}. (A006125) It's not too hard to see what's going on: a graph on a set S is given by a set of edges. In other words it's a subset of the set of unordered pairs of elements of S. If S has size n, there are n(n-1)/2 such pairs.

> graph = [2^((n*(n-1) `div` 2)) / product (map fromInteger [1..n]) | n <- [0..]] :: [Rational]

exp has a really amazing property. If f is the expnential power series corresponding to counting some structure, say F, on {1,...,n} then exp(f) is the power series of the structure consisting of the union of a bunch of non-zero sized F's. We can put this into practice: all graphs can be decomposed as the union of connected graphs. So the generating function of connected graphs is given by

> connectedGraph = 1 + log graph

(The 1+ is because of the non-zero I just mentioned.)

That was Sloane's A001187.

By the way, this plays a big role in quantum field theory where we have to enumerate Feynman diagrams.

Anyway, if you didn't previously know about generating functions I hope this whets your appetite to read up on them some more. They are an almost magical part of mathematics that makes the solution of some very difficult seeming combinatorial problems almost trivial. They also have countless real world applications ranging from statistics to digital signal processing.

Saturday, November 03, 2007

A Blobby Language

As I don't want to run the risk of drawing down the ire of George Lucas's lawyers I work hard not to post anything in my blog that has anything to do with my work, that could be useful for work, or even anything that could have any useful application whatsoever. But earlier this year I presented a little bit of work publicly and the presentation is available online, so there can be no harm in linking to it.

First some background: Renderman is a "production quality" 3D renderer developed by Pixar and used by many visual effects houses, including my employer, ILM. One of its neatest features is that lighting and shading computations (ie. the calculation of the colour of every pixel once the geometry that you're looking at has been determined) are performed in a shading language called Renderman Shader Language. It's a C-like DSL that's compiled to a bytecode that runs on a SIMD virtual machine.

Buried within the Renderman documentation is another less well known virtual machine that is used to compute "blobbies". A blobby is simply an isosurface, ie. the set of points that form the solution to f(x,y,z)=c for a function f. The virtual machine is the one on which the function f is computed. Unlike shading, however, there are bytecodes for this VM, but no programming language. So this presentation was about my solution to the problem: embedding a tiny DSL in Python that compiles down to bytecodes. You write ordinary looking Python code but arithmetic operators and other functions are overloaded to build an AST that is converted to an "opcode" stream that is sent to Renderman. From a computer science standpoint it really is basic stuff, but as the author of the blobby code (Tom Duff of Duff's Device in fact) said to me, he's been waiting for someone to write a compiler for his opcodes for years and mine was apparently the first one. So here's a link to the presentation. There isn't really much detail in the presentation, but it might give at least a tiny flavour of what I do in my day job.

Blog Archive