Sunday, April 27, 2008

Infinitesimal rotations and Lie algebras

A little while back I tried to give a rough idea of what Lie algebras were about. What I want to do now is show how with a bit of Haskell code we can directly get our hands on one. I'm going to assume you know how to make matrices for rotations, and multiply them, and some knowledge from previous posts, for which I'll give links.

Firstly a bit of Haskell administration:

> {-# OPTIONS -fno-warn-missing-methods #-}

Now we need some quick and dirty matrix code:

> data Matrix a = M [[a]] deriving (Eq,Show)

> instance Functor Matrix where
> fmap f (M a) = M $ map (map f) a

> instance Num a => Num (Matrix a) where
> M a * M b = M $ mult a b

A Lie Group


What I'm going to do is start by constructing elements of the group of 3D rotations, otherwise known as SO(3), and show how there's another algebraic structure hidden inside it. So let's make some rotation matrices:

> rx theta = M $ [[1,0,0],
> [0,cos theta,-sin theta],
> [0,sin theta,cos theta]]

> ry theta = M $ [[cos theta,0,sin theta],
> [0,1,0],
> [-sin theta,0,cos theta]]

> rz theta = M $ [[cos theta,-sin theta,0],
> [sin theta,cos theta,0],
> [0,0,1]]

These are the three rotations around the x-, y- and z-axes. It's traditional to build arbitrary rotations through the use of Euler angles:

> euler [a,b,c] = rx a*ry b*rz c

The 3D rotations form an example of a Lie group. (A Lie group is essentially a group where the operations like multiplication are differentiable.)

Any 3D rotation can be constructed from a single application of euler. But notice how there's a bit of ugliness in this function. I've made an arbitrary choice of which order to apply the rotations in. I could have defined:

> euler' [a,b,c] = rz c*ry b*rx a

And it's easy to show that eulereuler'. This is because rotations don't commute. In other words, for rotations, a*b≠b*a.

We can measure the non-commutativity. Remember that any rotation has an inverse. For example rx theta*rx (-theta) gives us the identity matrix because one of these two rotations 'undoes' the other. Given any two rotations we can construct what is known as their commutator:

> commutator a b = inverse a*inverse b*a*b

The idea is that we first perform b, then a, then undo b and then undo a. If a and b commute then this expression can be rearranged to inverse a*inverse b*b*a and then the inverses cancel leaving us with the identity matrix. If they don't commute then we end up with a non-identity matrix. So the comuutator measures the extent to which matrices don't commute.

As I'm feeling lazy, I don't feel like writing inverse. Instead, as I'm only going to work with rotations, I'll use the fact that the inverse of a rotation matrix is the transpose and define:

> inverse = transpose

Try playing with expressions like commutator (rx 1) (ry 2). Note how the numbers quickly get messy. Try to write down closed form expressions for applications of euler and you'll see how complex things can get.

A Lie Algebra


But there's another way to approach the rotation group - through 'infinitesimal' rotations. In my earlier article I just talked about infinitesimal group operations in a hand-wavey way. Now I'm going to make the notion more rigorous. We just need to introduce an infinitesimal number, d, whose square is zero. I've talked about this a lot before so I'm borrowing my earlier code and defining:

> d :: Num a => Dual a
> d = D 0 1

If you try it you'll see that d*d is zero.

Now we can try making infinitesimal rotations:

> rot1 = euler [d,2*d,3*d]

Note how when we evaluate this we get 'nice' numbers. No need to worry about those trig functions any more. And if you look closely at rot1 you'll see that it's essentially the identity matrix plus an infinitesimal part. We can pull the infinitesimal part out using fmap im rot1. You may be able to guess how to build it from the arguments to euler. But first, try evaluating rot2:

> rot2 = euler' [d,2*d,3*d]

It's the same! Try other infiitesimal rotations. When dealing with infinitesimal rotations it doesn't matter what order you apply them in, you get the same result. Working with infinitesimal rotations is looking much easier than working with full=size rotations. In fact, it gets better. Try defining

> rot3 = euler [5*d,-d,2*d]

Now look at fmap im (rot1*rot3) and compare with fmap im rot1 and fmap im rot3. We can multiply infinitesimal rotations simply by adding their infinitesimal parts. In fact, we can define

> star [a,b,c] = M $ [[0, -c, b],
> [c,0,-a],
> [-b,a,0]]

So we have:
fmap im (euler [a*d,b*d,c*d]) == star [a,b,c]

and
fmap im (euler [a*d,b*d,c*d]*euler [u*d,v*d,w*d]) == fmap im (euler [(a+u)*d,(b+v)*d,(c+w)*d])


Not a single trig expression anywhere!

So we have a simplified way of viewing rotations by looking at infinitesimal rotations. A triple [a,b,c] can be thought of as representing an infinitesimal rotation through star and instead of multiplying matrices we just add the triples elementwise. These triples, together with the binary operation of addition form an example of a Lie algebra. But there's a piece missing. We have lost all information about the non-commutativity of the rotations. It's one thing to simplify, but it's another to lose an important feature of what you're looking at.

The problem is that d is 'too small'. We need an infinitesimal that doesn't go to zero the moment you square it, but is still, in some sense, infinitesimally small. We could rewrite the Dual type. But there's a trick. Define:

> e :: Num a => Dual (Dual a)
> e = D (D 0 1) (D 1 0)

(If you think of Dual as forming a tensor product as I described here then e=1⊗d+d⊗1.)

You can check that e^2 is non-zero but e^3 is zero.

Now when we compute a commutator we get something a little different:

> comm1 = commutator (euler [e,0,0]) (euler [0,e,0])

fmap re comm1 is essentially the identity as before. But if we look at fmap im comm2 there's a non-zero infinitesimal piece which is different from what we had when we worked with d. This infinitesimal piece is in fact proportional to e^2. As (im . im) (e^2) is a half, we can extract the coefficient of e^2 from comm1 using

> im2 x = im (im x)/2

In fact, we have fmap im2 comm1 == star [0,0,1]. So by choosing infinitesimals that aren't too small, we haven't lost information about non-commutativity. In fact, a bit of experimentation may convince you that with:

> shrink u = fmap (e*) u

we have:
fmap im2 (commutator (euler (shrink u)) (euler (shrink v))) == star (u `cross` v)


So let's step back a bit and recapitulate all this in something approaching English:

Given a tiny rotation we can represent it as three Euler angles a, b, c, all of which are tiny. We can think of a, b and c as forming a vector [a,b,c]. When we do this, apart from an even smaller error, multiplication of rotations becomes ordinary addition of vectors and the order of rotations isn't significant. But if we choose not to ignore this small error we see that a rotation represented by u and a rotation represented by v don't quite commute and the order does matter. The size of this error is measured by the cross product of u and v. This is intuitively plausible, we'd expect that rotations defined by vectors in a similar direction would be closer to commuting, and this is reflected in the fact that the cross product is zero for parallel vectors.

So now I can fill in the missing piece from the description of the Lie algebra I gave above. The Lie algebra so(3) consists of the 3d vectors (representing infinitesimal rotations), addition of vectors (representing multiplcation of infinitesimal rotations) and the cross product (measuring the amount by which small rotations fail to commute). This picture doesn't just apply to the rotations, a similar one applies for any Lie group. We get the same pattern of a simplified form of multiplication and a binary operation that measures non-commutativity.

But you might still ask if there's something missing. What if we defined f so that f^4 is zero but f^3 isn't? Would we extract more information about the non-commutativity of rotations? Well the interesting fact, which I won't prove here, is that it doesn't. Almost everything you need to know about a Lie group can be extracted from its Lie algebra. In fact, from a group's Lie algebra you can almost recover the original group. (In fact, what you recover is its universal cover.) But notice how there are no trig formulae involved when talking about Lie algebras. Lie algebras give a really nice way to study Lie groups without getting your hands too dirty. But this isn't the only reason to study Lie algebras, many physical properties arising from symmetries are more naturally studied through the Lie algebra because Lie algebras arise from Lie groups as soon as you start differentiating things. In particular, Lie algebras play a major role in Noether's theorem, one of the cornerstones of modern theoretical physics.

In summary then, I hope I've given some flavour of Lie algebras. Using infinitesimals you can get your hands on them directly without the use of a large amount of mathematical machinery. There has been some difficult stuff here, but I'm hoping that the freedom you now have at the Haskell prompt to play with the things I've been talking about will make up for the inadequacies of my explanations.

One last word: in principle we could do the same with E8. But we'd need 248 by 248 matrices, and the equivalent of euler would need 248 parameters. (That's a coincidence, in general the number of parameters needed to define an element of a Lie group isn't equal to the dimension of the matrix, but it is for SO(3) and E8).


Appendix (the bits of code I left out above)



Defining the infinitesimals:

> data Dual a = D a a deriving (Show,Eq)

Extract the 'full-size' and 'infinitesimal' parts of a number:

> re (D a _) = a
> im (D _ b) = b

> instance Num a => Num (Dual a) where
> fromInteger i = D (fromInteger i) 0
> (D a a')+(D b b') = D (a+b) (a'+b')
> (D a a')-(D b b') = D (a-b) (a'-b')
> (D a a')*(D b b') = D (a*b) (a*b'+a'*b)

> instance Floating a => Floating (Dual a) where
> cos (D a a') = D (cos a) (-sin a*a')
> sin (D a a') = D (sin a) (cos a*a')

> instance Fractional a => Fractional (Dual a)

Some useful matrix and vector operations:

> mult a ([]:_) = map (const []) a
> mult a b = zipWith (:) (map (dot (map head b)) a) (mult a (map tail b))

> transpose (M a) = M $ transpose' a where
> transpose' [] = repeat []
> transpose' (xs : xss) = zipWith (:) xs (transpose' xss)

Some simple vector operations

> dot a b = foldr (+) 0 $ zipWith (*) a b
> cross [a,b,c] [d,e,f] = [b*f-c*e,c*d-a*f,a*e-b*d]

Saturday, April 12, 2008

Negative Probabilities

I'm always interested in new ways to present the ideas of quantum mechanics to a wider audience. Unfortunately the conventional description of quantum mechanics requires knowledge of complex numbers and vector spaces making it difficult to avoid falling back on misleading analogies and metaphors. But there's another approach to quantum mechanics, due to Feynman, that I have never seen mentioned in the popular science literature. It assumes only basic knowledge of probability theory and negative numbers to get a foot on the ladder.

We start with tweaking probability theory a bit. One of the axioms of probability theory says that all probabilities must lie in the range zero to one. However, we could imagine relaxing this rule even though on the face of it it seems meaningless. For example, suppose we have a coin that has a 3/2 chance of landing heads and a -1/2 chance of landing tails. We can still reason that the chance of getting two heads in a row is 3/2×3/2=9/4 by the usual multiplication rule. But obviously no situation like this could ever arise in the real world because after tossing such a coin 10 times we'd expect to see -5 tails on average.

But what if we could contrive a system with some kind of internal state governed by negative probabilities even though we couldn't observe it directly? So consider this case: a machine produces boxes with (ordererd) pairs of bits in them, each bit viewable through its own door. Let's suppose the probability of each possible combination of two bits is given by the following table:






First bitSecond bitProbability
00-1/2
011/2
101/2
111/2

Obviously if we were able to look through both doors we'd end up with the meaningless prediction that we'd expect to see 00 a negative number of times. But suppose that things are arranged so that we can only look through one door. Maybe the boxes self-destruct if one or other door is opened but you still get enough time to see what was behind the door. Now what happens?

If you look through the first door the probability of seeing 1 is P(10)+P(11)=1. We get the same result if we look through the second door. We only get probabilities in the range zero to one. As long as we're restricted to one door we get meaningful results.

If we were to perform this experiment repeatedly with different runs of the machine, each time picking a random door to look through, we'd eventually become very confident that every box contained 11. After all, if we freely choose which door to look through, and we always see 1, there's no place 0's could be 'hiding'.

But now suppose a new feature is added to the box that allows us to compare the two bits to see if they are equal. It reveals nothing about what the bits are, just their state of equality. And of course, after telling us, it self-destructs. We now find that the probability of the two bits being different is P(01)+P(10)=1. So if we randomly chose one of the three possible observations each time the machine produced the box we'd quickly run into the strange situation that the two bits both appear to be 1, and yet are different. But note that although the situation is weird, it's not meaningless. As long as we never get to see both bits at the same time we never directly observe a paradox. If we met such boxes in the real world we'd be forced to conclude that maybe the boxes knew which bit you were going to look at and changed value as a result, or that maybe you didn't have the free will to choose door that you thought you had, or maybe, even more radically, you'd conclude that the bits generated by the machine were described by negative probabilities.

That's all very well, but obviously the world doesn't really work like this and we never see boxes like this. Except that actually it does! The EPR experiment has many similarities to the scenario I described above. The numbers aren't quite the same, and we're not talking about bits in boxes, but we do end up with a scenario involving observations of bits that simply don't add up. If we do try to explain what's going on using probability theory, we either conclude there's something weird about our assumptions of locality or causality or we end up assigning negative probabilities to the internal states of our systems. In fact, you can read the details in an article by David Schneider. Being forced to conclude that we have negative probabilities in a physical system is usually taken as a sign that we have a contradiction. In the case of Bell's theorem it shows that we can't interpret what we see in terms of probability theory and hence that the weirdness of quantum mechanics can't be explained in terms of some hidden random variable that we can't see. QM simply doesn't obey the rules you'd expect of hidden variables we can't see.

But in a paper called Negative Probability, Feynman tried taking the idea of negative probabilities seriously. He showed that you could reformulate quantum mechanics completely in terms of them so that you no longer needed to think in terms of the complex number valued 'amplitudes' that physicists normally use. This means the above isn't just an analogy, it's actually a formal system within which you can do QM, although I haven't touched on the bit that refers to the dynamics of quantum systems. So if you can get your head around the ideas I've talked about above you're well on your way to understanding some reasons why quantum mechanics seems so paradoxical.

At this point you may be wondering how nature contrives to hide these negative probabilities from direct observation. Her trick is that making one kind of observation disturbs up the state of what you've observed so that you can't make the other kind of observation on a pristine state. You have to pick one kind of observation or the other. Electrons and photons really are a lot like the boxes I just described.

So why don't physicists use this formulation? Despite the fact that negative numbers seem simpler to most people than imaginary numbers, the negative number formulation of QM is much more complicated. What's more, because it makes exactly the same predictions as regular QM there's no compelling reason to switch to it. And anyway, it's not as if directly observing negative probabilities is any more intuitive or meaningful than imaginary ones. Once you've introduced negative ones, you might as well go all the way!

This all ties in with what I said a while back. The important thing about QM is that having two ways to do something can make it less likely to happen, not more.

For a different perspective this is an interesting comment.

Footnote: We can embed QM in negative probability theory. But can we do the converse? Can every negative probability distribution be physically realised in a quantum system? I've a hunch the answer is obvious but I'm too stupid to see it.


Blog Archive