Sunday, April 27, 2008

Infinitesimal rotations and Lie algebras

A little while back I tried to give a rough idea of what Lie algebras were about. What I want to do now is show how with a bit of Haskell code we can directly get our hands on one. I'm going to assume you know how to make matrices for rotations, and multiply them, and some knowledge from previous posts, for which I'll give links.

Firstly a bit of Haskell administration:

> {-# OPTIONS -fno-warn-missing-methods #-}

Now we need some quick and dirty matrix code:

> data Matrix a = M [[a]] deriving (Eq,Show)

> instance Functor Matrix where
> fmap f (M a) = M $ map (map f) a

> instance Num a => Num (Matrix a) where
> M a * M b = M $ mult a b

A Lie Group


What I'm going to do is start by constructing elements of the group of 3D rotations, otherwise known as SO(3), and show how there's another algebraic structure hidden inside it. So let's make some rotation matrices:

> rx theta = M $ [[1,0,0],
> [0,cos theta,-sin theta],
> [0,sin theta,cos theta]]

> ry theta = M $ [[cos theta,0,sin theta],
> [0,1,0],
> [-sin theta,0,cos theta]]

> rz theta = M $ [[cos theta,-sin theta,0],
> [sin theta,cos theta,0],
> [0,0,1]]

These are the three rotations around the x-, y- and z-axes. It's traditional to build arbitrary rotations through the use of Euler angles:

> euler [a,b,c] = rx a*ry b*rz c

The 3D rotations form an example of a Lie group. (A Lie group is essentially a group where the operations like multiplication are differentiable.)

Any 3D rotation can be constructed from a single application of euler. But notice how there's a bit of ugliness in this function. I've made an arbitrary choice of which order to apply the rotations in. I could have defined:

> euler' [a,b,c] = rz c*ry b*rx a

And it's easy to show that eulereuler'. This is because rotations don't commute. In other words, for rotations, a*b≠b*a.

We can measure the non-commutativity. Remember that any rotation has an inverse. For example rx theta*rx (-theta) gives us the identity matrix because one of these two rotations 'undoes' the other. Given any two rotations we can construct what is known as their commutator:

> commutator a b = inverse a*inverse b*a*b

The idea is that we first perform b, then a, then undo b and then undo a. If a and b commute then this expression can be rearranged to inverse a*inverse b*b*a and then the inverses cancel leaving us with the identity matrix. If they don't commute then we end up with a non-identity matrix. So the comuutator measures the extent to which matrices don't commute.

As I'm feeling lazy, I don't feel like writing inverse. Instead, as I'm only going to work with rotations, I'll use the fact that the inverse of a rotation matrix is the transpose and define:

> inverse = transpose

Try playing with expressions like commutator (rx 1) (ry 2). Note how the numbers quickly get messy. Try to write down closed form expressions for applications of euler and you'll see how complex things can get.

A Lie Algebra


But there's another way to approach the rotation group - through 'infinitesimal' rotations. In my earlier article I just talked about infinitesimal group operations in a hand-wavey way. Now I'm going to make the notion more rigorous. We just need to introduce an infinitesimal number, d, whose square is zero. I've talked about this a lot before so I'm borrowing my earlier code and defining:

> d :: Num a => Dual a
> d = D 0 1

If you try it you'll see that d*d is zero.

Now we can try making infinitesimal rotations:

> rot1 = euler [d,2*d,3*d]

Note how when we evaluate this we get 'nice' numbers. No need to worry about those trig functions any more. And if you look closely at rot1 you'll see that it's essentially the identity matrix plus an infinitesimal part. We can pull the infinitesimal part out using fmap im rot1. You may be able to guess how to build it from the arguments to euler. But first, try evaluating rot2:

> rot2 = euler' [d,2*d,3*d]

It's the same! Try other infiitesimal rotations. When dealing with infinitesimal rotations it doesn't matter what order you apply them in, you get the same result. Working with infinitesimal rotations is looking much easier than working with full=size rotations. In fact, it gets better. Try defining

> rot3 = euler [5*d,-d,2*d]

Now look at fmap im (rot1*rot3) and compare with fmap im rot1 and fmap im rot3. We can multiply infinitesimal rotations simply by adding their infinitesimal parts. In fact, we can define

> star [a,b,c] = M $ [[0, -c, b],
> [c,0,-a],
> [-b,a,0]]

So we have:
fmap im (euler [a*d,b*d,c*d]) == star [a,b,c]

and
fmap im (euler [a*d,b*d,c*d]*euler [u*d,v*d,w*d]) == fmap im (euler [(a+u)*d,(b+v)*d,(c+w)*d])


Not a single trig expression anywhere!

So we have a simplified way of viewing rotations by looking at infinitesimal rotations. A triple [a,b,c] can be thought of as representing an infinitesimal rotation through star and instead of multiplying matrices we just add the triples elementwise. These triples, together with the binary operation of addition form an example of a Lie algebra. But there's a piece missing. We have lost all information about the non-commutativity of the rotations. It's one thing to simplify, but it's another to lose an important feature of what you're looking at.

The problem is that d is 'too small'. We need an infinitesimal that doesn't go to zero the moment you square it, but is still, in some sense, infinitesimally small. We could rewrite the Dual type. But there's a trick. Define:

> e :: Num a => Dual (Dual a)
> e = D (D 0 1) (D 1 0)

(If you think of Dual as forming a tensor product as I described here then e=1⊗d+d⊗1.)

You can check that e^2 is non-zero but e^3 is zero.

Now when we compute a commutator we get something a little different:

> comm1 = commutator (euler [e,0,0]) (euler [0,e,0])

fmap re comm1 is essentially the identity as before. But if we look at fmap im comm2 there's a non-zero infinitesimal piece which is different from what we had when we worked with d. This infinitesimal piece is in fact proportional to e^2. As (im . im) (e^2) is a half, we can extract the coefficient of e^2 from comm1 using

> im2 x = im (im x)/2

In fact, we have fmap im2 comm1 == star [0,0,1]. So by choosing infinitesimals that aren't too small, we haven't lost information about non-commutativity. In fact, a bit of experimentation may convince you that with:

> shrink u = fmap (e*) u

we have:
fmap im2 (commutator (euler (shrink u)) (euler (shrink v))) == star (u `cross` v)


So let's step back a bit and recapitulate all this in something approaching English:

Given a tiny rotation we can represent it as three Euler angles a, b, c, all of which are tiny. We can think of a, b and c as forming a vector [a,b,c]. When we do this, apart from an even smaller error, multiplication of rotations becomes ordinary addition of vectors and the order of rotations isn't significant. But if we choose not to ignore this small error we see that a rotation represented by u and a rotation represented by v don't quite commute and the order does matter. The size of this error is measured by the cross product of u and v. This is intuitively plausible, we'd expect that rotations defined by vectors in a similar direction would be closer to commuting, and this is reflected in the fact that the cross product is zero for parallel vectors.

So now I can fill in the missing piece from the description of the Lie algebra I gave above. The Lie algebra so(3) consists of the 3d vectors (representing infinitesimal rotations), addition of vectors (representing multiplcation of infinitesimal rotations) and the cross product (measuring the amount by which small rotations fail to commute). This picture doesn't just apply to the rotations, a similar one applies for any Lie group. We get the same pattern of a simplified form of multiplication and a binary operation that measures non-commutativity.

But you might still ask if there's something missing. What if we defined f so that f^4 is zero but f^3 isn't? Would we extract more information about the non-commutativity of rotations? Well the interesting fact, which I won't prove here, is that it doesn't. Almost everything you need to know about a Lie group can be extracted from its Lie algebra. In fact, from a group's Lie algebra you can almost recover the original group. (In fact, what you recover is its universal cover.) But notice how there are no trig formulae involved when talking about Lie algebras. Lie algebras give a really nice way to study Lie groups without getting your hands too dirty. But this isn't the only reason to study Lie algebras, many physical properties arising from symmetries are more naturally studied through the Lie algebra because Lie algebras arise from Lie groups as soon as you start differentiating things. In particular, Lie algebras play a major role in Noether's theorem, one of the cornerstones of modern theoretical physics.

In summary then, I hope I've given some flavour of Lie algebras. Using infinitesimals you can get your hands on them directly without the use of a large amount of mathematical machinery. There has been some difficult stuff here, but I'm hoping that the freedom you now have at the Haskell prompt to play with the things I've been talking about will make up for the inadequacies of my explanations.

One last word: in principle we could do the same with E8. But we'd need 248 by 248 matrices, and the equivalent of euler would need 248 parameters. (That's a coincidence, in general the number of parameters needed to define an element of a Lie group isn't equal to the dimension of the matrix, but it is for SO(3) and E8).


Appendix (the bits of code I left out above)



Defining the infinitesimals:

> data Dual a = D a a deriving (Show,Eq)

Extract the 'full-size' and 'infinitesimal' parts of a number:

> re (D a _) = a
> im (D _ b) = b

> instance Num a => Num (Dual a) where
> fromInteger i = D (fromInteger i) 0
> (D a a')+(D b b') = D (a+b) (a'+b')
> (D a a')-(D b b') = D (a-b) (a'-b')
> (D a a')*(D b b') = D (a*b) (a*b'+a'*b)

> instance Floating a => Floating (Dual a) where
> cos (D a a') = D (cos a) (-sin a*a')
> sin (D a a') = D (sin a) (cos a*a')

> instance Fractional a => Fractional (Dual a)

Some useful matrix and vector operations:

> mult a ([]:_) = map (const []) a
> mult a b = zipWith (:) (map (dot (map head b)) a) (mult a (map tail b))

> transpose (M a) = M $ transpose' a where
> transpose' [] = repeat []
> transpose' (xs : xss) = zipWith (:) xs (transpose' xss)

Some simple vector operations

> dot a b = foldr (+) 0 $ zipWith (*) a b
> cross [a,b,c] [d,e,f] = [b*f-c*e,c*d-a*f,a*e-b*d]

Labels: , ,

Saturday, March 08, 2008

Comonadic Arrays

On haskell-cafe, ajb, aka Pseudonym, laments that many people don't have enough experience with comonads to recognise them. So I thought I'd mention a really simple example of a comonad that nevertheless captures the essence of a large class of comonads. It's conceptually not much different to my cellular automaton example (making this a bit of a rerun), but this should be easier to understand. And if it's too trivial, I hint at a particle physics connection towards the end.

Firstly, you can skip this paragraph if you don't want a quick bit of theoretical discussion. Consider arrays of fixed dimension. As types, they look something like XN for some fixed integer N. From a container we construct its zipper by applying X d/dX. In this case we get XNXN-1=NXN. In other words, the corresponding zipper is an array paired with an index into the array. We can stretch the meaning of comonad slightly to allow this to relate to arrays whose size isn't fixed.

So here's some code:


> import Data.Array


The usual definition of Comonad:


> class Functor w => Comonad w where
> (=>>) :: w a -> (w a -> b) -> w b
> coreturn :: w a -> a


And now a type that is a pair of an array and an index into that array:


> data Pointer i e = P i (Array i e) deriving Show


Think of it as an array with one of its elements singled out for special attention. It trivially inherits the functoriality of ordinary arrays:


> instance Ix i => Functor (Pointer i) where
> fmap f (P i a) = P i (fmap f a)


And now comes the Comonad implementation. coreturn serves to pop out the special element from its context - in other words it gives you the special element, whle throwing away the array it lived in. (=>>), on the other hand, applies a function f of type P i a -> b to the entire array. The function is applied to each element in turn, making each element the special element for long enough to apply f.


> instance Ix i => Comonad (Pointer i) where
> coreturn (P i a) = a!i
> P i a =>> f = P i $ listArray bds (fmap (f . flip P a) (range bds))
> where bds = bounds a


Compare with fmap for arrays. This walks through each element in turn, applying a function to each element, and return an array of results. The computation for each element is separate from all the others. With =>> however, the entire array may be used for the computation of each element of the result, with the index into the array serving to indicate which element it is we should be focussing on.

For example, here's an array of values:


> x = listArray (0,9) [0..9]


We want to consider this to be a circular array so that going off one end wraps around to the beginning:


> wrap i = if i<0 then i+10 else if i>9 then i-10 else i


Now here's a simple operation that 'blurs' the single ith pixel in the 1-D image represented by x:


> blur (P i a) = let
> k = wrap (i-1)
> j = wrap (i+1)
> in 0.25*a!k + 0.5*a!i + 0.25*a!j


We can apply this to the entire image thusly


> test1 = P 0 x =>> blur


Note the curious way I have to use P 0 x as an input to blur. There seems to be a redundancy here, we want the resulting array and don't care what the focussed element is. But =>> wants us to give it a focal point. Curiously it's making explicit something that's familiar to C programmers, but is slightly hidden in C. In C, you refer to an array of floats using a float *. But the same type points to elements of the array as well. So when you point to an array, you are, in effect, blurring the distinction between a pointer to an array and a pointer to the first element. Comonads make that distinction explicit.

Anyway, suppose you wanted to apply a sequence of operations. Adding, blurring, scaling, nonlinearly transforming and so on. You could write a pipeline like


> x ==> f = f x
> test2 = P 0 x ==> fmap (+1) =>> blur ==> fmap (*2) ==> fmap (\x -> x*x)


Note how ==> fmap f ==> fmap g = ==> fmap (g . f). If you think of fmap as farming out workloads to a SIMD processor with one thread applied to each array element, sequences of fmaps correspond to threads that can continue to work independently. The comonadic operations, however, correspond to steps
where the threads must synchronise and talk to each other. I believe this last statement explains the a cryptic comment in the comments to this blog entry.

One final thing going back to my optional paragraph above. If D means d/dX then we can see the operator XD as a kind of number operator. When you apply it to an array like XN the array becomes multiplied by a type corresponding to the array index type. For ordinary arrays, these are just integers. So you can see XD as a way of counting how many elements there are in a container. Also, for any container F, we also have the equations D(XF) = XDF+F, which we can write as DX=XD+1. At some point, when I have time, I'll point out how this is closely related to the Heisenberg uncertainty principle and how when we say that differentiation makes holes in a data type, it's related to the notion of a hole in solid state physics.

Oh, and I sketched this diagram but don't have time to write an explanation:

Labels: , , ,

Tuesday, April 10, 2007

The curious rotational memory of the Electron, Part 2

A spinning top keeps going because it has angular momentum, and the angular momentum arises because the top is made up of mass that is distributed around the axis of rotation. The angular momentum is represented by a vector (well, a pseudovector really) directed along the axis of spin (at least when it's spinning around the top's axis). Similarly electrons have angular momentum. It's tempting to think of electrons as like microscopic spinning tops but the angular momentum of electrons isn't due to an extended mass rotating about some axis. In fact, as far as we know, electrons have no extension. Instead their angular momentum is a peculiar kind of intrinsic angular momentum that has a lot in common with ordinary angular momentum - eg. it's conserved and it's interconvertible with ordinary angular momentum. Because of it's not being associated with a mass distribution it's given the special name of spin, but as long as you bear in mind these provisos it's probably fair to use at least some intuitions about spinning tops when thinking about electrons.

But it's a little more complex than this. We also have quantum mechanics to contend with. The spin of an electron is a vector. But we find that when we measure one of the components of this vector this value is quantised and can only take values +hbar/2 and -hbar/2, where hbar is Planck's constant. We choose units where h-bar is 1 so the z-component of the spin is always measured to be +1/2 or -1/2. If we write these two states as |+> and |-> then because we are dealing with quantum mechanics, the z-component of the spin can be represented by the linear combination a|+>+b|->. This corresponds to a state in which there is a probability |a|² of measuring +1/2 and a probability |b|² of measuring -1/2. This is what might have been written as a.*return (1/2)+b.*return (-1/2) in my earlier Haskell code. But that's just one component of the spin. What about the x- and y-components? Amazingly the state a|+>+b|-> tells us everything we can possibly know about the spin of an electron and we'll call it a spin state.

Suppose we have an electron in the state ψ = a|+>+b|->. What happens if we measure the y-component of its spin? One way to answer that question is to rotate the electron through π/2 so that its x-axis is rotated to align with the z-axis and then measure the z-component of its spin. In order to to that we need to know how to rotate spin states. The rule for rotation through θ about the x-axis is this (in a suitable coordinate frame):

|+> → cos(θ/2)|+>-sin(θ/2)|->
|-> → sin(θ/2)|+>+cos(θ/2)|->

Note how choosing θ=0 gives the identity, as expected. Note also that θ=π maps a|+>+b|-> to b|+>-a|-> so that the probabilities of measuring +1/2 and -1/2 are simply swapped, exactly what you'd expect for turning a state upside down. But there's something else that you should notice - there's an ambiguity. A rotation through 2π should give the same as a rotation through 0 and yet setting θ=2π in that transformation maps a state ψ to -ψ. Now |a|² = |-a|² so the probability of observing spin up or spin down is unaffected. But as I've been showing over previous posts, flipping a sign in a state can make a big difference as soon as you start performing interference experiments. The same goes for any angle: if I rotate through π should I use θ=π or θ = 3π? So can the transformation I've given make sense?

The transformation does make sense if you consider that in any physical process that rotates an electron the transformation will evolve continuously over time. Electrons don't just instantly rotate. In other words, if a rotation is applied to an electron then it will follow a path in SO(3), not just be an instantaneous application of an element of SO(3). And that allows us to resolve the ambiguity: the rotations of electrons are described by the double cover of SO(3) known as SU(2). So a rotation through 360 degrees doesn't return you to the identity although a 720 degree rotation does. The transformation I gave above is completely unambiguous if you continuously rotate an electron around the x-axis tracking a continuous value of θ, after all, the double cover is basically just the set of continuous paths from the identitiy in SO(3) (with homotopic paths considered equivalent).

And that's the bizarre fact: electron rotations aren't described by SO(3), they're described by SU(2). In particular, rotating an electron through 360 degrees does not return it to its original state, but a rotation through 720 degrees does! In a sense, like Dirac's belt, electrons can remember something about the path they took to get where they are, in particular they remember how many twists there were in the path.

What does this mean experimentally? the first thing to note is that this is true not just for electrons but any spin-1/2 fermion. This included protons and neutrons. The stuff I've been talking about manifests itself in a number of ways. In particular, the spin of a particle affects how a magnetic field acts on it. For example, spin-up and spin-down particles can be separated into distinct beams using Stern-Gerlach apparatus. Also, the spin of particles precesses in a magnetic field and this is used on a regular basis in NMR. These two facts allow us to easily manipulate and measure the spin of fermions. In other words, the fact that fermions remember how many twists there are in their rotations isn't just some esoteric nonsense, it's now engineering and the theory is tested repeatedly all over the world.

Every familiar object is invariant under rotations through 360 degrees. So the fact that electrons need to be rotated through 720 degrees to return them to their original state seems like one of the most bizarre facts about the universe I know of. And yet many books that introduce spin just slip in this fact in a routine way as if it were no different to any other.

The fact that the biggest connected cover of SO(3) is the double cover puts a big constraint on the kinds of weird effects like this can happen. We can have a 360 degree rotation multiply by -1, but not by i, because a 720 degree rotation absolutely has to return us to where we started from. But suppose the universe were 2-dimensional. If you remember what I said about SO(2) you may notice that no such constraints apply because SO(2) has an infinite cover. There is a group in which all of the rotations through 360n degrees are distinct for distinct n. This means that a physical system could have its state multiplied by any factor (of modulus 1) when rotated through 360 degrees. Particle that behave this way are called anyons. But we live in a 3D universe so we don't expect any fundamental particles to have this property. However, in quantum mechanics any kind of 'excitation' of a physical system is quantised and can be thought of as a type of particle. These are known as quasiparticles. For example, just as light is made of photons, sound is also quantised as phonons. In the right kind of solid state medium, especially those that arise from some kind of 2D lattice, it seems quite plausible that anyons might arise. This gives rise to the so called fractional quantum hall effect. Anyons might one day play an important role in quantum computing via topological quantum computation.

Labels: ,

Saturday, March 31, 2007

The Curious Rotational Memory of the Electron, Part 1

There's a curious and bizarre fact about the universe that is introduced in physics courses without anyone stopping to point out just how curious and bizarre it is. I think it deserves to be more widely known to non-physicists. It's a little esoteric, but not so esoteric it can't be grounded in ordinary everyday concepts. And it isn't something hypothetical but something that is fundamental to modern physics and testable in the lab. It is also the reason why I chose to use theta/2 instead of theta in my quantum computing code.

This post will be in two parts. This part will mainly be about geometry and topology, and in the second part I'll talk about the physics.

In order to explain the idea I want to introduce the concept of a universal cover. If you look at the definition of this notion anywhere it can seem pretty daunting to non-mathematicians, but in actual fact the key idea is so simple that even a child could understand it. I can say that without fear of exaggeration because universal covers used to play a role in a childhood fantasy of mine. (OK, I was a weird kid, but not that weird.)

My crazy idea as a kid was this: if there were two ways to get from A to B, maybe the B you reached by taking either of the two routes were actually slightly different, even though they looked the same. I was actually more concerned with journeys from A to A. Suppose the kitchen has a door at each end and that you start in the bedroom, go down to the kitchen at one end, come out the other end, and return by a different path to the bedroom. How do you know this is the same bedroom? (I also admit I read a lot of science fiction as a kid.)



Suppose instead you took a simpler path: you left the bedroom and went to another room and then returned by the path you went. If you had taken a piece of string and tied one end to the bedroom and walked this path playing out the string behind you as you went then when you returned to your bedroom you would have both ends of the string visible. You could then reel in the string from your journey, without letting go of the ends, and see that indeed, both ends were in fact the ends of the same string and so you had retuned to the start of your journey. But if you'd taken the round trip through the double-doored kitchen you wouldn't be able to pull the string tight and so you could no longer be sure.

In what follows we consider two paths from A to B equivalent in this way: mark the two paths with strings going from A to B. If you can pull one string to line up exactly with the other (you're allowed to stretch) without breaking the string or moving the endpoints then they are considered equivalent. (The correct technical term is homotopic.) So if a path that appears to go from A to A is homotopic to the trivial zero length path, then you know that the path really is from A back to A. But if you can't shrink the path down to zero length you're faced with the possibility that the string is extended from A to a similar looking place, A'. (Of course this implies a considerable conspiracy with someone tying a distinct identical string to an identical bedroom, but if you can't see both ends, alien beings from a higher dimension could do anything...)

The best case scenario is that all of the bedrooms are the same - a conventional house. Call this house X. But what's the worst case scenario? How many different copies of the bedroom might there be? Well let's call the starting bedroom 0. If we walk anti-clockwise round the house, going through the kitchen, we end up back at a bedroom. In the worst case scenario this will be another bedroom. Let's call it 1. Walk round again and we get to another copy of the bedroom, call it 2 and so on. So we have an infinite sequence of bedrooms, (0,1,2,…). But we could have walked clockwise. In that case we'd get a sequence (-1,-2,…). So now think about the set of all points in this infinite house. Each point can be described as a pair (x,n) where x is a point in the house X and n labels which of the infinite number of paths you could have taken to get there. (Remember, if the two paths are homotopic we consider them to be the same path.)

If using a house as an example seems too far-fetched think of a mult-storey car park. It's easy to walk what you think is a closed loop but find that you've in fact walked to almost identical looking, but different, floor.

(Incidentally, this kind of infinite house is the kind of thing that can be created in a Portal engine such as in the game Portal.)

Let's think about this more generally. Given a topological space X we can form another space X' from it by (1) picking a point p in it and (2) forming the set of all pairs (x,n) where x is a point in X and n is a path (up to homotopy) from p to x. X' is known as the universal cover of X.

Enough with houses, let's try a more purely geometric example. Consider the space of rotations in 2D, otherwise known as SO(2). A rotation can simply be described by an angle with the proviso that a rotation through an angle θ is the same as a rotation through θ+2nπ where n is an integer. The space of rotations is essentially the same thing as the circle (by circle I mean the 1-dimensional boundary of a 2-dimensional disk). Looping once round the circle brings you back to where you started. So how can we visualise the universal cover of SO(2)? There's a nice trick for this. Imagine a disk that's free to rotate around one axis. SO(2) describes the rotations of this disk. But now imagine a belt, fixed at one end, connected at the other end to the centre of this disk like this:



The rule is that we're allowed to twist the belt as much as we like as long as the centre line of the belt is along the axis. Notice how if we rotate the disk through an angle of 2nπ the disk returns to its starting point, but the belt 'remembers' how many twists we took to get there. Notice another thing: if we travel along the belt from one end to the other, the orientation of the belt at each point gives a 2D rotation. In other words, the belt itself represents a path in SO(2) with one one representing a fixed point in SO(2) and the other end matching the current orientation of the disk. So if the orientations of the disk represent elements of SO(2), the possible twistings of the belt represent elements in the universal cover of SO(2). And what the twists do is allow us to differentiate between rotations through θ and rotations through θ+2nπ. In other words, the universal cover is just the real line, but not folded down so that θ is the same as θ+2nπ.

The universal cover is the worst case scenario where all non-homotopic paths are considered to represent distinct points. But it's also possible to construct an in-between scenario where some paths that aren't homotopic to each other are still considered to lead to the same place. For example, take the real line again. If we identify 0 and 2π we get SO(2). If we consider 0 and 2nπ to be distinct for n≠0 we get the universal cover. But we pick an N≠1 and consider 2Nπ to be the same as 0, but that 2nπ is distinct from 0 for 0<n<N. What we now end up with is an N-fold cover of SO(2). Unfortunately I don't know a way construct a physical example of a general N-fold cover using belts and discs. (At least for N>2. For N=2 you should easily be able to come up with a mechanism based on what's below.)

With the examples so far, the universal cover of a space has always had an infinite number of copies of the original space. That isn't always the case and now we'll meet an example where the universal cover is a double cover: the real projective plane, RP2. The easiest way to visualise this is to imagine the game of Asteroids played on a circular TV screen. In the usual Asteroids, when you go off one side of the screen you reappear on the opposite side. In this version it's much the same, points on the edge that are diametrically opposite are considered to be the same points. Such a universe is a real projective plane. Let P be the point at the centre of the screen. Suppose we attach one end of our string to P and fly off towards the edge, reappearing on the other side. Eventually we get back to P again. Can we shrink this path down to a point? Unfortunately, any path that crosses the edge of our screen will always cross the edge no matter how much we try to deform it. If we move the point at which it crosses one edge of the screen, the point at the other side always remains diametrically opposite.




This means you can never bring these points together in order to collapse down a loop. On the other hand, suppose we loop twice round the edge. The following diagram shows how this path can be shrunk down to a loop:



So given two distinct points in RP2 there are two inequivalent paths between them, and so the universal cover of RP2 is made from two copies of the space and is in fact a double cover.

Now we'll try a 3D space - the space of 3D rotations, otherwise known as SO(3). This space is 3-dimensional because it takes 3 numbers to define a rotation, for example we can specify a rotation using a vector whose direction is the axis of rotation and whose length is the angle of rotation. Now all possible rotations are rotations through an angle 0≤θ≤π for some exis. So every rotation can be represented as a point in the ball of radius π. But a rotation around the axis a through an angle of π is also the same as a rotation around the axis -a through π. In other words, SO(3) is the closed 3-dimensional ball with diametrically opposite points identified. This is very similar to the situation with RP2. In fact, almost exactly the same argument shows that the universal cover of SO(3) is the double cover. (For an alternative wording, see this posting.) There is also another way of seeing this that is analogous to the disk with belt example above. This time we have a sphere instead of a disk. The belt is fixed at one end and the other end is attached to the sphere. We are free to rotate the sphere however we want, and we're no longer confined to having the belt along one axis. It is an amazing but easily demonstrable fact that if you rotate the sphere through 2π, you cannot untwist the belt, but if you rotate it through 4π you can! This is known as the Dirac Belt Trick, and if you don't believe it, watch the Java applet written by science fiction author Greg Egan.

If you've done much computer graphics you'll have already met the double cover of SO(3). Computer graphics people sometimes like to represent rotations using unit quaternions. The unit quaternions form a 3-dimensional sphere but there are two such quaternions for each rotation. So the unit quaternions actually represent this double cover, and SO(3) is a 3-dimensional sphere with antipodean points identified.

And that's the geometry dealt with. In the next installment I'll say what this has to do with electrons and my quantum computing code.


Footnote


Balls and Spheres. The closed n-dimensional ball is the set of points (x1,...,xn) such that x12+...+xn2≤1. The (n-1)-dimensional sphere is the set of points (x1,...,xn) such that x12+...+xn2=1.

Labels: ,

Sunday, March 18, 2007

The Shor Quantum Error Correcting Code (and a Monad for Heat)

Quantum computers promise great things, but that promise will only come to be if quantum computers can be made reliable. In order to achieve that reliability we need qubits that can maintain their state for extended periods of time. Unfortunately, just about any type of interaction between a qubit and its environment can render the qubit useless. But there is a solution - a quantum error correcting code. This is very similar to classical error correcting code in which a number of bits are stored, in a distributed fashion, over a larger set of bits, in such a way that if some of the bits are corrupted, the original state can be recovered. In the following will be some code to simulate the implementation of one of these codes: the nine bit Shor code.

Firstly, let me include the quantum monad that I've developed over earlier weeks:


> import Prelude hiding (flip)
> import Maybe
> import List
> import Data.Map (toList,fromListWith)
> import Complex
> import Data.Bits hiding (rotate)

> infixl 1 ->-, -><, =>=
> infixl 7 .*


Remember that W b a is the type of elements of a vector space over b with a basis whose elements are labelled by elements of a. I'm using the letter W as I'm thinking of elements of b as being weights.


> data W b a = W { runW :: [(a,b)] } deriving (Eq,Show,Ord)


It's useful to be able to apply transforms to all of the weights:


> mapW f (W l) = W $ map (\(a,b) -> (a,f b)) l


W is an instance of Num, Functor and Monad. Vector spaces form monads because you can specify a function between vectors spaces in terms of how it maps basis elements in the first vector space. The monad bind function lifts the function on basis elements to the entire vector space.


> instance Functor (W b) where
> fmap f (W a) = W $ map (\(a,p) -> (f a,p)) a

> instance Num b => Monad (W b) where
> return x = W [(x,1)]
> l >>= f = W $ concatMap (\(W d,p) -> map (\(x,q)->(x,p*q)) d) (runW $ fmap f l)

(.*) gives us scalar multiplication:

> a .* b = mapW (a*) b

> instance (Eq a,Show a,Num b) => Num (W b a) where
> W a + W b = W $ (a ++ b)
> a - b = a + (-1) .* b
> _ * _ = error "Num is annoying"
> abs _ = error "Num is annoying"
> signum _ = error "Num is annoying"
> fromInteger a = if a==0 then W [] else error "fromInteger can only take zero argument"


It'd be cool if the bind function for W could simplfy our vectors by collecting together multiples of the same basis elements. Unfortunately this is tricky to do for reasons described in Eric Kidd's recent Set monad article. Instead I'll use an explicit function, collect, that collects our terms together. I've also used a crude method for throwing out terms that probably ought to be zero. A better solution might be to work only with rational numbers, or exact algebraic numbers.


> nonTrivial x = 10+abs x/=10
> collect :: (Ord a,Num b) => W b a -> W b a
> collect = W . filter (nonTrivial . snd) . toList . fromListWith (+) . runW


And here is the quantum monad:


> type Q a = W (Complex Double) a


There are a number of ways to apply functions to vector spaces. I'll be writing functions on the right so that I can apply operations to a quantum state reading from left to right. So here's a bunch of combinators to implement this. First there's ordinary function application:


> (->-) :: a -> (a -> b) -> b
> x ->- f = f x


Then there are functions that operate on the vector space by simply relabelling the basis:


> (=>=) :: Q a -> (a -> b) -> Q b
> x =>= f = fmap f x


Then there are functions that map from Q a to Q b but are defined by how they act on the basis elements of Q a:


> (-><) :: Q a -> (a -> Q b) -> Q b
> g ->< f = g >>= f


As I mentioned last post on entanglement, we can form states for joint independent systems using the tensor product:


> (<*>) :: Q a -> Q b -> Q (a,b)
> x <*> y = do
> a <- x
> b <- y
> return (a,b)


The tensor product is a lot like a multiplication table. The coefficient of (a,b) in x <*> y is the product of the coefficients of a and b in x and y respectively. If two systems are unentangled then you can simulate the physics of each part separately. So it'd be nice to have a way to test whether or not a given state is unentangled, in other words, to test whether a state is a tensor product. As I'm not developing a serious application here, here's a quick and dirty algorithm that more or less does what anyone would do when faced with the task of recovering the row and column headings from a multiplcation table:


> norm x = sqrt $ sum $ map (\(_,a) -> magnitude a^2) $ runW $ collect x

> untensor :: (Ord a,Ord b) => Q (a,b) -> (Q a,Q b)
> untensor (W a) =
> let u = nub $ sort $ map (fst . fst) a
> v = nub $ sort $ map (snd . fst) a
> p = W $ map (\x -> (x,fromJust (lookup (x,head v) a))) u
> q = W $ map (\x -> (x,fromJust (lookup (head u,x) a))) v
> in ((1/norm p :+ 0) .* p,(1/norm q :+ 0) .* q)


untensor recovers x and y from x <*> y (up to multiplication by a complex number with unit norm) but it gives garbage if the argument isn't a tensor product. So we need to test whether or not the original vector is the tensor product of the inputs. So here's a test of how close the original is to the product of the putative factors:


> independence x = collect (x-uncurry (<*>) (untensor x))


We'll be doing lots of stuff with tensor products so it's useful to have some tools to apply functions to one or other factor:


> mapFst f (a,b) = (f a,b)
> mapSnd f (a,b) = (a,f b)
> mapFstM f (a,b) = do
> a' <- a ->- f
> return (a',b)
> mapSndM f (a,b) = do
> b' <- b ->- f
> return (a,b')


If a bit is a Bool then by rights, a qubit should be a Q Bool. But I think 0 and 1 look nicer so I'm using Q Integer instead. So here's the fundamentally quantum rotation operator:


> rotate :: Double -> Integer -> Q Integer
> rotate theta 1 = let theta' = theta :+ 0
> in cos (theta'/2) .* return 1 - sin (theta'/2) .* return 0
> rotate theta 0 = let theta' = theta :+ 0
> in cos (theta'/2) .* return 0 + sin (theta'/2) .* return 1


Although rotate theta is a map Integer -> Q Integer, using ->< it becomes a map Q Integer to Q Integer. Note that the Double argument isn't float data inside a quantum device, it's a hard-coded feature of the hardware itself. For example, if we choose theta to be pi/2 and -pi/2 we get the standard square root of not operation and its inverse:


> hnot = rotate (pi/2)
> unhnot = rotate (-pi/2)


(You may be wondering about the use of theta/2 in my definition of rotate. There's an interesting story behind that involving what I think is one of the weirdest features of physics. For another day.)

I'm also goint to manipulate lots of lists of qubits so here are some tools for that:


> mapAt i f x = let (h,t:ts) = splitAt i x in h ++ (f t : ts)

> mapAtM :: Monad m => Int -> (a -> m a) -> [a] -> m [a]
> mapAtM i f x = let (h,t:ts) = splitAt i x in do
> y <- f t
> return $ h ++ (y : ts)

> flip a = 1-a
> flipAt i = mapAt i flip


So here's a statement of the problem. We have a qubit, ie. something of type Q Integer. But quantum systems are very delicate and our qubit may wander into a different state. In particular, after a time t it will be as if rotate theta has been applied to it, except that we can't control theta. We need to find a way to recover the original qubit. In fact, it's worse than that: our qubit is in some environment which we can consider to be a list of qubits. So our state is given by an instance of Q ([Integer],Integer) and the interaction might not simply be a rotation of the qubit but an operation that entangles the qubit with the environment. This might happen if the qubit isn't perfectly isolated.

Let's forget about the full generality here and consider just one type of data corruption - a qubit that might or might not have been flipped. How can we make a device robust against such occurences? Well this is practically identical to the problem of making classical data robust against noise. We can simply use a 3 bit repetition code to store our qubits as three qubits. So here's an implementation of a function to repair a codeword from the classical 3 bit repetition code:


> repairRepetition' [b,c,d] = (if a<3 then mapAt a flip else id) [b,c,d] where
> a = case (b,c,d) of
> (0,0,0) -> 3
> (1,1,1) -> 3
> (0,0,1) -> 2
> (1,1,0) -> 2
> (0,1,0) -> 1
> (1,0,1) -> 1
> (1,0,0) -> 0
> (0,1,1) -> 0


But there are a couple of problems with this. Firstly this is a many-to-one map because [0,0,0] and [0,0,1], say, both get mapped to [0,0,0]. In quantum mechanics, all of our maps need to be unitary, and in particular, invertible. A many-to-one map isn't invertible. Secondly, we're creating a register a, and again this is not an invertible operation. These two problems will actually cancel each other out because the apparent destruction of information that comes about from our many-to-one map can be fixed by allowing a to carry that information away. And we can ensure reversibility by using a supplied register rather than creating a new one:


> repairRepetition'' (a,[b,c,d]) = (a,(if a<3 then mapAt a flip else id) [b,c,d]) where
> a = case (b,c,d) of
> (0,0,0) -> a `xor` 3
> (1,1,1) -> a `xor` 3
> (0,0,1) -> a `xor` 2
> (1,1,0) -> a `xor` 2
> (0,1,0) -> a `xor` 1
> (1,0,1) -> a `xor` 1
> (1,0,0) -> a `xor` 0
> (0,1,1) -> a `xor` 0


It's easy to write an inverse for this function so it's a valid unitary transform. But it's actually a bit tedious to write out all those xors. So we can cheat a bit. Let's just assume there's an infinite supply of input zeroes and use them at our leisure. We no longer need to write the xors, we just note that every time we assign a value to a new register the hardware engineer building your device installs some means of import zero qubits. The a register in the above function will never be touched again. But we can't simply throw it away. It's a kind of side effect and we'll call these bits <em>ancillae</em>. We'll have lots of these side effects that we need to collect up, so here's the final quantum repetition codeword repair code:


> repairRepetition (a,[b,c,d]) = (i:a,(if i<3 then mapAt i flip else id) [b,c,d]) where
> i = case (b,c,d) of
> (0,0,0) -> 3
> (1,1,1) -> 3
> (0,0,1) -> 2
> (1,1,0) -> 2
> (0,1,0) -> 1
> (1,0,1) -> 1
> (1,0,0) -> 0
> (0,1,1) -> 0


There's one last thing we could do to this code: the pairs being handed in and out of this function are essentially just instances of the Writer monad. So we could just squirrel away our ancillae inside a monad. I won't do that here because I think it's good to see everything explicitly when figuring this stuff out for the first time. But think about the bits that we'd be writing out. They correspond to information, or entropy, that we'd ultimately like to pump out of our device as exhaust. In other words they are heat. So this monad would, in effect, be a heat monad. I found this amusing given what I wrote in reason number eight in my list from last year.

So far so good. But that only corrects one type of error: a bit flip. But note that it also protects superpositions from bit flips. So let's take a quantum state built from repetition codewords placed in an empty environment:


> state1 = return [] <*> (return [0,0,0] + return [1,1,1])


Now corrupt the last bit:


> state1' = state1 =>= mapSnd (mapAt 2 flip)


And repair it:


> state1'' = state1' =>= repairRepetition


Although we have repaired the state of our system, the state of the environment has changed. But that's fine because this state is still an unentangles tensor product of the environment and our system. You can see this just by looking at the state, but for peace of mind use independence state1'' to see that the deviation from being a tensor product is the zero vector.

But that's still only a narrow class of error that we can fix. Another type of error maps return 0 to return 0 but maps return 1 to -return 1. This is known as a phase flip. One way to view this as as a flip in a rotated basis:

> phaseFlip x = x ->- hnot ->- fmap flip ->< unhnot
> pFlipAtM i = mapAtM i phaseFlip

There's a nice strategy to protect against phase flips: rotate our data into a basis where phase flips become ordinary flips, and use the repetition code. So we'll use these two states to represent 0 and 1:


> plus,minus :: Q [Integer]
> plus = return [1,1,1] ->< mapM unhnot
> minus = return [0,0,0] ->< mapM unhnot


So let's make a state that is a superposition of these states in some environment:


> state2 = return [] <*> (plus + minus) ->- collect


Corrupt the middle bit with a phase flip:


> state2' = state2 ->< mapSndM (mapAtM 1 phaseFlip)


And repair it:


> state2'' = state2 ->< mapSndM (mapM hnot) =>= repairRepetition ->< mapSndM (mapM unhnot) ->- collect


Comparing state2 and untensor state2'' we see that the repair is successful. Unfortunately, we can use this scheme to repair phase flips, but not plain flips. We need a scheme that can repair both of these. So now we turn to Shor's ingenious solution: use both of these schemes as the same time. So we'll use the scheme that protects against phase flips with the twist that instead of using three qubits we use three triplets of qubits. We can use the three triplets to fix phase flips and we can use the three bits within a triplet to fix flips. We'll represent 0 and 1 using shor 0 and shor 1 as defined here:


> tripletPlus = (1/sqrt 2) .* return [0,0,0] + (1/sqrt 2) .* return [1,1,1]
> tripletMinus = (1/sqrt 2) .* return [0,0,0] - (1/sqrt 2) .* return [1,1,1]

> shor 1 = do
> a <- tripletPlus
> b <- tripletPlus
> c <- tripletPlus
> return $ a ++ b ++ c

> shor 0 = do
> a <- tripletMinus
> b <- tripletMinus
> c <- tripletMinus
> return $ a ++ b ++ c


Repair is a little more complex before. First we use 6 ancilla bits to fix qubits within each triplet using the repetition code three times:


> ancilla (i,j,k) x = ((x!!i) `xor` (x!!j),(x!!j) `xor` (x!!k))

> fix (i,j,k) (e,x) = let (u,v) = ancilla (i,j,k) x
> b = case (u,v) of
> (0,0) -> x
> (0,1) -> flipAt k x
> (1,0) -> flipAt i x
> (1,1) -> flipAt j x
> in (u:v:e,b)

> flipRepair start = start
> =>= fix (0,1,2)
> =>= fix (3,4,5)
> =>= fix (6,7,8)


We also use two ancilla bits to locate phase flips between the triplets. phaseFlipFix will be applied in a rotated basis inside phaseFlipRepair:


> phaseFlipFix :: ([Integer],[Integer]) -> Q ([Integer],[Integer])
> phaseFlipFix (e,y) = do
> let u = (y!!0) `xor` (y!!1) `xor` (y!!2) `xor` (y!!3) `xor` (y!!4) `xor` (y!!5)
> let v = (y!!3) `xor` (y!!4) `xor` (y!!5) `xor` (y!!6) `xor` (y!!7) `xor` (y!!8)
> let c = case (u,v) of
> (0,0) -> y
> (0,1) -> y ->- flipAt 6
> (1,0) -> y ->- flipAt 0
> (1,1) -> y ->- flipAt 3
> return (u:v:e,c)

> phaseFlipRepair x = x
> ->< mapSndM (mapM hnot)
> ->- collect
> ->< phaseFlipFix
> ->- collect
> ->< mapSndM (mapM unhnot)
> ->- collect


Now we can pull all of this together into a single process to repair both kinds of error:


> shorRepair x = x ->- flipRepair ->- collect ->- phaseFlipRepair ->- collect


So now all that's needed is to try corrupting some qubits to see if we can repair them.


> state3 = return [] <*> shor 1
> state3' = state3 ->< mapSndM (pFlipAtM 0)
> state3'' = state3' ->- shorRepair


Let's try a more complex superposition with a more complex corruption:


> state4 = return [] <*> (0.6.*shor 0 + 0.8.*shor 1) ->- collect
> state4' = state4 ->< mapSndM (pFlipAtM 5) =>= mapSnd (flipAt 5) ->- collect
> state4'' = state4' ->- shorRepair ->- collect


This is all well and good. But there are more ways to corrupt a qubit beyond simple flips. Let's mess up our state using two additional kinds of 'noise' that we haven't discussed. Firstly, instead of just flipping, we'll rotate one of the bits. Secondly, we'll additionally 'leak' that bit to the environment so as to cause decoherence. We've discussed solutions to neither of these problems, so let's see what happens.


> leak i x = do
> (e,v) <- x
> return ((v!!i):e,v)

> state5 = state4
> state5' = state5 ->- leak 4 ->< mapSndM (mapAtM 4 (rotate (pi/10))) ->- collect
> state5'' = state5' ->- shorRepair ->- collect


A minor miracle has taken place. The Shor code is robust against just about anything you can throw at a single qubit. In fact, any code that can survive flips and phase flips of a single qubit can survive any kind of corruption, including entanglement with the environment. (Despite having read the proof, I was still pretty surprised when the code actually worked!) And that gives researchers one hope that these methods might serve as ways to make quantum computing reliable.

I'll mention a couple more things in closing. Firstly, nowhere above have I mentioned wavefunction collapse, observations or superoperators. You don't need any of these things to define the Shor code and show that it can correct errors. (You probably do need some extra assumptions about quantum mechanics if you want to calculate probabilities of various outcomes in the event that the code fails - eg. due to multiple qubits being corrupted.) And secondly, despite it being my April 1st post last year, I still have the doubts I expressed here about quantum computing.

Oh...and expect those lines to take a few seconds to evaluate. When you simulate the Shor code you're performing computations in a 512-dimensional vector space!

Bibliography

  1. Peter Shor's original paper has good discussion of the issues but I found the description of the correction algorithm a little laconic.
  2. John Preskill's Lecture Notes were probably the most useful document.
  3. A student term paper by Dheera Venkatraman was also pretty useful though I was confused for a while by what might be considered errors in it.


Update
Fixed a couple of minor typos. And remember this code should just run, without modifications, in GHCi, so you can play with it yourself.

Labels: , , ,

Sunday, March 11, 2007

Independence, entanglement and decoherence with the quantum monad

Before I can do anything more sophisticated with my code for probability theory and quantum mechanics I need to talk about a couple of related subjects - independence and entanglement. Again, I'm using the fact that formally, quantum mechanics is almost identical to probability theory, to kill two birds with one stone. So firstly, here's the probability/quantum code that I need from before:

> import Data.Map (toList,fromListWith)
> import Complex
> infixl 7 .*

> data W b a = W { runW :: [(a,b)] } deriving (Eq,Show,Ord)

> mapW f (W l) = W $ map (\(a,b) -> (a,f b)) l

> instance Functor (W b) where
>   fmap f (W a) = W $ map (\(a,p) -> (f a,p)) a

> instance Num b => Monad (W b) where
>   return x = W [(x,1)]
>   l >>= f = W $ concatMap (\(W d,p) -> map (\(x,q)->(x,p*q)) d) (runW $ fmap f l)

> a .* b = mapW (a*) b

> instance (Eq a,Show a,Num b) => Num (W b a) where
>    W a + W b = W $ (a ++ b)
>    a - b = a + (-1) .* b
>    _ * _ = error "Num is annoying"
>    abs _ = error "Num is annoying"
>    signum _ = error "Num is annoying"
>    fromInteger a = if a==0 then W [] else error "fromInteger can only take zero argument"

> collect :: (Ord a,Num b) => W b a -> W b a
> collect = W . filter ((/= 0) . snd) . toList . fromListWith (+) . runW

> type P a = W Float a
> type Q a = W (Complex Float) a


Now consider this probability distribution:

> dice = sum [1/36 .* return (i,j) | i <- [1..6], j <- [1..6]]
This corresponds to the usual distribution you expect from rolling two dice. As we know, under normal circumstances, the outcome of one die is doesn't depend on the other - in other words they are independent. We can express this independence mathematically as follows. First make the distribution for one die:
> die = sum [1/6 .* return i | i <- [1..6]]
Now we can write code to make two independent die rolls:
> two_rolls = do
>   a <- die
>   b <- die
>   return (a,b)
We find that dice=two_rolls. We can make this more generic by defining a joint distribution as follows:
> tensor dist1 dist2 = do
>   a <- dist1
>   b <- dist2
>   return (a,b)
So tensor die die=dice. (This operation is actually a tensor product.) So now we can give a definition: if we have a distribution on (a,b), then the first and second components are said to be independent if the distribution can be written as tensor dist1 dist2 for some choice of dist1 and dist2. But so far I've only been talking about the probability monad. What about the quantum monad Q? In that case the notion is almost identical, except the language is inverted. In that case, a distribution on (a,b) is said to be entangled if it can't be written as tensor state1 state2. Despite all of the mystification in the popular literature, that's all it means. So why is entanglement important? One of the key ideas is this. Suppose we have a process that acts only one one part of a combined system. In other words, consider something like this:
> apply_part f x = do
>   (a,b) <- x
>   b' <- f b
>   return (a,b')
If the two subcomponents are independent (or unentangled) so that x=tensor dist1 dist2 then apply_part f x=tensor dist1 (dist2 >>= f). In other words, if we're dealing with a distribution like tensor dist1 dist2, and we're interested in studying operations on the second component, then we can completely ignore the first component during the process.
> coin = 0.75 .* return True + 0.25 .* return False
> f x = 0.125 .* return x + 0.875 .* return (not x)
> ex1 = collect (tensor coin (coin >>= f))
> ex2 = collect (apply_part f (tensor coin coin))
ex1=ex2. Now suppose we have a combined system consisting of two parts: some subsystem, and its environment, ie. something of type (Environment,Subsystem). Then as long as the environment and the subsystem are independent, we can study the subsystem as if the environment weren't there. But now suppose information can 'leak' out from the subsystem. For example consider a process like:
> a `xor` b = a/=b
> process x = do
>   (e,s) <- x
>   let e' = e `xor` s
>   return (e',s)
Because there is some flow from s to e', there is no guarantee that process x has independent subparts. As a result, we can no longer treat s as an independent subsystem, and theorems that are true about an independent s no longer hold. For example, define:
> rotate :: Float -> Bool -> Q Bool
> rotate theta True = let theta' = theta :+ 0
>   in cos (theta'/2) .* return True - sin (theta'/2) .* return False
> rotate theta False = let theta' = theta :+ 0
>   in cos (theta'/2) .* return False + sin (theta'/2) .* return True
We know that return True >>= rotate (pi/10) >>= rotate (-pi/10) gives us back the state we started with. Similarly, with
> pure x = do
>   (a,b) <- x
>   b' <- rotate (pi/10) b
>   b'' <- rotate (-pi/10) b'
>   return (a,b'')
pure acts as the identity on the second component. In particular, pure (e,True) is the same as tensor e (return True) because the terms involving false cancel out. But if we have a 'leak' as in:
> impure x = do
>   (a,b) <- x
>   b' <- rotate (pi/10) b
>   a' <- return (a `xor` b') -- information flowing out
>   b'' <- rotate (-pi/10) b'
>   return (a',b'')
and consider impure (e,True) we end up with something where the False terms don't cancel out. Compare collect $ pure $ return (False,False) with collect $ impure $ return (False,False). We're familiar with the idea that something flowing into a system might mess it up. But here we have the situation that information flowing out of a physical system can also mess it up. Any kind of entanglement between a system and its environment is called decoherence. And what I've shown is that interactions between a system and its environment can cause a physical system to 'decohere', even if the direction of interaction appears to be only outwards.

And that's one reason why quantum computing is so hard. Not only must nothing in the environment affect our qubits, neither must any information about our qubits leave the system before we're ready to have them do so.

Anyway, now I'm equipped to start writing code to simulate quantum error correction...

Update: Tried to reuse some of this code and found a bug which I've now fixed. Sorry if this pops it to the front of various feeds. It's an old post.

Labels: , ,

Tuesday, March 06, 2007

Monads, Vector Spaces and Quantum Mechanics pt. II

Back from wordpress.com:

I had originally intended to write some code to simulate quantum computers and implement some quantum algorithms. I'll probably eventually do that but today I just want to look at quantum mechanics in its own right as a kind of generalisation of probability. This is probably going to be the most incomprehensible post I've written in this blog. On the other hand, even though I eventually talk about the philosophy of quantum mechanics, there's some Haskell code to play with at every stage, and the codeives the same results as appear in physics papers, so maybe that will help give a handle on what I'm saying.

First get some Haskell fluff out of the way:

> import Prelude hiding (repeat)
> import Data.Map (toList,fromListWith)
> import Complex
> infixl 7 .*

Now define certain types of vector spaces. The idea is the a W b a is a vector in a space whose basis elements are labelled by objects of type a and where the coefficients are of type p.

> data W b a = W { runW :: [(a,b)] } deriving (Eq,Show,Ord)

This is very similar to standard probability monads except that I've allowed the probabilities to be types other than Float. Now we need a couple of ways to operate on these vectors.

mapW allows the application of a function transforming the probabilities...

> mapW f (W l) = W $ map (\(a,b) -> (a,f b)) l

and fmap applies a function to the basis element labels.

> instance Functor (W b) where

> fmap f (W a) = W $ map (\(a,p) -> (f a,p)) a

We want our vectors to support addition, multiplication, and actually form a monad. The definition of >>= is similar to that for other probability monads. Note how vector addition just concatenates our lists of probabilities. The problem with this is that if we have a vector like a+2a we'd like it to be reduced to 3a but in order to do that we need to be able to spot that the two terms a and 2a both contain multiples of the same vector, and to do that we need the fact that the labels are instances of Eq. Unfortunately we can't do this conveniently in Haskell because of the lack of restricted datatypes and so to collect similar terms we need to use a separate collect function:

> instance Num b => Monad (W b) where
> return x = W [(x,1)]
> l >>= f = W $ concatMap (\(W d,p) -> map (\(x,q)->(x,p*q)) d) (runW $ fmap f l)

> a .* b = mapW (a*) b

> instance (Eq a,Show a,Num b) => Num (W b a) where
> W a + W b = W $ (a ++ b)
> a - b = a + (-1) .* b
> _ * _ = error "Num is annoying"
> abs _ = error "Num is annoying"
> signum _ = error "Num is annoying"
> fromInteger a = if a==0 then W [] else error "fromInteger can only take zero argument"

> collect :: (Ord a,Num b) => W b a -> W b a

> collect = W . toList . fromListWith (+) . runW

Now we can specialise to the two monads that interest us:

> type P a = W Float a

> type Q a = W (Complex Float) a

P is the (hopefully familiar if you've read Eric's recent posts) probability monad. But Q allows complex probabilities. This is because quantum mechanics is a lot like probability theory with complex numbers and many of the rules of probability theory carry over.

Suppose we have a (non-quantum macroscopic) coin that we toss. It's state might be described by:

> data Coin = Heads | Tails deriving (Eq,Show,Ord)

> coin1 = 0.5 .* return Heads + 0.5 .* return Tails :: P Coin

Suppose that if Albert sees a coin that is heads up he has a 50% chance of turning it over and if he sees a coin that is tails up he has a 25% chance of turning it over. We can describe Albert like this:

> albert Heads = 0.5 .* return Heads + 0.5 .* return Tails
> albert Tails = 0.25 .* return Heads + 0.75 .* return Tails

We can now ask what happens if Albert sees a coin originally turned up heads n times in a row:

> repeat 0 f = id
> repeat n f = repeat (n-1) f . f
> (->-) :: a -> (a -> b) -> b

> g ->- f = f g

> (-><) :: Q a -> (a -> Q b) -> Q b
> g ->< f = g >>= f

> albert1 n = return Heads ->- repeat n (->< albert) ->- collect


Let me explain those new operators. ->- is just function application written from left to right. The > in the middle is intended to suggest the direction of data flow. ->< is just >>= but I've written it this way with the final < intended to suggest the way a function a -> M b 'fans out'. Anyway, apropos of nothing else, notice how Albert approaches a steady state as n gets larger.

Quantum mechanics works similarly but with the following twist. When we come to observe the state of a quantum system it undergoes the following radical change:

> observe :: Ord a => Q a -> P a

> observe = W . map (\(a,w) -> (a,magnitude (w*w))) . runW . collect

Ie. the quantum state becomes an ordinary probabilistic one. This is called wavefunction collapse. Before collapse, the complex weights are called 'amplitudes' rather than probabilities. The business of physicists is largely about determining what these amplitudes are. For example, the well known Schrödinger equations is a lot like a kind of probabilistic diffusion, like a random walk, except with complex probabilities instead of amplitudes. (That's why so many physicists have been hired into finance firms in recent years - stocks follow a random walk which has formal similarities to quantum physics.)

The rules of quantum mechanics are a bit like those of probability theory. In probability theory the sum of the probabilites must add to one. In addition, any process (like albert) must act in such a way that if the input sum of probabilities is one, then so is the output. This means that probabilistic process are stochastic. In quantum mechanics the sum of the squares of the magnitudes of the amplitudes must be one. Such a state is called 'normalised'. All processes must be such that normalised inputs go to normalised outputs. Such processes are called unitary ones.

There's a curious subtlety present in quantum mechanics. In classical probability theory you need to have the sum of the probabilities of your different events to sum to one. But it's no good having events like "die turns up 1", "die turns up 2", "die turns up even" at the same time. "die turns up even" includes "die turns up 2". So you always need to work with a mutually exclusive set of events. In quantum mechanics it can be pretty tricky to figure out what the mutually exclusive events are. For example, when considering the spin of an electron, there are no more mutually exclusive events beyond "spin up" and "spin down". You might think "what about spin left?". That's just a mixture of spin up and spin down - and that fact is highly non-trivial and non-obvious. But I don't want to discuss that now and it won't affect the kinds of things I'm considering below.

So here's an example of a quantum process a bit like albert above. For any angle $latex \theta$, rotate turns a boolean state into a mixture of boolean states. For $latex \theta=0$ it just leaves the state unchanged and for $latex \theta=\pi$ it inverts the state so it corresponds to the function Not. But for $latex \theta=\pi/2$ it does something really neat: it is a kind of square root of Not. Let's see it in action:

> rotate :: Float -> Bool -> Q Bool
> rotate theta True = let theta' = theta :+ 0
> in cos (theta'/2) .* return True - sin (theta'/2) .* return False

> rotate theta False = let theta' = theta :+ 0
> in cos (theta'/2) .* return False + sin (theta'/2) .* return True

> snot = rotate (pi/2)

> repeatM n f = repeat n (>>= f)

> snot1 n = return True ->- repeatM n snot ->- observe

We can test it by running snot1 2 to see that two applications take you to where you started but that snot1 1 gives you a 50/50 chance of finding True or False. Nothing like this is possible with classical probability theory and it can only happen because complex numbers can 'cancel each other out'. This is what is known as 'destructive interference'. In classical probability theory you only get constructive interference because probabilities are always positive real numbers. (Note that repeatM is just a monadic version of repeat - we could have used it to simplify albert1 above so there's nothing specifically quantum about it.)

Now for two more combinators:

> (=>=) :: P a -> (a -> b) -> P b
> g =>= f = fmap f g
> (=><) :: P (Q a) -> (a -> Q b) -> P (Q b)
> g =>< f = fmap (>>= f) g

The first just uses fmap to apply the function. I'm using the = sign as a convention that the function is to be applied not at the top level but one level down within the datastructure. The second is simply a monadic version of the first. The reason we need the latter is that we're going to have systems that have both kinds of uncertainty - classical probabilistic uncertainty as well as quantum uncertainty. We'll also want to use the fact that P is a monad to convert doubly uncertain events to singly uncertain ones. That's what join does:

> join :: P (P a) -> P a
> join = (>>= id)

OK, that's enough ground work. Let's investigate a physical process that can be studied in the lab: the Quantum Zeno effect, otherwise known as the fact that a watched pot never boils. First an example related to snot1:

> zeno1 n = return True ->- repeatM n (rotate (pi/fromInteger n)) ->- collect ->- observe

The idea is that we 'rotate' our system through an angle $latex \pi/n$ but we do so in n stages. The fact that we do it in n stages makes no difference, we get the same result as doing it in one go. The slight complication is this: suppose we start with a probabilistic state of type P a. If we let it evolve quantum mechanically it'll turn into something of type P (Q a). On observation we get something of type P (P a). We need join to get a single probability distribution of type P a. The join is nothing mysterious, it just combines the outcome of two successive probabilistic processes into one using the usual laws of probability.

But here's a variation on that theme. Now we carry out n stages, but after each one we observe the system causing wavefunction collapse:

> zeno2 n = return True ->- repeat n (
> \x -> x =>= return =>< rotate (pi/fromInteger n) =>= observe ->- join
> ) ->- collect

Notice what happens. In the former case we flipped the polarity of the input. In this case it remains closer to the original state. The higher we make n the closer it stays to its original state. (Not too high, start with small n. The code suffers from combinatorial explosion.) Here's a paper describing the actual experiment. Who needs all that messing about with sensitive equipment when you have a computer? :-)

A state of the form P (Q a) is called a mixed state. Mixed states can get a bit hairy to deal with as you have this double level of uncertainty. It can get even trickier because you can sometimes observe just part of a quantum system rather than the whole system like oberve does. This inevitably leads mixed states. von Neumann came up with the notion of a density matrix to deal with this, although a P (Q a) works fine too. I also have a hunch there is an elegant way to handle them through an object of type P (Q (Q a)) that will eliminate the whole magnitude squared thing. However, I want to look at the quantum Zeno effect in another way that ultimately allows you deal with mixed states in another way. Unfortunately I don't have time to explain this elimination today, but we can look at the general approach.

In this version I'm going to consider a quantum system that consists of the logical state in the Zeno examples, but also include the state of the observer. Now standard dogma says you can't can't form quantum states out of observers. In other words, you can't form Q Observer where Observer is the state of the observer. It says you can only form P Observer. Whatever. I'm going to represent an experimenter using a list that representing the sequence of measurements they have made. Represent the complete system by a pair of type ([Bool],Bool). The first element of the pair is the experimenter's memory and the second element is the state of the boolean variable being studied. When our experimenter makes a measurement of the boolean variable, its value is simply prepended to his or her memory:

> zeno3 n = return ([],True) ->- repeatM n (
> \(m,s) -> do
> s' <- rotate (pi/fromInteger n) s
> return (s:m,s')
> ) ->- observe =>= snd ->- collect

Note how we now delay the final observation until the end when we observe both the experimenter and the poor boolean being experimented on. We want to know the probabilities for the final boolean state so we apply snd so as to discard the state of the observer's memory. Note how we get the same result as zeno2. (Note no mixed state, just an expanded quantum state that collapses to a classical probabilistic state.)

There's an interesting philosophical implication in this. If we model the environment (in this case the experimenter is part of that environment) as part of a quantum system, we don't need all the intermediate wavefunction collapses, just the final one at the end. So are the intermediate collapses real or not? The interaction with the environment is known as decoherence and some hope that wavefunction collapse can be explained away in terms of it.

Anyway, time for you to go and do something down-to-earth like gardening. Me, I'm washing the kitchen floor...


I must mention an important cheat I made above. When I model the experimenter's memory as a list I'm copying the state of the measured experiment into a list. But you can't simply copy data into a quantum register. One way to see this is that unitary processes are always invertible. Copy data into a register destroys the value that was there before and hence is not invertible. So instead, imagine that we really have an array that starts out zeroed out and that each time something is added to the list, the new result is xored into the next slot in the array. The list is just non-unitary looking, but convenient, shorthand for this unitary process.

Labels: ,

Sunday, February 25, 2007

Monads for vector spaces, probability and quantum mechanics pt. I

I've been enjoying Eric Kidd's articles on probability theory with Haskell. So I thought I'd follow them up with two things: (1) finding the underlying algebraic structure and (2) showing how it's general enough to do more than just probability.

Firstly, Eric found a really neat factoring of the probability monad as a 'product' of two monads: the Perhaps monad and the List monad. This factoring can be interpreted algebraically.

A monoid is defined to be a set m with a binary operator (typically written as abuttal, ie. the product of a and b is ab) that is associative, and with an element, 1, that is an identity for the binary operator. The List monad gives rise to the monoid freely generated by a set, in other words it defines the smallest monoid containing the set and with no extra relationships between the elements that don't come from the definition of a monoid. The binary operator for List is called ++ and we embed the original set in the monoid using the function \x -> [x], otherwise known as return. [] is the identity. It's not hard to see that we have asociativity as (a ++ b) ++ c == a ++ (b ++ c). If we use Set instead of List then we get the free commutative monoid instead, ie. one where a+b=b+a.

(Eh? Where's my paragraph on m-sets. Don't tell me I have to format all those HTML subscripts again! I hate it when computers do that. I wrote a paragraph and it just vanished. Oh well, maybe version 2.0 will be better. But without subscripts this time...)

If m is a monoid then an m-set is a set s with an action of m on it. An action of a monoid on a set is a scheme that converts each element of the monoid, say g, into a map g*:S→S so that 1* is the identity map and g*h*=(gh)*. The type Writer m s corresponds to pairs in s×m but this also doubles as the free m-set. We simply define f*(x,g)=(x,fg). It's free because we never get f*(x,f')=g*(x',g') unless x=x', so we never get any surprising equalities that aren't inherent in the definition of an m-set. Perhaps, which is actually a pseudonym for Writer Float, defines an R-set, where R is the monoid of reals under multiplication.

In an earlier post I showed how you could layer up algebraic structures. There I showed that you could combine a commutative monoid structure with a monoid structure to get a semiring. So what happens if we layer up a commutative monoid and an m-set? We'll get something that has a commutative binary operator but that can also be acted on my elements of m. Specialise to the case when m is a field like the real or complex numbers. Does this sound like a familiar structure? I hope so, it's a description of a vector space. What Eric Kidd has shown is that we can build a vector space monad by applying an m-set monad transformer (with m a field) to a commutative monoid monad. (This isn't a completely trivial fact and it depends on the fact that the definition of PerhapsT handles distributivity correctly.) I think that's pretty neat. But I'm getting slightly ahead of myself here as I haven't shown what Eric's stuff has to do with vector spaces.

A probability distribution can be thought of as a vector with outcomes forming a basis. Any distribution attaches a probability 'weight' to each possible outcome in the same way that a vector can be written as a weighted sum of basis elements. Each time we have a probabilistic transition, we're effectively multiplying our distribution vector by a stochastic matrix. Eric (and others)'s monad allows you to write these matrix multiplications in a very natural way.

A vector space forms a monad in a straightforward way. If V(B) is the vector space generated by basis B, and V(C) is another vector space, then any function B→V(C) can be lifted to a linear map V(B)→V(C). This lift is clearly of type (B→V(C))→(V(B)→V(C)). With a little flip we get V(B)→(B→V(C))→V(C). This is the main step in showing V to be a monad. And what the probability monad allows us to do is write our stochastic matrices in terms of what happens to the individual basis elements (ie. outcomes) instead of having to write out the entire matrix.

Anyway, this is all just waffle and it really needs some code to make it more concrete. I have an ulterior motive here. It's not just probability theory that is described by vector spaces. So is quantum mechanics. And I plan to work my way up to defining quantum computers and implementing a bunch of well known quantum algorithms in Haskell. In the next installment I expect to get at least as far as writing the code to play with the square root of NOT operator.

PS This idea of layering up algebraic structures is one I haven't found in the textbooks yet. I'm guessing it's in Mac Lane, but I haven't yet justified the cost of that book to myself. But I don't actually have a clue what a category theorist would call a monad transformer and don't recall reading about them in any texts that weren't specifically about computer science. Maybe someone else can fill me in. I do know that it's absolutely standard to get free algebraic structures from monads.

Errata: As notfancy points out, I should really be talking about multisets rather than sets, otherwise we have a+a=a. As programmers often use lists to represent both sets and multisets it can sometimes get confusing.

Labels: ,

Sunday, February 18, 2007

The Essence of Quantum Computing

A popular view of quantum computation is that its power (at least the power of certain well known quantum algorithms) derives from a form of parallel computation coming from the fact that the state of a quantum computer can be a superposition of many classical states. I want to argue that this is in fact a bit of a red herring because there are other models of computation that allow formal superpositions and yet clearly don't provide a form of parallel computation. Instead, the essence of quantum computing is derived from a kind of non-convexity property and this is the common feature of all of the well known quantum algorithms that isn't shared by classical algorithms.

Before continuing there's something important I should say. Let me quote Bennett et al. in Strengths and Weaknesses of Quantum Computing

"The power of quantum computers comes from their ability to follow a coherent superposition of computation paths."

In other words, what I am saying is in direct contradiction with what is said by some of the founding fathers of quantum computing. Actually, I think that's a good thing, it means that whether I'm right or wrong, I must be saying something non-trivial.

Consider a probabilistic computer. We can represent the probability of its being in any state by an element of a vector space whose basis is the set of possible states and where the coefficient of each basis element is the probability of the computer being in that state. Given any pair of vectors that represent possible ditributions of states. v1 and v2, then av1+bv2, where a,b>=0 and a+b = 1 is also a possible distribution.In other words we can form superpositions exactly analogously to the way we can form superpositions of quantum computer states. The mathematics that describes the notion of a quantum computer performing computations in parallel by superposing N states is formally almost identical to the mathematics describing a probabilistic computer as a superposition of states.

So if we can superpose N states in both classical probabilistic computers and quantum computers, why do neither seem to offer an N times speedup? In the case of quantum computers one popular explanation is that then though we run N states in parallel, we can't perform I/O with those N states in parallel. When we observe a quantum computer we cause a wavefunction collapse (it doesn't matter whether you are a believer in such collapse, that is how we model quantum computers) and instead we make a single observation from a probability distribution. Mathematically this is identical to what happens in a probabilistic machine - a superposition is nondeterministically mapped into one of many definite states.

So whatever the difference is between probabilistic computers and quantum computers, it doesn't lie in the fact that you can form superpositions. Mathematically speaking, probabilistic computers and quantum computers are almost identical in this respect.

But we know that quantum algorithms can do things that no probabilistic algorithm can, so there must be some difference. There is of course. Here's an approach to seeing what it is.

Consider a probabilistic branching process:


__ A
/
2
/ \__ B
1 __ A
\ /
3
\__ C


Starting at state 1 we can either go to state 2 or state 3, and from there we end up at either A, B or C. Assume that the choice of branch is nondeterministic and that we can assign probabilities to each branch. Suppose also that the branch probabilities are all non-zero. The important thing to note in that diagram is that there are two ways to reach it. The probability of reaching A is the sum of the probabilities for each of the different ways to reach A. The key point I want to make here is that A is more likely because there are two ways to reach it. If I were to replace one or other of the A's with a D then the likelihood of finding our process in state A at the end is reduced.

But in quantum mechanics we replace probabilities with complex 'amplitudes'. When we make our observation at the end, the probability of any outcome is given by the modulus of the amplitude squared. The amplitudes still add, just as in probability theory. But this has a very important consequence. If the amplitudes for the two different ways to reach A lie in opposite directions in the complex plane, then the probability of ending up in state A after observation is reduced by having two ways of reaching A. This is completely different to the probabilistic case. We can paradoxically make something less likely by adding a new way to achieve it. This is where the essence of quantum computing lies.

Pick up just about any paper describing a quantum algorithm and you'll see that the important step is a 'conspiracy' to make the amplitudes of 'undesirable' states sum to zero (or at least something small). Probably the simplest example is counterfactual computation where it is arranged that the computation is executed along two different branches with equal and opposite amplitudes. The neatest example is probably Shor's cycle discovery algorithm that forms the basis of his factoring algorithm. By clever use of the Fourier transform he arranges that branches leading to non-cycles cancel each other out leaving only the periods of cycles with high amplitudes.

However, I think it's important to stress that given any individual quantum computer Bennett et al. and I probably don't disagree on what it actually does. This is an issue of semantics. But semantics does make a difference. The notion of a quantum computer as a super-parallel machine is one being promulgated in the popular science press and I think this causes much confusion with people expecting all kinds of algorithms to run in O(1) time.

So why do people talk about parallel computation if that's not what quantum computer do? I think it's because people aren't comfortable with quantum computers yet. This means they don't want to think of quantum computers as something fundamental and irreducible and instead want to reduce them to something classical. The easiest classical model is the parallel one and so they 'reduce' it to that.

PS This is essentially a specialisation of an old post to the context of quantum computing.

Labels: