A Neighborhood of Infinity

Saturday, June 14, 2008

Categories of polynomials and comonadic plumbing

Suppose you have a Haskell program and you want to introduce a new global constant into your program. There are at least two approaches you could take:

Simply introduce a new global constant. You could name it x and write something like x=1.23456 and refer to x throughout your code. This has the advantage of being easy to implement.
Write all of your code in monadic style and make use of the reader monad. This is intrusive in the sense that you may have to make many changes to your code to support it. But it has the advantage that all of your functions now explicitly become functions of your global constant.

Now I’m going to roughly sketch a more categorical view of both of these approaches. So let’s restrict ourselves to the subset of Haskell that corresponds to typed lambda calculus without general recursion so that we know all of our functions will be total and correspond to the mathematical notion of a function. Then all of our functions become arrows in the category that we’ll call Hask.

Firstly consider approach (1). Suppose we want to introduce a new constant, x, of type A. Category theory talks about arrows rather than elements of objects, so instead of introducing x of type A, introduce the function x:1->A where 1 is the terminal object in Hask, normally called (). An element of A is the same thing as an element of 1->A, but in the latter case we have an arrow in the category Hask.

Before continuing, let me digress to talk about polynomials. Suppose we have a ring (with an identity) R. We define R[x], where x is an indeterminate, to be the ring of polynomials in x. Another way to describe that is to say that R[x] is the smallest ring containing R and an indeterminate x, that makes no assumptions about x other than those required to make R[x] a ring. For example we know that (1+x)(1-x)=1-x², because that must hold in any ring. Given a polynomial p in R[x] we can think of it as a function f_p from R to R. f_p(a) is the value we get when substituting the value of a for x in p. So a polynomial in R[x] is the same as a function from R to R that can be written in terms of elements of R, multiplication and addition.

We can do the same with category theory. Given a category A we can ask for the smallest category extending A and containing an indeterminate arrow x:1 -> A. Just as with polynomials we have to allow all possible arrows that can be made by composing arrows of A with x. The resulting expressions for arrows will contain x as a free variable, just like the way x appears in polynomials. In fact, by analogy we can call the resulting category, A[x], the category of polynomials in x:1->A. In the special case A=Hask, you can see that Hask[x] is the category of Haskell functions extended by a new constant of type x:1->A but assuming no equations other than those necessary to make Hask[x] a category. Just as an arrow in Hask is a Haskell function, an arrow in Hask[x] is a Haskell function making use of an as yet undefined constant x.

(I've glossed over some subtleties. Just as we need a suitable equivalence relation to ensure that (1+x)(1-x)=1-x² in R[x], we need suitable equivalence relations in our category. I'll be showing you where to find the missing details later.)

Here's the implementation of a function, h, making use of a constant x:

(Note that I'll be using Edward Kmett's category-extras shortly so I need some imports)



> import Control.Monad.Reader
> import Control.Comonad
> import Control.Comonad.Reader

> x = 1.23456

> f a = 2*a+x
> g a = x*a
> h a = f (g a)

> test1 = h 2

Now consider the second approach. The easiest thing is to just give an implementation of the above using the reader monad:



> f' a = do
>    x <- ask
>    return $ 2*a+x

> g' a = do
>    x <- ask
>    return $ x*a

> h' a = return a >>= g' >>= f'

> test2 = runReader (h' 2) 1.23456

Note how, as is typical in monadic code, I have to plumb f' and g' together using >>= so that 1.23456 is passed through f' and g'. Previously I've described another way to think about the composition of monadic functions. Using >>= we can compose functions of type a->m b and b->m c to make a function of type a->m c. The result is that given a monad we can form the Kleisli category of the monad. The objects are the same as in Hask, but an arrow from a->b in the Kleisli category is an arrow of type a->m b in Hask. It's not hard to show this satisfies all of the axioms of a category. When we program in the reader monad it's a bit like we've stopped using Hask and switched to the Kleisli category of the reader monad. It's not quite like that because we used functions like +. But in theory we could use lifted versions of those functions too, and then we'd be programming by composing things in the Kleisli category. If we call the reader monad R then we can call the corresponding Kleisli category Hask^R. (Strictly speaking that R needs a subscript telling is the type of the value we intend to ask for.)

So here's the important point: Hask[x] is the same category as Hask^R. In both cases the arrows are things, which when supplied a value of the right type (like 1.23456), give arrows in Hask from their head object to their tail object.

But there's another way to do this. We can use the reader comonad:



> f'' a = 2*extract a+askC a
> g'' a = extract a*askC a
> h'' a = a =>> g'' =>> f''

> test3 = runCoreader (h'' (Coreader 1.23456 2))

In a similar way, we're dealing with arrows of the form wa -> b and we can compose them using =>>. These arrows form the coKleisli category of the reader comonad, S, which we can write Hask_S. So we must have

Hask[x]≅Hask^R≅Hask_S.

Now some back story. Over 20 years ago I was intrigued by the idea that logic might form a category with logical ‘and’ and ‘or’ forming a product and coproduct. I came across the book Introduction to Higher Order Categorical Logic by Lambek and Scott for ₤30.00. That’s ₤60.00 at today's prices, or about $120.00. On a student grant? What was I thinking? And as it bore no relation to anything I was studying at the time, I barely understood a word of it. I was probably fairly applied at that point doing courses in stuff like solid state physics and electromagnetism as well as a bit of topology and algebra. I doubt I'd heard of lambda calculus though I could program in BASIC and APL. So there it sat on my bookshelf for 22 years. Periodically I’d look at it, realise that I still didn’t understand enough of the prerequisites, and put it back on the shelf. And then a month or so ago I picked it up again and realised that the first third or so of it could be interpreted as being about almost trivial Haskell programs. For example, on page 62 was

Proposition 7.1
The category A[x] of all polynomials in the indeterminate x:1->A over the cartesian or cartesian closed category A is isomorphic to the Kleisli category A_A=A_{S_A} of the cotriple (S_A,&epsilon_A,δ_A).

The language is a little different. Lambek and Scott used the term cotriple instead of comonad and Kleisli category where I’d say coKleisli category. δ and ε are cojoin and coreturn. And Lambek and Scott's theorem applies to any cartesian closed category. But after staring at this claim for a while it dawned on me that all it was really saying was this: here are two ways to introduce new constants into a category. But there’s no way I would have seen that without having practical experience of programming with monads. Learning Haskell has finally paid off. It’s given me enough intuition about category theory for me to get some return on my ₤30.00 investment paid to Heffers all those years ago. I expected to take this book to my deathbed, never having read it.

Anyway, for the details I left out above, especially the correct equivalence relation on Hask[x], you'll just have to read the book yourself.

Also, note the similarity to the deduction theorem. This theorem says that if we can prove B, assuming A, then we can deduce A implies B without making any assumptions. It unifies two way to introduce a proposition A, either as a hypothesis, or as an antecedent in an implication. In fact, the above theorem is just a categorical version of the deduction theorem.

Also note the connection with writing pointfree code. In fact, the pointfree lambdabot plugin makes use good use of the reader monad to eliminate named parameters from functions.

I’m amazed by seeing a book from 1986 that describes how to use a comonad to plumb a value through some code. As far as I know, this predates the explicit use of the reader monad in a program, Wadler and Moggi’s papers on monads, and certainly Haskell. Of course monads and comonads existed in category theory well before this date, but not, as far as I know, for plumbing computer programs. I’d love to hear from anyone who knows more about the history these ideas.

Labels: comonads, haskell, mathematics, monad

Saturday, June 30, 2007

Monads from Algebra and the the Gray Code from Groups

There is a close association between algebraic structures and monads. I've mentioned this in passing before but I think it's interesting to work out some details. Among other things, it gives us a way of converting your favourite algebraic structures into domain specific languages embedded in Haskell.

First an aside. Haskell monads are in some sense only an approximation to mathematical monads. In fact, Haskell is only an approximation to mathematics. It's easy to define a Haskell function f, say, such that x == y but f x /= f y. Once you do such a thing you can no longer reason about Haskell functions with the assumption that == represents equality in the mathematical sense. (For an example, see Saizan's comment here.) So in the following I'm going to assume that we've limited ourselves to functions such that x == y implies that f x == f y, x < y and y < z imply x < z and so on. I'd love to see some way, in Haskell, to make explicit 'promises' about the properties of functions. (I guess that's what they call programming by contract.) But for the following we'll just assume it holds.

The first abstract algebraic structure that people study is usually the group, but I'm going to start with something simpler, the monoid. A monoid is a set M (called the underlying set of the monoid) with a binary operator, ·:M×M→M and identity element e such that e·x = x·e = x and x ·(y·z) = (x·y)·z for all x,y,z in M. By abuse of notation we'll often use M for the name of the monoid as well as the underlying set. Monoids are easy to come by and there are lots of examples. Any time you have a sequence of operations that can be applied, or not, in any order, to some system, you have a monoid. Sequences of operations form the underlying set of the monad and the binary operator means "do what's on the right" followed by "do what's on the left". For example the set of uniform scaling operations we can perform on a circle form a monoid. Write a scaling by a factor of x as s(x). If we double the size of a circle, and then triple it we have s(2) followed by s(3), written s(3)·s(2). Note that we have s(m)·s(n) = s(mn) and that s(1)·s(m)=s(m)·s(1)=s(m) so s(1) plays the role of the identity, e. Also note that for all m apart from 0 we have s(m)·s(1/m)=e so every scaling apart from s(0) has in inverse. Note however that this monoid is special because for all x and y in it, x·y=y·x. This doesn't always hold in monoids and when it does hold our monoid is said to be commutative.

If we have a type of algebraic structure then we can often form "free" versions of that structure. For monoids we get free monoids. There are many different ways of looking at "freeness" and I'm going to go through some of them informally:

Given any monoid there are equations satisfied by its elements. From the above example we have that x·y=y·x and also that x·e·x=x·x. But notice how there is a big difference between these equations. The former doesn't hold in all monads, but the latter does. A monoid is said to be free when the only equations that hold follow from equations that hold in every monoid. You can think of a free monoid as being a generic monoid. It has no special properties above and beyond being a monoid. Given a set S, the free monoid generated by S is the smallest free monoid containing the set S and with no equations relating the elements of S. Write this monoid as FS. For example, suppose S = {x,y}. Then we know that e, x and y are all in FS. We also know that x·x, x·y, y·x and y·y are all in S. Importantly we know that all 4 of these elements are distinct because we know there can be no equations connecting them beyond those that define a monoid. In fact, it's not hard to see that the elements of FS are essentially just the (possibly zero length) strings of x's and y's.
Given a set S, consider its elements to be 'labels' for unknown variables. In fact, just use the elements of S as variables. Then we can also consider the free monoid generated by S to be the set of "monoid expressions" in these unknowns. By "monoid expression" we just mean functions of these variables that can be written using the language of monoids, ie. the symbols e and ·. If S = {x,y} then examples of such expressions are e, x, y, x·y and so on. I hope it's not hard to see that this is simply another description of the same thing as in the previous paragraph.
Another way to think of free monoids requires a tiny bit more algebra. Define a monoid homomorphism from one monoid, M, to another, N, to be a function f:M→N on the underlying sets such that f(e)=e and f(x·y)=f(x)·f(y). A bijective monoid homomorphism is called an isomorphism. If there is an isomorphism between two monoids then in some sense they are the same monoid. Note that e and · are being overloaded here - we have two different monoids and these symbols have different meanings in each one. Now we can define the free monoid generated by S to be the monoid, M, such that (1) there is a function i:S→M (2) given any monoid N, and any function f:S→N, then f can be factored as f' o i where f' is a monoid homomorphism.
Very vaguely, the set S lives in the category of sets. But FS lives in the category of monoids. FS is the closest approximation to S in the category of monoids. The previous property gives a sense of what this means: anything that can be said about a function from S to a monoid N can be said in the language of monoids using a homomorphism from FS to N. Don't worry if this is too vague as I won't be using it below - but it may click in some people's minds.

The important thing here is that the operator F, ie. the "free monoid generated by" operator, forms a monad. It's tempting to think that monads got their name from monoids for this reason, but I don't think it is because just about any "free X generated by" operator forms a monad. Haskell monads give rise to DSLs via do-notation. So this means that algebraic structures give rise to monads, which may give rise to Haskell monads, and hence DSLs.

So now to explain why F forms a monad. Note that just about everything I say here works with other kinds of algebraic structures ranging from monoids through groups to vector spaces. Consider our set S={x,y} above. For convenience, let's drop the · and write the binary operator as multiplication in the usual way. Given an element of S we can easily get an element of FS. In fact, we have an embedding i:S→FS with i(x) = x. This is an abuse of notation. The element on the LHS is an element of S and the x on the RHS is a variable labeled with the symbol x, but we also write this as x because v_x, or something like that, would be a pain in the ass to keep writing.

Now think about the elements of F(FS). These are strings of elements of FS, ie. strings of monoid expressions in x and y. We could write a typical member as (xyx)(e)(xxy) where the xyx, e and xxy are each elements of FS and I'm using parentheses to make clear which bits they are. It's tempting to say that this is simply xyxxxy, but that would be wrong. The former is an element of F(FS) and the latter is an element of FS. It would be clearer that these weren't equal if we used the v notation to write v_xyxv_ev_xxy. But in this case the oscurity is to our advantage. Even though (xyx)(e)(xxy) doesn't equal xyxxxy, the notation strongly suggests that we define a function that maps the first to the second. In fact, we can define a function m:F(FS)→FS that essentially just erases parentheses.

So we have functions i:S→FS and m:F(FS)→FS. Looks suspiciously like a monad. In fact, i and m satisfy the monad laws (exercise!) and make F into a monad.

So what monad is this? It's probably not hard to guess: elements of FS are finite strings of elements of S. So it's essentially the List monad. Unfortunately, Haskell allows you to form infinite lists and so the correspondence isn't 100% precise. Nonetheless, it's good enough that if you hadn't already invented the List monad (as in Haskell monad), you probably would if you had considered the free monoid monad (as in mathematical monad). i is just the embedding \x -> [x] and m is concat, which essentially just erases brackets. So you can think of [x,y,z] has the Haskell way to write the monoid element xyz.

In fact, if you repeat the above discussion with other algebraic structures that I and others have discussed you'll get other familiar monads (modulo details like being able to create infinite lists). Here's a table (with apologies for bad formatting caused by blogger.com):

Algebraic Structure	DSL
Monoid	Combinatorial search
M-Set	"Write-only" side effects
Vector Space	Probability theory/Quantum Mechanics
Semirings	Two player game tree search
Modules over tropical semiring^*	Min-sum algorithm
Group	???

(Note that the above table is approximate in the sense that sometimes you may need to restrict to instances of Eq and tweak the code slightly to make the Haskell monad behave like the mathematical one.)

Notice the gap in the "group" row. A group is a monoid in which every element has a left and right inverse. The monad is the "free group generated by" monad and I'll call it G. It doesn't correspond to any of the standard Haskell monads. So firstly - if free monoids give lists, what does the underlying datatype for free groups look like? Go back to our set S = {x,y}. GS contains e, x and y as well as all of the products of x and y that appear in FS. But additionally GS contains the inverses of x and y, x^-1 and y^-1. And of course you need all strings of x, y and their inverses. But do you need more? What about (xy)^-1y? Well we can expand out the inverse using (xy)^-1=y^-1x^-1. The net effect is that the free group contains precisely all (possibly empty) strings of x, y, x^-1 and y^-1, where substrings like xx^-1 and x^-1x are removed. We can model this in Haskell using the following type:



> import Control.Monad
> data Group a = G [Either a a] deriving Show

We're using Left x to represent x and Right x to represent x^-1. We implement >>= so that it uses the inverse of product formula above. It also ought to cancel out terms like xx^-1 but to do that requires that we restrict the monad to instances of Eq and if I do that I risk castigation from Saizan. So I'll leave out the cancellation to get the following Haskell monad:



> instance Monad Group where
>     return x = G [Left x]
>     G x >>= f = G $ concatMap g x where
>         g (Left y) = let G u = f y in u
>         g (Right z) = let G u = f z in reverse (map (either Right Left) u)

But what purpose might this serve? Try these:



> test1 = sequence $ replicate 4 (G [Left 0,Left 1])
> test2 = sequence $ replicate 4 (G [Left 0,Right 1])

The first does something almost identical to what you might expect from the ordinary list monad. But note the ordering in the second. This is the Gray code monad! By using Left and Right we can control the direction in which future combinatorial searches are carried out. Admittedly not the most useful monad, but it's curious that we do get something recognisable from this abstract nonsense. And maybe you spotted the "beautiful" Haskell implementation of the power set function on reddit recently. Here's a cute twist on that:



> powerset1 = filterM (const $ G [Left False,Left True])
> powerset2 = filterM (const $ G [Left False,Right True])

The first gives a result similar to the original, but the second lists the subsets in such a way that each element differs from the previous one by one element.

I wonder what other algebarically motivated monads are waiting to be discovered.

And I hope to write something about deriving comonads from coalgebras, as soon as I've read this. It looks really intriguing. Problem is, I'm having trouble making sense of it.

And sorry, this is a bit incoherent today. I only get a short period of time to write this stuff up and I blew most of mine this week in traffic jams. As usual, if the text makes no sense, you can be sure the code above works as I just tested it. But ask questions and complain about errors...

^* The tropical semiring is the proper name for the (R,min,+) semiring I talked about earlier.

Update: Saizan's doing a great job of keeping me honest. He noticed that I'd omitted a step. I was originally finding the inverse of a free group element by simply reversing the elements in the product. But I was failing to flip the polarities of each of the elements (so to speak) despite having spelled out in the text exactly what I needed to do. The code is now fixed.

Labels: haskell, monad

Saturday, June 16, 2007

How to write tolerably efficient optimzation code without really trying...

...or the (freely generated module over the (min,+)-semiring)-monad. (But if I used that as a title you probably wouldn't be reading this.)

There's a large class of optimisation problems that can be considered to be about graph searching. This can include anything from finding the shortest distance between two cities, through finding the edit distance between two strings to finding the most plausible reconstruction of a signal corrupted by noise. In each case there's an obvious implementation of a solution in Haskell: use the list monad to generate all of the paths in the search space and then find the optimal path. The problem with this is that generating all paths can result in combinatorial explosion with exponential running times. So this post is about what I think is a novel monad to fix that problem. It gives a simple DSL for specifying a wide range of optimisation problems with polynomial instead of exponential complexity. In some cases it results in something close to the best published algorithm with little effort on the part of the user.

This is quite a neat and useful monad. So if the following is hard going, skip to the examples first for some motivation. I'd hate people to miss out on something that turns out to be fairly easy to use once it's set up.

Unfortunately, despite the expressiveness of Haskell making this possible, we have to fight Haskell on two fronts. One is the old annoyance of Haskell not supporting restricted monads. For a solution to that I'll refer you to Eric Kidd who used an Oleg trick. So let's get the first one out of the way before talking about graphs. Here's a preamble that replaces the Prelude's monad class with our own:



> {-# LANGUAGE MultiParamTypeClasses,
>            NoImplicitPrelude,
>            UndecidableInstances #-}

> import Debug.Trace
> import Prelude hiding ((>>=),(>>),return)
> import Data.Monoid
> import Data.Map (toList,fromListWith)

> class Monad1 m a where
>  return :: a -> m a

> class (Monad1 m a, Monad1 m b) => Monad2 m a b where
>   (>>=) :: m a -> (a -> m b) -> m b
>   (>>) :: m a -> m b -> m b
>   l >> f = l >>= \_ -> f

> class MonadPlus m where
>    mzero :: m a
>    mplus :: m a -> m a -> m a

So here's a graph with lengths marked on the edges:

Suppose we want to find the shortest path from A to F in 3 hops. Before solving that, consider a more general problem. In the following graph

suppose we wish to find the shortest n-hop paths from A to the C_i. No matter what complexity is in the cloud, each of the paths must travel through one of the B_i. So we can solve the shortest paths problem by finding the shortest (n-1)-hop paths to the B_i and using:

dist_n(A,C_i) = min_j(dist_n-1(A,B_j)+d_ji)

Here's the crucial point to notice: If we consider dist_n(A,B_i) to be a vector whose indices are labelled by i, and d_ji to be a matrix, then the above formula is an ordinary matrix multiplication except that instead of addition and multiplication we use min and addition. If you've studied graph algorithms you'll recognise this as the same multiplication that appears in the Floyd-Warshall algorithm. If we extend the real numbers with infinity, then (R∪∞,min,+) forms a semiring. (See Cormen, Leiserson & Rivest for some good discussion of semirings in this context.) Just as you can form vector spaces over a field, you can form modules over this semiring. And just as vector spaces form a monad, so do these modules. But you don't need to know much about semirings because the monad itself is a fairly straightforward object.

Here's that infinity I was talking about:



> infinity :: Fractional a => a
> infinity = 1/0

I have to admit, I enjoyed writing that line of code :-) More importantly, Haskell does the right thing with infinities.

The monad is going to represent the length of a path to a graph vertex. Here's the definition:



> data M l p a = M { runM :: [(a,p,l)] } deriving (Eq,Show,Ord)

Ignore the l part for now, that's going to serve as a place to make a log, like the Writer monad. We'll need that when we want the actual shortest path, not just its length. For now we'll just use an empty list. So suppose that at some stage in our computation we've reached the node B with a shortest path of length 3 and node F with a shortest path of length 4. We represent this as

u = M [(B,3,[]),(F,4,[])] :: M Vertex Float [()]

.
Suppose we now consider what happens at the next step. We can express the options from B with

v = M [(A,1,[]),(D,3,[])]

and those from F as

w = M [(D,3,[]),(E,3,[])].

To combine these with the earlier computed paths we could replace B and F in u with v and w respectively to get:

M [(M [(A,1,[]),(D,3,[])],3,[]),(M [(D,3,[]),(E,3,[])],4,[])] :: M (M Vertex Float [()]) Float [()].

We need to reduce this back down to a M Vertex Float [()] using the matrix 'multiplication' formula above to pick the appropriate shortest path each time. But note the pattern: it's the usual M M a -> M a monad pattern. So we can write the substitution+reduction as an implementation of >>=. Here's the implementation, along with some other monadic goodness:



> instance (Ord a,Monoid l,Num p) => Monad1 (M l p) a where
>   return a = M [(a,0,mempty)]

> instance (Ord a,Monoid l,Fractional p,Ord b,Ord p,Ord l) => Monad2 (M l p) a b where
>   M x >>= f = let
>        applyf = [(f a,p,l) | (a,p,l) <- x]
>        rawresult = concat [[(a',(p+p',l `mappend` l')) | (a',p',l') <- a] | (M a,p,l) <- applyf]
>        reduced = map (\(a,(p,l)) -> (a,p,l)) $ toList $ fromListWith min $ rawresult
>        filtered = filter (\(_,p,l) -> p/=infinity) reduced
>        in M filtered

> instance MonadPlus (M l p) where
>    mzero = M mempty
>    mplus (M a) (M b) = M (a ++ b)

> guard True = return ()
> guard False = mzero

> choose x = M $ map (\(a,p) -> (a,p,mempty)) x

I've also had to provide my own implementation of MonadPlus. choose is a convenience function so we don't have to keep entering those empty lists.

So here's that graph represented in terms of a function that describes transitions from vertex to vertex:



> data Vertex = A | B | C | D | E | F deriving (Eq,Show,Ord)

> graph A = choose [(B,1),(C,2)]
> graph B = choose [(A,1),(D,3)]
> graph C = choose [(A,2),(D,4),(E,1)]
> graph D = choose [(B,3),(C,4),(E,2),(F,3)]
> graph E = choose [(C,1),(D,2),(F,3)]
> graph F = choose [(D,3),(E,3)]

And here's our first example. We compute all of the shortest 3-hop paths from A:



> ex0 :: M [()] Float Vertex
> ex0 = do
>    step1 <- return A
>    step2 <- graph step1
>    step3 <- graph step2
>    step4 <- graph step3
>    return step4

Just in case you think that's the wrong result note that these are the shortest paths to each accessible vertex that are exactly 3 hops long. If you want paths up to 3 hops long you have to add some zero length transitions from nodes to themselves. I'll leave that as a (trivial) exercise.

But suppose you just want the shortest path to F. Then you can use:



> ex1 :: M [()] Float Vertex
> ex1 = do
>    step1 <- return A
>    step2 <- graph step1
>    step3 <- graph step2
>    step4 <- graph step3
>    guard $ step4==F
>    return step4

Note how simple this code is. We just write code to walk the graph. All of the optimisation machinery is hidden in the background. But there's a dark secret here because the code isn't yet doing what a claim. But before I admit what the problem is, let's modify this code so we can actually see what the shortest paths are. record is just like tell from the Writer monad. I've implemented >>= so that it does the right thing:



> record l = M [((),0,l)]

> ex2 = do
>   step1 <- choose [(A,0),(B,0)]
>   record [step1]
>   step2 <- graph step1
>   record [step2]
>   step3 <- graph step2
>   record [step3]
>   step4 <- graph step3
>   record [step4]
>   guard (step4==E || step4==F)
>   return (step1,step4)

Just for fun I made this example more complex. It finds all four shortest paths from anywhere in {A,B} to anywhere in {E,F}. As an exercise try modifying the above code so that it finds the one shortest path from {A,B} to {E,F}. It's also simple to modify it to deal with conditions like the traveller being at one of a particular set of nodes at a particular time. It's a DSL, you have considerable freedom to form whatever conditions you want.

And now it's time to reveal the secret. Although I wrote >>= so that it reduces the list of considered paths at each stage the above code still has exponential complexity as you'll rapidly discover as you increase the number of steps.

Consider the third monad law: (m >>= f) >>= g = m >>= (\x -> f x >>= g).

If we evaluate the left hand side for this monad we get a reduction after applying f and then another after applying g. But we if we evaluate the right hand side we find that we get many small (and useless) reductions applied and then at the end a large reduction is applied. We don't get a big reduction at the intermediate step. The result: exponential complexity because the code ends up exploring every possible path performing stupid trivial reductions at each step. The monad law says we get the same result, but it doesn't take the same time.

And here's the really annoying thing: when you use do-notation in the usual way the Haskell compiler builds expressions like the slow right hand side. Bummer! But there's an easy, if tedious, fix. We have to use the third monad law as a rewrite rule to convert our code to the form on the left hand side. Here's ex2 rewritten:



> ex3 = do
>   step1 <- choose [(A,0),(B,0)]
>   step4 <- do
>       step3 <- do
>           step2 <- do
>               record [step1]
>               graph step1
>           record [step2]
>           graph step2
>       record [step3]
>       graph step3
>   do
>       record [step4]
>       guard (step4==E || step4==F)
>   return (step1,step4)

Not as pretty is it? It's still not too bad but it'd be nice to make this conversion automatic. I've a hunch this can be done using Haskell rules. Anyone care to volunteer some help making this work?

Anyway, I mentioned that lots of problems are in this class, some not obviously graph theoretical. Here's one: computing the edit distance between two strings:



> diag (a:as,b:bs) = choose [ ((as,bs),if a==b then 0 else 1) ]
> diag _           = mzero

> right (a:as,bs) = choose [((as,bs),1)]
> right _         = mzero

> down (as,b:bs) = choose [((as,bs),1)]
> down _         = mzero

> nop ([],[]) = return ([],[])
> nop _       = mzero

> munch :: (String,String) -> M [()] Float (String,String)
> munch x = diag x `mplus` right x `mplus` down x `mplus` nop x

> pure (M a) = length a==1
> editDistance a b = until pure (>>= munch) (munch (a,b))

> ex4 = editDistance "hypocrisy" "hippocracy"

I think this is actually the dynamic programming algorithm in disguise (and slightly reordered) with some extra log time penalties because it's using a binary tree (from Data.Set) instead of arrays. But I wouldn't stake my life on it.

Anyway, here's one last example. Rather than explain it I refer you to the nicely written Wikipedia on the Viterbi algorithm. In fact, I decided to lift their example precisely. And note that I've (1) written the code twice, the second one is the one rewritten for speed, (2) used logarithms to convert the multiplications to additions and (3) 'unrolled' the loop to make absolutely explicit what's going on. If I wrote Haskell code to work on general arrays like the Python code in the Wikipedia article, it'd be too short and you might miss it if you blinked.



> data Weather = Rainy | Sunny deriving (Eq,Show,Ord)
> data Activity = Walk | Shop | Clean deriving (Eq,Show,Ord)

> transition Rainy = choose [(Rainy,-log 0.7),(Sunny,-log 0.3)]
> transition Sunny = choose [(Rainy,-log 0.4),(Sunny,-log 0.6)]

> emission Rainy = choose [(Walk,-log 0.1),(Shop,-log 0.4),(Clean,-log 0.5)]
> emission Sunny = choose [(Walk,-log 0.6),(Shop,-log 0.3),(Clean,-log 0.1)]

> ex5 = do
>       day1 <- choose [(Rainy,-log 0.6),(Sunny,-log 0.4)]
>       record [day1]
>       emission1 <- emission day1
>       guard (emission1 == Walk)
>       day2 <- transition day1
>       record [day2]
>       emission2 <- emission day2
>       guard (emission2 == Shop)
>       day3 <- transition day2
>       record [day3]
>       emission3 <- emission day3
>       guard (emission3 == Clean)
>       day4 <- transition day3
>       record [day4]

> ex6 = do
>       day4 <- do
>           day3 <- do
>               day2 <- do
>                   day1 <- choose [(Rainy,-log 0.6),(Sunny,-log 0.4)]
>                   do
>                       record [day1]
>                       do
>                           emission1 <- emission day1
>                           guard (emission1 == Walk)
>                   transition day1
>               do
>                   record [day2]
>                   do
>                       emission2 <- emission day2
>                       guard (emission2 == Shop)
>               transition day2
>           do
>               record [day3]
>               do
>                   emission3 <- emission day3
>                   guard (emission3 == Clean)
>           transition day3
>       record [day4]

[Sunny,Rainy,Rainy,Rainy]. It's the Viterbi algorithm for free.

Anyway, I haven't proved a thing above. I think (hope?) I'm right about the polynomial complexity. Certainly on paper it all works, but I've always found it hard to reason about the complexity of Haskell code.

Unfortunately this code is all still slower than I'd like. I've played with trace during the development and everything seemed to run the way I expect (at least after I discovered the third monad law problem). I think that ultimately there's a big overhead for piling up HOFs, and the lookups in Data.Set probably aren't helping. Still, I don't think that invalidates the mathematics behind the approach and I think that asymptotically the slowness is really a big constant and a bunch of logs and it's still much better than combinatorial search. I ought to see how ocaml fares.

BTW There's a deep connection between this monad and the quantum mechanics one. I've no time to write about that now, though see here.

Some acknowledgements: the strange but highly readable Wikipedia article motivated me to think about the Viterbi algorithm, and I became sure there was a monad lurking in the Python code. I spent a month or so getting nowhere. And then I spoke with Eric Kidd on the phone about probability monads. He challenged me on something and in an attempt to formulate a response the whole of the above seemed to just crystallise. Thanks for asking the right question Eric!

I think there might be a comonad for working backwards through the graph...

Anyway, one last thing: please for the love of the FSM will someone put a clean way to implement restricted monads into their Haskell compiler. There is so much neat stuff that could be implemented with it and I now officially consider Haskell broken without one.

Labels: haskell, monad

Sunday, April 29, 2007

Homeland Security Threat Level Monad

I've previously introduced the idea of a monad being a kind of one way wrapper. The Monad interface allows you to wrap things, but not unwrap them again. And it allows you to apply any function you like to a wrapped object, as long as it stays wrapped. I also indicated how this could be used to model a notion of taint. If you have some data from a tainted source - for example it's unreliable, then the trivial monad can do a nice job of keeping track of all other data that has been tainted by contact with this data. What I also want to show is that a wide variety of other monads can be interpreted in terms of taint. What this means is that despite its simplicity, the trivial monad is always worth keeping in the back of your mind because it is a protoype for many other monads.

You could imagine using this taint idea as a security model. Any data of type W a is considered secret and a threat to public safety if it is revealed. By sticking to the Monad interface we force the compiler to make sure that any data depending on secret data is itself marked secret. This makes it easy to keep secret data sequestered - at least as long as people only use the Monad interface to W.

But suppose this isn't good enough and we need to track not only which data is secret, but also the threat level to public safety should a secret be revealed. With this in mind, define the Homeland Security Threat Level Monoid:


> data Threat = LOW | GUARDED | ELEVATED | HIGH | SEVERE
>          deriving (Eq,Ord,Enum,Show)

It's a monoid because it has an associative binary operator max and has an identity LOW. For example max LOW ELEVATED = ELEVATED.

We want to tag some of our data with a threat level. In addition, we want to ensure that if we combine two tagged pieces of data, x and y, the combination must be tagged with the worse of the two threat levels. That's why we have the monoid structure that allows us to combine two threats.

So define


> data T a = T Threat a deriving Show

But to use this it appears we have to thread use of the max function all the way through our code. It'll be a nightmare to implement. But there's a solution. What we want is a monad!

Consider LOW to be the default threat level of all data and that tells us how to define return.

Now suppose we have a function f :: a -> T b and some data x :: T a. We want to be able to apply f to x using the monad interface: x >>= f. The threat level of the result should be the max of the two possible sources of threat: x itself or the result of computing f. With that in mind:


> instance Monad T where
>     return x = T LOW x
>     (T t x) >>= f = let T t' x' = f x in T (max t t') x'

Notice how we've now shown how to implement threat combination in a way that can be used by any code that uses the monad interface. So now we can take last week's definition of h and use it:



> g x y = y >>= (return . (+x))
> h x y = x >>= (\x -> g x y)

Try evaluating h (T ELEVATED 3) (T GUARDED 6). It just works, even though g and h were originally written for the trivial monad! So it's trivial to make code that's written monadically work with our new threat level system. This should give some idea of the power of monads. We've completely changed the sematics of our monadic addition operator, h, without lifting a finger to edit it.

But notice that I didn't need to talk about threats at all. I've shown how to tag data with other data that's implicitly carried along with it. All we needed was a rule for combining pairs of tags. Our tags were of type Threat. Instead we could have used strings and instead of max we could have used ++. If we had, then we'd have a way of implicitly carrying around logs with our computations. In fact, all I have done here is implement a special case of the Writer monad. But I still think that thinking in terms of data that is quarantined in some way helps with the intuition.

I hope that by starting with the trivial monad last week, and upgrading in a natural way to the Homeland Security Threat Level monad, I've given an easy way to think about monads. Our ordinary intuitions about taint, for example that if a clean object touches a tainted one it becomes tainted itself, carry over to monads. And the Writer monad isn't the only one that can be thought of in these terms. I hope at some point soon to look at the Maybe and list monads in this way too.

Labels: haskell, monad

Saturday, April 28, 2007

The Trivial Monad

Update: I've added some solutions to exercises below.

Haskell monads form a type class. And Haskell type classes are essentially interfaces shared by some types. So the Monad type class is a common API for talking to a bunch of different types. So the question is this: what's so special about this API?

One way to grasp an API is to see concrete examples. Usually the monad API is introduced through containers or IO. But I think that the prototypical monad through which all others can be understood is much simpler - it's the trivial monad.

APIs often capture design patterns, and the design pattern here is a one-way wrapper. We can wrap things but we can't unwrap them. We still want to be able to whatever we like to the wrapped data, but anything that depends on anything wrapped should itself be wrapped.

Without further ado, here's a wrapper type:


data W a = W a deriving Show

(The Show bit is just so we can play with these things interactively.)

Note how it doesn't add anything except some wrapping. And the first thing we need is to be able to wrap anything we like. We can do that easily with this function:


return :: a -> W a
return x = W x

(We could have just used W instead of return. But I'm heading towards a common API that will work with other types too, so it obviously can't be called W.)

And now we need one more thing - a way to manipulate wrapped data leaving it wrapped. There's an obvious idea. Given any function a -> b we write a function that converts it to a function W a -> W b. This is guaranteed to keep things under wraps. So here goes


fmap :: (a -> b) -> (W a -> W b)
fmap f (W x) = W (f x)

.

It seems that we've finished. For example we can wrap a number and then increment it:


a = W 1
b = fmap (+1) a

But here's something we can't do. Define a function f like this:


f :: Int -> W Int
f x = W (x+1)

It increments and returns a wrapped result. Now suppose we want to apply the underlying operation here twice, ie. we'd like to increment a number twice and return a wrapped result. We can't simply apply f twice because (1) it'll doubly wrap our result and (2) it'll attempt to add 1 to a wrapped result, something that doesn't make sense. fmap doesn't do what we want either. Try it in an interactive Haskell session. It seems we need a way to apply f, unwrap the result and apply f again. But we've already said that we don't want to allow people to unwrap these things. We we need to provide a higher order function that does the unwrapping and application for us. As long as this function always gives us back something that is wrapped, our end users will never be able to unwrap anything. Here's an idea for such a function:


bind :: (a -> W b) -> (W a -> W b)
bind f (W x) = f x

Notice how it's very similar to fmap but is even simpler. So now we can try doubly, or triply, incrementing


c = bind f (f 1)
d = bind f (bind f (f 1))

Notice how bind is more general than fmap. In fact, fmap f = bind (return . f).

And that's it. Using return and bind we have achieved our goal of wrapping objects and freely manipulating wrapped objects while keeping them wrapped. What's more, we can chain functions that wrap without getting bogged down in multiple layers of wrapping. And that, really, sums up what a Haskell monad is all about.

So here are a couple of exercises:

(1) define a function g :: Int -> W Int -> W Int so that g x (W y) = W (x+y). Obviously that definition won't do - the left hand side has a W y pattern so it's actually unwrapping. Rewrite this function so that the only unwrapping that happens is carried out by bind.

(2) define a function h :: W Int -> W Int -> W Int so that h (W x) (W y) = W (x+y). Again, no unwrapping.

I'm hoping that after you've done these exercises you'll see how you can still work freely with data even though it's wrapped.

In Haskell, it is more usual to use the operator >>= instead of bind where bind f x = x >>= f.

So the last question is this: why would you ever wrap data like this? In practice people tend not to use the trivial monad very much. Nonetheless, you can see how it might be used to represent tainted data. Wrapped data is considered tainted. Our API never lets us forget when data is tainted and yet it still allows us to do what we like with it. Any time we try to do anything with tainted data the result is also tainted, exactly as we might expect. What I find interesting is that almost every monad, including IO, lists and even probability, can be thought of quite naturally as variations on taint. I hope to say more about this in the near future.

Anyway, code above fails to compile because of some namespace clashes. Here's a complete definition of W that really works. Note also that fmap, which we showed we don't really need, allows us to make W an instance of Functor too:


> data W x = W x deriving Show

> instance Functor W where
>    fmap f (W x) = W (f x)

> instance Monad W where
>    return x = W x
>    W x >>= f = f x

Exercise 3: Prove the three monad laws for W. This should be almost trivial.

Exercise 4: We can't completely unwrap things using the monad API. But we can unwrap one layer from things that are wrapped twice. So here's a nice puzzle: define a function join :: W (W a) -> W a using the Monad API and no explicit unwrapping.

In conclusion, most monad tutorials show what monads are, and how to use them. I hope I've given some additional insight into why the Monad interface consists precisely of the two functions it does. That insight should become clearer when I say more about taint.

PS I've been a bit busy lately with planning for kitchen remodelling, lots of work and a new cat(!). I probably won't be posting very frequently for a few months.

Solutions to Exercises

Some people have found the exercises tricky so I'm posting some solutions. I think it's a good sign that they're tricky, it means that there is some non-trivial stuff to be learnt from playing with this monad. It ought to figure more strongly in tutorials. In a sense it allows people to learn about monadicity in its purest form.

Anyway, enough of that.

Exercise 1: We want g :: Int -> W Int -> W Int so that g x (W y) = W (x+y). Start the definition like this


g x y = ...

We want to apply (+x) to y. But y is wrapped so we must use >>= to apply it. But (+1) doesn't have the right type to be applied with >>=, it's an Int -> Int not an Int -> W Int. So we must compose with return and we get


g x y = y >>= (return . (+x))

Exercise 2: We want h :: W Int -> W Int -> W Int so that h (W x) (W y) = W (x+y). But g is already fairly close. So again write


h x y = ...

The difference from g is that x is wrapped. So we want to apply \x -> g x y to our wrapped value. This function is already of type Int -> W Int. So we can just write


h x y = x >>= (\x -> g x y)

or just h x y = x >>= flip g y.

Exercise 3: I'll just do the last one: (m >>= f) >>= g = m >>= (\x -> f x >>= g)

m is of the form W x for some x:


(W x >>= f) >>= g 
= (f x) >>= g
 
W x >>= (\x -> f x >>= g)
= (\x -> f x >>= g) x
= f x >>= g

So the LHS equals the RHS.

Incidentally, you can get a long way with these problems by not thinking about what's going on! Instead, just write out the type signatures of all of the functions involved and try to stick them together like a jigsaw puzzle so that the final result has the right type. At each stage the arguments of each function must match the signature of the function so it really is like fitting the shapes of jigsaw pieces together. This isn't a foolproof strategy, but it often works.

Anyway, feel free to ask questions if you're still stuck. Some of your questions may be answered here.

Labels: haskell, monad

Sunday, March 18, 2007

The Shor Quantum Error Correcting Code (and a Monad for Heat)

Quantum computers promise great things, but that promise will only come to be if quantum computers can be made reliable. In order to achieve that reliability we need qubits that can maintain their state for extended periods of time. Unfortunately, just about any type of interaction between a qubit and its environment can render the qubit useless. But there is a solution - a quantum error correcting code. This is very similar to classical error correcting code in which a number of bits are stored, in a distributed fashion, over a larger set of bits, in such a way that if some of the bits are corrupted, the original state can be recovered. In the following will be some code to simulate the implementation of one of these codes: the nine bit Shor code.

Firstly, let me include the quantum monad that I've developed over earlier weeks:


> import Prelude hiding (flip)
> import Maybe
> import List
> import Data.Map (toList,fromListWith)
> import Complex
> import Data.Bits hiding (rotate)

> infixl 1 ->-, -><, =>=
> infixl 7 .*

Remember that W b a is the type of elements of a vector space over b with a basis whose elements are labelled by elements of a. I'm using the letter W as I'm thinking of elements of b as being weights.


> data W b a = W { runW :: [(a,b)] } deriving (Eq,Show,Ord)

It's useful to be able to apply transforms to all of the weights:


> mapW f (W l) = W $ map (\(a,b) -> (a,f b)) l

W is an instance of Num, Functor and Monad. Vector spaces form monads because you can specify a function between vectors spaces in terms of how it maps basis elements in the first vector space. The monad bind function lifts the function on basis elements to the entire vector space.


> instance Functor (W b) where
>   fmap f (W a) = W $ map (\(a,p) -> (f a,p)) a

> instance Num b => Monad (W b) where
>   return x = W [(x,1)]
>   l >>= f = W $ concatMap (\(W d,p) -> map (\(x,q)->(x,p*q)) d) (runW $ fmap f l)

(.*) gives us scalar multiplication:

> a .* b = mapW (a*) b

> instance (Eq a,Show a,Num b) => Num (W b a) where
>    W a + W b = W $ (a ++ b)
>    a - b = a + (-1) .* b
>    _ * _ = error "Num is annoying"
>    abs _ = error "Num is annoying"
>    signum _ = error "Num is annoying"
>    fromInteger a = if a==0 then W [] else error "fromInteger can only take zero argument"

It'd be cool if the bind function for W could simplfy our vectors by collecting together multiples of the same basis elements. Unfortunately this is tricky to do for reasons described in Eric Kidd's recent Set monad article. Instead I'll use an explicit function, collect, that collects our terms together. I've also used a crude method for throwing out terms that probably ought to be zero. A better solution might be to work only with rational numbers, or exact algebraic numbers.


> nonTrivial x = 10+abs x/=10
> collect :: (Ord a,Num b) => W b a -> W b a
> collect = W . filter (nonTrivial . snd) . toList . fromListWith (+) . runW

And here is the quantum monad:


> type Q a = W (Complex Double) a

There are a number of ways to apply functions to vector spaces. I'll be writing functions on the right so that I can apply operations to a quantum state reading from left to right. So here's a bunch of combinators to implement this. First there's ordinary function application:


> (->-) :: a -> (a -> b) -> b
> x ->- f = f x

Then there are functions that operate on the vector space by simply relabelling the basis:


> (=>=) :: Q a -> (a -> b) -> Q b
> x =>= f = fmap f x

Then there are functions that map from Q a to Q b but are defined by how they act on the basis elements of Q a:


> (-><) :: Q a -> (a -> Q b) -> Q b
> g ->< f = g >>= f

As I mentioned last post on entanglement, we can form states for joint independent systems using the tensor product:


> (<*>) :: Q a -> Q b -> Q (a,b)
> x <*> y = do
>   a <- x
>   b <- y
>   return (a,b)

The tensor product is a lot like a multiplication table. The coefficient of (a,b) in x <*> y is the product of the coefficients of a and b in x and y respectively. If two systems are unentangled then you can simulate the physics of each part separately. So it'd be nice to have a way to test whether or not a given state is unentangled, in other words, to test whether a state is a tensor product. As I'm not developing a serious application here, here's a quick and dirty algorithm that more or less does what anyone would do when faced with the task of recovering the row and column headings from a multiplcation table:


> norm x = sqrt $ sum $ map (\(_,a) -> magnitude a^2) $ runW $ collect x

> untensor :: (Ord a,Ord b) => Q (a,b) -> (Q a,Q b)
> untensor (W a) =
>     let u = nub $ sort $ map (fst . fst) a
>         v = nub $ sort $ map (snd . fst) a
>         p = W $ map (\x -> (x,fromJust (lookup (x,head v) a))) u
>         q = W $ map (\x -> (x,fromJust (lookup (head u,x) a))) v
>         in ((1/norm p :+ 0) .* p,(1/norm q :+ 0) .* q)

untensor recovers x and y from x <*> y (up to multiplication by a complex number with unit norm) but it gives garbage if the argument isn't a tensor product. So we need to test whether or not the original vector is the tensor product of the inputs. So here's a test of how close the original is to the product of the putative factors:


> independence x = collect (x-uncurry (<*>) (untensor x))

We'll be doing lots of stuff with tensor products so it's useful to have some tools to apply functions to one or other factor:


> mapFst f (a,b) = (f a,b)
> mapSnd f (a,b) = (a,f b)
> mapFstM f (a,b) = do
>     a' <- a ->- f
>     return (a',b)
> mapSndM f (a,b) = do
>     b' <- b ->- f
>     return (a,b')

If a bit is a Bool then by rights, a qubit should be a Q Bool. But I think 0 and 1 look nicer so I'm using Q Integer instead. So here's the fundamentally quantum rotation operator:


> rotate :: Double -> Integer -> Q Integer
> rotate theta 1 = let theta' = theta :+ 0
>   in cos (theta'/2) .* return 1 - sin (theta'/2) .* return 0
> rotate theta 0 = let theta' = theta :+ 0
>   in cos (theta'/2) .* return 0 + sin (theta'/2) .* return 1

Although rotate theta is a map Integer -> Q Integer, using ->< it becomes a map Q Integer to Q Integer. Note that the Double argument isn't float data inside a quantum device, it's a hard-coded feature of the hardware itself. For example, if we choose theta to be pi/2 and -pi/2 we get the standard square root of not operation and its inverse:


> hnot = rotate (pi/2)
> unhnot = rotate (-pi/2)

(You may be wondering about the use of theta/2 in my definition of rotate. There's an interesting story behind that involving what I think is one of the weirdest features of physics. For another day.)

I'm also goint to manipulate lots of lists of qubits so here are some tools for that:


> mapAt i f x = let (h,t:ts) = splitAt i x in h ++ (f t : ts)

> mapAtM :: Monad m => Int -> (a -> m a) -> [a] -> m [a]
> mapAtM i f x = let (h,t:ts) = splitAt i x in do
>     y <- f t
>     return $ h ++ (y : ts)

> flip a = 1-a
> flipAt i = mapAt i flip

So here's a statement of the problem. We have a qubit, ie. something of type Q Integer. But quantum systems are very delicate and our qubit may wander into a different state. In particular, after a time t it will be as if rotate theta has been applied to it, except that we can't control theta. We need to find a way to recover the original qubit. In fact, it's worse than that: our qubit is in some environment which we can consider to be a list of qubits. So our state is given by an instance of Q ([Integer],Integer) and the interaction might not simply be a rotation of the qubit but an operation that entangles the qubit with the environment. This might happen if the qubit isn't perfectly isolated.

Let's forget about the full generality here and consider just one type of data corruption - a qubit that might or might not have been flipped. How can we make a device robust against such occurences? Well this is practically identical to the problem of making classical data robust against noise. We can simply use a 3 bit repetition code to store our qubits as three qubits. So here's an implementation of a function to repair a codeword from the classical 3 bit repetition code:


> repairRepetition' [b,c,d] = (if a<3 then mapAt a flip else id) [b,c,d] where
>     a = case (b,c,d) of
>         (0,0,0) -> 3
>         (1,1,1) -> 3
>         (0,0,1) -> 2
>         (1,1,0) -> 2
>         (0,1,0) -> 1
>         (1,0,1) -> 1
>         (1,0,0) -> 0
>         (0,1,1) -> 0

But there are a couple of problems with this. Firstly this is a many-to-one map because [0,0,0] and [0,0,1], say, both get mapped to [0,0,0]. In quantum mechanics, all of our maps need to be unitary, and in particular, invertible. A many-to-one map isn't invertible. Secondly, we're creating a register a, and again this is not an invertible operation. These two problems will actually cancel each other out because the apparent destruction of information that comes about from our many-to-one map can be fixed by allowing a to carry that information away. And we can ensure reversibility by using a supplied register rather than creating a new one:


> repairRepetition'' (a,[b,c,d]) = (a,(if a<3 then mapAt a flip else id) [b,c,d]) where
>     a = case (b,c,d) of
>         (0,0,0) -> a `xor` 3
>         (1,1,1) -> a `xor` 3
>         (0,0,1) -> a `xor` 2
>         (1,1,0) -> a `xor` 2
>         (0,1,0) -> a `xor` 1
>         (1,0,1) -> a `xor` 1
>         (1,0,0) -> a `xor` 0
>         (0,1,1) -> a `xor` 0

It's easy to write an inverse for this function so it's a valid unitary transform. But it's actually a bit tedious to write out all those xors. So we can cheat a bit. Let's just assume there's an infinite supply of input zeroes and use them at our leisure. We no longer need to write the xors, we just note that every time we assign a value to a new register the hardware engineer building your device installs some means of import zero qubits. The a register in the above function will never be touched again. But we can't simply throw it away. It's a kind of side effect and we'll call these bits <em>ancillae</em>. We'll have lots of these side effects that we need to collect up, so here's the final quantum repetition codeword repair code:


> repairRepetition (a,[b,c,d]) = (i:a,(if i<3 then mapAt i flip else id) [b,c,d]) where
>     i = case (b,c,d) of
>         (0,0,0) -> 3
>         (1,1,1) -> 3
>         (0,0,1) -> 2
>         (1,1,0) -> 2
>         (0,1,0) -> 1
>         (1,0,1) -> 1
>         (1,0,0) -> 0
>         (0,1,1) -> 0

There's one last thing we could do to this code: the pairs being handed in and out of this function are essentially just instances of the Writer monad. So we could just squirrel away our ancillae inside a monad. I won't do that here because I think it's good to see everything explicitly when figuring this stuff out for the first time. But think about the bits that we'd be writing out. They correspond to information, or entropy, that we'd ultimately like to pump out of our device as exhaust. In other words they are heat. So this monad would, in effect, be a heat monad. I found this amusing given what I wrote in reason number eight in my list from last year.

So far so good. But that only corrects one type of error: a bit flip. But note that it also protects superpositions from bit flips. So let's take a quantum state built from repetition codewords placed in an empty environment:


> state1 = return [] <*> (return [0,0,0] + return [1,1,1])

Now corrupt the last bit:


> state1' = state1 =>= mapSnd (mapAt 2 flip)

And repair it:


> state1'' = state1' =>= repairRepetition

Although we have repaired the state of our system, the state of the environment has changed. But that's fine because this state is still an unentangles tensor product of the environment and our system. You can see this just by looking at the state, but for peace of mind use independence state1'' to see that the deviation from being a tensor product is the zero vector.

But that's still only a narrow class of error that we can fix. Another type of error maps return 0 to return 0 but maps return 1 to -return 1. This is known as a phase flip. One way to view this as as a flip in a rotated basis:

> phaseFlip x = x ->- hnot ->- fmap flip ->< unhnot
> pFlipAtM i = mapAtM i phaseFlip

There's a nice strategy to protect against phase flips: rotate our data into a basis where phase flips become ordinary flips, and use the repetition code. So we'll use these two states to represent 0 and 1:


> plus,minus :: Q [Integer]
> plus = return [1,1,1] ->< mapM unhnot 
> minus = return [0,0,0] ->< mapM unhnot

So let's make a state that is a superposition of these states in some environment:


> state2 = return [] <*> (plus + minus) ->- collect

Corrupt the middle bit with a phase flip:


> state2' = state2 ->< mapSndM (mapAtM 1 phaseFlip)

And repair it:


> state2'' = state2 ->< mapSndM (mapM hnot) =>= repairRepetition ->< mapSndM (mapM unhnot) ->- collect

Comparing state2 and untensor state2'' we see that the repair is successful. Unfortunately, we can use this scheme to repair phase flips, but not plain flips. We need a scheme that can repair both of these. So now we turn to Shor's ingenious solution: use both of these schemes as the same time. So we'll use the scheme that protects against phase flips with the twist that instead of using three qubits we use three triplets of qubits. We can use the three triplets to fix phase flips and we can use the three bits within a triplet to fix flips. We'll represent 0 and 1 using shor 0 and shor 1 as defined here:


> tripletPlus = (1/sqrt 2) .* return [0,0,0] + (1/sqrt 2) .* return [1,1,1]
> tripletMinus = (1/sqrt 2) .* return [0,0,0] - (1/sqrt 2) .* return [1,1,1]

> shor 1 = do
>     a <- tripletPlus
>     b <- tripletPlus
>     c <- tripletPlus
>     return $ a ++ b ++ c

> shor 0 = do
>     a <- tripletMinus
>     b <- tripletMinus
>     c <- tripletMinus
>     return $ a ++ b ++ c

Repair is a little more complex before. First we use 6 ancilla bits to fix qubits within each triplet using the repetition code three times:


> ancilla (i,j,k) x = ((x!!i) `xor` (x!!j),(x!!j) `xor` (x!!k))

> fix (i,j,k) (e,x) = let (u,v) = ancilla (i,j,k) x
>                         b = case (u,v) of
>                             (0,0) -> x
>                             (0,1) -> flipAt k x
>                             (1,0) -> flipAt i x
>                             (1,1) -> flipAt j x
>                      in (u:v:e,b)

> flipRepair start = start
>             =>= fix (0,1,2)
>             =>= fix (3,4,5)
>             =>= fix (6,7,8)

We also use two ancilla bits to locate phase flips between the triplets. phaseFlipFix will be applied in a rotated basis inside phaseFlipRepair:


> phaseFlipFix :: ([Integer],[Integer]) -> Q ([Integer],[Integer])
> phaseFlipFix (e,y) = do
>     let u = (y!!0) `xor` (y!!1) `xor` (y!!2) `xor` (y!!3) `xor` (y!!4) `xor` (y!!5)
>     let v = (y!!3) `xor` (y!!4) `xor` (y!!5) `xor` (y!!6) `xor` (y!!7) `xor` (y!!8)
>     let c = case (u,v) of
>             (0,0) -> y
>             (0,1) -> y ->- flipAt 6
>             (1,0) -> y ->- flipAt 0
>             (1,1) -> y ->- flipAt 3
>     return (u:v:e,c)

> phaseFlipRepair x = x
>   ->< mapSndM (mapM hnot)
>   ->- collect
>   ->< phaseFlipFix
>   ->- collect
>   ->< mapSndM (mapM unhnot)
>   ->- collect

Now we can pull all of this together into a single process to repair both kinds of error:


> shorRepair x = x ->- flipRepair ->- collect ->- phaseFlipRepair ->- collect

So now all that's needed is to try corrupting some qubits to see if we can repair them.


> state3 = return [] <*> shor 1
> state3' = state3 ->< mapSndM (pFlipAtM 0)
> state3'' = state3' ->- shorRepair

Let's try a more complex superposition with a more complex corruption:


> state4 = return [] <*> (0.6.*shor 0 + 0.8.*shor 1) ->- collect
> state4' = state4 ->< mapSndM (pFlipAtM 5) =>= mapSnd (flipAt 5) ->- collect
> state4'' = state4' ->- shorRepair ->- collect

This is all well and good. But there are more ways to corrupt a qubit beyond simple flips. Let's mess up our state using two additional kinds of 'noise' that we haven't discussed. Firstly, instead of just flipping, we'll rotate one of the bits. Secondly, we'll additionally 'leak' that bit to the environment so as to cause decoherence. We've discussed solutions to neither of these problems, so let's see what happens.


> leak i x = do
>   (e,v) <- x
>   return ((v!!i):e,v)

> state5 = state4
> state5' = state5 ->- leak 4 ->< mapSndM (mapAtM 4 (rotate (pi/10))) ->- collect
> state5'' = state5' ->- shorRepair ->- collect

A minor miracle has taken place. The Shor code is robust against just about anything you can throw at a single qubit. In fact, any code that can survive flips and phase flips of a single qubit can survive any kind of corruption, including entanglement with the environment. (Despite having read the proof, I was still pretty surprised when the code actually worked!) And that gives researchers one hope that these methods might serve as ways to make quantum computing reliable.

I'll mention a couple more things in closing. Firstly, nowhere above have I mentioned wavefunction collapse, observations or superoperators. You don't need any of these things to define the Shor code and show that it can correct errors. (You probably do need some extra assumptions about quantum mechanics if you want to calculate probabilities of various outcomes in the event that the code fails - eg. due to multiple qubits being corrupted.) And secondly, despite it being my April 1st post last year, I still have the doubts I expressed here about quantum computing.

Oh...and expect those lines to take a few seconds to evaluate. When you simulate the Shor code you're performing computations in a 512-dimensional vector space!

Bibliography

Peter Shor's original paper has good discussion of the issues but I found the description of the correction algorithm a little laconic.
John Preskill's Lecture Notes were probably the most useful document.
A student term paper by Dheera Venkatraman was also pretty useful though I was confused for a while by what might be considered errors in it.

Update
Fixed a couple of minor typos. And remember this code should just run, without modifications, in GHCi, so you can play with it yourself.

Labels: haskell, monad, physics, quantum

Tuesday, March 06, 2007

Monads, Vector Spaces and Quantum Mechanics pt. II

Back from wordpress.com:

I had originally intended to write some code to simulate quantum computers and implement some quantum algorithms. I'll probably eventually do that but today I just want to look at quantum mechanics in its own right as a kind of generalisation of probability. This is probably going to be the most incomprehensible post I've written in this blog. On the other hand, even though I eventually talk about the philosophy of quantum mechanics, there's some Haskell code to play with at every stage, and the codeives the same results as appear in physics papers, so maybe that will help give a handle on what I'm saying.

First get some Haskell fluff out of the way:


> import Prelude hiding (repeat)
> import Data.Map (toList,fromListWith)
> import Complex
> infixl 7 .*

Now define certain types of vector spaces. The idea is the a W b a is a vector in a space whose basis elements are labelled by objects of type a and where the coefficients are of type p.


> data W b a = W { runW :: [(a,b)] } deriving (Eq,Show,Ord)

This is very similar to standard probability monads except that I've allowed the probabilities to be types other than Float. Now we need a couple of ways to operate on these vectors.

mapW allows the application of a function transforming the probabilities...


> mapW f (W l) = W $ map (\(a,b) -> (a,f b)) l

and fmap applies a function to the basis element labels.


> instance Functor (W b) where

>    fmap f (W a) = W $ map (\(a,p) -> (f a,p)) a

We want our vectors to support addition, multiplication, and actually form a monad. The definition of >>= is similar to that for other probability monads. Note how vector addition just concatenates our lists of probabilities. The problem with this is that if we have a vector like a+2a we'd like it to be reduced to 3a but in order to do that we need to be able to spot that the two terms a and 2a both contain multiples of the same vector, and to do that we need the fact that the labels are instances of Eq. Unfortunately we can't do this conveniently in Haskell because of the lack of restricted datatypes and so to collect similar terms we need to use a separate collect function:


> instance Num b => Monad (W b) where
>    return x = W [(x,1)]
>    l >>= f = W $ concatMap (\(W d,p) -> map (\(x,q)->(x,p*q)) d) (runW $ fmap f l)

> a .* b = mapW (a*) b

> instance (Eq a,Show a,Num b) => Num (W b a) where
>     W a + W b = W $ (a ++ b)
>     a - b = a + (-1) .* b
>     _ * _ = error "Num is annoying"
>     abs _ = error "Num is annoying"
>     signum _ = error "Num is annoying"
>     fromInteger a = if a==0 then W [] else error "fromInteger can only take zero argument"

> collect :: (Ord a,Num b) => W b a -> W b a

> collect = W . toList . fromListWith (+) . runW

Now we can specialise to the two monads that interest us:


> type P a = W Float a

> type Q a = W (Complex Float) a

P is the (hopefully familiar if you've read Eric's recent posts) probability monad. But Q allows complex probabilities. This is because quantum mechanics is a lot like probability theory with complex numbers and many of the rules of probability theory carry over.

Suppose we have a (non-quantum macroscopic) coin that we toss. It's state might be described by:


> data Coin = Heads | Tails deriving (Eq,Show,Ord)

> coin1 = 0.5 .* return Heads + 0.5 .* return Tails :: P Coin

Suppose that if Albert sees a coin that is heads up he has a 50% chance of turning it over and if he sees a coin that is tails up he has a 25% chance of turning it over. We can describe Albert like this:


> albert Heads = 0.5 .* return Heads + 0.5 .* return Tails
> albert Tails = 0.25 .* return Heads + 0.75 .* return Tails

We can now ask what happens if Albert sees a coin originally turned up heads n times in a row:


> repeat 0 f = id
> repeat n f = repeat (n-1) f . f
> (->-) :: a -> (a -> b) -> b

> g ->- f = f g

> (-><) :: Q a -> (a -> Q b) -> Q b
> g ->< f = g >>= f

> albert1 n = return Heads ->- repeat n (->< albert) ->- collect

Let me explain those new operators. ->- is just function application written from left to right. The > in the middle is intended to suggest the direction of data flow. ->< is just >>= but I've written it this way with the final < intended to suggest the way a function a -> M b 'fans out'. Anyway, apropos of nothing else, notice how Albert approaches a steady state as n gets larger.

Quantum mechanics works similarly but with the following twist. When we come to observe the state of a quantum system it undergoes the following radical change:


> observe :: Ord a => Q a -> P a

> observe = W . map (\(a,w) -> (a,magnitude (w*w))) . runW . collect

Ie. the quantum state becomes an ordinary probabilistic one. This is called wavefunction collapse. Before collapse, the complex weights are called 'amplitudes' rather than probabilities. The business of physicists is largely about determining what these amplitudes are. For example, the well known Schrödinger equations is a lot like a kind of probabilistic diffusion, like a random walk, except with complex probabilities instead of amplitudes. (That's why so many physicists have been hired into finance firms in recent years - stocks follow a random walk which has formal similarities to quantum physics.)

The rules of quantum mechanics are a bit like those of probability theory. In probability theory the sum of the probabilites must add to one. In addition, any process (like albert) must act in such a way that if the input sum of probabilities is one, then so is the output. This means that probabilistic process are stochastic. In quantum mechanics the sum of the squares of the magnitudes of the amplitudes must be one. Such a state is called 'normalised'. All processes must be such that normalised inputs go to normalised outputs. Such processes are called unitary ones.

There's a curious subtlety present in quantum mechanics. In classical probability theory you need to have the sum of the probabilities of your different events to sum to one. But it's no good having events like "die turns up 1", "die turns up 2", "die turns up even" at the same time. "die turns up even" includes "die turns up 2". So you always need to work with a mutually exclusive set of events. In quantum mechanics it can be pretty tricky to figure out what the mutually exclusive events are. For example, when considering the spin of an electron, there are no more mutually exclusive events beyond "spin up" and "spin down". You might think "what about spin left?". That's just a mixture of spin up and spin down - and that fact is highly non-trivial and non-obvious. But I don't want to discuss that now and it won't affect the kinds of things I'm considering below.

So here's an example of a quantum process a bit like albert above. For any angle $latex \theta$, rotate turns a boolean state into a mixture of boolean states. For $latex \theta=0$ it just leaves the state unchanged and for $latex \theta=\pi$ it inverts the state so it corresponds to the function Not. But for $latex \theta=\pi/2$ it does something really neat: it is a kind of square root of Not. Let's see it in action:


> rotate :: Float -> Bool -> Q Bool
> rotate theta True = let theta' = theta :+ 0
>   in cos (theta'/2) .* return True - sin (theta'/2) .* return False

> rotate theta False = let theta' = theta :+ 0
>   in cos (theta'/2) .* return False + sin (theta'/2) .* return True

> snot = rotate (pi/2)

> repeatM n f = repeat n (>>= f)

> snot1 n = return True ->- repeatM n snot ->- observe

We can test it by running snot1 2 to see that two applications take you to where you started but that snot1 1 gives you a 50/50 chance of finding True or False. Nothing like this is possible with classical probability theory and it can only happen because complex numbers can 'cancel each other out'. This is what is known as 'destructive interference'. In classical probability theory you only get constructive interference because probabilities are always positive real numbers. (Note that repeatM is just a monadic version of repeat - we could have used it to simplify albert1 above so there's nothing specifically quantum about it.)

Now for two more combinators:


> (=>=) :: P a -> (a -> b) -> P b
> g =>= f = fmap f g
> (=><) :: P (Q a) -> (a -> Q b) -> P (Q b)
> g =>< f = fmap (>>= f) g

The first just uses fmap to apply the function. I'm using the = sign as a convention that the function is to be applied not at the top level but one level down within the datastructure. The second is simply a monadic version of the first. The reason we need the latter is that we're going to have systems that have both kinds of uncertainty - classical probabilistic uncertainty as well as quantum uncertainty. We'll also want to use the fact that P is a monad to convert doubly uncertain events to singly uncertain ones. That's what join does:


> join :: P (P a) -> P a
> join = (>>= id)

OK, that's enough ground work. Let's investigate a physical process that can be studied in the lab: the Quantum Zeno effect, otherwise known as the fact that a watched pot never boils. First an example related to snot1:


> zeno1 n = return True ->- repeatM n (rotate (pi/fromInteger n)) ->- collect ->- observe

The idea is that we 'rotate' our system through an angle $latex \pi/n$ but we do so in n stages. The fact that we do it in n stages makes no difference, we get the same result as doing it in one go. The slight complication is this: suppose we start with a probabilistic state of type P a. If we let it evolve quantum mechanically it'll turn into something of type P (Q a). On observation we get something of type P (P a). We need join to get a single probability distribution of type P a. The join is nothing mysterious, it just combines the outcome of two successive probabilistic processes into one using the usual laws of probability.

But here's a variation on that theme. Now we carry out n stages, but after each one we observe the system causing wavefunction collapse:


> zeno2 n = return True ->- repeat n (
>     \x -> x =>= return =>< rotate (pi/fromInteger n) =>= observe ->- join
>     ) ->- collect

Notice what happens. In the former case we flipped the polarity of the input. In this case it remains closer to the original state. The higher we make n the closer it stays to its original state. (Not too high, start with small n. The code suffers from combinatorial explosion.) Here's a paper describing the actual experiment. Who needs all that messing about with sensitive equipment when you have a computer? :-)

A state of the form P (Q a) is called a mixed state. Mixed states can get a bit hairy to deal with as you have this double level of uncertainty. It can get even trickier because you can sometimes observe just part of a quantum system rather than the whole system like oberve does. This inevitably leads mixed states. von Neumann came up with the notion of a density matrix to deal with this, although a P (Q a) works fine too. I also have a hunch there is an elegant way to handle them through an object of type P (Q (Q a)) that will eliminate the whole magnitude squared thing. However, I want to look at the quantum Zeno effect in another way that ultimately allows you deal with mixed states in another way. Unfortunately I don't have time to explain this elimination today, but we can look at the general approach.

In this version I'm going to consider a quantum system that consists of the logical state in the Zeno examples, but also include the state of the observer. Now standard dogma says you can't can't form quantum states out of observers. In other words, you can't form Q Observer where Observer is the state of the observer. It says you can only form P Observer. Whatever. I'm going to represent an experimenter using a list that representing the sequence of measurements they have made. Represent the complete system by a pair of type ([Bool],Bool). The first element of the pair is the experimenter's memory and the second element is the state of the boolean variable being studied. When our experimenter makes a measurement of the boolean variable, its value is simply prepended to his or her memory:


> zeno3 n = return ([],True) ->- repeatM n (
>    \(m,s) -> do
>     s' <- rotate (pi/fromInteger n) s
>     return (s:m,s')
>    ) ->- observe =>= snd ->- collect

Note how we now delay the final observation until the end when we observe both the experimenter and the poor boolean being experimented on. We want to know the probabilities for the final boolean state so we apply snd so as to discard the state of the observer's memory. Note how we get the same result as zeno2. (Note no mixed state, just an expanded quantum state that collapses to a classical probabilistic state.)

There's an interesting philosophical implication in this. If we model the environment (in this case the experimenter is part of that environment) as part of a quantum system, we don't need all the intermediate wavefunction collapses, just the final one at the end. So are the intermediate collapses real or not? The interaction with the environment is known as decoherence and some hope that wavefunction collapse can be explained away in terms of it.

Anyway, time for you to go and do something down-to-earth like gardening. Me, I'm washing the kitchen floor...

I must mention an important cheat I made above. When I model the experimenter's memory as a list I'm copying the state of the measured experiment into a list. But you can't simply copy data into a quantum register. One way to see this is that unitary processes are always invertible. Copy data into a register destroys the value that was there before and hence is not invertible. So instead, imagine that we really have an array that starts out zeroed out and that each time something is added to the list, the new result is xored into the next slot in the array. The list is just non-unitary looking, but convenient, shorthand for this unitary process.

Labels: monad, physics

Saturday, March 03, 2007

Monads in C pt. III

OK, the last C monad example:


typedef struct {
  char *string;
  int value;
} WriterInt;

WriterInt returnWriter(int i)
{
  WriterInt r;
  r.string = "";
  r.value = i;
  return r;
}

WriterInt bind(WriterInt (*f)(int),WriterInt x)
{
  WriterInt y = f(x.value);
  WriterInt z;
  z.value = y.value;
  int len = strlen(x.string)+strlen(y.string);
  z.string = malloc(len+1);
  strcpy(z.string,x.string);
  strcat(z.string,y.string);
  return z;
}

WriterInt print(int i)
{
  WriterInt x;
  x.string = malloc(32);
  x.value = sprintf(x.string,"%d\n",i);
  return x;
}

WriterInt printplus_bad(int i)
{
  WriterInt x = print(i);
  return print(x.value); /* cheating! */
}

WriterInt printplus(int i)
{
  WriterInt x = print(i);
  return bind(print,x);
}

This time, instead of printing, we build up a string as a side effect. printplus() is implemented exactly as before, without knowledge of how to handle side effects, and yet it correctly handles strings returned as a side effect and concatenates them together. The magic is bind() which allows a function expecting an int input to be applied to a WriterInt.

I'm hoping that programmers of languages such as C can now begin to see the pattern shared by these examples.

Anyway, this is all just stalling for time while I try to get my quantum computation code working.

Labels: monad

A Neighborhood of Infinity

Saturday, June 14, 2008

Categories of polynomials and comonadic plumbing

Saturday, June 30, 2007

Monads from Algebra and the the Gray Code from Groups

Saturday, June 16, 2007

How to write tolerably efficient optimzation code without really trying...

Sunday, April 29, 2007

Homeland Security Threat Level Monad

Saturday, April 28, 2007

The Trivial Monad

Solutions to Exercises

Sunday, March 18, 2007

The Shor Quantum Error Correcting Code (and a Monad for Heat)

Tuesday, March 06, 2007

Monads, Vector Spaces and Quantum Mechanics pt. II

Saturday, March 03, 2007

Monads in C pt. III

About Me

Previous Posts

Archives