A Neighborhood of Infinity: 2008

Wednesday, December 24, 2008

The Mother of all Monads

Suppose someone stole all the monads but one, which monad would you want it to be? If you're a Haskell programmer you wouldn't be too bothered, you could just roll your own monads using nothing more than functions.

But suppose someone stole do-notation leaving you with a version that only supported one type of monad. Which one would you choose? Rolling your own Haskell syntax is hard so you really want to choose wisely. Is there a universal monad that encompasses the functionality of all other monads?

I often find I learn more computer science by trying to decode random isolated sentences than from reading entire papers. About a year ago I must have skimmed this post because the line "the continuation monad is in some sense the mother of all monads" became stuck in my head. So maybe Cont is the monad we should choose. This post is my investigation of why exactly it's the best choice. Along the way I'll also try to give some insight into how you can make practical use the continuation monad. I'm deliberately going to avoid discussing the underlying mechanism that makes continuations work.

So let's start with this simple piece of code


> import Control.Monad.Cont

> ex1 = do
>   a <- return 1
>   b <- return 10
>   return $ a+b

I haven't specified the monad but in almost every case we'd expect the result to have something to do with the number 11. For the list monad we get [11], for the Maybe monad we get Just 11 and so on. For the Cont monad we get something that takes a function, and applies it to 11. Here's an example of its use:


> test1 = runCont ex1 show

ex1 is just a function that takes as argument show and applies it to 11 to give the string "11". Cont and runCont are just wrapping and unwrapping functions that we can mostly ignore.

We could have done that without continuations. So what exactly does the Cont monad give us here? Well let's make a 'hole' in the code above:

Whatever integer we place in the hole, the value of test1 will be the result of adding one and applying show. So we can think of that picture as being a function whose argument we shove in the hole. Now Haskell is a functional programming language so we expect that we can somehow reify that function and get our hands on it. That's exactly what the continuation monad Cont does. Let's call the function we're talking about by the name fred. How can we get our hands on it? It's with this piece code:


ex1 = do
  a <- return 1
  b <- Cont (\fred -> ...)
  return $ a+b

The ... is a context in which fred represents "the entire surrounding computation". Such a computaton is known as a "continuation". It's a bit hard to get your head around but the Cont monad allows you to write subexpressions that are able to "capture" the entirety of the code around them, as far as the function provided to runCont. To show that this is the case let's apply fred to the number 10:


> ex2 = do
>   a <- return 1
>   b <- Cont (\fred -> fred 10)
>   return $ a+b

> test2 = runCont ex2 show

The entire computation is applied to 10 and we get "11". Now you know what return does in this monad. But that's a convoluted way of doing things. What other advantages do we get? Well the expression for b can do whatever it wants with fred as long as it returns the same type, ie. a string. So we can write this:


> ex3 = do
>   a <- return 1
>   b <- Cont (\fred -> "escape")
>   return $ a+b

> test3 = runCont ex3 show

fred is completely ignored. The entire computation is thrown away and instead of applying show to a number, we simply return "escape". In other words, we have a mechanism for throwing values out of a computation. So continuations provide, among other things, an exception handling mechanism. But that's curious, because that's exactly what the Maybe monad provides. It looks like we might be able to simulate Maybe this way. But rather than do that, let's do something even more radical.


> ex4 = do
>   a <- return 1
>   b <- Cont (\fred -> fred 10 ++ fred 20)
>   return $ a+b

> test4 = runCont ex4 show

We've used fred twice. We've made the code around our "hole" run twice, each time executing with a different starting value. Continuations allow mere subexpressions to take complete control of the expressions within which they lie. That should remind you of something. It's just like the list monad. The above code is a lot like


> test5 = do
>   a <- return 1
>   b <- [10,20]
>   return $ a+b

So can we emulate the list monad? Well instead of converting our integer to a string at the end we want to convert it to a list. So this will work:


> ex6 = do
>   a <- return 1
>   b <- Cont (\fred -> fred 10 ++ fred 20)
>   return $ a+b

> test6 = runCont ex6 (\x -> [x])

We can avoid those ++ operators by using concat:


> ex7 = do
>   a <- return 1
>   b <- Cont (\fred -> concat [fred 10,fred 20])
>   return $ a+b

> test7 = runCont ex7 (\x -> [x])

But now you may notice we can remove almost every depepndence on the list type to get:


> ex8 = do
>   a <- return 1
>   b <- Cont (\fred -> [10,20] >>= fred)
>   return $ a+b

> test8 = runCont ex8 return

Note, we're using monad related functions, but when we do so we're not using do-notation. We can now do one last thing to tidy this up:


> i x = Cont (\fred -> x >>= fred)
> run m = runCont m return

And now we have something close to do-notation for the list monad at our disposal again:


> test9 = run $ do
>   a <- i [1,2]
>   b <- i [10,20]
>   return $ a+b

I hope you can see how this works. i x says that the continuation should be applied to x, not as an ordinary function, but with >>=. But that's just business as usual for monads. So the above should work for any monad.


> test10 = run $ do
>   i $ print "What is your name?"
>   name <- i getLine
>   i $ print $ "Merry Xmas " ++ name

The Grinch has been foiled and we see that the continuation monad really is the mother of all monads.

There are some interesting consequences of this beyond Haskell. Many languages with support for continuations should be extensible to support monads. In particular, if there is an elegant notation for continuations, there should be one for monads too. This is why I didn't want to talk about the underlying mechanism of the Cont monad. Different languages can implement continuations in different ways. An extreme example is (non-portable) C where you can reify continuations by literally flushing out all registers to memory and grabbing the stack. In fact, I've used this to implement something like the list monad for searching in C. (Just for fun, not for real work.) Scheme has call-with-current-continuation which can be used similarly. And even Python's yield does something a little like reifying a continuation and might be usable this way. (Is that's what's going on here? I haven't read that yet.).

This post was also inspired by this paper by Filinski. I haven't followed the details yet (it's tricky) but the gist is similar. I was actually looking at Filinski's paper because of something I'll mention in my next post.

Saturday, November 29, 2008

An Approach to Algorithm Parallelisation

The other day I came across the paper Parallelizing Complex Scans and Reductions lying on a colleague's desk. The first part of the paper discussed how to make a certain family of algorithms run faster on parallel machines and the second half of the paper went on to show how, with some work, the method could be stretched to a wider class of algorithm. What the authors seemed to miss was that the extra work really wasn't necessary and the methods of the first half apply, with no change, to the second half. But don't take this as a criticism! I learnt a whole new way to approach algorithm design, and the trick to making the second half easy uses methods that have become popular in more recent years. Doing a web search I found lots of papers describing something similar to what I did.

This is also a nice example of how the notion of abstraction in computing and the notion of abstraction in mathematics are exactly the same thing. But I'm getting ahead of myself.

So here a direct translation from he paper of some procedural code to find the largest sum that can be made from a subsequence of a sequence. This will be our first implementation of the problem examined in the second half of the paper:


> f1 [] (sofar,max) = (sofar,max)
> f1 (b:bs) (sofar,max) =
>     let sofar' = if sofar+b<0
>             then 0
>             else sofar+b
>         max' = if max<sofar'
>             then sofar'
>             else max
>     in f1 bs (sofar',max')

sofar is a running sum that is reset each time it dips below zero, and max keeps track of the best sum so far. We initialise sofar and sum with 0 and -infinity. Here's an example of how to ue it:


> b :: [Double]
> b = [1..5] ++ [5,4..(-10)] ++ [(-2)..6]

> infinity :: Double
> infinity = 1/0
> test1 b = snd $ f1 b (0,-infinity)

Notice how we prime the algorithm with a starting vector. The 0 corresponds to the fact that at the start we've summed over 0 elements and the -infinity corresponds to the fact that we want the first sum computed to be the highest so far at that point.

Test the code with test1 b. We'll use a similar pattern all the way through this code.

You may see the problem with making this parallelisable: we are maintaining running sums so that the final values of sofar and max all depend on what was computed earlier. It's not obvious that we can break this up into pieces. sofar computes sums of subsequences between resets but chopping the array b into pieces might split such subsequences between processors. How can we handle this cleanly?

The first step is to write version two of the above function using max instead of conditionals:


> f2 [] (sofar,max) = (sofar,max)
> f2 (b:bs) (sofar,max) =
>     let sofar' = Prelude.max (sofar+b) 0
>         max'   = Prelude.max max sofar'
>     in f2 bs (sofar',max')

> test2 b = snd $ f2 b (0,-infinity)

But that doesn't appear to make things any easier, we've just buried the conditionals inside max, it doesn't make the serial dependency go away.

So let's solve another problem instead. In f2 I'll replace max with addition and addition with multiplication. 0 is the identity for addition so we should replace it with the identity for multiplication, 1. Similarly, -infinity is the identity for max so we should replace that with 0. We get:


> f3 [] (sofar,max) = (sofar,max)
> f3 (b:bs) (sofar,max) =
>     let sofar' = sofar*b+1
>         max'   = max+sofar'
>     in f3 bs (sofar',max')

> test3 b = snd $ f3 b (1,0)

That's all very well but (1) it looks no more parallelisable and (2) it's solving the wrong problem. Let's ignore problem (2) for now.

The thing that makes f3 easier to work with is that it's almost a linear function acting on the vector (sofar,max). Linear functions have one very nice property. If f and g are linear then we can compute f(g(x)) by acting with g first, and then applying f. But we can also compose f and g without reference to x giving us another linear function. We only have to know how f and g act on basis elements and we can immediately compute how f . g acts on basis elements. This is usually expressed by writing f and g as matrices. So let's tweak f3 so it's linear in its last argument:


> f4 [] (sofar,max,i) = (sofar,max,i)
> f4 (b:bs) (sofar,max,i) =
>     let sofar' = (sofar * b) + i
>         max'   = max + sofar'
>         i'     = i
>     in f4 bs (sofar',max',i')

> test4 b = let (_,max,_) = f4 b (1,0,1) in max

So now I need to write some code to work with linear functions. I'll do it in a very direct style. Here are some tuples representing basis vectors. (I could have written some fancy vector/matrix code but I don't want to distract from the problem in hand.)


> x,y,z :: Num a => (a,a,a)
> x = (1,0,0)
> y = (0,1,0)
> z = (0,0,1)

And here's some code that computes how a function acts on a basis, in effect finding the matrix for our function with respect to the basis x,y,z:


> matrix f = (f x,f y,f z)

Some simple operations on vectors:


> (a,b,c) .+ (d,e,f) = (a + d,b + e,c + f)
> a .* (b,c,d) = (a * b,a * c,a * d)

And now a little function that, given how f acts on basis elements, can apply f to any vector:


> apply (mx,my,mz) (sofar,max,i) = (sofar .* mx) .+ (max .* my) .+ (i .* mz)

Now we can redo the calculation with f4 by first making the matrix for f4, and then applying that to our starting vector.


> test5 b = let m = matrix (f4 b)
>               (_,max,_) = apply m (1,0,1)
>           in max

Note how by time we've computed m we've done almost all of the work even though the code hasn't yet touched (1,0,1).

But now comes the first bit of magic. We can split our list b into pieces. Compute the corresponding matrix for each piece on a separate processor, and then apply the resulting matrices to our starting vector.

Let's chop our list of reals into pieces of size n:


> chop n [] = []
> chop n l = let (a,b) = splitAt n l in a : chop n b

We'll use pieces of size 10:


> test6 b = max where
>    (_,max,_) = foldr ($) (1,0,1) (reverse b_functions) where
>    b_pieces = chop 10 b

The following maps should be replaced with a parallel version. It's easy to do this.


>    b_matrices = map (matrix . f4) b_pieces
>    b_functions = map apply b_matrices

Great, we've successfully parallelised the code, but it's the wrong algorithm. How can we use this to solve the correct problem? Remember how we replaced max with addition and addition with multiplication. We just have to undo that. That's all! Everything required to prove that the above parallelisation is valid applies over any semiring. At no point did we divide or subtract, and we only used elementary properties of numbers like a*(b+c) = a*b+a*c. That property holds for max and addition. In fact a+max(b,c) = max(a+b,a+c). We don't even have to modify the code. We can just define the max-plus semiring as follows:


> newtype MaxPlus = M Double deriving (Eq,Show)

> instance Num MaxPlus where
>     fromInteger 0 = M (-infinity)
>     fromInteger 1 = M 0
>     fromInteger _ = error "no conversion from general integer"
>     M a + M b = M (max a b)
>     M a * M b = M (a+b)
>     abs _ = error "no abs"
>     signum _ = error "no signum"
>     negate _ = error "no negate"

And now all we need is


> test7 b = test6 (fmap M b)

(I wonder if ghc is smart enough to completely eliminate that fmap M, after all, on a newtype, M should do zero work.)

And that's a completely parallelised version of the original algorithm.

There is a ton of optimisation that can be performed here. In particular, matrix applies a function to a fixed basis. For the particular function we're using here there's a big win from constant folding. The same constant folding applies in any semiring.

And back to the point I made at the beginning. By abstracting from the reals to a general semiring we are able to make the same code perform multiple duties: it can work on functions linear over many semirings, not just the reals. Mathematicians don't work with abstractions just to make life hell for students - they do so because working with abstract entities allows the same words to be reused in a valid way in many different contexts. This benefits both mathematicians and computer scientists.

Here's a link you can use to find out more on this technique.

Note that in real world usage you wouldn't use lists. -|chop|- would take longer than the rest of the algorithm.

PS A curious aside. I spent ages trying to get ghc to compile this code and getting my homebrew HTML markup code to work reliably on it. But eventually I located the problem. I've just returned from Hawai`i where I wrote most of this code. Just for fun I'd put my keyboard into Hawai`ian mode and forgot about it. When I did that, the single quote key started generating the unicode symbol for the Hawai`ian glottal stop. It looks just like a regular single quote in my current terminal font so it was hard to see anything wrong with the code. But, quite reasonably, ghc and many other tools barf if you try to use one where a regular quote is expected.

Saturday, November 15, 2008

Some thoughts on reasoning and monads

Just a short note about some half-formed thoughts on the subject of monads and reasoning.

Haskell types and total functions, at least for a suitable subset of Haskell, form a category, with types and functions as the objects and arrows. Haskell code is usually full of expressions whose types don't correspond to arrows (e.g.. 1::Int). But in category theory we can only really talk of objects and arrows, not elements of objects. Many categories are made up of objects that have no reasonable notion of an element. By writing code in pointfree style we can eliminate reference to individual elements and talk only about arrows.

For example, suppose f :: A -> B and g :: B -> C. Then we can define h :: A -> C in a couple of different ways. Pointfree:


h = g . f

or pointful:


h x = let y = f x
          z = g y
      in z

For examples as simple as this it's not hard to write code that will translate between one style and the other. But the former definition has an advantage, it works in any category. We'll consider a very simple example. Let R be the category of real numbers, where there is an arrow from x to y if x≤y. We can think of f and g as arrows in this category. If we know A≤B and B≤C then the first definition above tells us how to construct an arrow from A to C and hence it proves that A≤C. The second definition makes no sense because it relies on the notion of x as an element of A and uses the functions f and g to construct elements of B and C. These words hold no meaning in the category R where the arrows aren't functions and the objects aren't containers of elements.

Except that's not quite true.

Because there is a scheme for translating the second pointful definition to pointfree form, the second definition does in fact provide a proof that A≤C. We just have to bear in mind that the proof needs to be translated into pointfree form first. In fact, we can happily spend our day using pointful style to generate proofs about R, as long as at the end of the day we translate our proofs to pointfree notation. In fact, Haskell programmers know that it's often much easier to write programs in pointful style so it seems reasonable to guess that there are many proofs that are easier to write in pointful style even though they can't be interpreted literally. Philosophically this is a bit weird. As long as we restrict ourselves to chains of reasoning that can be translated, we can use intuitions about elements of objects to make valid deductions about domains where these notions make no sense.

Part I of Lambek and Scott is about the correct pointful language to use when talking about cartesian closed categories (CCCs). They use a form of typed lambda calculus. Every arrow in a cartesian category can be written as a pointful lambda expression. Even though there isn't a meaningful way to assign meaning to the 'points' individually, every lambda abstraction gives an arrow in a cartesian closed category that can be built using the standard parts that come with a CCC, and vice versa. The pointful definition of h above is an example. Here's a (very) slightly less trivial example. Given f :: A -> C and g :: B -> D we can define h :: (A,C) -> Either B D as


f x = let y = fst x
          z = Left y
      in z

Again, that code can be translated into pointfree form using only the standard parts available in a cartesian closed category. Actually, we only need a category with products and coproducts for this example. So consider a lattice. The above code can be translated into a proof that A∩C≤B∪D. (I'm using cap and cup for join and meet.) Again, it makes no sense to talk of elements of a general lattice as containers of elements, and yet after translation to pointfree notation we have a valid proof. If you can restrict yourself to a suitable subset of set theory then your proofs involving elements of objects can carry over to many other categories.

Part II of Lambek and Scott is about taking this idea to extremes. It's about categories known as toposes. A topos is a category that is even more like the category of sets than a CCC. It's still general enough that there are many kinds of toposes, but you can use a sizable portion of first order logic and ZF to make deductions about them. Again, the literal notion of membership of the objects of a topos might make no sense, but the proofs have a translation to pointfree notation. In fact, it's possible to write entire papers in what looks like conventional set theory language, and have them be valid for other toposes. Anders Kock, for example, writes such papers. Chris Isham has been arguing that topos theory is the correct framework for physics. If you interpret your propositions as being in the category Set then you get classical physics. But those same propositions can be interpreted in other categories, such as one for quantum mechanics, giving a way to use and extend classical language to reason about quantum systems. This set theory-like language is known as the "internal language" of a topos.

Anyway, I'm interested in the notion that Haskell do-notation provides another kind of pointful language that can be used to reason about situations where points don't seem at first to make sense. Consider the lattice of subsets of a topological space, ordered by inclusion. Let Cl(U) be the closure of U. Cl satisfies these properties:

X⊂Y implies Cl(X)⊂Cl(Y)
X⊂Cl(X)
and
Cl(CL(X))⊂Cl(X)

Look familiar? If we make this lattice into a category, with arrows being inclusions, then the first property states that Cl is a functor and the next two say that Cl is a monad. In fact, monads are a kind of generalised closure. So now suppose we're given A⊂Cl(B) and B⊂Cl(C) and wish to prove that A⊂Cl(C). We can rephrase this by saying that if f :: A -> Cl B and g :: B -> Cl C we need an arrow h :: A -> Cl C. We can write one like this:


h x = do
    y <- f x
    z <- g y
    return z

Now it's tempting to interpret the inclusions as functions with y <- f x saying that y is the image of x under the inclusion. (I don’t know about you, but when I write x <- getChar I think to myself “x is the return value from calling getChar”.) But that doesn't really work because the type of f x is Cl B but y is of type B. On the other hand, we can radically reinterpret the above as something like this: when arguing about chains of inclusions of subsets of a topological space, as long as at the end of the chain of inclusions you always end up in the closure of some subset, you're allowed to cheat and nudge a generic point in the closure of a subset back into the original subset. This is exactly parallel to the way do-notation seems to allow us to extract elements out of monadic objects as long as at the end of the do-block we always return an element of monadic type. I'm sure that with a bit of work we could produce a rigorous metatheorem from this. I also expect we can also produce something similar for comonads.

Anyway, the moral is that when working with categories with monads there are may be some interesting and unusual ways to reason. The example of the lattice of subsets is fairly trivial but I'm sure there are other interesting examples. I also expect there's a nice connection with modal logic. I now think of Haskell do-notation as the "internal language" of a category with monads.

Update: I left out the crucial sentences I meant to write. It's easy to see do-notation as a kind of Haskell specific trick for making things like IO heavy code look like traditional procedural code. But comparison with the theory in Lambek and Scott, and Topos Theory in general, makes it clear that do-notation is a member of a family of related languages.

Saturday, November 08, 2008

From Monoids to Monads

Generalising Monoids

The word 'monad' is derived from the word 'monoid'. The explanation usually given is that there is an analogy between monoids and monads. On the surface, this seems a bit unlikely. The join operation in a monad is supposed to correspond to the binary operator in the monoid, but join is a completely different kind of thing, certainly not a binary operator in any usual sense.

I'm going to make this analogy precise so that it's clear that both monoids and monads are examples of the same construction. In fact, I'm going to write some Haskell code to define monoids and monads in almost exactly the same way. I was surprised to find I could do this because instances of Haskell's Monoid and Monad aren't even the same kind of thing (where I'm using 'kind' in its technical sense). But it can be done.

So let's start thinking about monoids. They are traditionally sets equipped with a special element and a binary operator so that the special element acts as an identity for the binary operator, and where the binary operator is associative. We expect type signatures something like one :: m and mult :: m -> m -> m so that, for example, m (m a b) c == m a (m b c). That's fine as it stands, but it doesn't generalise easily. In particular it'd be nice to generalise this definition to other categories. To do that we need to rephrase the definitions so that they are completely point-free and are written purely as the compositions of arrows.

Let's start by thinking about the rule that says multiplying by the identity on the left should leave a value unchanged. Ie. we want mult one x == x for all x. We already have a problem, it refers to a couple of 'points', both one, the identity, and x the unknown. We can deal with the first one easily, we just replace one with an arrow () -> m. But we also need some plumbing to provide two arguments to the mult function. Rather than belabour the point, I'll just give the full code:


> {-# OPTIONS_GHC -fglasgow-exts #-}

> import Control.Monad
> import Test.QuickCheck

> class Monoid m where
>     one :: () -> m
>     mult :: (m,m) -> m

The law for multiplication on the left is then given by law1_left == law1_middle and so on:


>     law1_left,law1_middle,law1_right :: m -> m
>     law1_left   = mult . (one <#> id) . lambda
>     law1_middle = id
>     law1_right  = mult . (id <#> one) . rho

The associativity law is then given by law2_left x = law2_right x.


>     law2_left,law2_right :: ((m,m),m) -> m
>     law2_left = mult . (mult <#> id)
>     law2_right = mult . (id <#> mult) . alpha

The left and right hand sides of the laws are now point-free. But in order to do this I've had to write some auxiliary functions:


> lambda :: a -> ((),a)
> lambda x = ((),x)

> rho :: a -> (a,())
> rho x = (x,())

> alpha :: ((a,b),c) -> (a,(b,c))
> alpha ((x,y),z) = (x,(y,z))

I've also used the fact that (,) is a bifunctor, ie. it's a functor in each of its arguments so (,) doesn't just give a way to generate a new type (a,b) from types a and b. It also combines pairs of arrows to make new arrows. I'll call the part of (,) that acts on arrows by the name <#>:


> (<#>) :: (a -> c) -> (b -> d) -> (a, b) -> (c, d)
> (f <#> g) (x,y) = (f x,g y)

Intuitively, f <#> g maps f on the left and g on the right of (a,b)

Try unpacking those definitions to see that we get the usual monoid laws. For example


law2_left ((x,y),z) == mult $ (mult <#> id) ((x,y),z) == mult (mult (x,y),z)
law2_right ((x,y),z) == mult $ (id <#> mult) $ alpha ((x,y),z) == mult (x,mult (y,z))

So we get


mult (mult (x,y),z) ==  mult (x,mult (y,z))

the usual associativity law.

Now that this definition is point-free it seems we could just carry it over to any category. In fact, we've implicitly done this because we've carried over the definition of a monoid from Set to Hask, the category of Haskell types and total functions. We're so used to treating Hask as a proxy for Set we can forget they are actually different categories. But this definition of a monoid works in both. But what about that lambda, rho and alpha? Well they're easy to define in any Cartesian closed category (CCC). But we don't need all of the features of a CCC to define a monoid, we just need lambda, rho and alpha and some kind of 'product' on the set of objects that also acts like a bifunctor. In fact, there's a bunch of 'obvious' laws that these functions satisfy in a CCC. Any category with these functions satisfying these same laws is called a monoidal category. The above definitions allow us to transfer the definition of a monoid to any such category. For the full definition, see the wikipedia entry.

Anyway, let's check to see if the type Int might be a monoid:


> instance Monoid Int where
>     one _ = 1
>     mult (a,b) = a*b

> check1 = quickCheck $ \n -> law1_left n == law1_middle (n :: Int)
> check2 = quickCheck $ \n -> law1_left n == law1_right (n :: Int)
> check3 = quickCheck $ \n -> law2_left n == (law2_right n :: Int)

Of course that's no proof, but it gives us some confidence. (Eg. what about large numbers close to 2³¹...?)

Another Category in the World of Haskell

Most people around here will be familiar with how Haskell types and functions form a category Hask in the obvious way. But it's less well known that there is another category lurking in Haskell. Consider the set of all Haskell functors. These are endofunctors, ie. functors Hask→Hask. Between any two functors is the set of natural transformations between those functors. (If f and g are functors, then the polymorphic functions f a -> g a are the natural transformations.) In addition, you can compose natural transformations and the composition is associative. In other words, Haskell functors form a category. (See the appendix for more abstract nonsense relating to this.)

We'll call the category of endofunctors on Hask by the name E. Functors can be composed like so:


> type (f :<*> g) x = f (g x)

It'd be cool if this was a product in the usual categorical sense, but it isn't. There isn't a natural way to map to both f a and g a from f (g a) with the universal property of products. Instead it's a weaker type of product which is still a bifunctor. Here's how it acts on arrows (remembering that in E, arrows are natural transformations):


> (<*>) f g = f. fmap g

(We could also have used fmap g . f.) If you think of f <#> g as making f act on the left and g act on the right, then you can think of f <*> g as making f and g act on the outer and inner containers of a container of containers.

Here's the identity element for this product, the identity functor. It plays a role similar to () in Hask:


> data Id x = Id x deriving Show
> instance Functor Id where
>     fmap f (Id x) = Id (f x)

We can now define some familiar looking natural transformations:


> lambda' :: Functor f => f a -> (Id :<*> f) a
> lambda' x = Id x

> rho' :: Functor f => f a -> (f :<*> Id) a
> rho' x = fmap Id x

> alpha' :: f (g (h a)) -> f (g (h a))
> alpha' = id

With these we have turned E into a monoidal category.

(OK, this may be confusing. We're 'multiplying' functors and we have associativity and a left- and right-identity. So functors form a monoid (modulo isomorphism). But that's a distraction, these are not the monoids that you're looking for. See the appendix for more on this.)

So now we can define monoids in this category using almost the same code as above:


> class Functor m => Monoid' m where
>     one' :: Id a -> m a
>     mult' :: (m :<*> m) a -> m a

>     law1_left',law1_middle',law1_right' :: m a -> m a
>     law1_left'   = mult' . (one' <*> id) . lambda'
>     law1_middle' = id
>     law1_right'  = mult' . (id <*> one') . rho'

>     law2_left',law2_right' :: ((m :<*> m) :<*> m) a -> m a
>     law2_left'  = mult' . (mult' <*> id)
>     law2_right' = mult' . (id <*> mult') . alpha'

And here's the punch line. That's precisely a monad, laws 'n' all.

If you want, you can unpack the definitions above and see that they correspond to the usual notion of a monad. We can write code to do the translation:


> data Monad m => TranslateMonad m a = TM { unTM :: m a } deriving (Eq,Show)

> translate :: Monad m => m a -> TranslateMonad m a
> translate x = TM x

> instance (Monad m,Functor m) => Functor (TranslateMonad m) where
>     fmap f (TM x) = TM (fmap f x)

> instance (Functor m,Monad m) => Monoid' (TranslateMonad m) where
>     one' (Id x) = TM $ return x
>     mult' (TM x) = TM $ fmap unTM x >>= id

In other words, any instance of Monad gives an instance of Monoid'. I'll let you write the reverse mapping. We can even check (not prove!) the laws are satisfied by a particular monad by using QuickCheck:


> instance Arbitrary a => Arbitrary (Id a) where
>     arbitrary = liftM Id arbitrary

> instance (Monad m,Eq (m a),Arbitrary (m a)) => Arbitrary (TranslateMonad m a) where
>     arbitrary = liftM TM arbitrary

> check4 = quickCheck $ \n -> law1_left' n == law1_middle' (n :: TranslateMonad [] Int)
> check5 = quickCheck $ \n -> law1_left' n == law1_right' (n :: TranslateMonad [] Int)
> check6 = quickCheck $ \n -> law2_left' n == (law2_right' n :: TranslateMonad [] Int)

I don't know about you, but I find this mind-blowing. We took a definition of a simple concept, the monoid. We then abstractificated its definition so that it applied to any monoidal category, and in doing so we completely changed the meaning of one and mult. And yet the resulting object, including its laws, are precisely what you need to solve a whole host of problems in algebra and Haskell programming. This is an amazing example of the unreasonable effectiveness of mathematics. The concept might be a little tricky to grasp: monads are like ordinary monoids, but with outer and inner replacing left and right. But the payoff from this is that intuitions about monoids carry over to monads. I hope to say more about this in future episodes.

Appendix

Consider the category, H, with one object, Hask, and whose arrows are the endofunctors on Hask. We've shown how the arrows on this category aren't just a set, they themselves form a category. This makes H a 2-category. A category with one object is a monoid. But this is a 2-category, so we have a 2-monoid. In fact, there are some extra details required to show we have a 2-category. For example, we need to show that composition of arrows (which now form a category, not a set) is a functor (not just a function). That follows from the Interchange Law which I've already talked about. But note that H is a monoid in a completely conventional way, its arrows form a set with a binary operator on them. This is not the structure that corresponds to monads, although it plays a part in constructing them. Also, don't confuse this monoid with the one that appears in a MonadPlus.

Confession

I'm having trouble giving <*> the correct type signature. I think it should be something like (forall x.a x -> c x) -> (forall x.b x -> d x) -> (forall x.a (b x) -> c (d x)). GHC doesn't like it. Can anyone come up with the correct thing? The code still works.

Links

Another account of the same subject, modulo some details.

Saturday, October 25, 2008

Operads and their Monads

Hardly a day goes by at the n-Category Cafe without operads being mentioned. So it's time to write some code illustrating them.


> {-# LANGUAGE FlexibleInstances #-}

> import Data.Monoid
> import Control.Monad.Writer
> import Control.Monad.State

Let's define a simple tree type:


> data Tree a = Leaf a | Tree [Tree a] deriving (Eq,Show)

Sometimes we want to apply a function to every element of the tree. That's provided by the fmap member of the Functor type class.


> instance Functor Tree where
>     fmap f (Leaf x) = Leaf (f x)
>     fmap f (Tree ts) = Tree $ map (fmap f) ts

But just as we can't use map to apply monadic functions to a list (we'd write mapM print [1,2,3]), we can't use fmap to apply them to our tree. What we need is a monadic version of fmap. Here's a suitable type class:


> class FunctorM c where
>     fmapM :: Monad m => (a -> m b) -> c a -> m (c b)

(I could have used Data.Traversable but that entails Data.Foldable and that's too much work.)

And now we can implement a monadic version of Tree's Functor instance:


> instance FunctorM Tree where
>     fmapM f (Leaf x) = do
>         y <- f x
>         return (Leaf y)
>     fmapM f (Tree ts) = do
>         ts' <- mapM (fmapM f) ts
>         return (Tree ts')

We can use fmapM to extract the list of elements of a container type:


> serialise a = runWriter $ fmapM putElement a
>               where putElement x = tell [x]

Not only does serialise suck out the elements of a container, it also spits out an empty husk in which all of the elements have been replaced by (). We can think of the latter as the 'shape' of the original structure with the original elements removed. We can formalise this as


> type Shape t = t ()

So we have:


> serialise :: FunctorM t => t a -> (Shape t,[a])

Keeping the shape around allows is to invert serialise to give:


> deserialise :: FunctorM t => Shape t -> [a] -> (t a, [a])
> deserialise t = runState (fmapM getElement t) where
>   getElement () = do
>       x:xs <- get
>       put xs
>       return x

(That's a bit like using the supply monad. This function also returns the leftovers.)

We can also write (apologies for the slightly cryptic use of the writer monad):


> size :: FunctorM t => t a -> Int
> size a = getSum $ execWriter $ fmapM incCounter a
>          where incCounter _ = tell (Sum 1)

Let's try an example. Here's an empty tree:


> tree1 = Tree [Tree [Leaf (),Leaf ()],Leaf ()]

We can pack a bunch of integers into it:


> ex1 = fst $ deserialise tree1 [1,2,3]

And get them back out again:


> ex2 = serialise ex1

serialise separates the shape from the data, something you can read lots more about at Barry Jay's web site.

Remember that trees are also monads:


> instance Monad Tree where
>     return x = Leaf x
>     t >>= f = join (fmap f t) where
>         join (Leaf t) = t
>         join (Tree ts) = Tree (fmap join ts)

The join operation for a tree grafts the elements of a tree of trees back into the tree.

Right, that's enough about trees and shapes for now.

Operads are a bit like the plumbing involved in installing a sprinkler system. Suppose you have a piece, A that splits a single pipe into two:


       ______
      /
     /   ____
____/   /
     A /
____   \
    \   \____
     \
      \______

And you have two more pipes B and C that look like this:


        ______
       /
      /   ____
     /   /
____/    \____
          
    B or C
____      ____
    \    /
     \   \____
      \
       \______

then you can 'compose' them to make a larger system like this:


                       ______
                      /
                     /   ____
                    /   /
          _________/    \____
         /            B  
        /   _______      ____
       /   /       \    /
      /   /         \   \____
     /   /           \
____/   /             \______
     A /
____   \               ______
    \   \             /
     \   \           /   ____
      \   \         /   /
       \   \_______/    \____
        \             C  
         \_________      ____
                   \    /
                    \   \____
                     \
                      \______

(Vim rectangular mode FTW!)

The important thing to note here is that as A had two outputs (or inputs, depending on your point of view) you can attach two more pieces, like B and C, directly to it.

Call the number of outlets the 'degree' of the system. If A has degree n then we can attach n more systems, A₁...A_n to it and the resulting system will have degree degree(A₁)+...+degree(A_n). We can write the result as A(A₁,...,A_n).

We also have the 'identity' pipe that looks like this:


_____________

  identity
_____________

Formally, an operad is a collection of objects, each of which has a 'degree' that's an integer n, n≥0, depending on your application), an identity element of degree 1, and a composition law:


> class Operad a where
>     degree :: a -> Int
>     identity :: a
>     o :: a -> [a] -> a

o is the composition operation. If f has degree n then we can apply it to a list of n more objects. So we only expect to evaluate f `o` fs successfully if degree f == length fs.

There are many identities we'd expect to hold. For example f `o` [identities,...,identity] == f, because adding plain sections of pipe has no effect. We also expect some associativity conditions coming from the fact that it doesn't matter what order we build a pipe assembly in, it'll still function the same way.

We can follow this pipe metaphor quite closely to define what I think of as the prototype Operad. A Fn a is a function that takes n inputs of type a and returns one of type a. As we can't easily introspect and find out how many arguments such a function expects, we also store the degree of the function with the function:


> data Fn a = F { deg::Int, fn::[a] -> a }

> instance Show a => Show (Fn a) where
>     show (F n _) = "<degree " ++ show n ++ " function>"

unconcat is a kind of inverse to concat. You give a list of integers and it chops up a list into pieces with lengths corresponding to your integers. We use this to unpack the arguments to f `o` gs into pieces suitable for the elements of gs to consume.


> unconcat [] [] = []
> unconcat (n:ns) xs = take n xs : unconcat ns (drop n xs)

> instance Operad (Fn a) where
>     degree = deg
>     f `o` gs = let n = sum (map degree gs)
>                in F n (fn f . zipWith fn gs . unconcat (map degree gs))
>     identity = F 1 head

Now compute an example, f(f₁,f₂,f₃) applied to [1,2,3]. It should give 1+1+2*3=8.


> ex3 = fn (f `o`  [f1,f2]) [1,2,3] where
>   f = F 2 (\[x,y] -> x+y)
>   f1 = F 1 (\[x] -> x+1)
>   f2 = F 2 (\[x,y] -> x*y)

(That's a lot like lambda calculus without names. Operads are a bit like n-ary combinators.)

Now I'm going to introduce a different looking operad. Think of V as representing schemes for dicing the real line. Here are some examples:


|0.25|0.25|   0.5   |

|0.1|0.1|    0.8    |

If A divides up the real line into n pieces then you could divide up each of the n pieces using their own schemes. This means that dicing schemes compose. So if we define A, B and C as:


A = |0.5|0.5|

B = |0.75|0.25|
C = |0.1|0.1|0.8|

Then A(B,C) is:


|0.375|0.125|0.05|0.05|0.4|

We could implement V as a list of real numbers, but it's more fun to generalise to any monoid and not worry about divisions summing to 1:


> data V m = V { unV :: [m] } deriving (Eq,Show)

This becomes an operad by allowing the monoid value in a 'parent' scheme multiply the values in a 'child'.


> instance Monoid m => Operad (V m) where
>     degree (V ps) = length ps
>     (V as) `o` bs = V $ op as (map unV bs) where
>         op [] [] = []
>         op (a:as) (b:bs) = map (mappend a) b ++ op as bs
>     identity = V [mempty]

For example, if d₁ cuts the real line in half, and d₂ cuts it into thirds, then d₁(d₁,d₂) will cut it into five pieces of lengths 1/4,1/4,1/6,1/6,1/6:


> ex4 = d1 `o` [d1,d2] where
>   d1 = V [Product (1/2),Product (1/2)]
>   d2 = V [Product (1/3),Product (1/3),Product (1/3)]

If the elements in a V are non-negative and sum to 1 we can think of them as probability distributions. The composition A(A₁,...,A_n) is the distribution of all possible outcomes you can get by selecting a value i in the range {1..n} using distribution A and then selecting a second
value conditionally from distribution A_i. We connect with the recent n-category post on entropy.

In fact we can compute the entropy of a distrbution as follows:


> h (V ps) = - (sum $ map (\(Product x) -> xlogx x) ps) where
>   xlogx 0 = 0
>   xlogx x = x*log x/log 2

We can now look at the 'aside' in that post. From an element of V we can produce a function that computes a corresponding linear combination (at least for Num types):


> linear (V ps) xs = sum $ zipWith (*) (map (\(Product x) -> x) ps) xs

We can now compute the entropy of a distribution in two different ways:


> (ex5,ex6) = (h (d1 `o` [d1,d2]),h d1 + linear d1 (map h [d1,d2])) where
>   d1 = V [Product 0.5,Product 0.5]
>   d2 = V [Product 0.25,Product 0.75]

Now according to this paper on operads we can build a monad from an operad. Here's the construction:


> data MonadWrapper op a = M { shape::op, value::[a] } deriving (Eq,Show)

(The field names aren't from the paper but they do give away what's actually going on...)

The idea is that an element of this construction consists of an element of the operad of degree n, and an n element list. It's a functor in an obvious way:


> instance Functor (MonadWrapper o) where
>     fmap f (M o xs) = M o (map f xs)

It's also a FunctorM:


> instance FunctorM (MonadWrapper o) where
>   fmapM f (M s c) = do
>       c' <- mapM f c
>       return $ M s c'

We can make the construction a monad as follows:


> instance Operad o => Monad (MonadWrapper o) where
>     return x = M identity [x]
>     p >>= f = join (fmap f p) where
>         join (M p xs) = M (p `o` map shape xs) (concatMap value xs)

Now for something to be a monad there are various laws that needs to be satisfied. These follow from the rules (which I haven't explicitly stated) for an operad. When I first looked at that paper I was confused - it seemed that the operad part and the list part didn't interact with each other. And then I suddenly realised what was happening. But hang on for a moment...

Tree shapes make nice operads. The composition rule just grafts child trees into the leaves of the parent:


> instance Operad (Tree ()) where
>     degree t = length (snd (serialise t))
>     identity = Leaf ()
>     t `o` ts = let (r,[]) = deserialise t ts in r >>= id

We can write that more generically so it works with more than trees:

> data OperadWrapper m = O { unO::Shape m }


> instance (FunctorM m,Monad m) => Operad (OperadWrapper m) where
>     degree (O t) = size t
>     identity = O (return ())
>     (O t) `o` ts = let (r,[]) = deserialise t (map unO ts) in O (r >>= id)

So let's use the construction above to make a monad. But what actually is this monad? Each element is a pair with (1) a tree shape of degree n and (2) an n-element list. In other words, it's just a serialised tree. We can define these isomorphisms to make that clearer:


> iso1 :: FunctorM t => t x -> MonadWrapper (t ()) x
> iso1 t = uncurry M (serialise t)

> iso2 :: FunctorM t => MonadWrapper (t ()) x -> t x
> iso2 (M shape contents) = let (tree,[]) = deserialise shape contents in tree

So, for example:


> ex7 = iso2 (iso1 tree) where
>   tree = Tree [Tree [Leaf "Birch",Leaf "Oak"],Leaf "Cypress",Leaf "Binary"]

That construction won't work for all monads, just those monads that come from operads. I'll leave you to characterise those.

And now we have it: a way to think about operads from a computational perspective. They're the shapes of certain monadic serialisable containers. Operadic composition is the just the same grafting operation used in the join operation, using values () as the graft points.

I have a few moments spare so let's actually do something with an operad. First we need the notion of a free operad. This is basically just a set of 'pipe' parts that we can stick together with no equations holding apart from those inherent in the definition of an operad. This is different from the V operad where many different ways of apply the o operator can result in the same result. We can use any set of parts, as long as we can associate an integer with each part:


> class Graded a where
>   grade :: a -> Int

The free operad structure is just a tree:


> data FreeOperad a = I | B a [FreeOperad a] deriving Show

I will be the identity, but it will also serve as a 'terminator' like the way there's always a [] at the end of a list.

An easy way to make a single part an element of an operad:


> b n = let d = grade n in B n (replicate d I)

Here's the instance:


> instance Graded a => Operad (FreeOperad a) where
>   degree I = 1
>   degree (B _ xs) = sum (map degree xs)
>   identity = I
>   I `o` [x] = x
>   B a bs `o` xs = let arities = map degree bs
>                   in B a $ zipWith o bs (unconcat arities xs)

Now I'm going to use this to make an operad and then a monad:


> instance Graded [a] where
>   grade = length

> type DecisionTree = MonadWrapper (FreeOperad [Float])

What we get is a lot like the probability monad except it doesn't give the final probabilities. Instead, it gives the actual tree of possibilities. (I must also point out this hpaste by wli.)


> test = do
>   a <- M (b [Product 0.5,Product 0.5]) [1,2]
>   b <- M (b [Product (1/3.0),Product (1/3.0),Product (1/3.0)]) [1,2,3]
>   return $ a+b

Now we can 'flattten' this tree so that the leaves have the final probabilities:


> flatten :: Monoid m => FreeOperad [m] -> V m
> flatten I = V [mempty]
> flatten (B ms fs) = V $ concat $ zipWith (map . mappend) ms (map (unV . flatten) fs)

This is a morphism of operads. (You can probably guess the definition of such a thing.) It induces a morphism of monads:


> liftOp :: (Operad a,Operad b) => (a -> b) -> MonadWrapper a x -> MonadWrapper b x
> liftOp f (M shape values) = M (f shape) values

liftOp flatten test will give you the final probabilities.

There may just be a possible application of this stuff. The point of separating shape from data is performance. You can store all of your data in flat arrays and do most of your work there. It means you can write fast tight loops and only rebuild the original datatype if needed at the end. If you're lucky you can precompute the shape of the result, allowing you to preallocate a suitable chunk of memory for your final answer to go into. What the operad does is allow you to extend this idea to monadic computations, for suitable monads. If the 'shape' of the computation is independent of the details of the computation, you can use an operad to compute that shape, and then compute the contents of the corresponding array separately. If you look at the instance for MonadWrapper you'll see that the part of the computation that deals with the data is simply a concatMap.

BTW In some papers the definition restricts the degree to ≥1. But that's less convenient for computer science applications. If it really bothers you then you can limit yourself to thinking about containers that contain at least one element.

Saturday, October 18, 2008

What's the use of a transfinite ordinal?

This post is going to talk about one player games (which I'll call 1PGs, or just games). They seem pretty stupid at first, but it turns out that they have a lot of rich and applicable theory. The idea is that at any position in a 1PG there is a set of new positions that you can move to. You lose when you have no legal moves and the goal is to avoid losing for as long as you can. For convenience, let's identify your position in a 1PG with the set of positions you could move to. A position in a 1PG is just another 1PG. So we can give the following recursive formal definition of a 1PG:

A one player game is either the losing position with no legal moves, denoted by ∅, or a set of one player games.

A sequence of legal moves in a 1PG, say x, is then a sequence of the form x ∋x₀ ∋x₁ ∋x₂ ∋....

Did you notice that I sneaked in a really big restriction on 1PGs in that definition? My definition means that 1PGs are the same thing as sets. And sets are well-founded meaning that any sequence of the form x ∋x_0 ∋x₁ ∋x₂ ∋... eventually terminates. So my definition implicitly contains the restriction that you always eventually lose. That's fine for my purposes.

Now suppose that x ∋y ∋z is a legal sequence of plays in some 1PG. You might as well assume that z is in x. The reason is that if z were a legal move from x, you wouldn't take it, because in order to delay the inevitable you're going to prefer to go via y. So we'll assume that any game x is "transitively closed", ie. if x∋y and y∋z then x∋z.

Now I want to restrict things even further to the most boring kind of 1PG of all, games where the successor moves are all totally ordered. Intuitively what I mean is that you can think of every move in the game as simply moving you along a 'line'. Given any two positions in the game, x and y, either x∋y, or y∈x, or x=y. In these games there are no branches, the only thing you can do in these games is delay the inevitable. A bit like life really. We'll also use the notation a<b to mean a∈b.

I'll call these games ordinal 1PGs. You may have noticed that this is precisely the same as the ordinary definition of an ordinal.

Let's call the game ∅ by the name 0. 0 is sudden death. Recursively define the game n={0,1,2,3,n-1}. In the game of n you have up to n steps to play before you die (though you could hurry things along if you wanted to).

But what about this game: ω={0,1,2,...}. There's nothing mysterious about it. In your first move you choose an integer, and then you get to delay the end for up to n moves. This is our first transfinite ordinal, but it just describes a 1PG that anyone can play and which is guaranteed to terminate after a finite number of moves. We can't say in advance how many moves it will take, and we can't even put an upper bound on the game length, but we do know it's finite.

Given two games, a and b, we can define the sum a+b. This is simply the game where you play through b first, and then play through a. Because of the transitivity condition above you could simply abandon b at any stage and then jump into a, but that's not a good thing to do if you're trying to stay alive for as long as possible. The first example of such a 1PG is the game ω+1. You get to make one move, then you choose an n, and then you get to play n.

What about 1+ω? Well you get to choose n, play n moves, and then play an extra move. But that's exactly like playing ω and making the move n+1. So 1+ω and ω are the same game. Clearly addition of 1PGs isn't commutative.

We can multiply two games as well. Given games a and b, ab is defined like this: we have game a and game b in front of us at the same time, a on the left and b on the right. At any turn we have a choice: make a move in the game on the left, or make a move in the game on the right, resetting the left game back to a again. It's like playing through b, but playing through a copy of a at each turn. Note how for integers a and b the game ab is just ab, where the former is game multiplication, and the latter is ordinary arithmetical multiplication.

We can also define exponentiation. We can define aⁿ, for finite integer n, in the obvious way as ω·ω·...·ω n times. Before getting onto more general exponentiation I need to classify ordinal games. There are three kinds. There's

(1) the zero game 0=∅.
(2) the games where there is one 'largest' next move in the game, the one you should take. These are games of the form a+1 and the next position in the game, if you're playing to survive, will be a. For example, the game 7 is 6+1.
(3) the games where there is no single best choice of next move. These are games where there is no largest next move, but instead an ever increasing sequence of possible next moves and you have to pick one. The first example is ω where you're forced to pick an n. These are called limit ordinals.

We can use this to recursively define a^b using the classification of b.

(1) If b=0 then by definition a^b=1 and that's that.
(2) if b=c+1, then a^b=a^c·a.
(3) if b is a limit ordinal then a move in a^b works like this: you pick a move in b, say c, and now you get to play a^c (or, by transitive closure, any smaller game).

For example, you play ω^ω like this: in your first move you pick an n, and then you play ωⁿ. Note how the state of play never becomes infinite, you're always playing with finite rules on a finite 'board'. This is an important point. Even though ω^ω3+22+ω+4 seems like it must be a very big kind of thing, it describes a finite process that always terminates.

So what use are these things?

Suppose you have an algorithm that runs one step at a time, with each step taking a finite time, and where the internal state of the program at step t is s_t. Suppose that you can find a map f from program states to the ordinals such that f(s_t+1)<f(s_t), ie. the program always decreases the ordinal associated to a state. Then the program must eventually terminate.

For example, suppose the program iterates 10 times. We can simply assign f(s_t)=10-t. Suppose instead your algorithm computes some number n at the first step, though you can't figure out what that number is (or maybe it accepts n as an input from a user), then we can use f(0)=ω and so on. We don't know in advance what n is, but that doesn't matter. Whatever n is entered, the program will terminate.

This technique is particularly useful for proving termination of rewrite rules. For example, with the rewrite rules generated by the Buchberger algorithm we can map the first step to ωⁿ, for some n. And if we don't know what n is, we can just start with ω^ω. If you take a peek at the papers of Nachum Dershowitz you'll see how he applies this technique to many other types of rewrite rule.

Transfinite ordinals are not as esoteric as they may first appear to be. Using them to prove termination goes back to Turing and Floyd but I learnt about the method from reading Dershowitz. (I think Floyd suggested using the method to prove termination of the standard rewrite rules for computing derivatives of expressions. It's tricky because differentiating a large product, say, can result in many new terms, so there's a race between the derivative 'knocking down' powers, and products causing growth.)

Probably the best example of using ordinals to prove the termination of a process is in the Hydra game. What's amazing about this game is that you need quite powerful axioms of arithmetic to prove termination.

One more thing. All of this theory generalises to two player games.

Update: I think mjd may have been about to write on the same thing. On the other hand, though he mentions program termination, he says of it "But none of that is what I was planning to discuss". So maybe what I've written can be seen as complementary.

Sunday, October 12, 2008

Untangling with Continued Fractions: part 5


> {-# LANGUAGE MultiParamTypeClasses,FunctionalDependencies,FlexibleInstances,GeneralizedNewtypeDeriving #-}

> module Main where

> import qualified Data.Map as M
> import Control.Monad
> import Prelude
> import Ratio hiding (numerator,denominator)

> infixl 5 .+
> infixl 6 .*

So far I've described a way to use monad notation to represent knots, links and tangles, and I've provided a way to implement a monad so that the notation gives rise to the bracket polynomial when applied to a knot. What happens when we do the same with a tangle?

A tangle corresponds to something with this type:


> type Tangle = (Bool,Bool) -> KnotMonad (Bool,Bool)

Here are two examples of tangles, ie. expressions with this type:


> infinity (i,j) = return (i,j)
> zero (i,j) = do
>   cup (i,j)
>   cap ()

These correspond to these tangles:

And the names of those expressions are the corresponding extended rational values.

Now here's a surprising but nontrivial fact that holds when T is a rational tangle: if we set A²=i (where i²=-1) then the value of the monadic expression corresponding to the tangle is of the form a<[∞]>+b<[0]> where the [r] are the rational tangles above, and <T> is the bracket polynomial (evaluated at A²=i). The rational associated to the original tangle is then given by -ia/b. I believe this discovery is, like many other things I've talked about in this blog, due to Conway.

We now have the makings of a complete algorithm. Using the polynomial monad defined in the previous installment we evaluate our tangle. We then compute the coefficients a and b, which are polynomials in A. Now compute a and b at A²=-i, and then compute -ia/b and use the operations z→z+1, z→z-1 and z→-1/z to reduce this rational to zero, interpreting these operations as twist, antitwist and rotation respectively (as I described here).

There's a shortcut I'm going to take but I don't have a proof it's guaranteed to work so it's risky. Instead of computing polynomials in A, doing polynomial division, and then evaluating at A²=-i, I'm going to do everything from the first step with the assumption that A²=-1. (The catch is that if a and b are polynomials in A, then a and b may have a common divisor that cancels when computing a/b. If we substitute A²=-1 from the beginning we get division of zero by zero. I don't know for sure if this can't happen. It's not a big deal if it does though, I just need to write a polynomial division routine.)

Now if A²=-1 then we could choose A=(1/2)^1/2+(1/2)^1/2i. That suggests using real arithmetic but we need to extract results that are rational numbers. So instead introduce the field K=Q((1/2)^1/2), ie. the field of numbers of the form u+v(1/2)^1/2. We can implement this as:


> data K = K { re2::Rational, im2::Rational } deriving (Eq,Show)

> instance Num K where
>   K a b + K a' b' = K (a+a') (b+b')
>   K a b * K a' b' = K (a*a'+2*b*b') (a*b'+a'*b)
>   negate (K a b) = K (negate a) (negate b)
>   abs _ = error ""
>   signum _ = error ""
>   fromInteger i = K (fromInteger i) 0

> instance Fractional K where
>   recip (K a b) = let r = recip (a*a-2*b) in K (r*a) (-r*b)
>   fromRational x = K x 0

We actually want to work with the field L=K(i), ie. complex numbers built from elements of K. We can implement this as:


> type L = Complex K

(Notice how it's not at all hard to build up algebraic structures like this in Haskell without a symbolic algebra package. For more powerful code in this vein, see David Amos's code).

Now we have our field, our monad is that of vector spaces over L:


> type KnotMonad = V L

Let's define some useful constants. The square root of 1/2, A, B=1/A and i:


> hr2 = K 0 (1%2)
> a = hr2 :+ hr2
> b = hr2 :+ (-hr2)
> i = 0 :+ 1

Here's the knot monad code I've presented before. (This version uses -|()|- in a way that may appear a bit useless. I'm just making explicit when we're working with a 1-dimensional vector space generated by -|()|-, which is much the same thing as the base field.)


> cup :: (Bool,Bool) -> KnotMonad ()
> cup (u,v) = case (u,v) of
>   (False,True) -> (-i * b) .* return ()
>   (True,False) -> (i * a) .* return ()
>   otherwise -> vzero

> cap :: () -> KnotMonad (Bool,Bool)
> cap () = (-i * b) .* return (False,True) .+ (i * a) .* return (True,False)

> over :: Tangle
> over (u,v) = a .* do
>   () <- cup (u,v)
>   cap ()
>   .+
>   b .* return (u,v)

> under :: Tangle
> under (u,v) = b .* do
>   () <- cup (u,v)
>   cap ()
>   .+
>   a .* return (u,v)

Here's where we work out the coefficients a and b. We treat a tangle as the function that it is and look at what happens when we apply it to two different values in (Bool,Bool). Applying to value' to infinity and zero shows that they do in fact compute a and b correctly:


> value' p =
>   let alpha = coefficient (False,False) $ p (False,False)
>       beta  = coefficient (True,False)  $ p (False,True)
>   in (alpha,beta)

From a and b we extract -ia/b as a rational:


> value :: Tangle -> Rational
> value p =
>   let (a,b) = value' p
>   in re2 $ realPart $ -i*a/b

And now we reduce to 0. This isn't the best algorithm for reducing rationals to zero. But it's what I wrote in my first attempt and I'm sticking with it for consistency with part one:


> steps :: Rational -> [String]
> steps 0 = []
> steps q | q<= -1 = "twist":steps (q+1)
>         | -1<q && q<1 = "rotate":steps (-1/q)
>         | q>=1 = "antitwist":steps (q-1)

> untangle t = steps (value t)

Here's the example from the first installment:


> example :: Tangle
> example (a,b) = do
>   (c,d) <- over (a,b)
>   (e,f) <- cap ()
>   (g,h) <- over (c,e)
>   (i,j) <- over (f,d)
>   (m,n) <- cap ()
>   (k,l) <- cap ()
>   (q,r) <- over (h,k)
>   (s,y) <- over (l,i)
>   (o,p) <- over (n,g)
>   (t,u) <- under (p,q)
>   (v,w) <- under (r,s)
>   (x,z) <- over (y,j)
>   cup (o,t)
>   cup (u,v)
>   cup (w,x)
>   return (m,z)

I'll leave you to type untangle example in ghci and refer back to my original video.

To find out more, and to read about the details I've left out, try the many papers by Kauffman, including this one.

That was hard! I didn't anticipate how much stuff I'd have to introduce to just sketch out how this works. It's one of those cases where the comments need to be about 30 or 40 times larger than the code and done properly, there's enough material to fill a good few chapters of a book. So I apologise if I skimped on some details along the way, and left out some needed explanations. But I hope that I've at least motivated you to take a peek at knot theory and appreciate how nicely tangles and monads play together.

And remember this is only code to illustrate some mathematics - it's inefficient and not how I'd really untangle big rational tangles! (For example, just using the monad associativity laws you can often rearrange the monad expressions for knots into equivalent forms that execute much quicker.)

The rest is similar to code I've used in previous installments:


> swap (x,y) = (y,x)

> class Num k => VectorSpace k v | v -> k where
>     vzero :: v
>     (.+) :: v -> v -> v
>     (.*) :: k -> v -> v
>     (.-) :: v -> v -> v
>     v1 .- v2 = v1 .+ ((-1).*v2)

> data V k a = V { unV :: [(k,a)] } deriving (Show)

> reduce x = filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ x

> instance Num k => Functor (V k) where
>     fmap f (V as) = V $ map (\(k,a) -> (k,f a)) as

> instance Num k => Monad (V k) where
>     return a = V [(1,a)]
>     x >>= f = join (fmap f x)
>         where join x = V $ concat $ fmap (uncurry scale) $ unV $ fmap unV x
>               scale k1 as = map (\(k2,a) -> (k1*k2,a)) as

> instance Num r => MonadPlus (V r) where
>     mzero = V []
>     mplus (V x) (V y) = V (x++y)

> instance (Num k,Ord a) => VectorSpace k (V k a) where
>     vzero = V []
>     V x .+ V y = V (x ++ y)
>     (.*) k = (>>= (\a -> V [(k,a)]))

> coefficient b (V bs) = maybe 0 id (lookup b (map swap (reduce bs)))

> data Complex a = (:+) { realPart :: a, imagPart :: a } deriving (Eq,Show)

> instance Num a => Num (Complex a) where
>   (a :+ b) + (a' :+ b') = (a+a') :+ (b+b')
>   (a :+ b) * (a' :+ b') = (a*a'-b*b') :+ (a*b'+a'*b)
>   negate (a :+ b) = (-a) :+ (-b)
>   fromInteger n = fromInteger n :+ 0
>   abs (a :+ b) = undefined
>   signum (a :+ b) = undefined

> instance Fractional a => Fractional (Complex a) where
>   recip (a :+ b) = let r = recip (a*a+b*b) in ((a*r) :+ (-b*r))
>   fromRational q = fromRational q :+ 0

Friday, September 26, 2008

On writing Python one-liners.


# I'm off to the Oregon Shakespeare Festival for the weekend so no
# time to finish my Untangling series. But I do have time to post this...

# The other day someone asked me if it was possible to implement a
# certain task in one line of Python. My response was "yes, but you
# wouldn't want to." You can think of what follows as the demonstration
# that you really don't want to.

# The problem is, there are lots of Python constructions you can't
# do on one line: assigning variables, defining functions, using
# conditionals, recursion and so on. So it appears there's a limit
# to what you can do. Amazingly, all of these problems can be solved
# with the use of lambdas.

# One place you often want one line lambdas is in the use of the map
# function. So let's stick with that as an example and warm up with
# an easy example. Lets say we want to add 1 to all of the numbers
# from 0 to 19. We can write:

def inc(n):
    return n+1

print "1.",map(inc,range(0,20))

# Lambdas enable you to write that as a one-liner:

print "1.",map(lambda x:x+1,range(0,20))

# Now what happens if we want to apply a function that uses a local
# (constant) variable. For example:

def f1(n):
    m = 2*n+1
    return m+m*m

print "2.",map(f1,range(0,20))

# We can eliminate the assigment by means of a function. m changes from
# being a local to a function argument:

def f2(n):
    def g(m):
        return m+m*m
    return g(2*n+1)

print "2.",map(f2,range(0,20))

# But that seems to make things worse because now we have a function
# definition instead of a variable assigment. But we can eliminate
# that with a lambda:

def f3(n):
    return (lambda m:m+m*m)(2*n+1)

print "2.",map(f3,range(0,20))

# And now we can write the whole thing as one big lambda:

print "2.",map(lambda n:(lambda m:m+m*m)(2*n+1),range(0,20))

# But what happens if we want to define a bunch of local functions
# inside our lambda? We can use the same technique. We (1) use
# lambdas to allow us to fake local variable definitions and
# (2) use lambdas to define the functions. So we can replace this:

def f(n):
    def g(n):
        return 2*n+1

    def h(n):
        return 3*n-2

    return g(h(g(h(n))))

print "3.",map(f,range(0,20))

# with this:

print "3.",map(lambda n:(lambda g,h,n:g(h(g(h(n)))))(lambda n:2*n+1,lambda n:3*n-2,n),range(0,20))

# We're passing in lambdas as arguments to a lambda so that they
# get bound to g and h.

# But what happens if we want a recursive definition like:

def fact(n):
    if n<=1:
        return 1
    else:
        return n*fact(n-1)

print "4.",map(fact,range(0,20))

# Because the function is bound to the name fact, we can refer to
# fact by name inside the defintion of fact. How can we do this in
# a one-line lambda? We can start by eliminating the self-reference
# in a fairly standard way. First rewrite fact so it doesn't call
# itself, but calls a function passed in as an argument:

def urfact(f,n):
    if n<=1:
        return 1
    else:
        return n*f(f,n-1)

# We can then make a factorial function by calling urfact with itself,
# allowing it to carry out the recursion:

def fact1(n):
    return urfact(urfact,n)

print "4.",map(fact1,range(0,20))

# We can use urfact directly without fact1:

print "4.",map(lambda n:urfact(urfact,n),range(0,20))

# Now all we need is a one-liner definition of urfact. It's tempting to try:

def urfact1(f,n):
    return {True:1,False:n*f(f,n-1)}[n<=1]

# But that results in non-termination because it evaluates *both* branches
# of the conditional. In a lazy language like Haskell, the branch not taken
# wouldn't be evaluated. But Python can be made lazy by using...you
# guessed it...lambda. The expression lambda:x is like a lazy version
# of the expression x. It doesn't get evaluated until you apply it as a
# function of no arguments, at which time it takes on the value x. So we
# can rewrite urfact as:

def urfact2(f,n):
    return {True:lambda:1,False:lambda:n*f(f,n-1)}[n<=1]()

# And we get

print "4.",map(lambda n:urfact2(urfact2,n),range(0,20))

# Note how we use urfact2 twice, so we need the trick above for local
# definitions to get it down to one reference:

print "4.",map(lambda n:(lambda f:f(f,n))(urfact2),range(0,20))

# And now we can replace urfact2 with a lambda

print "4.",map(lambda n:(lambda f:f(f,n))(lambda f,n:{True:lambda:1,False:lambda:n*f(f,n-1)}[n<=1]()),range(0,20))

# And that's enough for now. I expect that if we keep going we'll find 
# way to rewrite any Python program as a one liner. You could probably
# even write a program to automatically 'compile' a usable subset of
# Python into one liners.

# But you probably don't want to do that.

Saturday, September 20, 2008

Untangling with Continued Fractions: Part 4


> {-# LANGUAGE MultiParamTypeClasses,FunctionalDependencies,FlexibleInstances,GeneralizedNewtypeDeriving #-}

> module Main where

> import Test.QuickCheck
> import qualified Data.Map as M
> import Control.Monad
> import Ratio

> infixl 5 .+
> infixl 6 .*

As I've mentioned previously, we need to choose values for our pieces of knot or tangle so that when they're combined together we get something that tells us about the knot or tangle, not the diagram. For example, suppose our diagram contains this piece:

That piece is isotopic to:

We want any kind of evaluation of our diagram to be independent of which one of these we've chosen. In other words, among other things we need this function:


> cupcap i = do
>   (j,k) <- cap
>   cup (i,j)
>   return k

to equal this function:


> straight i = do
>   return i

I've already said how in the vector space monad, these correspond to summations, so really we want these two expressions to be equal (using the summation convention):

M^ijN_jk=δⁱ_k

(δⁱ_k is 1 if i=k and zero otherwise, ie. the identity matrix.)
The left hand side is just the rule for multiplying matrices. So we require M to be the inverse of N.

Now we can start on some real knot theory. There is a collection of 'moves' we can perform on a diagram called Reidemeister moves. Here are diagrams illustrating them:

There are three types of move that we call type I, type II and type III.

These diagrams are intended to show changes we can make to a part of a bigger knot or tangle. It's not hard to see that these are isotopies, they're just simple operations we can perform on a knot or tangle by moving the string a little bit. If you perform one of these moves on a knot or tangle then you should end up with merely a different diagram representing the same knot or tangle. Each of these corresponds to an equality of monadic expression, or an equality of summations. But the really important part is that these are all you need to consider. Every isotopy can be constructed out of these moves. So if we can satisfy the corresponding equalities of expressions then we automatically have a knot invariant that tells us about the underlying knot or tangle, not just the diagram.

Unfortunately, there's a catch. Satisfying the equality for a type I move is too onerous. So we'll just concentrate on type II and III. But the good news is that it doesn't matter, there are workarounds.

Let a be an unknown complex number. We're going to build expressions from a but they're all going to be polynomials in a and its reciprocal, like a+a^2-3/a. These are known as Laurent polynomials and I pointed out recently that I can use the Group type class to represent them.


> i = 0 :+ 1
> newtype Z = Z Int deriving (Eq,Ord,Show,Num)
> type Poly = V (Complex Rational) Z

In other words, elements of Poly are weighted linear combinations of integers. We can think of each integer n as representing aⁿ and the weights being coefficients of these powers. Amazingly the group multiplication multiplies these things out correctly.

a and ia represent a and a^-1.


> a = return (Z 1) :: Poly
> ia = return (Z (-1)) :: Poly

We can now satisfy the type II and III Reidemeister moves with these definitions:


> cup :: (Bool,Bool) -> V Poly ()
> cup (u,v) = case (u,v) of
>   (False,True) -> (-i .* ia) .* return ()
>   (True,False) -> (i .* a) .* return ()
>   otherwise -> zero

> cap :: V Poly (Bool,Bool)
> cap = (-i .* ia) .* return (False,True) .+ (i .* a) .* return (True,False)

> over :: (Bool,Bool) -> V Poly (Bool,Bool)
> over (u,v) = a .* do
>   () <- cup (u,v)
>   cap
>   .+
>   ia .* return (u,v)

> under :: (Bool,Bool) -> V Poly (Bool,Bool)
> under (u,v) = ia .* do
>   () <- cup (u,v)
>   cap
>   .+
>   a .* return (u,v)

(I'm making the V monad do double duty here. I'm using it to represent the Laurent polynomials, but I'm also using it to represent vectors over the Laurent polynomials.)

For example, here's what a type III move looks like if we try to arrange things nicely in terms of our standard 'components':

That gives rise to the two functions


> left (i,j,k) = do
>   (l,m) <- under (i,j)
>   (n,o) <- over (m,k)
>   (p,q) <- over (l,n)
>   return (p,q,o)

and


> right (i,j,k) = do
>   (l,m) <- over (j,k)
>   (n,o) <- over (i,l)
>   (p,q) <- under (o,m)
>   return (n,p,q)

whose equality can be tested with


> test1 = quickCheck $ \(x,y,z) -> left (x,y,z)==right (x,y,z)

I'll leave checking the type II rule as an exercise.

With these defintions we can take any knot diagram and compute a value for it. Note that because a not has no loose ends we have no inputs and no outputs so the result won't be a function but a value, a polynomial in a and its reciprocal. This is known as the Bracket Polynomial.

As mentioned above, the bracket polynomial fails to be a knot invariant. Two diagrams related by type I moves will have different bracket polynomials. With a little bit of sleight of hand we can apply a fudge factor to l with those extra loops and we end up with the Jones polynomial. As that's not my main goal here, I'll let you read how to use the writhe get from the bracket polynomial to the Jones polynomial at Wikipedia. But do I want to make another digression.

It's tricky to define knot invariants. The problems is that knots are topologically invariant in the sense that you can deform a knot in lots of ways and it's still the same knot. This rules out most attempts at using the geometry of a knot to define an invariant. Just about any geometrical feature you find in a particular configuration will be ruined by deforming the knot. But here's an idea: suppose you could compute some geometrical property from a particular curve representing a knot and then find the average of that value for all possible curves representing the same knot. Then we'd get a knot invariant as it wouldn't depend on a particular choice of representative curve. But there's a catch. We need to average over an infinite space, the space of all curves representing a knot. On the other hand, there is a perfectly good tool for averaging over infinite sets: integration. But there's another catch: the space of all curves representing a knot is very large, infinite dimensional in fact, and we don't have very good theory for how to do this. At least mathematicians don't. But physicists integrate over spaces of paths all the time, ever since Feynman came up with the path integral. Amazingly, Ed Witten showed that tere was already a physical model that fit the bill: Chern-Simons theory. In fact, Witten showed that the Jones polynomial, which as I've already mentioned came originally out of statistical mechanics, could be derived from this theory.

Anyway, enough of that digression. In the next installment I'll show how the above can be modified for use with tangles and give my final algorithm.

In the last installment (I think) I'll show how we can derive the rational number for a tangle using the bracket polynomial.

The following is the 'library' behind the code above. Most of it is simply built up from what I've said in recent weeks.


> swap (x,y) = (y,x)

> class Num k => VectorSpace k v | v -> k where
>     zero :: v
>     (.+) :: v -> v -> v
>     (.*) :: k -> v -> v
>     (.-) :: v -> v -> v
>     v1 .- v2 = v1 .+ ((-1).*v2)

> data V k a = V { unV :: [(k,a)] } deriving (Show)

> reduce x = filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ x

> instance (Ord a,Num k) => Eq (V k a) where
>   V x==V y = reduce x==reduce y

> instance (Ord a,Num k,Ord k) => Ord (V k a) where
>   compare (V x) (V y) = compare (reduce x) (reduce y)

> instance Num k => Functor (V k) where
>     fmap f (V as) = V $ map (\(k,a) -> (k,f a)) as

> instance Num k => Monad (V k) where
>     return a = V [(1,a)]
>     x >>= f = join (fmap f x)
>         where join x = V $ concat $ fmap (uncurry scale) $ unV $ fmap unV x
>               scale k1 as = map (\(k2,a) -> (k1*k2,a)) as

> instance Num r => MonadPlus (V r) where
>     mzero = V []
>     mplus (V x) (V y) = V (x++y)

> instance (Num k,Ord a) => VectorSpace k (V k a) where
>     zero = V []
>     V x .+ V y = V (x ++ y)
>     (.*) k = (>>= (\a -> V [(k,a)]))

> e = return :: Num k => a -> V k a
> coefficient b (V bs) = maybe 0 id (lookup b (map swap (reduce bs)))

> diag a = (a,a)
> both f g (a,b) = (f a,g b)

> class Group m a where
>     unit :: () -> m a
>     counit :: a -> m ()
>     mult :: (a,a) -> m a
>     comult :: a -> m (a,a)
>     anti :: a -> m a

> instance Monad m => Group m Z where
>     unit _ = return (Z 0)
>     counit _ = return ()
>     mult = return . uncurry (+)
>     comult = return . diag
>     anti = return . negate

> newtype Identity a = I a deriving (Eq,Ord,Show)

> instance Monad Identity where
>     return x = I x
>     I x >>= f = f x

> data K = K Rational Rational deriving (Eq,Show)

> instance Num K where
>   K a b + K a' b' = K (a+a') (b+b')
>   K a b * K a' b' = K (a*a'+2*b*b') (a*b'+a'*b)
>   negate (K a b) = K (negate a) (negate b)
>   abs _ = error ""
>   signum _ = error ""
>   fromInteger i = K (fromInteger i) 0

> instance Num k => Num (V k Z) where
>   a + b = a .+ b
>   a * b = do
>       u <- a
>       v <- b
>       mult (u,v)
>   negate a = (-1) .* a
>   fromInteger n = fromInteger n .* return 0
>   abs a = error ""
>   signum a = error ""

> data Complex a = (:+) { realPart :: a, imagPart :: a } deriving (Eq,Show)

> instance Num a => Num (Complex a) where
>   (a :+ b) + (a' :+ b') = (a+a') :+ (b+b')
>   (a :+ b) * (a' :+ b') = (a*a'-b*b') :+ (a*b'+a'*b)
>   negate (a :+ b) = (-a) :+ (-b)
>   fromInteger n = fromInteger n :+ 0
>   abs (a :+ b) = undefined
>   signum (a :+ b) = undefined

> instance Fractional a => Fractional (Complex a) where
>   recip (a :+ b) = let r = recip (a*a+b*b) in ((a*r) :+ (-b*r))
>   fromRational q = fromRational q :+ 0

> instance Ord (Complex Float) where
>   compare (a :+ b) (c :+ d) = compare (a,b) (c,d)

A Neighborhood of Infinity

Search This Blog