Saturday, November 29, 2008

An Approach to Algorithm Parallelisation

The other day I came across the paper Parallelizing Complex Scans and Reductions lying on a colleague's desk. The first part of the paper discussed how to make a certain family of algorithms run faster on parallel machines and the second half of the paper went on to show how, with some work, the method could be stretched to a wider class of algorithm. What the authors seemed to miss was that the extra work really wasn't necessary and the methods of the first half apply, with no change, to the second half. But don't take this as a criticism! I learnt a whole new way to approach algorithm design, and the trick to making the second half easy uses methods that have become popular in more recent years. Doing a web search I found lots of papers describing something similar to what I did.

This is also a nice example of how the notion of abstraction in computing and the notion of abstraction in mathematics are exactly the same thing. But I'm getting ahead of myself.

So here a direct translation from he paper of some procedural code to find the largest sum that can be made from a subsequence of a sequence. This will be our first implementation of the problem examined in the second half of the paper:


> f1 [] (sofar,max) = (sofar,max)
> f1 (b:bs) (sofar,max) =
> let sofar' = if sofar+b<0
> then 0
> else sofar+b
> max' = if max<sofar'
> then sofar'
> else max
> in f1 bs (sofar',max')


sofar is a running sum that is reset each time it dips below zero, and max keeps track of the best sum so far. We initialise sofar and sum with 0 and -infinity. Here's an example of how to ue it:


> b :: [Double]
> b = [1..5] ++ [5,4..(-10)] ++ [(-2)..6]

> infinity :: Double
> infinity = 1/0
> test1 b = snd $ f1 b (0,-infinity)


Notice how we prime the algorithm with a starting vector. The 0 corresponds to the fact that at the start we've summed over 0 elements and the -infinity corresponds to the fact that we want the first sum computed to be the highest so far at that point.

Test the code with test1 b. We'll use a similar pattern all the way through this code.

You may see the problem with making this parallelisable: we are maintaining running sums so that the final values of sofar and max all depend on what was computed earlier. It's not obvious that we can break this up into pieces. sofar computes sums of subsequences between resets but chopping the array b into pieces might split such subsequences between processors. How can we handle this cleanly?

The first step is to write version two of the above function using max instead of conditionals:


> f2 [] (sofar,max) = (sofar,max)
> f2 (b:bs) (sofar,max) =
> let sofar' = Prelude.max (sofar+b) 0
> max' = Prelude.max max sofar'
> in f2 bs (sofar',max')

> test2 b = snd $ f2 b (0,-infinity)


But that doesn't appear to make things any easier, we've just buried the conditionals inside max, it doesn't make the serial dependency go away.

So let's solve another problem instead. In f2 I'll replace max with addition and addition with multiplication. 0 is the identity for addition so we should replace it with the identity for multiplication, 1. Similarly, -infinity is the identity for max so we should replace that with 0. We get:


> f3 [] (sofar,max) = (sofar,max)
> f3 (b:bs) (sofar,max) =
> let sofar' = sofar*b+1
> max' = max+sofar'
> in f3 bs (sofar',max')

> test3 b = snd $ f3 b (1,0)


That's all very well but (1) it looks no more parallelisable and (2) it's solving the wrong problem. Let's ignore problem (2) for now.

The thing that makes f3 easier to work with is that it's almost a linear function acting on the vector (sofar,max). Linear functions have one very nice property. If f and g are linear then we can compute f(g(x)) by acting with g first, and then applying f. But we can also compose f and g without reference to x giving us another linear function. We only have to know how f and g act on basis elements and we can immediately compute how f . g acts on basis elements. This is usually expressed by writing f and g as matrices. So let's tweak f3 so it's linear in its last argument:


> f4 [] (sofar,max,i) = (sofar,max,i)
> f4 (b:bs) (sofar,max,i) =
> let sofar' = (sofar * b) + i
> max' = max + sofar'
> i' = i
> in f4 bs (sofar',max',i')

> test4 b = let (_,max,_) = f4 b (1,0,1) in max


So now I need to write some code to work with linear functions. I'll do it in a very direct style. Here are some tuples representing basis vectors. (I could have written some fancy vector/matrix code but I don't want to distract from the problem in hand.)


> x,y,z :: Num a => (a,a,a)
> x = (1,0,0)
> y = (0,1,0)
> z = (0,0,1)


And here's some code that computes how a function acts on a basis, in effect finding the matrix for our function with respect to the basis x,y,z:


> matrix f = (f x,f y,f z)


Some simple operations on vectors:


> (a,b,c) .+ (d,e,f) = (a + d,b + e,c + f)
> a .* (b,c,d) = (a * b,a * c,a * d)


And now a little function that, given how f acts on basis elements, can apply f to any vector:


> apply (mx,my,mz) (sofar,max,i) = (sofar .* mx) .+ (max .* my) .+ (i .* mz)


Now we can redo the calculation with f4 by first making the matrix for f4, and then applying that to our starting vector.


> test5 b = let m = matrix (f4 b)
> (_,max,_) = apply m (1,0,1)
> in max


Note how by time we've computed m we've done almost all of the work even though the code hasn't yet touched (1,0,1).

But now comes the first bit of magic. We can split our list b into pieces. Compute the corresponding matrix for each piece on a separate processor, and then apply the resulting matrices to our starting vector.

Let's chop our list of reals into pieces of size n:


> chop n [] = []
> chop n l = let (a,b) = splitAt n l in a : chop n b


We'll use pieces of size 10:


> test6 b = max where
> (_,max,_) = foldr ($) (1,0,1) (reverse b_functions) where
> b_pieces = chop 10 b


The following maps should be replaced with a parallel version. It's easy to do this.


> b_matrices = map (matrix . f4) b_pieces
> b_functions = map apply b_matrices


Great, we've successfully parallelised the code, but it's the wrong algorithm. How can we use this to solve the correct problem? Remember how we replaced max with addition and addition with multiplication. We just have to undo that. That's all! Everything required to prove that the above parallelisation is valid applies over any semiring. At no point did we divide or subtract, and we only used elementary properties of numbers like a*(b+c) = a*b+a*c. That property holds for max and addition. In fact a+max(b,c) = max(a+b,a+c). We don't even have to modify the code. We can just define the max-plus semiring as follows:


> newtype MaxPlus = M Double deriving (Eq,Show)

> instance Num MaxPlus where
> fromInteger 0 = M (-infinity)
> fromInteger 1 = M 0
> fromInteger _ = error "no conversion from general integer"
> M a + M b = M (max a b)
> M a * M b = M (a+b)
> abs _ = error "no abs"
> signum _ = error "no signum"
> negate _ = error "no negate"


And now all we need is


> test7 b = test6 (fmap M b)


(I wonder if ghc is smart enough to completely eliminate that fmap M, after all, on a newtype, M should do zero work.)

And that's a completely parallelised version of the original algorithm.

There is a ton of optimisation that can be performed here. In particular, matrix applies a function to a fixed basis. For the particular function we're using here there's a big win from constant folding. The same constant folding applies in any semiring.

And back to the point I made at the beginning. By abstracting from the reals to a general semiring we are able to make the same code perform multiple duties: it can work on functions linear over many semirings, not just the reals. Mathematicians don't work with abstractions just to make life hell for students - they do so because working with abstract entities allows the same words to be reused in a valid way in many different contexts. This benefits both mathematicians and computer scientists.

Here's a link you can use to find out more on this technique.

Note that in real world usage you wouldn't use lists. -|chop|- would take longer than the rest of the algorithm.

PS A curious aside. I spent ages trying to get ghc to compile this code and getting my homebrew HTML markup code to work reliably on it. But eventually I located the problem. I've just returned from Hawai`i where I wrote most of this code. Just for fun I'd put my keyboard into Hawai`ian mode and forgot about it. When I did that, the single quote key started generating the unicode symbol for the Hawai`ian glottal stop. It looks just like a regular single quote in my current terminal font so it was hard to see anything wrong with the code. But, quite reasonably, ghc and many other tools barf if you try to use one where a regular quote is expected.

Saturday, November 15, 2008

Some thoughts on reasoning and monads

Just a short note about some half-formed thoughts on the subject of monads and reasoning.

Haskell types and total functions, at least for a suitable subset of Haskell, form a category, with types and functions as the objects and arrows. Haskell code is usually full of expressions whose types don't correspond to arrows (e.g.. 1::Int). But in category theory we can only really talk of objects and arrows, not elements of objects. Many categories are made up of objects that have no reasonable notion of an element. By writing code in pointfree style we can eliminate reference to individual elements and talk only about arrows.

For example, suppose f :: A -> B and g :: B -> C. Then we can define h :: A -> C in a couple of different ways. Pointfree:

h = g . f

or pointful:

h x = let y = f x
z = g y
in z

For examples as simple as this it's not hard to write code that will translate between one style and the other. But the former definition has an advantage, it works in any category. We'll consider a very simple example. Let R be the category of real numbers, where there is an arrow from x to y if x≤y. We can think of f and g as arrows in this category. If we know A≤B and B≤C then the first definition above tells us how to construct an arrow from A to C and hence it proves that A≤C. The second definition makes no sense because it relies on the notion of x as an element of A and uses the functions f and g to construct elements of B and C. These words hold no meaning in the category R where the arrows aren't functions and the objects aren't containers of elements.

Except that's not quite true.

Because there is a scheme for translating the second pointful definition to pointfree form, the second definition does in fact provide a proof that A≤C. We just have to bear in mind that the proof needs to be translated into pointfree form first. In fact, we can happily spend our day using pointful style to generate proofs about R, as long as at the end of the day we translate our proofs to pointfree notation. In fact, Haskell programmers know that it's often much easier to write programs in pointful style so it seems reasonable to guess that there are many proofs that are easier to write in pointful style even though they can't be interpreted literally. Philosophically this is a bit weird. As long as we restrict ourselves to chains of reasoning that can be translated, we can use intuitions about elements of objects to make valid deductions about domains where these notions make no sense.

Part I of Lambek and Scott is about the correct pointful language to use when talking about cartesian closed categories (CCCs). They use a form of typed lambda calculus. Every arrow in a cartesian category can be written as a pointful lambda expression. Even though there isn't a meaningful way to assign meaning to the 'points' individually, every lambda abstraction gives an arrow in a cartesian closed category that can be built using the standard parts that come with a CCC, and vice versa. The pointful definition of h above is an example. Here's a (very) slightly less trivial example. Given f :: A -> C and g :: B -> D we can define h :: (A,C) -> Either B D as

f x = let y = fst x
z = Left y
in z

Again, that code can be translated into pointfree form using only the standard parts available in a cartesian closed category. Actually, we only need a category with products and coproducts for this example. So consider a lattice. The above code can be translated into a proof that A∩C≤B∪D. (I'm using cap and cup for join and meet.) Again, it makes no sense to talk of elements of a general lattice as containers of elements, and yet after translation to pointfree notation we have a valid proof. If you can restrict yourself to a suitable subset of set theory then your proofs involving elements of objects can carry over to many other categories.

Part II of Lambek and Scott is about taking this idea to extremes. It's about categories known as toposes. A topos is a category that is even more like the category of sets than a CCC. It's still general enough that there are many kinds of toposes, but you can use a sizable portion of first order logic and ZF to make deductions about them. Again, the literal notion of membership of the objects of a topos might make no sense, but the proofs have a translation to pointfree notation. In fact, it's possible to write entire papers in what looks like conventional set theory language, and have them be valid for other toposes. Anders Kock, for example, writes such papers. Chris Isham has been arguing that topos theory is the correct framework for physics. If you interpret your propositions as being in the category Set then you get classical physics. But those same propositions can be interpreted in other categories, such as one for quantum mechanics, giving a way to use and extend classical language to reason about quantum systems. This set theory-like language is known as the "internal language" of a topos.

Anyway, I'm interested in the notion that Haskell do-notation provides another kind of pointful language that can be used to reason about situations where points don't seem at first to make sense. Consider the lattice of subsets of a topological space, ordered by inclusion. Let Cl(U) be the closure of U. Cl satisfies these properties:

X⊂Y implies Cl(X)⊂Cl(Y)
X⊂Cl(X)
and
Cl(CL(X))⊂Cl(X)

Look familiar? If we make this lattice into a category, with arrows being inclusions, then the first property states that Cl is a functor and the next two say that Cl is a monad. In fact, monads are a kind of generalised closure. So now suppose we're given A⊂Cl(B) and B⊂Cl(C) and wish to prove that A⊂Cl(C). We can rephrase this by saying that if f :: A -> Cl B and g :: B -> Cl C we need an arrow h :: A -> Cl C. We can write one like this:

h x = do
y <- f x
z <- g y
return z

Now it's tempting to interpret the inclusions as functions with y <- f x saying that y is the image of x under the inclusion. (I don’t know about you, but when I write x <- getChar I think to myself “x is the return value from calling getChar”.) But that doesn't really work because the type of f x is Cl B but y is of type B. On the other hand, we can radically reinterpret the above as something like this: when arguing about chains of inclusions of subsets of a topological space, as long as at the end of the chain of inclusions you always end up in the closure of some subset, you're allowed to cheat and nudge a generic point in the closure of a subset back into the original subset. This is exactly parallel to the way do-notation seems to allow us to extract elements out of monadic objects as long as at the end of the do-block we always return an element of monadic type. I'm sure that with a bit of work we could produce a rigorous metatheorem from this. I also expect we can also produce something similar for comonads.

Anyway, the moral is that when working with categories with monads there are may be some interesting and unusual ways to reason. The example of the lattice of subsets is fairly trivial but I'm sure there are other interesting examples. I also expect there's a nice connection with modal logic. I now think of Haskell do-notation as the "internal language" of a category with monads.

Update: I left out the crucial sentences I meant to write. It's easy to see do-notation as a kind of Haskell specific trick for making things like IO heavy code look like traditional procedural code. But comparison with the theory in Lambek and Scott, and Topos Theory in general, makes it clear that do-notation is a member of a family of related languages.

Saturday, November 08, 2008

From Monoids to Monads

Generalising Monoids



The word 'monad' is derived from the word 'monoid'. The explanation usually given is that there is an analogy between monoids and monads. On the surface, this seems a bit unlikely. The join operation in a monad is supposed to correspond to the binary operator in the monoid, but join is a completely different kind of thing, certainly not a binary operator in any usual sense.

I'm going to make this analogy precise so that it's clear that both monoids and monads are examples of the same construction. In fact, I'm going to write some Haskell code to define monoids and monads in almost exactly the same way. I was surprised to find I could do this because instances of Haskell's Monoid and Monad aren't even the same kind of thing (where I'm using 'kind' in its technical sense). But it can be done.

So let's start thinking about monoids. They are traditionally sets equipped with a special element and a binary operator so that the special element acts as an identity for the binary operator, and where the binary operator is associative. We expect type signatures something like one :: m and mult :: m -> m -> m so that, for example, m (m a b) c == m a (m b c). That's fine as it stands, but it doesn't generalise easily. In particular it'd be nice to generalise this definition to other categories. To do that we need to rephrase the definitions so that they are completely point-free and are written purely as the compositions of arrows.

Let's start by thinking about the rule that says multiplying by the identity on the left should leave a value unchanged. Ie. we want mult one x == x for all x. We already have a problem, it refers to a couple of 'points', both one, the identity, and x the unknown. We can deal with the first one easily, we just replace one with an arrow () -> m. But we also need some plumbing to provide two arguments to the mult function. Rather than belabour the point, I'll just give the full code:


> {-# OPTIONS_GHC -fglasgow-exts #-}

> import Control.Monad
> import Test.QuickCheck

> class Monoid m where
> one :: () -> m
> mult :: (m,m) -> m


The law for multiplication on the left is then given by law1_left == law1_middle and so on:


> law1_left,law1_middle,law1_right :: m -> m
> law1_left = mult . (one <#> id) . lambda
> law1_middle = id
> law1_right = mult . (id <#> one) . rho


The associativity law is then given by law2_left x = law2_right x.


> law2_left,law2_right :: ((m,m),m) -> m
> law2_left = mult . (mult <#> id)
> law2_right = mult . (id <#> mult) . alpha


The left and right hand sides of the laws are now point-free. But in order to do this I've had to write some auxiliary functions:


> lambda :: a -> ((),a)
> lambda x = ((),x)

> rho :: a -> (a,())
> rho x = (x,())

> alpha :: ((a,b),c) -> (a,(b,c))
> alpha ((x,y),z) = (x,(y,z))


I've also used the fact that (,) is a bifunctor, ie. it's a functor in each of its arguments so (,) doesn't just give a way to generate a new type (a,b) from types a and b. It also combines pairs of arrows to make new arrows. I'll call the part of (,) that acts on arrows by the name <#>:


> (<#>) :: (a -> c) -> (b -> d) -> (a, b) -> (c, d)
> (f <#> g) (x,y) = (f x,g y)


Intuitively, f <#> g maps f on the left and g on the right of (a,b)

Try unpacking those definitions to see that we get the usual monoid laws. For example


law2_left ((x,y),z) == mult $ (mult <#> id) ((x,y),z) == mult (mult (x,y),z)
law2_right ((x,y),z) == mult $ (id <#> mult) $ alpha ((x,y),z) == mult (x,mult (y,z))


So we get


mult (mult (x,y),z) == mult (x,mult (y,z))


the usual associativity law.

Now that this definition is point-free it seems we could just carry it over to any category. In fact, we've implicitly done this because we've carried over the definition of a monoid from Set to Hask, the category of Haskell types and total functions. We're so used to treating Hask as a proxy for Set we can forget they are actually different categories. But this definition of a monoid works in both. But what about that lambda, rho and alpha? Well they're easy to define in any Cartesian closed category (CCC). But we don't need all of the features of a CCC to define a monoid, we just need lambda, rho and alpha and some kind of 'product' on the set of objects that also acts like a bifunctor. In fact, there's a bunch of 'obvious' laws that these functions satisfy in a CCC. Any category with these functions satisfying these same laws is called a monoidal category. The above definitions allow us to transfer the definition of a monoid to any such category. For the full definition, see the wikipedia entry.

Anyway, let's check to see if the type Int might be a monoid:


> instance Monoid Int where
> one _ = 1
> mult (a,b) = a*b

> check1 = quickCheck $ \n -> law1_left n == law1_middle (n :: Int)
> check2 = quickCheck $ \n -> law1_left n == law1_right (n :: Int)
> check3 = quickCheck $ \n -> law2_left n == (law2_right n :: Int)


Of course that's no proof, but it gives us some confidence. (Eg. what about large numbers close to 231...?)

Another Category in the World of Haskell



Most people around here will be familiar with how Haskell types and functions form a category Hask in the obvious way. But it's less well known that there is another category lurking in Haskell. Consider the set of all Haskell functors. These are endofunctors, ie. functors Hask→Hask. Between any two functors is the set of natural transformations between those functors. (If f and g are functors, then the polymorphic functions f a -> g a are the natural transformations.) In addition, you can compose natural transformations and the composition is associative. In other words, Haskell functors form a category. (See the appendix for more abstract nonsense relating to this.)

We'll call the category of endofunctors on Hask by the name E. Functors can be composed like so:


> type (f :<*> g) x = f (g x)


It'd be cool if this was a product in the usual categorical sense, but it isn't. There isn't a natural way to map to both f a and g a from f (g a) with the universal property of products. Instead it's a weaker type of product which is still a bifunctor. Here's how it acts on arrows (remembering that in E, arrows are natural transformations):


> (<*>) f g = f. fmap g


(We could also have used fmap g . f.) If you think of f <#> g as making f act on the left and g act on the right, then you can think of f <*> g as making f and g act on the outer and inner containers of a container of containers.

Here's the identity element for this product, the identity functor. It plays a role similar to () in Hask:


> data Id x = Id x deriving Show
> instance Functor Id where
> fmap f (Id x) = Id (f x)


We can now define some familiar looking natural transformations:


> lambda' :: Functor f => f a -> (Id :<*> f) a
> lambda' x = Id x

> rho' :: Functor f => f a -> (f :<*> Id) a
> rho' x = fmap Id x

> alpha' :: f (g (h a)) -> f (g (h a))
> alpha' = id


With these we have turned E into a monoidal category.

(OK, this may be confusing. We're 'multiplying' functors and we have associativity and a left- and right-identity. So functors form a monoid (modulo isomorphism). But that's a distraction, these are not the monoids that you're looking for. See the appendix for more on this.)

So now we can define monoids in this category using almost the same code as above:


> class Functor m => Monoid' m where
> one' :: Id a -> m a
> mult' :: (m :<*> m) a -> m a

> law1_left',law1_middle',law1_right' :: m a -> m a
> law1_left' = mult' . (one' <*> id) . lambda'
> law1_middle' = id
> law1_right' = mult' . (id <*> one') . rho'

> law2_left',law2_right' :: ((m :<*> m) :<*> m) a -> m a
> law2_left' = mult' . (mult' <*> id)
> law2_right' = mult' . (id <*> mult') . alpha'


And here's the punch line. That's precisely a monad, laws 'n' all.

If you want, you can unpack the definitions above and see that they correspond to the usual notion of a monad. We can write code to do the translation:


> data Monad m => TranslateMonad m a = TM { unTM :: m a } deriving (Eq,Show)

> translate :: Monad m => m a -> TranslateMonad m a
> translate x = TM x

> instance (Monad m,Functor m) => Functor (TranslateMonad m) where
> fmap f (TM x) = TM (fmap f x)

> instance (Functor m,Monad m) => Monoid' (TranslateMonad m) where
> one' (Id x) = TM $ return x
> mult' (TM x) = TM $ fmap unTM x >>= id


In other words, any instance of Monad gives an instance of Monoid'. I'll let you write the reverse mapping. We can even check (not prove!) the laws are satisfied by a particular monad by using QuickCheck:


> instance Arbitrary a => Arbitrary (Id a) where
> arbitrary = liftM Id arbitrary

> instance (Monad m,Eq (m a),Arbitrary (m a)) => Arbitrary (TranslateMonad m a) where
> arbitrary = liftM TM arbitrary

> check4 = quickCheck $ \n -> law1_left' n == law1_middle' (n :: TranslateMonad [] Int)
> check5 = quickCheck $ \n -> law1_left' n == law1_right' (n :: TranslateMonad [] Int)
> check6 = quickCheck $ \n -> law2_left' n == (law2_right' n :: TranslateMonad [] Int)


I don't know about you, but I find this mind-blowing. We took a definition of a simple concept, the monoid. We then abstractificated its definition so that it applied to any monoidal category, and in doing so we completely changed the meaning of one and mult. And yet the resulting object, including its laws, are precisely what you need to solve a whole host of problems in algebra and Haskell programming. This is an amazing example of the unreasonable effectiveness of mathematics. The concept might be a little tricky to grasp: monads are like ordinary monoids, but with outer and inner replacing left and right. But the payoff from this is that intuitions about monoids carry over to monads. I hope to say more about this in future episodes.

Appendix



Consider the category, H, with one object, Hask, and whose arrows are the endofunctors on Hask. We've shown how the arrows on this category aren't just a set, they themselves form a category. This makes H a 2-category. A category with one object is a monoid. But this is a 2-category, so we have a 2-monoid. In fact, there are some extra details required to show we have a 2-category. For example, we need to show that composition of arrows (which now form a category, not a set) is a functor (not just a function). That follows from the Interchange Law which I've already talked about. But note that H is a monoid in a completely conventional way, its arrows form a set with a binary operator on them. This is not the structure that corresponds to monads, although it plays a part in constructing them. Also, don't confuse this monoid with the one that appears in a MonadPlus.

Confession



I'm having trouble giving <*> the correct type signature. I think it should be something like (forall x.a x -> c x) -> (forall x.b x -> d x) -> (forall x.a (b x) -> c (d x)). GHC doesn't like it. Can anyone come up with the correct thing? The code still works.

Links


Another account of the same subject, modulo some details.