Sunday, March 23, 2008

Transforming a comonad with a monad

I can't yet read most of this paper on Combining a Monad and a Comonad because I've no experience with 2-categories. Nonetheless, there's something useful I can extract from the bits that do make sense - the idea of combining a monad with a comonad in the way that we already combine monads using monad transformers. In my previous post I hinted at the idea that the cobind operation for a comonad was a bit like an operation on a SIMD computer that allows every processor to get access to information local to every other processor. The idea now will be to allow those processors to also exploit monads to perform operations like IO.

First let me first reintroduce the code for the array comonads I considered before so that this post again becomes a self-contained literate Haskell post:

> {-# OPTIONS -fno-monomorphism-restriction #-}
> {-# LANGUAGE MultiParamTypeClasses,FlexibleInstances,Arrows #-}

> import Data.Array
> import Data.Foldable as F
> import Control.Monad
> import Control.Arrow

> data Pointer i e = P { index::i, array::Array i e } deriving Show

> bind f x = x >>= f

> class Functor w => Comonad w where
> (=>>) :: w a -> (w a -> b) -> w b
> x =>> f = fmap f (cojoin x)
> coreturn :: w a -> a
> cojoin :: w a -> w (w a)
> cojoin x = x =>> id
> cobind :: (w a -> b) -> w a -> w b
> cobind f x = x =>> f

> instance Ix i => Functor (Pointer i) where
> fmap f (P i a) = P i (fmap f a)

> instance Ix i => Comonad (Pointer i) where
> coreturn (P i a) = a!i
> P i a =>> f = P i $ listArray bds (fmap (f . flip P a) (range bds))
> where bds = bounds a

Remember how things work with monads. If m is a monad, then given a function a -> m b we can lift it to a function m a ->m b. Similarly, with a comonad w we can lift a function w a -> b to a function w a -> w b. In fact, I wrote about all this a while back.

Just like before, we can write a simple 'blur' function, except this time it also performs some I/O.

> wrap i = if i<0 then i+5 else if i>4 then i-5 else i

> blur (P i a) = do
> let k = wrap (i-1)
> let j = wrap (i+1)
> let s = 1.0*a!k + 2.0*a!i + 1.0*a!j
> print $ "sum = " ++ show s
> return s

Notice that blur is of the form w a -> m b. So how can we compose these things?

Consider g :: w a -> m b and f :: w b -> m c. Using bind and cobind we can 'lift' the head and tail of these functions:

bind g :: w a -> w (m b) and cobind f :: m (w b) -> m c.

We can almost compose these functions. But the catch is that we need a function from w (m b) to m (w b). Conveniently, there's a prototypical example of such a thing in the ghc libraries, the function sequence which distributes any monad over []. As we can easily move back and forth between arrays and lists we can write:

> class Distributes m w where
> distribute :: w (m a) -> m (w a)

> instance (Monad m,Ix i) => Distributes m (Pointer i) where
> distribute (P i ma) = do
> let bds = bounds ma
> a <- sequence (elems ma)
> return $ P i (listArray bds a)

And now we can compose these funny dual-sided Kleisli/coKleisli arrows. But before writing a composition function, note what we have. We're looking at functions that can make a-ish things to b-ish things, and we can compose them like functions directly from a to b. This is precisely the design pattern captured by arrows. So we can write:

> data A m w a b = A { runA :: w a -> m b }

> instance (Distributes m w,Monad m,Comonad w) => Arrow (A m w) where
> A g >>> A f = A $ bind f . distribute . cobind g
> first (A f) = A $ \x -> do
> u <- f (fmap fst x)
> return $ (u,coreturn (fmap snd x))
> pure f = A $ return . f . coreturn

Now check out page 1323 of Signals and Comonads by Uustalu and Vene. Honestly, I found that paper after I wrote the above code. It's almost line for line identical!

Anyway, now we can try an example. First some data to work on:

> x = P 0 $ listArray (0,4) (map return [0.0..4.0])

This is like last post's except that as I've now used a non-normalised blur operation I can show how arrow notation makes it easy to fix that:

> g = proc a -> do
> b <- A blur -< a
> n <- A blur -< 1
> returnA -< a-b/n

That's a very simple (and inefficient) high pass filter by the way. At the end of the day, however, we still need to be able to lift our arrows to act on x:

> liftCM :: (Distributes m w, Monad m, Comonad w) => A m w a b -> w (m a) -> w (m b)
> liftCM (A f) = cobind (\x -> distribute x >>= f)

> y = liftCM g x

And convert the result to something we can look at:

> result = sequence (toList (Main.array y))

It might not be quite the result you expected because we see a lot of I/O. Each computation that is tainted with I/O will carry that I/O with it. So every bit of I/O that was computed by any value that that went into the final result remains hanging round its neck until the bitter end, with duplications if the value was duplicated.

Of course there are lots of other monad/comonad pairs to try, so maybe there are some other interesting combinations lurking around.

Exercise: liftCM lifts an arrow A m w a b to a function w (m a) -> w (m b). You can also implement a lift to m (w a) -> m (w b). Implement it and figure out what it does.

Saturday, March 08, 2008

Comonadic Arrays

On haskell-cafe, ajb, aka Pseudonym, laments that many people don't have enough experience with comonads to recognise them. So I thought I'd mention a really simple example of a comonad that nevertheless captures the essence of a large class of comonads. It's conceptually not much different to my cellular automaton example (making this a bit of a rerun), but this should be easier to understand. And if it's too trivial, I hint at a particle physics connection towards the end.

Firstly, you can skip this paragraph if you don't want a quick bit of theoretical discussion. Consider arrays of fixed dimension. As types, they look something like XN for some fixed integer N. From a container we construct its zipper by applying X d/dX. In this case we get XNXN-1=NXN. In other words, the corresponding zipper is an array paired with an index into the array. We can stretch the meaning of comonad slightly to allow this to relate to arrays whose size isn't fixed.

So here's some code:

> import Data.Array

The usual definition of Comonad:

> class Functor w => Comonad w where
> (=>>) :: w a -> (w a -> b) -> w b
> coreturn :: w a -> a

And now a type that is a pair of an array and an index into that array:

> data Pointer i e = P i (Array i e) deriving Show

Think of it as an array with one of its elements singled out for special attention. It trivially inherits the functoriality of ordinary arrays:

> instance Ix i => Functor (Pointer i) where
> fmap f (P i a) = P i (fmap f a)

And now comes the Comonad implementation. coreturn serves to pop out the special element from its context - in other words it gives you the special element, whle throwing away the array it lived in. (=>>), on the other hand, applies a function f of type P i a -> b to the entire array. The function is applied to each element in turn, making each element the special element for long enough to apply f.

> instance Ix i => Comonad (Pointer i) where
> coreturn (P i a) = a!i
> P i a =>> f = P i $ listArray bds (fmap (f . flip P a) (range bds))
> where bds = bounds a

Compare with fmap for arrays. This walks through each element in turn, applying a function to each element, and return an array of results. The computation for each element is separate from all the others. With =>> however, the entire array may be used for the computation of each element of the result, with the index into the array serving to indicate which element it is we should be focussing on.

For example, here's an array of values:

> x = listArray (0,9) [0..9]

We want to consider this to be a circular array so that going off one end wraps around to the beginning:

> wrap i = if i<0 then i+10 else if i>9 then i-10 else i

Now here's a simple operation that 'blurs' the single ith pixel in the 1-D image represented by x:

> blur (P i a) = let
> k = wrap (i-1)
> j = wrap (i+1)
> in 0.25*a!k + 0.5*a!i + 0.25*a!j

We can apply this to the entire image thusly

> test1 = P 0 x =>> blur

Note the curious way I have to use P 0 x as an input to blur. There seems to be a redundancy here, we want the resulting array and don't care what the focussed element is. But =>> wants us to give it a focal point. Curiously it's making explicit something that's familiar to C programmers, but is slightly hidden in C. In C, you refer to an array of floats using a float *. But the same type points to elements of the array as well. So when you point to an array, you are, in effect, blurring the distinction between a pointer to an array and a pointer to the first element. Comonads make that distinction explicit.

Anyway, suppose you wanted to apply a sequence of operations. Adding, blurring, scaling, nonlinearly transforming and so on. You could write a pipeline like

> x ==> f = f x
> test2 = P 0 x ==> fmap (+1) =>> blur ==> fmap (*2) ==> fmap (\x -> x*x)

Note how ==> fmap f ==> fmap g = ==> fmap (g . f). If you think of fmap as farming out workloads to a SIMD processor with one thread applied to each array element, sequences of fmaps correspond to threads that can continue to work independently. The comonadic operations, however, correspond to steps
where the threads must synchronise and talk to each other. I believe this last statement explains the a cryptic comment in the comments to this blog entry.

One final thing going back to my optional paragraph above. If D means d/dX then we can see the operator XD as a kind of number operator. When you apply it to an array like XN the array becomes multiplied by a type corresponding to the array index type. For ordinary arrays, these are just integers. So you can see XD as a way of counting how many elements there are in a container. Also, for any container F, we also have the equations D(XF) = XDF+F, which we can write as DX=XD+1. At some point, when I have time, I'll point out how this is closely related to the Heisenberg uncertainty principle and how when we say that differentiation makes holes in a data type, it's related to the notion of a hole in solid state physics.

Oh, and I sketched this diagram but don't have time to write an explanation:

Labels: , , ,

Sunday, March 02, 2008

What does topology have to do with computability? Reloaded.

Now I’ve given the briefest of possible introduction to topology, I can get back to the issue of investigating the functions from () to ().

I introduced the idea of open sets as being the sets that can be observed by an idealised type of machine. So how about turning this notion onto itself. There are two possible outcomes from using one of our machines. It can respond “yes” or fail to respond at all. We can give these states the names ⊤ and ⊥ respectively. So the space of possible states is the set S = {⊤,⊥}, with the former meaning "yes". Now suppose we have a machine to watch machines. What are the possible observable subsets of S? Well we can easily build a machine that always responds ⊤ and we can also easily build a machine that always responds ⊥. So {} and {⊤,⊥} are observable. We can also build a machine that simply relays the state of the observed machine. That gives us the observable set {⊤}. But what about {⊥}?. Well there’s no way we can build a machine that responds with ⊤ when it sees ⊥ and with ⊥ when it sees ⊤. In order to respond ⊤ when it sees ⊥ it needs no know how long to wait before giving up. But the original definition of machine that I give says that we have no guarantee of how long a machine might take to respond. So no such machine is possible. In other words, the observable sets are {}, {⊤} and {{}, ⊤}, and this makes a topology on S. (Trivial exercise: check the axioms for this topology.)

S, equipped with this topology, is known as the Sierpinski space. There’s an interesting way to think about this. In Set Theory we have the notion of set membership, and membership is a binary valued thing. So the two element set {0,1} has a special role in that it gives a way to label the answers to the question “is x in the set y?:”. (In fact, this can be formalised nicely in category theory through the notion of a subobject classifier.) The Sierpinski space plays a similar role for sets when you only have a one-sided membership test. We can make this analogy precise:

Given a set X then any subset, U, of X, can be written as f-1({1}) where f is a function f:X->{0,1}. Conversely, any such function gives rise to a subset.

Given a topological space X, then any open subset U, of X, can be written as f-1({⊤}) where f is a continuous function f:X->S. Conversely, any such function gives rise to an open subset.

Now consider the continuous functions from S to S. As S has only two elements, there are only four possibilities to consider. (Almost trivial exercise: find the continuous functions. Remember, f is continuous if and only if f-1(U) is open for all open U.) It turns out there are precisely three functions. They are essentially the same functions that I listed here for the purely lazy Haskell case. So at least for this simple case, the computable functions are precisely the continuous ones. And that is now my second answer to the question a month or two ago. By assigning the correct topology to your types you can arrange that the computable functions are precisely the continuous functions. At this point you can turn the tools of topology to computer science and you find that terms like discrete, Hausdorff and compact, which are all standard terms from topology defined inn terms of open sets, all have computational content. And of course this machinery generalises from the simple example S to any other type. (There is a proviso due to the fact that in topology, arbitrary unions of open sets must be open, but this doesn’t carry over directly without a little bit of tweaking.)

One researcher making use of such ideas is Martin Escardo. When I started writing these posts I was looking at a bunch of papers by Escardo that I found hard to decipher. Since starting, however, I discovered that this document is actually a fairly easy read. So it’s not worth me saying much more except to tell you to read that. And also check out the work of Paul Taylor.

I can now give a summary of what the last three posts have all been about:

It’s tempting to think of functions in a programming language like Haskell as being like ordinary mathematical functions and try to embed types and Haskell functions directly into Set Theory as sets and functions. After all, Haskell functions don’t have side-effects and so seem to behave much like mathematical functions. But not all functions terminate in a finite time, so they aren’t quite like ordinary functions after all. We could think of them in terms of partial functions, but partial functions don’t have a very rich structure and certainly don’t automatically rule out examples like the non-continuous function from () to (). But topology *does* provide a good framework. This idea goes back a few years, to the work of people like Smyth, Plotkin and Scott. But I’ve been programming (at least as a hobby) for 30 years, and I got my PhD in Mathematics about 16 years ago, and yet I never knew that these topics were so connected. So this work isn’t getting as much publicity as it should, even among people who know a little topology and computer science.

And I’ll finish by mentioning an application. An important property that topological spaces can have is known as compactness. I can give a rough idea of what this means quite easily: consider a finite row of pigeonholes and an infinite number of letters. There is no way of placing all of the letters in the pigeon holes without one of the pigeonholes ending up with an infinite number of letters. This isn’t the case if you have an infinite row of pigeonholes. On the other hand, consider the task of placing an infinite sequence of zero sized dots on a closed interval on the real line. You won’t be forced to put two dots on the same point, but there will be at least one point in the interval where the dots are infinitely bunched together. For example, if you place dots at 1/2,1/4,1/8,1/16,... inside the interval [-1,1] then in any small interval around zero there will be infinitely many dots. There'll be such a point whatever scheme you try. So even though [-1,1] isn’t finite, it’s the ‘next best thing’. One important thing about compact sets is that there are often strategies to exhaustively search them. And that is precisely what my earlier post about exhaustive searches was discussing. So topological ideas give a nice way to get a handle on these non-obvious search algorithms, and characterise those types for which such searches are possible.

PS In my previous post I introduced the idea of 'observability' as an intuition pump for topology. I think that idea goes back to the work of Stephen Vickers. Brits might recognise that name. Check out the wikipedia page on him if you don't.