Saturday, June 14, 2008

Categories of polynomials and comonadic plumbing

Suppose you have a Haskell program and you want to introduce a new global constant into your program. There are at least two approaches you could take:

  1. Simply introduce a new global constant. You could name it x and write something like x=1.23456 and refer to x throughout your code. This has the advantage of being easy to implement.
  2. Write all of your code in monadic style and make use of the reader monad. This is intrusive in the sense that you may have to make many changes to your code to support it. But it has the advantage that all of your functions now explicitly become functions of your global constant.

Now I’m going to roughly sketch a more categorical view of both of these approaches. So let’s restrict ourselves to the subset of Haskell that corresponds to typed lambda calculus without general recursion so that we know all of our functions will be total and correspond to the mathematical notion of a function. Then all of our functions become arrows in the category that we’ll call Hask.

Firstly consider approach (1). Suppose we want to introduce a new constant, x, of type A. Category theory talks about arrows rather than elements of objects, so instead of introducing x of type A, introduce the function x:1->A where 1 is the terminal object in Hask, normally called (). An element of A is the same thing as an element of 1->A, but in the latter case we have an arrow in the category Hask.

Before continuing, let me digress to talk about polynomials. Suppose we have a ring (with an identity) R. We define R[x], where x is an indeterminate, to be the ring of polynomials in x. Another way to describe that is to say that R[x] is the smallest ring containing R and an indeterminate x, that makes no assumptions about x other than those required to make R[x] a ring. For example we know that (1+x)(1-x)=1-x2, because that must hold in any ring. Given a polynomial p in R[x] we can think of it as a function fp from R to R. fp(a) is the value we get when substituting the value of a for x in p. So a polynomial in R[x] is the same as a function from R to R that can be written in terms of elements of R, multiplication and addition.

We can do the same with category theory. Given a category A we can ask for the smallest category extending A and containing an indeterminate arrow x:1 -> A. Just as with polynomials we have to allow all possible arrows that can be made by composing arrows of A with x. The resulting expressions for arrows will contain x as a free variable, just like the way x appears in polynomials. In fact, by analogy we can call the resulting category, A[x], the category of polynomials in x:1->A. In the special case A=Hask, you can see that Hask[x] is the category of Haskell functions extended by a new constant of type x:1->A but assuming no equations other than those necessary to make Hask[x] a category. Just as an arrow in Hask is a Haskell function, an arrow in Hask[x] is a Haskell function making use of an as yet undefined constant x.

(I've glossed over some subtleties. Just as we need a suitable equivalence relation to ensure that (1+x)(1-x)=1-x2 in R[x], we need suitable equivalence relations in our category. I'll be showing you where to find the missing details later.)

Here's the implementation of a function, h, making use of a constant x:

(Note that I'll be using Edward Kmett's category-extras shortly so I need some imports)


> import Control.Monad.Reader
> import Control.Comonad
> import Control.Comonad.Reader

> x = 1.23456

> f a = 2*a+x
> g a = x*a
> h a = f (g a)

> test1 = h 2


Now consider the second approach. The easiest thing is to just give an implementation of the above using the reader monad:


> f' a = do
> x <- ask
> return $ 2*a+x

> g' a = do
> x <- ask
> return $ x*a

> h' a = return a >>= g' >>= f'

> test2 = runReader (h' 2) 1.23456


Note how, as is typical in monadic code, I have to plumb f' and g' together using >>= so that 1.23456 is passed through f' and g'. Previously I've described another way to think about the composition of monadic functions. Using >>= we can compose functions of type a->m b and b->m c to make a function of type a->m c. The result is that given a monad we can form the Kleisli category of the monad. The objects are the same as in Hask, but an arrow from a->b in the Kleisli category is an arrow of type a->m b in Hask. It's not hard to show this satisfies all of the axioms of a category. When we program in the reader monad it's a bit like we've stopped using Hask and switched to the Kleisli category of the reader monad. It's not quite like that because we used functions like +. But in theory we could use lifted versions of those functions too, and then we'd be programming by composing things in the Kleisli category. If we call the reader monad R then we can call the corresponding Kleisli category HaskR. (Strictly speaking that R needs a subscript telling is the type of the value we intend to ask for.)

So here's the important point: Hask[x] is the same category as HaskR. In both cases the arrows are things, which when supplied a value of the right type (like 1.23456), give arrows in Hask from their head object to their tail object.

But there's another way to do this. We can use the reader comonad:


> f'' a = 2*extract a+askC a
> g'' a = extract a*askC a
> h'' a = a =>> g'' =>> f''

> test3 = runCoreader (h'' (Coreader 1.23456 2))


In a similar way, we're dealing with arrows of the form wa -> b and we can compose them using =>>. These arrows form the coKleisli category of the reader comonad, S, which we can write HaskS. So we must have
Hask[x]≅HaskR≅HaskS.


Now some back story. Over 20 years ago I was intrigued by the idea that logic might form a category with logical ‘and’ and ‘or’ forming a product and coproduct. I came across the book Introduction to Higher Order Categorical Logic by Lambek and Scott for ₤30.00. That’s ₤60.00 at today's prices, or about $120.00. On a student grant? What was I thinking? And as it bore no relation to anything I was studying at the time, I barely understood a word of it. I was probably fairly applied at that point doing courses in stuff like solid state physics and electromagnetism as well as a bit of topology and algebra. I doubt I'd heard of lambda calculus though I could program in BASIC and APL. So there it sat on my bookshelf for 22 years. Periodically I’d look at it, realise that I still didn’t understand enough of the prerequisites, and put it back on the shelf. And then a month or so ago I picked it up again and realised that the first third or so of it could be interpreted as being about almost trivial Haskell programs. For example, on page 62 was

Proposition 7.1
The category A[x] of all polynomials in the indeterminate x:1->A over the cartesian or cartesian closed category A is isomorphic to the Kleisli category AA=ASA of the cotriple (SA,&epsilonAA).

The language is a little different. Lambek and Scott used the term cotriple instead of comonad and Kleisli category where I’d say coKleisli category. δ and ε are cojoin and coreturn. And Lambek and Scott's theorem applies to any cartesian closed category. But after staring at this claim for a while it dawned on me that all it was really saying was this: here are two ways to introduce new constants into a category. But there’s no way I would have seen that without having practical experience of programming with monads. Learning Haskell has finally paid off. It’s given me enough intuition about category theory for me to get some return on my ₤30.00 investment paid to Heffers all those years ago. I expected to take this book to my deathbed, never having read it.

Anyway, for the details I left out above, especially the correct equivalence relation on Hask[x], you'll just have to read the book yourself.

Also, note the similarity to the deduction theorem. This theorem says that if we can prove B, assuming A, then we can deduce A implies B without making any assumptions. It unifies two way to introduce a proposition A, either as a hypothesis, or as an antecedent in an implication. In fact, the above theorem is just a categorical version of the deduction theorem.

Also note the connection with writing pointfree code. In fact, the pointfree lambdabot plugin makes use good use of the reader monad to eliminate named parameters from functions.

I’m amazed by seeing a book from 1986 that describes how to use a comonad to plumb a value through some code. As far as I know, this predates the explicit use of the reader monad in a program, Wadler and Moggi’s papers on monads, and certainly Haskell. Of course monads and comonads existed in category theory well before this date, but not, as far as I know, for plumbing computer programs. I’d love to hear from anyone who knows more about the history these ideas.

Labels: , , ,

Saturday, March 08, 2008

Comonadic Arrays

On haskell-cafe, ajb, aka Pseudonym, laments that many people don't have enough experience with comonads to recognise them. So I thought I'd mention a really simple example of a comonad that nevertheless captures the essence of a large class of comonads. It's conceptually not much different to my cellular automaton example (making this a bit of a rerun), but this should be easier to understand. And if it's too trivial, I hint at a particle physics connection towards the end.

Firstly, you can skip this paragraph if you don't want a quick bit of theoretical discussion. Consider arrays of fixed dimension. As types, they look something like XN for some fixed integer N. From a container we construct its zipper by applying X d/dX. In this case we get XNXN-1=NXN. In other words, the corresponding zipper is an array paired with an index into the array. We can stretch the meaning of comonad slightly to allow this to relate to arrays whose size isn't fixed.

So here's some code:


> import Data.Array


The usual definition of Comonad:


> class Functor w => Comonad w where
> (=>>) :: w a -> (w a -> b) -> w b
> coreturn :: w a -> a


And now a type that is a pair of an array and an index into that array:


> data Pointer i e = P i (Array i e) deriving Show


Think of it as an array with one of its elements singled out for special attention. It trivially inherits the functoriality of ordinary arrays:


> instance Ix i => Functor (Pointer i) where
> fmap f (P i a) = P i (fmap f a)


And now comes the Comonad implementation. coreturn serves to pop out the special element from its context - in other words it gives you the special element, whle throwing away the array it lived in. (=>>), on the other hand, applies a function f of type P i a -> b to the entire array. The function is applied to each element in turn, making each element the special element for long enough to apply f.


> instance Ix i => Comonad (Pointer i) where
> coreturn (P i a) = a!i
> P i a =>> f = P i $ listArray bds (fmap (f . flip P a) (range bds))
> where bds = bounds a


Compare with fmap for arrays. This walks through each element in turn, applying a function to each element, and return an array of results. The computation for each element is separate from all the others. With =>> however, the entire array may be used for the computation of each element of the result, with the index into the array serving to indicate which element it is we should be focussing on.

For example, here's an array of values:


> x = listArray (0,9) [0..9]


We want to consider this to be a circular array so that going off one end wraps around to the beginning:


> wrap i = if i<0 then i+10 else if i>9 then i-10 else i


Now here's a simple operation that 'blurs' the single ith pixel in the 1-D image represented by x:


> blur (P i a) = let
> k = wrap (i-1)
> j = wrap (i+1)
> in 0.25*a!k + 0.5*a!i + 0.25*a!j


We can apply this to the entire image thusly


> test1 = P 0 x =>> blur


Note the curious way I have to use P 0 x as an input to blur. There seems to be a redundancy here, we want the resulting array and don't care what the focussed element is. But =>> wants us to give it a focal point. Curiously it's making explicit something that's familiar to C programmers, but is slightly hidden in C. In C, you refer to an array of floats using a float *. But the same type points to elements of the array as well. So when you point to an array, you are, in effect, blurring the distinction between a pointer to an array and a pointer to the first element. Comonads make that distinction explicit.

Anyway, suppose you wanted to apply a sequence of operations. Adding, blurring, scaling, nonlinearly transforming and so on. You could write a pipeline like


> x ==> f = f x
> test2 = P 0 x ==> fmap (+1) =>> blur ==> fmap (*2) ==> fmap (\x -> x*x)


Note how ==> fmap f ==> fmap g = ==> fmap (g . f). If you think of fmap as farming out workloads to a SIMD processor with one thread applied to each array element, sequences of fmaps correspond to threads that can continue to work independently. The comonadic operations, however, correspond to steps
where the threads must synchronise and talk to each other. I believe this last statement explains the a cryptic comment in the comments to this blog entry.

One final thing going back to my optional paragraph above. If D means d/dX then we can see the operator XD as a kind of number operator. When you apply it to an array like XN the array becomes multiplied by a type corresponding to the array index type. For ordinary arrays, these are just integers. So you can see XD as a way of counting how many elements there are in a container. Also, for any container F, we also have the equations D(XF) = XDF+F, which we can write as DX=XD+1. At some point, when I have time, I'll point out how this is closely related to the Heisenberg uncertainty principle and how when we say that differentiation makes holes in a data type, it's related to the notion of a hole in solid state physics.

Oh, and I sketched this diagram but don't have time to write an explanation:

Labels: , , ,