# A Neighborhood of Infinity

## Friday, May 23, 2014

### Cofree meets Free

> {-# LANGUAGE RankNTypes, MultiParamTypeClasses, TypeOperators #-}


Introduction

After I spoke at BayHac 2014 about free monads I was asked about cofree comonads. So this is intended as a sequel to that talk. Not only am I going to try to explain what cofree comonads are. I'm also going to point out a very close relationship between cofree comonads and free monads.

At the beginning of the talk the Google Hangout software seems to have switched to the laptop camera so you can't see the slides in the video. However the slides are here.

Cothings as machines

I often think of coalgebraic things as machines. They have some internal state and you can press buttons to change that internal state. For example here is a type class for a machine with two buttons that's related to a magma:

> class TwoButton a where
>   press :: a -> (a, a)


The idea is that the state of the machine is given by some type a and you could press either the left button or the right button. The result of pressing one or other button is given by these two functions:

> pressLeft, pressRight :: TwoButton a => a -> a
> pressLeft = fst . press
> pressRight = snd . press


(As with many metaphors used to explain Haskell type classes your mileage may vary. Sometimes you'll have to stretch your imagination to see what the set of buttons is for a particular cothing.)

Just as monads are a kind of generalised algebraic structure (for example see my talk), comonads are a generalised kind of machine. The idea is that for any state of the machine there is a bunch of buttons we could press. But we don't have two buttons, or any fixed number of buttons. We instead have a functorful of buttons (if you think of functors by analogy with containers). We also don't get to directly see the internal state of the machine but instead we get to make observations.

Here's the type class:

> class Comonad w where
>   extract :: w a -> a
>   duplicate :: w a -> w (w a)


The state of the machine is given by w a. We observe the state using the extract function. And when we come to press a button, we have a functorful of new states that it could end up in. The duplicate function gives the container of those new states.

For example, various kinds of zipper give rise to comonads. Zippers allow you to "focus" on a part of a data structure. The extract operation allows you to observe the point that currently has focus. There is one button for every position in the structure where the focus could be. Pressing the corresponding button moves the focus to that point. Similarly the Store comonad has one button for each value you can store in the field it represents. Press the button and the value gets stored in the field.

Cofreeness as a way to memoise

Cofree coalgebras can be thought of as memoised forms of elements of coalgebras. For example, the TwoButton machine above has a function, press, as part of its definition. Memoising an element of such a thing means tabulating everything that could possibly happen if you pressed the buttons so we no longer need the press function. One approach is to try something like this:

data CofreeTwoButton = Memo CofreeTwoButton CofreeTwoButton


The structure contains two CofreeTwoButtons, each giving the result of pressing one of the two buttons. Any element of CofreeTwoButton may now be memoised like so:

memoiseTwoButton :: TwoButton m => m -> CofreeTwoButton
memoiseTwoButton m = Memo (memoiseTwoButton (pressLeft m)) (memoiseTwoButton (pressRight m))


It definitely tabulates the result of pressing buttons. But it has a major flaw. We have no way of seeing what's stored in the table! To make this useful we want to also store some data in the table that we can peek at. So here is a better definition:

> data CofreeTwoButton a = Memo a (CofreeTwoButton a) (CofreeTwoButton a)
> memoiseTwoButton :: TwoButton m => (m -> a) -> m -> CofreeTwoButton a
> memoiseTwoButton f m = Memo (f m) (memoiseTwoButton f (pressLeft m)) (memoiseTwoButton f (pressRight m))


The first argument to memoiseTwoButton says what we want to store in the table and then memoiseTwoButton goes ahead and stores it. We can use the identity function if we want to store the original elements.

Note how this is like foldMap:

foldMap :: Monoid m => (a -> m) -> t a -> m


if we replace t by the list functor and remember that lists are free monoids. The main difference is that arrows have been reversed. Where foldMap takes an element of a free monoid and interprets it as an element of another monoid, memoiseTwoButton packs an element of a TwoButton into a cofree structure. The "interpretation" and "packing" here are both homomorphisms for their respective structures. Homomorphisms respect equations so if an equation holds between elements of a free monoid we expect it to also hold when interpreted in another monoid. But any element of a free monoid can be interpreted in any other monoid meaning that any equation that holds between elements of a free monoid must hold in any monoid. That's why free monoids are designed so that the only equations that hold between elements are those that follow from the monoid laws.

With the TwoButton we have a dualised version of the above. Every element of every TwoButton can be packed into the CofreeTwoButton. So every equation in the original structure will still hold after the packing. So every equation that holds in some TwoButton must have some solution in CofreeTwoButton. That gives an idea of what a CofreeTwoButton is by analogy with the free monoid.

A cofree comonad is basically a memoised comonad. So the data structure is:

> data Cofree f a = Cofree a (f (Cofree f a))


At each point in the "table" we store some observable value of type a. And we have a functorful of buttons, so we expect to have a functorful of new states we could transition to. The Functor instance looks like:

> instance Functor f => Functor (Cofree f) where
>   fmap f (Cofree a fs) = Cofree (f a) (fmap (fmap f) fs)


We apply f to the observable value and then push the fmap f down to the child nodes.

The duplicate function takes a memoised state and replaces the observable stored at each position with the memoised state that gives rise to the observable.

> instance Functor f => Comonad (Cofree f) where
>   extract (Cofree a _) = a
>   duplicate c@(Cofree _ fs) = Cofree c (fmap duplicate fs)


Now by analogy with memoiseTwoButton we can memoise comonads.

> memoiseComonad :: (Comonad w, Functor f) =>
>                   (forall x.w x -> f x) -> (forall x.w x -> Cofree f x)
> memoiseComonad f w = Cofree (extract w) (fmap (memoiseComonad f) (f (duplicate w)))


So that's what a cofree comonad is: it's a type that can be used to memoise all of the states that are accessible from a state in a comonad by pressing its buttons.

But that's not all. There is a close relationship between cofree comonads and free monads. So to get going, here's a free monad type:

> data Free f a = Id a | Free (f (Free f a))

> join' :: Functor f => Free f (Free f a) -> Free f a
> join' (Id x) = x
> join' (Free fa) = Free (fmap join' fa)

> instance Functor f => Functor (Free f) where
>   fmap f (Id x) = Id (f x)
>   fmap f (Free fa) = Free (fmap (fmap f) fa)

> instance Functor f => Monad (Free f) where
>   return = Id
>   m >>= f = join' (fmap f m)


Now I'll define a kind of pairing between functors. Given a way to combine two kinds of element, the pairing gives a way to combine a pair of containers of those elements.

> class (Functor f, Functor g) => Pairing f g where
>   pair :: (a -> b -> r) -> f a -> g b -> r

> data Identity a = Identity a
> instance Functor Identity where
>   fmap f (Identity x) = Identity (f x)

> instance Pairing Identity Identity where
>   pair f (Identity a) (Identity b) = f a b

> data (f :+: g) x = LeftF (f x) | RightF (g x)
> instance (Functor f, Functor g) => Functor (f :+: g) where
>   fmap f (LeftF x) = LeftF (fmap f x)
>   fmap f (RightF x) = RightF (fmap f x)

> data (f :*: g) x = f x :*: g x
> instance (Functor f, Functor g) => Functor (f :*: g) where
>   fmap f (x :*: y) = fmap f x :*: fmap f y

> instance (Pairing f f', Pairing g g') => Pairing (f :+: g) (f' :*: g') where
>   pair p (LeftF x) (a :*: _) = pair p x a
>   pair p (RightF x) (_ :*: b) = pair p x b

> instance (Pairing f f', Pairing g g') => Pairing (f :*: g) (f' :+: g') where
>   pair p (a :*: _) (LeftF x) = pair p a x
>   pair p (_ :*: b) (RightF x) = pair p b x

> instance Pairing ((->) a) ((,) a) where
>   pair p f = uncurry (p . f)


Given a pairing between f and g we get one between Cofree f and Free g.

> instance Pairing f g => Pairing (Cofree f) (Free g) where
>   pair p (Cofree a _) (Id x) = p a x
>   pair p (Cofree _ fs) (Free gs) = pair (pair p) fs gs


An element of Free g can be thought of as an expression written in a DSL. So this pairing gives a way to apply a monadic expression to a memoised comonad. In other words, if you think of comonads as machines, monads give a language that can be used to compute something based on the output of the machine.

Here's an almost trivial example just so you can see everything working together. A reasonable definition of a comagma structure on the type a is a -> UpDown a with UpDown defined as:

> data UpDown a = Up a | Down a

> instance Functor UpDown where
>   fmap f (Up a) = Up (f a)
>   fmap f (Down a) = Down (f a)

> type CofreeComagma a = Cofree UpDown a


A well known comagma structure on the positive integers is given by the famous Collatz conjecture:

> collatz :: Integer -> UpDown Integer
> collatz n = if even n then Down (n div 2) else Up (3*n+1)


We can memoise this as a cofree comonad:

> memoisedCollatz :: Integer -> CofreeComagma Integer
> memoisedCollatz n = Cofree n (fmap memoisedCollatz (collatz n))


Here's a picture of memoisedCollatz 12:

Now let's make the dual functor in readiness for building the dual monad:

> data Two a = Two a a
> instance Functor Two where
>   fmap f (Two a b) = Two (f a) (f b)


And here we set up a pairing:

> instance Pairing UpDown Two where
>   pair f (Up a) (Two b _) = f a b
>   pair f (Down a) (Two _ c) = f a c

> execute :: Cofree UpDown x -> Free Two (x -> r) -> r
> execute w m = pair (flip (\$)) w m


This gives rise to a free monad isomorphic to the one in my talk:

> data Direction = WentUp | WentDown deriving Show

> choose :: Free Two Direction
> choose = Free (Two (return WentUp) (return WentDown))


And here's an example of some code written in the corresponding DSL:

> ex1 :: Free Two (Integer -> String)
> ex1 = do
>   x <- choose
>   y <- choose
>   case (x, y) of
>       (WentDown, WentDown) -> return (\z -> "Decreased twice " ++ show z)
>       _ -> return show


It can be represented as:

And here's what happens when they meet:

> go1 :: String
> go1 = execute (memoisedCollatz 12) ex1


This can be understood through the combined picture:

References

On getting monads from comonads more generally see Monads from Comonads. For more on memoising and how it's really all about the Yoneda lemma see Memoizing Polymorphic Functions. I'm waiting for Tom Leinster to publish some related work. The pairing above gives a way for elements of free monads to pick out elements of cofree comonads and is a special case of what I'm talking about here. But I think Tom has some unpublished work that goes further.

If you think of a comonad as a compressed object that is decompressed by a monadic decision tree, then you'd expect some form of information theoretical description to apply. That makes me think of Convex spaces and an operadic approach to entropy.

## Introduction

There are two broad approaches to problem solving that I see frequently in mathematics and computing. One is attacking a problem via subproblems, and another is attacking a problem via quotient problems. The former is well known though I’ll give some examples to make things clear. The latter can be harder to recognise but there is one example that just about everyone has known since infancy.

## Subproblems

Consider sorting algorithms. A large class of sorting algorithms, including quicksort, break a sequence of values into two pieces. The two pieces are smaller so they are easier to sort. We sort those pieces and then combine them, using some kind of merge operation, to give an ordered version of the original sequence. Breaking things down into subproblems is ubiquitous and is useful far outside of mathematics and computing: in cooking, in finding our path from A to B, in learning the contents of a book. So I don’t need to say much more here.

## Quotient problems

The term quotient is a technical term from mathematics. But I want to use the term loosely to mean something like this: a quotient problem is what a problem looks like if you wear a certain kind of filter over your eyes. The filter hides some aspect of the problem that simplifies it. You solve the simplified problem and then take off the filter. You now ‘lift’ the solution of the simplified problem to a solution to the full problem. The catch is that your filter needs to match your problem so I’ll start by giving an example where the filter doesn’t work.

Suppose we want to add a list of integers, say: 123, 423, 934, 114. We can try simplifying this problem by wearing a filter that makes numbers fuzzy so we can’t distinguish numbers that differ by less than 10. When we wear this filter 123 looks like 120, 423 looks like 420, 934 looks like 930 and 114 looks like 110. So we can try adding 120+420+930+110. This is a simplified problem and in fact this is a common technique to get approximate answers via mental arithmetic. We get 1580. We might hope that when wearing our filters, 1580 looks like the correct answer. But it doesn’t. The correct answer is 1594. This filter doesn’t respect addition in the sense that if a looks like a’ and b looks like b’ it doesn’t follow that a+b looks like a’+b’.

To solve a problem via quotient problems we usually need to find a filter that does respect the original problem. So let’s wear a different filter that allows us just to see the last digit of a number. Our original problem now looks like summing the list 3, 3, 4, 4. We get 4. This is the correct last digit. If we now try a filter that allows us to see just the last two digits we see that summing 23, 23, 34, 14 does in fact give the correct last two digits. This is why the standard elementary school algorithms for addition and multiplication work through the digits from right to left: at each stage we’re solving a quotient problem but the filter only respects the original problem if it allows us to see the digits to the right of some point, not digits to the left. This filter does respect addition in the sense that if a looks like a’ and b looks like b’ then a+b looks like a’+b’.

Another example of the quotient approach is to look at the knight’s tour problem in the case where two opposite corners have been removed from the chessboard. A knight’s tour is a sequence of knight’s moves that visit each square on a board exactly once. If we remove opposite corners of the chessboard, there is no knight’s tour of the remaining 62 squares. How can we prove this? If you don’t see the trick you can get get caught up in all kinds of complicated reasoning. So now put on a filter that removes your ability to see the spatial relationships between the squares so you can only see the colours of the squares. This respects the original problem in the sense that a knight’s move goes from a black square to a white square, or from a white square to a black square. The filter doesn’t stop us seeing this. But now it’s easier to see that there are two more squares of one colour than the other and so no knight’s tour is possible. We didn’t need to be able to see the spatial relationships at all.

(Note that this is the same trick as we use for arithmetic, though it’s not immediately obvious. If we think of the spatial position of a square as being given by a pair of integers (x, y), then the colour is given by x+y modulo 2. In other words, by the last digit of x+y written in binary. So it’s just the see-only-digits-on-the-right filter at work again.)

## Wearing filters while programming

So now think about developing some code in a dynamic language like Python. Suppose we execute the line:

a = 1

The Python interpreter doesn’t just store the integer 1 somewhere in memory. It also stores a tag indicating that the data is to be interpreted as an integer. When you come to execute the line:

b = a+1

it will first examine the tag in a indicating its type, in this case int, and use that to determine what the type for b should be.

Now suppose we wear a filter that allows us to see the tag indicating the type of some data, but not the data itself. Can we still reason about what our program does?

In many cases we can. For example we can, in principle, deduce the type of

a+b*(c+1)/(2+d)

if we know the types of a, b, c, d. (As I’ve said once before, it’s hard to make any reliable statement about a bit of Python code so let's suppose that a, b, c and d are all either of type int or type float.) We can read and understand quite a bit of Python code wearing this filter. But it’s easy to go wrong. For example consider

if a>1 then:
return 1.0
else:
return 1

The type of the result depends on the value of the variable a. So if we’re wearing the filter that hides the data, then we can’t predict what this snippet of code does. When we run it, it might return an int sometimes and a float other times, and we won’t be able to see what made the difference.

In a statically typed language you can predict the type of an expression knowing the type of its parts. This means you can reason reliably about code while wearing the hide-the-value filter. This means that almost any programming problem can be split into two parts: a quotient problem where you forget about the values, and then problem of lifting a solution to the quotient problem to a solution to the full problem. Or to put that in more conventional language: designing your data and function types, and then implementing the code that fits those types.

I chose to make the contrast between dynamic and static languages just to make the ideas clear but actually you can happily use similar reasoning for both types of language. Compilers for statically typed languages, give you a lot of assistance if you choose to solve your programming problems this way.

A good example of this at work is given in Haskell. If you're writing a compiler, say, you might want to represent a piece of code as an abstract syntax tree, and implement algorithms that recurse through the tree. In Haskell the type system is strong enough that once you’ve defined the tree type the form of the recursion algorithms is often more or less given. In fact, it can be tricky to implement tree recursion incorrectly and have the code compile without errors. Solving the quotient problem of getting the types right gets you much of the way towards solving the full problem.

And that’s my main point: types aren’t simply a restriction mechanism to help you avoid making mistakes. Instead they are a way to reduce some complex programming problems to simpler ones. But the simpler problem isn’t a subproblem, it’s a quotient problem.

## Dependent types

Dependently typed languages give you even more flexibility with what filters you wear. They allow you to mix up values and types. For example both C++ and Agda (to pick an unlikely pair) allow you to wear filters that hide the values of elements in your arrays while allowing you to see the length of your arrays. This makes it easier to concentrate on some aspects of your problem while completely ignoring others.

## Notes

I wrote the first draft of this a couple of years ago but never published it. I was motivated to post by a discussion kicked off by Voevodsky on the TYPES mailing list http://lists.seas.upenn.edu/pipermail/types-list/2014/001745.html