A Neighborhood of Infinity

Friday, May 23, 2014

Cofree meets Free

> {-# LANGUAGE RankNTypes, MultiParamTypeClasses, TypeOperators #-}


After I spoke at BayHac 2014 about free monads I was asked about cofree comonads. So this is intended as a sequel to that talk. Not only am I going to try to explain what cofree comonads are. I'm also going to point out a very close relationship between cofree comonads and free monads.

At the beginning of the talk the Google Hangout software seems to have switched to the laptop camera so you can't see the slides in the video. However the slides are here.

Cothings as machines

I often think of coalgebraic things as machines. They have some internal state and you can press buttons to change that internal state. For example here is a type class for a machine with two buttons that's related to a magma:

> class TwoButton a where
>   press :: a -> (a, a)

The idea is that the state of the machine is given by some type a and you could press either the left button or the right button. The result of pressing one or other button is given by these two functions:

> pressLeft, pressRight :: TwoButton a => a -> a
> pressLeft = fst . press
> pressRight = snd . press

(As with many metaphors used to explain Haskell type classes your mileage may vary. Sometimes you'll have to stretch your imagination to see what the set of buttons is for a particular cothing.)


Just as monads are a kind of generalised algebraic structure (for example see my talk), comonads are a generalised kind of machine. The idea is that for any state of the machine there is a bunch of buttons we could press. But we don't have two buttons, or any fixed number of buttons. We instead have a functorful of buttons (if you think of functors by analogy with containers). We also don't get to directly see the internal state of the machine but instead we get to make observations.

Here's the type class:

> class Comonad w where
>   extract :: w a -> a
>   duplicate :: w a -> w (w a)

The state of the machine is given by w a. We observe the state using the extract function. And when we come to press a button, we have a functorful of new states that it could end up in. The duplicate function gives the container of those new states.

For example, various kinds of zipper give rise to comonads. Zippers allow you to "focus" on a part of a data structure. The extract operation allows you to observe the point that currently has focus. There is one button for every position in the structure where the focus could be. Pressing the corresponding button moves the focus to that point. Similarly the Store comonad has one button for each value you can store in the field it represents. Press the button and the value gets stored in the field.

Cofreeness as a way to memoise

Cofree coalgebras can be thought of as memoised forms of elements of coalgebras. For example, the TwoButton machine above has a function, press, as part of its definition. Memoising an element of such a thing means tabulating everything that could possibly happen if you pressed the buttons so we no longer need the press function. One approach is to try something like this:

data CofreeTwoButton = Memo CofreeTwoButton CofreeTwoButton

The structure contains two CofreeTwoButtons, each giving the result of pressing one of the two buttons. Any element of CofreeTwoButton may now be memoised like so:

memoiseTwoButton :: TwoButton m => m -> CofreeTwoButton
memoiseTwoButton m = Memo (memoiseTwoButton (pressLeft m)) (memoiseTwoButton (pressRight m))

It definitely tabulates the result of pressing buttons. But it has a major flaw. We have no way of seeing what's stored in the table! To make this useful we want to also store some data in the table that we can peek at. So here is a better definition:

> data CofreeTwoButton a = Memo a (CofreeTwoButton a) (CofreeTwoButton a)
> memoiseTwoButton :: TwoButton m => (m -> a) -> m -> CofreeTwoButton a
> memoiseTwoButton f m = Memo (f m) (memoiseTwoButton f (pressLeft m)) (memoiseTwoButton f (pressRight m))

The first argument to memoiseTwoButton says what we want to store in the table and then memoiseTwoButton goes ahead and stores it. We can use the identity function if we want to store the original elements.

Note how this is like foldMap:

foldMap :: Monoid m => (a -> m) -> t a -> m

if we replace t by the list functor and remember that lists are free monoids. The main difference is that arrows have been reversed. Where foldMap takes an element of a free monoid and interprets it as an element of another monoid, memoiseTwoButton packs an element of a TwoButton into a cofree structure. The "interpretation" and "packing" here are both homomorphisms for their respective structures. Homomorphisms respect equations so if an equation holds between elements of a free monoid we expect it to also hold when interpreted in another monoid. But any element of a free monoid can be interpreted in any other monoid meaning that any equation that holds between elements of a free monoid must hold in any monoid. That's why free monoids are designed so that the only equations that hold between elements are those that follow from the monoid laws.

With the TwoButton we have a dualised version of the above. Every element of every TwoButton can be packed into the CofreeTwoButton. So every equation in the original structure will still hold after the packing. So every equation that holds in some TwoButton must have some solution in CofreeTwoButton. That gives an idea of what a CofreeTwoButton is by analogy with the free monoid.

Cofree comonads

A cofree comonad is basically a memoised comonad. So the data structure is:

> data Cofree f a = Cofree a (f (Cofree f a))

At each point in the "table" we store some observable value of type a. And we have a functorful of buttons, so we expect to have a functorful of new states we could transition to. The Functor instance looks like:

> instance Functor f => Functor (Cofree f) where
>   fmap f (Cofree a fs) = Cofree (f a) (fmap (fmap f) fs)

We apply f to the observable value and then push the fmap f down to the child nodes.

The duplicate function takes a memoised state and replaces the observable stored at each position with the memoised state that gives rise to the observable.

> instance Functor f => Comonad (Cofree f) where
>   extract (Cofree a _) = a
>   duplicate c@(Cofree _ fs) = Cofree c (fmap duplicate fs)

Now by analogy with memoiseTwoButton we can memoise comonads.

> memoiseComonad :: (Comonad w, Functor f) =>
>                   (forall x.w x -> f x) -> (forall x.w x -> Cofree f x)
> memoiseComonad f w = Cofree (extract w) (fmap (memoiseComonad f) (f (duplicate w)))

So that's what a cofree comonad is: it's a type that can be used to memoise all of the states that are accessible from a state in a comonad by pressing its buttons.

Cofree comonad meets free monad

But that's not all. There is a close relationship between cofree comonads and free monads. So to get going, here's a free monad type:

> data Free f a = Id a | Free (f (Free f a))

> join' :: Functor f => Free f (Free f a) -> Free f a > join' (Id x) = x > join' (Free fa) = Free (fmap join' fa)

> instance Functor f => Functor (Free f) where > fmap f (Id x) = Id (f x) > fmap f (Free fa) = Free (fmap (fmap f) fa)

> instance Functor f => Monad (Free f) where > return = Id > m >>= f = join' (fmap f m)

Now I'll define a kind of pairing between functors. Given a way to combine two kinds of element, the pairing gives a way to combine a pair of containers of those elements.

> class (Functor f, Functor g) => Pairing f g where
>   pair :: (a -> b -> r) -> f a -> g b -> r

> data Identity a = Identity a > instance Functor Identity where > fmap f (Identity x) = Identity (f x)

> instance Pairing Identity Identity where > pair f (Identity a) (Identity b) = f a b

> data (f :+: g) x = LeftF (f x) | RightF (g x) > instance (Functor f, Functor g) => Functor (f :+: g) where > fmap f (LeftF x) = LeftF (fmap f x) > fmap f (RightF x) = RightF (fmap f x)

> data (f :*: g) x = f x :*: g x > instance (Functor f, Functor g) => Functor (f :*: g) where > fmap f (x :*: y) = fmap f x :*: fmap f y

> instance (Pairing f f', Pairing g g') => Pairing (f :+: g) (f' :*: g') where > pair p (LeftF x) (a :*: _) = pair p x a > pair p (RightF x) (_ :*: b) = pair p x b

> instance (Pairing f f', Pairing g g') => Pairing (f :*: g) (f' :+: g') where > pair p (a :*: _) (LeftF x) = pair p a x > pair p (_ :*: b) (RightF x) = pair p b x

> instance Pairing ((->) a) ((,) a) where > pair p f = uncurry (p . f)

Given a pairing between f and g we get one between Cofree f and Free g.

> instance Pairing f g => Pairing (Cofree f) (Free g) where
>   pair p (Cofree a _) (Id x) = p a x
>   pair p (Cofree _ fs) (Free gs) = pair (pair p) fs gs

An element of Free g can be thought of as an expression written in a DSL. So this pairing gives a way to apply a monadic expression to a memoised comonad. In other words, if you think of comonads as machines, monads give a language that can be used to compute something based on the output of the machine.

Here's an almost trivial example just so you can see everything working together. A reasonable definition of a comagma structure on the type a is a -> UpDown a with UpDown defined as:

> data UpDown a = Up a | Down a

> instance Functor UpDown where > fmap f (Up a) = Up (f a) > fmap f (Down a) = Down (f a)

> type CofreeComagma a = Cofree UpDown a

A well known comagma structure on the positive integers is given by the famous Collatz conjecture:

> collatz :: Integer -> UpDown Integer
> collatz n = if even n then Down (n `div` 2) else Up (3*n+1)

We can memoise this as a cofree comonad:

> memoisedCollatz :: Integer -> CofreeComagma Integer
> memoisedCollatz n = Cofree n (fmap memoisedCollatz (collatz n))

Here's a picture of memoisedCollatz 12:

Now let's make the dual functor in readiness for building the dual monad:

> data Two a = Two a a
> instance Functor Two where
>   fmap f (Two a b) = Two (f a) (f b)

And here we set up a pairing:

> instance Pairing UpDown Two where
>   pair f (Up a) (Two b _) = f a b
>   pair f (Down a) (Two _ c) = f a c

> execute :: Cofree UpDown x -> Free Two (x -> r) -> r > execute w m = pair (flip ($)) w m

This gives rise to a free monad isomorphic to the one in my talk:

> data Direction = WentUp | WentDown deriving Show

> choose :: Free Two Direction > choose = Free (Two (return WentUp) (return WentDown))

And here's an example of some code written in the corresponding DSL:

> ex1 :: Free Two (Integer -> String)
> ex1 = do
>   x <- choose
>   y <- choose
>   case (x, y) of
>       (WentDown, WentDown) -> return (\z -> "Decreased twice " ++ show z)
>       _ -> return show

It can be represented as:

And here's what happens when they meet:

> go1 :: String
> go1 = execute (memoisedCollatz 12) ex1

This can be understood through the combined picture:


On getting monads from comonads more generally see Monads from Comonads. For more on memoising and how it's really all about the Yoneda lemma see Memoizing Polymorphic Functions. I'm waiting for Tom Leinster to publish some related work. The pairing above gives a way for elements of free monads to pick out elements of cofree comonads and is a special case of what I'm talking about here. But I think Tom has some unpublished work that goes further.

If you think of a comonad as a compressed object that is decompressed by a monadic decision tree, then you'd expect some form of information theoretical description to apply. That makes me think of Convex spaces and an operadic approach to entropy.

Saturday, May 17, 2014

Types, and two approaches to problem solving


There are two broad approaches to problem solving that I see frequently in mathematics and computing. One is attacking a problem via subproblems, and another is attacking a problem via quotient problems. The former is well known though I’ll give some examples to make things clear. The latter can be harder to recognise but there is one example that just about everyone has known since infancy.


Consider sorting algorithms. A large class of sorting algorithms, including quicksort, break a sequence of values into two pieces. The two pieces are smaller so they are easier to sort. We sort those pieces and then combine them, using some kind of merge operation, to give an ordered version of the original sequence. Breaking things down into subproblems is ubiquitous and is useful far outside of mathematics and computing: in cooking, in finding our path from A to B, in learning the contents of a book. So I don’t need to say much more here.

Quotient problems

The term quotient is a technical term from mathematics. But I want to use the term loosely to mean something like this: a quotient problem is what a problem looks like if you wear a certain kind of filter over your eyes. The filter hides some aspect of the problem that simplifies it. You solve the simplified problem and then take off the filter. You now ‘lift’ the solution of the simplified problem to a solution to the full problem. The catch is that your filter needs to match your problem so I’ll start by giving an example where the filter doesn’t work.

Suppose we want to add a list of integers, say: 123, 423, 934, 114. We can try simplifying this problem by wearing a filter that makes numbers fuzzy so we can’t distinguish numbers that differ by less than 10. When we wear this filter 123 looks like 120, 423 looks like 420, 934 looks like 930 and 114 looks like 110. So we can try adding 120+420+930+110. This is a simplified problem and in fact this is a common technique to get approximate answers via mental arithmetic. We get 1580. We might hope that when wearing our filters, 1580 looks like the correct answer. But it doesn’t. The correct answer is 1594. This filter doesn’t respect addition in the sense that if a looks like a’ and b looks like b’ it doesn’t follow that a+b looks like a’+b’.

To solve a problem via quotient problems we usually need to find a filter that does respect the original problem. So let’s wear a different filter that allows us just to see the last digit of a number. Our original problem now looks like summing the list 3, 3, 4, 4. We get 4. This is the correct last digit. If we now try a filter that allows us to see just the last two digits we see that summing 23, 23, 34, 14 does in fact give the correct last two digits. This is why the standard elementary school algorithms for addition and multiplication work through the digits from right to left: at each stage we’re solving a quotient problem but the filter only respects the original problem if it allows us to see the digits to the right of some point, not digits to the left. This filter does respect addition in the sense that if a looks like a’ and b looks like b’ then a+b looks like a’+b’.

Another example of the quotient approach is to look at the knight’s tour problem in the case where two opposite corners have been removed from the chessboard. A knight’s tour is a sequence of knight’s moves that visit each square on a board exactly once. If we remove opposite corners of the chessboard, there is no knight’s tour of the remaining 62 squares. How can we prove this? If you don’t see the trick you can get get caught up in all kinds of complicated reasoning. So now put on a filter that removes your ability to see the spatial relationships between the squares so you can only see the colours of the squares. This respects the original problem in the sense that a knight’s move goes from a black square to a white square, or from a white square to a black square. The filter doesn’t stop us seeing this. But now it’s easier to see that there are two more squares of one colour than the other and so no knight’s tour is possible. We didn’t need to be able to see the spatial relationships at all.

(Note that this is the same trick as we use for arithmetic, though it’s not immediately obvious. If we think of the spatial position of a square as being given by a pair of integers (x, y), then the colour is given by x+y modulo 2. In other words, by the last digit of x+y written in binary. So it’s just the see-only-digits-on-the-right filter at work again.)

Wearing filters while programming

So now think about developing some code in a dynamic language like Python. Suppose we execute the line:

a = 1

The Python interpreter doesn’t just store the integer 1 somewhere in memory. It also stores a tag indicating that the data is to be interpreted as an integer. When you come to execute the line:

b = a+1

it will first examine the tag in a indicating its type, in this case int, and use that to determine what the type for b should be.

Now suppose we wear a filter that allows us to see the tag indicating the type of some data, but not the data itself. Can we still reason about what our program does?

In many cases we can. For example we can, in principle, deduce the type of


if we know the types of a, b, c, d. (As I’ve said once before, it’s hard to make any reliable statement about a bit of Python code so let's suppose that a, b, c and d are all either of type int or type float.) We can read and understand quite a bit of Python code wearing this filter. But it’s easy to go wrong. For example consider

if a>1 then:
return 1.0
return 1

The type of the result depends on the value of the variable a. So if we’re wearing the filter that hides the data, then we can’t predict what this snippet of code does. When we run it, it might return an int sometimes and a float other times, and we won’t be able to see what made the difference.

In a statically typed language you can predict the type of an expression knowing the type of its parts. This means you can reason reliably about code while wearing the hide-the-value filter. This means that almost any programming problem can be split into two parts: a quotient problem where you forget about the values, and then problem of lifting a solution to the quotient problem to a solution to the full problem. Or to put that in more conventional language: designing your data and function types, and then implementing the code that fits those types.

I chose to make the contrast between dynamic and static languages just to make the ideas clear but actually you can happily use similar reasoning for both types of language. Compilers for statically typed languages, give you a lot of assistance if you choose to solve your programming problems this way.

A good example of this at work is given in Haskell. If you're writing a compiler, say, you might want to represent a piece of code as an abstract syntax tree, and implement algorithms that recurse through the tree. In Haskell the type system is strong enough that once you’ve defined the tree type the form of the recursion algorithms is often more or less given. In fact, it can be tricky to implement tree recursion incorrectly and have the code compile without errors. Solving the quotient problem of getting the types right gets you much of the way towards solving the full problem.

And that’s my main point: types aren’t simply a restriction mechanism to help you avoid making mistakes. Instead they are a way to reduce some complex programming problems to simpler ones. But the simpler problem isn’t a subproblem, it’s a quotient problem.

Dependent types

Dependently typed languages give you even more flexibility with what filters you wear. They allow you to mix up values and types. For example both C++ and Agda (to pick an unlikely pair) allow you to wear filters that hide the values of elements in your arrays while allowing you to see the length of your arrays. This makes it easier to concentrate on some aspects of your problem while completely ignoring others.


I wrote the first draft of this a couple of years ago but never published it. I was motivated to post by a discussion kicked off by Voevodsky on the TYPES mailing list http://lists.seas.upenn.edu/pipermail/types-list/2014/001745.html

This article isn’t a piece of rigorous mathematics and I’m using mathematical terms as analogies.

The notion of a subproblem isn’t completely distinct from a quotient problem. Some problems are both, and in fact some problems can be solved by transforming them so they become both.

More generally, looking at computer programs through different filters is one approach to abstract interpretation http://en.wikipedia.org/wiki/Abstract_interpretation. The intuition section there (http://en.wikipedia.org/wiki/Abstract_interpretation#Intuition) has much in common with what I’m saying.

Friday, April 25, 2014

The Monad called Free


As Dan Doel points out here, the gadget Free that turns a functor into a monad is itself a kind of monad, though not the usual kind of monad we find in Haskell. I'll call it a higher order monad and you can find a type class corresponding to this in various places including an old version of Ed Kmett's category-extras. I'll borrow some code from there. I hunted around and couldn't find an implementation of Free as an instance of this class so I thought I'd plug the gap.

> {-# LANGUAGE RankNTypes, FlexibleContexts, InstanceSigs, ScopedTypeVariables #-}

> import Control.Monad > import Data.Monoid

To make things unambiguous I'll implement free monads in the usual way here:

> data Free f a = Pure a | Free (f (Free f a))

> instance Functor f => Functor (Free f) where > fmap f (Pure a) = Pure (f a) > fmap f (Free a) = Free (fmap (fmap f) a)

> instance Functor f => Monad (Free f) where > return = Pure > Pure a >>= f = f a > Free a >>= f = Free (fmap (>>= f) a)

The usual Haskell typeclass Monad corresponds to monads in the category of types and functions, Hask. We're going to want monads in the category of endomorphisms of Hask which I'll call Endo.

The objects in Endo correspond to Haskell's Functor. The arrows in Endo are the natural transformations between these functors:

> type Natural f g = (Functor f, Functor g) => forall a. f a -> g a

So now we are led to consider functors in Endo.

> class HFunctor f where

A functor in Endo must map functors in Hask to functors in Hask. So if f is a functor in Endo and g is a functor in Hask, then f g must be another functor in Hask. So there must be an fmap associated with this new functor. There's an associated fmap for every g and we collect them all into one big happy natural family:

>     ffmap :: Functor g => (a -> b) -> f g a -> f g b

But note also that by virtue of being a functor itself, f must have its own fmap type function associated with it. The arrows in Endo are natural transformations in Hask so the fmap for HFunctor must take arrows in Endo to arrows in Endo like so:

>     hfmap :: (Functor g, Functor h) => Natural g h -> Natural (f g) (f h)

Many constructions in the category Hask carry over to Endo. In Hask we can form a product of type types a and b as (a, b). In Endo we form the product of two functors f and g as

> data Product f g a = Product (f (g a))

Note that this product isn't commutative. We don't necessarily have an isomorphism from Product f g to Product g f. (This breaks many attempts to transfer constructions from Hask to Endo.) We also won't explicitly use Product because we can simply use the usual Haskell composition of functors inline.

We can implement some functions that act on product types in both senses of the word "product":

> left :: (a -> c) -> (a, b) -> (c, b)
> left f (a, b) = (f a, b)

> right :: (b -> c) -> (a, b) -> (a, c) > right f (a, b) = (a, f b)

> hleft :: (Functor a, Functor b, Functor c) => Natural a c -> a (b x) -> c (b x) > hleft f = f

> hright :: (Functor a, Functor b, Functor c) => Natural b c -> a (b x) -> a (c x) > hright f = fmap f

(Compare with what I wrote here.)

We have something in Endo a bit like the type with one element in Hask, namely the identity functor. The product of a type a with the one element type in Hask gives you something isomorphic to a. In Endo the product is composition for which the identity functor is the identity. (Two different meanings of the word "identity" there.)

We also have sums. For example, if we define a functor like so

> data F a = A a | B a a

we can think of F as a sum of two functors: one with a single constructor A and another with constructor B.

We can now think about reproducing an Endo flavoured version of lists. The usual definition is isomorphic to:

> data List a = Nil | Cons a (List a)

And it has a Monoid instance:

> instance Monoid (List a) where
>   mempty = Nil
>   mappend Nil as = as
>   mappend (Cons a as) bs = Cons a (mappend as bs)

We can try to translate that into Endo. The Nil part can be thought of as being an element of a type with one element so it should become the identity functor. The Cons a (List a) part is a product of a and List a so that should get replaced by a composition. So we expect to see something vaguely like:

List' a = Nil' | Cons' (a (List' a))

That's not quite right because List' a is a functor, not a type, and so acts on types. So a better definition would be:

List' a b = Nil' b | Cons' (a (List' a b))

That's just the definition of Free. So free monads are lists in Endo. As everyone knows :-) monads are just monoids in the category of endofunctors. Free monads are also just free monoids in the category of endofunctors.

So now we can expect many constructions associated with monoids and lists to carry over to monads and free monads.

An obvious one is the generalization of the singleton map a -> List a:

> singleton :: a -> List a
> singleton a = Cons a Nil

> hsingleton :: Natural f (Free f) > hsingleton f = Free (fmap Pure f)

Another is the generalization of foldMap. This can be found under a variety of names in the various free monad libraries out there but this implementation is designed to highlight the similarity between monoids and monads:

> foldMap :: Monoid m => (a -> m) -> List a -> m
> foldMap _ Nil = mempty
> foldMap f (Cons a as) = uncurry mappend $ left f $ right (foldMap f) (a, as)

> fold :: Monoid m => List m -> m > fold = foldMap id

> hFoldMap :: (Functor f, Functor m, Monad m) => Natural f m -> Natural (Free f) m > hFoldMap _ (Pure x) = return x > hFoldMap f (Free x) = join $ hleft f $ hright (hFoldMap f) x

> hFold :: Monad f => Natural (Free f) f > hFold = hFoldMap id

The similarity here isn't simply formal. If you think of a list as a sequence of instructions then foldMap interprets the sequence of instructions like a computer program. Similarly hFoldMap can be used to interpret programs for which the free monad provides an abstract syntax tree.

You'll find some of these functions here by different names.

Now we can consider Free. It's easy to show this is a HFunctor by copying a suitable definition for List:

> instance Functor List where
>   fmap f = foldMap (singleton . f)

> instance HFunctor Free where > ffmap = fmap > hfmap f = hFoldMap (hsingleton . f)

We can define HMonad as follows:

> class HMonad m where
>     hreturn :: Functor f => f a -> m f a
>     hbind :: (Functor f, Functor g) => m f a -> Natural f (m g) -> m g a

Before making Free an instance, let's look at how we'd make List an instance of Monad

> instance Monad List where
>     return = singleton
>     m >>= f = fold (fmap f m)

And now the instance I promised at the beginning.

> instance HMonad Free where
>     hreturn = hsingleton
>     hbind m f = hFold (hfmap f m)

I've skipped the proofs that the monad laws hold and that hreturn and hbind are actually natural transformations in Endo. Maybe I'll leave those as exercises for the reader.


After writing this I tried googling for "instance HMonad Free" and I found this by haasn. There's some other good stuff in there too.

Saturday, February 01, 2014

Reinversion Revisited


A while back I talked about the idea of reinversion of control using the continuation monad to wrest control back from an interface that only wants to call you, but doesn't want you to call them back. I want to return to that problem with a slightly different solution. The idea is that we build an interpreter for an imperative language that's an embedded Haskell DSL. You arrange that the DSL does the work of waiting to be called by the interface, but from the point of view of the user of the DSL it looks like you're calling the shots. To do this I'm going to pull together a bunch of techniques I've talked about before. This approach is largely an application of what apfelmus described here.

The code

We'll start with some administrative stuff before getting down to the real code:

> {-# LANGUAGE TemplateHaskell #-}

> import Control.Lens > import Control.Monad > import Control.Monad.Loops

We'll make our DSL an imperative wrapper around Gloss:

> import Graphics.Gloss.Interface.Pure.Game

We'll define a structure that can be used to represent the abstract syntax tree (AST) of our DSL. Our DSL will support the reading of inputs, adding pictures to the current picture, and clearing the screen.

First we'll need a wrapper that allows us to represent ordinary Haskell values in our DSL:

> data Basic a = Return a

Now we want an expression that represents events given to us by Gloss. Internally we'll represent this by a function that says what our program does if it's given an event. It says what our program does by returning another AST saying what happens when the input is received. (I've previously talked about these kinds of expression trees here).

>              | Input (Event -> Basic a)

We have a command to render some graphics. It appends a new Picture to the current picture. Again, part of the AST muct be another AST saying what happens after the picture is rendered:

>              | Render Picture (Basic a)

And lastly here's the AST for a clear screen command:

>              | Cls (Basic a)

Our AST will form a monad. This will allow us to build ASTs using ordinary Haskell do-notation. This technique is what I described previously here.

> instance Monad Basic where
>     return = Return
>     Return a >>= f = f a
>     Input handler >>= f = Input (\e -> handler e >>= f)
>     Render p a >>= f = Render p (a >>= f)
>     Cls a >>= f = Cls (a >>= f)

You can think of the expression x >>= f as x with the tree f a grafted in to replace any occurrence of Return a in it. This is exactly what Return a >>= f does. But applying >>= f to the other ASTs simply digs down "inside" the ASTs to find other occurrences of Return a.

It's convenient to uses lenses to view Gloss's game world:

> data World = World { _program :: Basic (), _picture :: Picture }
> $(makeLenses ''World)

And now we have some wrappers around the interpreter's commands. The return () provides the convenient place where we can graft subtrees into our AST.

> input = Input return
> render p = Render p (return ())
> cls = Cls (return ())

Now we can start coding. Here's a test to see if a Gloss event is a key down event:

> keydown (EventKey (Char key)            Down _ _) = True
> keydown (EventKey (SpecialKey KeySpace) Down _ _) = True
> keydown _ = False

And now here's a complete program using our DSL. It's deliberately very imperative. It simply iterates over a nested pair of loops, collecting keystrokes and displaying them. It reads a lot like an ordinary program written in a language like Python or Basic:

> mainProgram = do
>     render (Color white $ Scale 0.2 0.2 $ Text "Type some text")

> forM_ [780, 760..] $ \ypos -> do > forM_ [0, 20..980] $ \xpos -> do

> event <- iterateUntil keydown $ input

> let key = case event of > EventKey (Char key) Down _ _ -> key > EventKey (SpecialKey KeySpace) Down _ _ -> ' '

> when (ypos == 780 && xpos == 0) $ cls > render $ Color white $ Translate (xpos-500) (ypos-400) $ Scale 0.2 0.2 $ Text $ [key]

Here is where we launch everything, placing our program and starting Blank picture into the World.

> main = play (InWindow "Basic" (1000, 800) (10, 10))
>  black 
>  60
>  (World mainProgram Blank)
>  (^. picture)
>  handleEvent
>  (const id)

So now we need just one more ingredient, an actual interpreter for our AST. It's the event handler:

> handleEvent :: Event -> World -> World

The Return command is purely a place to graft in subtrees. It should never be interpreted.

> handleEvent _ (World (Return a) _) = error "error!"

After receiving some input, I want the interpreter to keep interpreting commands such as Cls that don't need any more input. I'm going to do this by using a null event EventMotion (0,0). But when an input really is desired, I want this null event to be ignored.

> handleEvent (EventMotion (0, 0)) state@(World (Input handler) _) = state

We render something by mappending it to the current picture stored in the World. But the rendering is carried out by the event handler. We update the state so that at the next event, the subtree of the AST is executed. This means that after updating the picture, the event still needs to be handed back to the event handler:

> handleEvent event state@(World (Render p cont) _) = state & (picture <>~ p)    & (program .~ cont) & handleEvent event

Clearing the screen is similar:

> handleEvent event state@(World (Cls cont)      _) = state & (picture .~ Blank) & (program .~ cont) & handleEvent event

And now we need to handle inputs. We do this by applying the "what happens when the input is received" function to the event. The result is put back in the state indicating that this is what we want to happen at the next event. So the interpreter doesn't stop here, waiting for the next event, the interpreter sends itself a null event.

> handleEvent event state@(World (Input handler) _) = state & (program .~ handler event) & handleEvent (EventMotion (0, 0))

And that's it!

There are many changes that can be made. We can easily add more commands and make the state more complex. But you might also notice that we create the AST only to tear it apart again in the interpreter. We can actually elide the AST creation, but that will eventually bring us back to something like what I originally posted. This shouldn't be a big surprise, I've already shown how any monad can be replaced with the continuation monad here. By the way, it's pretty easy to add a Fork command. You can replace the _program :: Basic() field with _program :: [Basic ()] and interpret this as a list of threads using a scheduler of your choice.


I was prompted to write this (a little late, I know) after reading this article and Tekmo's post on reddit. I think ultimately continuations may perform better than using ASTs. But sometimes it's nice to build an AST because they give you an object that can easily be reasoned about and manipulated by code. Much as I love trickery with continuations, I find ASTs are much easier to think about.


My real motivation was that I was thinking about games. The rules of games are often given in imperative style: first player 1 does this. Then they do this. If this happens they do that. And then it's player two's turn. I wanted my Haskell code to reflect that style.


Added 'null' event to keep interpreter going when it makes sense to do so, but there's no event pending.

Friday, October 25, 2013

Distributed computing with alien technology


Suppose we are given a function of boolean arguments that returns a boolean result. Alice has bits, and Bob has another bits . Alice and Bob are widely separated and don't know each other's bits. What is the total number of bits that Alice has to send to Bob and that Bob has to send to Alice so that between them they can compute ? Think about how complex might get. The and might each describe half of a "voxelised" region of space and might answer a question about a computational fluid dynamics (CFD) simulation running in that space. CFD simulations can be chaotic and so we might expect that in the worst case many bits have to be transferred back and forth between Alice and Bob. In the worst case we might expect that Alice has to send Bob all of her bits, or vice versa.

But in fact Alice needs to send Bob just one bit.

A loophole

To get the communication requirements down to one bit we need to use a loophole. But I hope to (1) justify the cheat to some extent and (2) justify that it's even worthwhile to think about cheats.

Alice and Bob have access to some Ancient technology. They each have one of a pair of boxes. At prearranged times, Alice puts a bit into her box, and Bob puts a bit into his box. A bit pops back out of Alice's box and a bit pops back out of Bob's box. Whatever the input, both Alice and Box have a 0.5 chance of seeing a one or zero pop out of their respective boxes. But when the two outputs are XORed together the result is the logical AND of the two inputs. With such boxes, Alice can compute after Bob sends a single bit down a conventional communication channel.

"But this is a total cheat!" you complain before I even start to explain their technique. It seems Alice receives a bit that depends on what Bob input, and so Bob is communicating with Alice. But look closely and you'll see that the boxes don't allow any communication. No matter what Bob inputs, Alice has a 0.5 chance of getting zero or one. There is no way Bob can use this to communicate anything. It's like intercepting a message encrypted with a one time pad. Without the pad, the message is basically a sequence of random bits. Nonetheless, it is true that the outputs that Alice and Bob see are correlated.

I hope I've convinced you that Alice and Bob can't send any bits with these boxes. Despite this, it is pretty clear that the behaviour of the boxes is non-local. We'll call any kind of boxes that allow instantaneous long range correlations that can't be explained by purely local behaviour non-local boxes. Boxes that can't be used for message sending are called non-signalling local boxes. And the particular non-local box I describe above is called a PR box (eg. see here).

(BTW As an aside note that as the box results in widely separated outputs that are correlated, but doesn't allow communication, it's an example of how non-locality doesn't imply communication. Usually when people want to give examples of such a thing they talk about quantum mechanics. But there's no need to mention quantum mechanics to explain the behaviour of these particular non-local boxes.)

The method

Any single bit boolean function of a finite sequence of bits can be written as a polynomial modulo 2. Each monomial in the polynomial can be written as a product of terms involing just the and terms involving just the , ie.

where depends only on the , depends only on the and is drawn from some finite set. Alice can compute the and Bob can compute the . Now Alice and Bob, in parallel, feed and respectively into their PR boxes. We know that we could evaluate each term in the sum we want by adding Alice's output to Bob's output. But that would require sending one one-bit message for each . But we don't need each term one by one; we just want the sum. So Alice and Bob can individually sum their separate outputs knowing that adding Alice's output and Bob's output modulo 2 will be the correct sum. So Bob sends his sum to Alice. Alice adds that number to her own (modulo 2) and that's the value we want. Only one one-bit message was sent.

But what about reality?

Non-local boxes don't exist, do they? So why are we talking about them?

Actually, non-local boxes exist both theoretically and in the lab. Non-local correlations in quantum mechanics allow them to be constructed. But for this article I wanted to abstract from quantum mechanics and talk about the behaviour of a non-local box without getting my hands dirty with the details of quantum mechanics. Having said that, although non-local boxes do exist, the special case of the PR box can't in fact be constructed with quantum mechanics. In some sense it allows correlations that are "too strong". An article I wrote a while back describes the closest you can get to building a PR box with quantum correlations. Curiously, if you restrict yourself to the kind of non-local box quantum mechanics allows you to build you find that some functions can still be computed with less communication than you'd need if non-local correlations are disallowed. Nonetheless, the worst case scenario with QM still requires the sending of bits.

Going further there's an interesting conjecture. It says that any non-local box that is even marginally better (in some sense) than what quantum mechanics allows is powerful enough to allow the computation of any with only a single bit of communication. It suggests that quantum mechanics is right at the edge of the space of possible physics that make life difficult for us. If quantum mechanics were to be tweaked the tiniest amount to make correlations any stronger, large numbers of difficult distributed computing problems would suddenly collapse to become trivial. If the conjecture is true it means that nature looks a bit like a conspiracy to keep computer scientists in work. (It's possible the conjecture has been decided one way or the other by now.)

Final words

There are a couple of papers about universes where PR boxes can be built; so called boxworlds. There is a lot of interesting theoretical work in characterising quantum mechanics. In particular there are a number of theorems and conjectures that describe QM in the form "the most X theory that doesn't allow Y" where X is an interesting property and Y is something you'd like to do.


I learnt all of this from the paper Implausible Consequences of Superstrong Nonlocality by Wim van Dam.

Saturday, October 12, 2013

What stops us defining Truth?

Recall the standard cartoon sketch of the proof of Gödel's first incompleteness theorem. We start by defining a predicate, , that is true if and only if its argument is provable. (Or more accurately, is true if is the Gödel number of a provable proposition.) With some quining we can use this to construct the proposition which says . The proposition asserts its own unprovability.
Suppose instead we define a predicate which holds if its argument is true. We can use this to construct the proposition which says . Then if is true it must also be false and if it's false then it must be true. We seem to have a paradox. The loophole is that we assumed the existence of the predicate . So this argument demonstrates that there is actually no such predicate. This is Tarski's undefinability theorem.
But what exactly stops us defining ? What goes wrong if we attempt to define a predicate that analyses the parts of a proposition to tell us whether or not it is true?

This article is written in English. But as is standard in much of mathematics, unless I state otherwise, I'm using English largely as shorthand for an argument that could, in principle, be written in the formal language of Set Theory. So I will allow myself to use all of the usual reasoning methods that are available in ZF, even when talking about other formal systems such as Peano Arithmetic.

Defining Truth for Propositional Calculus
Suppose we're given a proposition from propositional calculus like . We can use a syntactic approach to determining whether or not it is true. We determine whether or not is true, then whether or not is true, and then the whole proposition is true if both and are true. Similarly is true if either or is true. Of course and might themselves be compound propositions using , and . But that's fine, that simply means that to define truth for such propositions we need to employ recursion. In fact, we can straightforwardly turn such a definition into a recursive computer program.
(Ultimately with propositional calculus we hit the leaves which are atomic propositions like . Typically when we ask about the truth of a proposition in propositional calculus we've already made an assignment of truth values to the atomic propositions. So the base case for the recursion is straightforward.)
We can illustrate the process with a diagram:

The truth value of a node in the tree is determined by the truth of the propositions hanging underneath it. We have a parent-child relation between a proposition and its subexpressions. Recursion allows us to make a definition by defining what happens on the leaves of such a tree, and by saying how the definition at a node is built from that of its children.

Defining truth for Peano Arithmetic
We can go further and attempt this approach with Peano Arithmetic (PA). The catch is that we need to consider quantifiers. For example, consider this proposition from Peano arithmetic: . This proposition is true if and only if is true whatever number we substitute for in the expression.

The proposition at the top of the tree above is true if all of the immediate children are true and their truth is in turn determined by the truth of the propositions immediately below them. With some work this eventually leads to a perfectly good definition of truth for propositions in PA. Because we have nodes with infinitely many children we don't get an algorithm guaranteed to terminate, but that's not a problem for a definition in ZF. Note that we don't literally prove the infinitely many child propositions one at a time. Instead what happens is that to define the truth of we define it in terms of the truth of some infinite family of propositions all based on . ZF is perfectly good at dealing with such definitions without us having to list every element of our family explicitly.
Note how in this case the tree isn't the parse tree of the proposition. It's much bigger with nodes that have infinite branching. But that's fine, there's nothing about infinite branching that prevents us making a recursive definition. So we can ultimately extend the idea for defining truth in propositional calculus to include quanifiers and then all of Peano arithmetic.

Defining truth for ZF
But the approach used for PA looks like it might work perfectly well for ZF as well. For example, our definition of truth would say that is true if is true whatever set we substitute for . In ZF there is no difficulty in defining a predicate that uses quantification over all sets. So it seems we can define for ZF in ZF, contradicting Tarski's theorem.

What went wrong?
Recursive definitions typically rely on the parent-child relation I mentioned above. To recursively define something we (1) define it for all leaves and then (2) specify how the definition at a parent is given in terms of the value for all of its children. We then invoke a recursion theorem of some sort to show how this uniquely defines our object for everything in our universe. For example, one form of recursion in Peano arithmetic has as its leaf and the only child of is . The induction axiom for PA can be used to show that definitions using this parent-child relation are valid.
Similarly in ZF we have the empty set as leaf and the children of a set are simply its elements. But now we need to look closely at the recursion principle we need. For ZF we need to invoke the Transfinite Recursion Theorem. Transfinite recursion is very powerful. It's not just limited to induction over sets. It can also be used for induction over classes. For example if you need to recursively define a function on the class of all sets it can allow this. (Strictly speaking it'll be a function class rather than a function.) But now comes the catch. If you take a look at the Wikipedia article it mentions that the parent-child relation, , needs to be set-like (though as the article is currently written it's almost an afterthought). For this theorem to apply we need the collection of children of a proposition to form a set. But to prove the truth of a proposition with a quantifier at the front we need to prove something is true for all children where there is one child for each set. This means the children don't form a set. So we can't use transfinite recursion. And this means the informal definition of truth I gave above can't be turned into a rigorous definition.

I think this issue is quite subtle. It's really easy to say in English "this thing is true if that thing is true for all sets". Such a sentence in isolation can often be turned into a rigorous proposition in ZF. But if that sentence is part of a collection of sentences that refer to each other forming an attempt at a mutually recursive definition, you need to check precisely what parent-child relation you're using.

Thanks to Sridar Ramesh for making clear to me why the attempted definition of truth in ZF doesn't work. But I've probably made some mistakes above and they have nothing to do with Sridar.

Blog Archive