A Neighborhood of Infinity

Saturday, August 06, 2016

Dimensionful Matrices


Programming languages and libraries for numerical work tend not to place a lot of emphasis on the types of their data. For example Matlab, R, Octave, Fortran, and Numpy (but not the now defunct Fortress) all tend to treat their data as plain numbers meaning that any time you have a temperature and a mass, say, there is nothing to prevent you adding them.

I've been wondering how much dimensions (in the sense of dimensional analysis) and units could help with numerical programming. As I pointed out on G+ recently (which is where I post shorter stuff these days), you don't have to limit dimensions to the standard ones of length, mass, time, dollars and so on. Any scale invariance in the equations you're working with can be exploited as a dimension giving you a property that can be statically checked by a compiler.

There are quite a few libraries to statically check dimensions and units now. For example Boost.Units for C++, units for Haskell and even quantities for Idris.

A matrix that breaks things

Even if a language supports dimensions, it's typical to define objects like vectors and matrices as homogeneous containers of quantities. But have a look at the Wikipedia page on the metric tensor. There is a matrix

which has the curious property that 3 entries on the diagonal seem to be dimensionless while the first entry is a squared velocity with dimension . This will break many libraries that support units. An obvious workaround is to switch to use natural units, which is much the same as abandoning the usefulness of dimensions. But there's another way, even if it may be tricky to set up with existing languages.

Heterogeneous vectors and matrices

According to a common convention in physics, a 4-vector has dimensions where I'm using the convention that we can represent the units of a vector or matrix simply as a vector or matrix of dimensions, and here is time and is length. The metric tensor is used like this: (where I'm using the Einstein summation convention so the 's and 's are summed over). If we think of having units of length squared (it is a pseudo-Riemannian metric after all) then it makes sense to think of having dimensions given by

We can write this more succinctly as

where is the usual outer product.

I'll use the notation to mean is of type . So, for example, . I'll also use pointwise notation for types such as and .

Now I can give some general rules. If is a matrix, and are vectors, and is a scalar, then only makes sense if . Similarly the "inner product" only makes sense if .

Generic vectors and matrices

Although these kinds of types might be useful if you're dealing with the kind of heterogeneous matrices that appear in relativity, there's another reason they might be useful. If you write code (in the imaginary language that supports these structures and understands dimensions and units) to be as generic as possible in the types of the vector and matrix entries, failures to type check will point out parts of the code where there are hidden assumptions, or even errors, about scaling. For example, consider a routine to find the inverse of a 3 by 3 matrix. Writing this generically as possible means we should write it to operate on a matrix of type , say. The result should have type . If this type checks when used with a suitably powerful type checker then it means that if we replace the units for type A, say, with units twice as large, it should have no effect on the result, taking into account those units. In this case, it means that if we multiply the numbers of the first row of the input by 0.5 then the numbers of the first column of the output should get multiplied by 2. In fact this is a basic property of matrix inverses. In other words, this mathematical property of matrix inverses is guaranteed by a type system that can handle units and heterogeneous matrices. It would be impossible to write a matrix inverter that type checks and fails to have this property. Unfortunately it's still possible to write a matrix inverter that type checks and is incorrect some other way. Nonetheless this kind of type system would put a very big constraint on the code and is likely to eliminate many sources of error.

An example, briefly sketched

I thought I'd look at an actual example of a matrix inverter to see what would happen if I used a type checker like the one I've described. I looked at the conjugate gradient method. At the Wikipedia page, note the line

This would immediately fail to type check because if is of generic vector type then isn't the same type as so they can't be added. I won't go into any of the details but the easiest way to patch up this code to make it type check is to introduce a new matrix of type and besides using it to make this inner product work (replacing the numerator by ) we also use anywhere in the code we need to convert a vector of type to a vector of type . If you try to do this as sparingly as possible you'll end up with a modified algorithm. But at first this seems weird. Why should this matrix inverse routine rely on someone passing in a second matrix to make it type check? And what is this new algorithm anyway? Well scroll down the Wikipedia page and you get to the preconditioned conjugate gradient algorithm. The extra matrix we need to pass in is the preconditioner. This second algorithm would type check. Preconditioned conjugate gradient, with a suitable preconditioner, generally performs better than pure conjugate gradient. So in this case we're getting slightly more than a check on our code's correctness. The type checker for our imaginary language would give a hint on how to make the code perform better. There's a reason for this. The original conjugate gradient algorithm is implicitly making a choice of units that sets scales along the axes. These determine the course taken by the algorithm. It's not at all clear that picking these scalings randomly (which is in effect what you're doing if you throw a random problem at the algorithm) is any good. It's better to pick a preconditioner adapted to the scale of the problem and the type checker is hinting (or would be if it existed) that you need to do this. Compare with the gradient descent algorithm whose scaling problems are better known.

But which language?

I guess both Agda and Idris could be made to implement what I've described. However, I've a hunch it might not be easy to use in practice.

Friday, May 23, 2014

Cofree meets Free

> {-# LANGUAGE RankNTypes, MultiParamTypeClasses, TypeOperators #-}


After I spoke at BayHac 2014 about free monads I was asked about cofree comonads. So this is intended as a sequel to that talk. Not only am I going to try to explain what cofree comonads are. I'm also going to point out a very close relationship between cofree comonads and free monads.

At the beginning of the talk the Google Hangout software seems to have switched to the laptop camera so you can't see the slides in the video. However the slides are here.

Cothings as machines

I often think of coalgebraic things as machines. They have some internal state and you can press buttons to change that internal state. For example here is a type class for a machine with two buttons that's related to a magma:

> class TwoButton a where
>   press :: a -> (a, a)

The idea is that the state of the machine is given by some type a and you could press either the left button or the right button. The result of pressing one or other button is given by these two functions:

> pressLeft, pressRight :: TwoButton a => a -> a
> pressLeft = fst . press
> pressRight = snd . press

(As with many metaphors used to explain Haskell type classes your mileage may vary. Sometimes you'll have to stretch your imagination to see what the set of buttons is for a particular cothing.)


Just as monads are a kind of generalised algebraic structure (for example see my talk), comonads are a generalised kind of machine. The idea is that for any state of the machine there is a bunch of buttons we could press. But we don't have two buttons, or any fixed number of buttons. We instead have a functorful of buttons (if you think of functors by analogy with containers). We also don't get to directly see the internal state of the machine but instead we get to make observations.

Here's the type class:

> class Comonad w where
>   extract :: w a -> a
>   duplicate :: w a -> w (w a)

The state of the machine is given by w a. We observe the state using the extract function. And when we come to press a button, we have a functorful of new states that it could end up in. The duplicate function gives the container of those new states.

For example, various kinds of zipper give rise to comonads. Zippers allow you to "focus" on a part of a data structure. The extract operation allows you to observe the point that currently has focus. There is one button for every position in the structure where the focus could be. Pressing the corresponding button moves the focus to that point. Similarly the Store comonad has one button for each value you can store in the field it represents. Press the button and the value gets stored in the field.

Cofreeness as a way to memoise

Cofree coalgebras can be thought of as memoised forms of elements of coalgebras. For example, the TwoButton machine above has a function, press, as part of its definition. Memoising an element of such a thing means tabulating everything that could possibly happen if you pressed the buttons so we no longer need the press function. One approach is to try something like this:

data CofreeTwoButton = Memo CofreeTwoButton CofreeTwoButton

The structure contains two CofreeTwoButtons, each giving the result of pressing one of the two buttons. Any element of CofreeTwoButton may now be memoised like so:

memoiseTwoButton :: TwoButton m => m -> CofreeTwoButton
memoiseTwoButton m = Memo (memoiseTwoButton (pressLeft m)) (memoiseTwoButton (pressRight m))

It definitely tabulates the result of pressing buttons. But it has a major flaw. We have no way of seeing what's stored in the table! To make this useful we want to also store some data in the table that we can peek at. So here is a better definition:

> data CofreeTwoButton a = Memo a (CofreeTwoButton a) (CofreeTwoButton a)
> memoiseTwoButton :: TwoButton m => (m -> a) -> m -> CofreeTwoButton a
> memoiseTwoButton f m = Memo (f m) (memoiseTwoButton f (pressLeft m)) (memoiseTwoButton f (pressRight m))

The first argument to memoiseTwoButton says what we want to store in the table and then memoiseTwoButton goes ahead and stores it. We can use the identity function if we want to store the original elements.

Note how this is like foldMap:

foldMap :: Monoid m => (a -> m) -> t a -> m

if we replace t by the list functor and remember that lists are free monoids. The main difference is that arrows have been reversed. Where foldMap takes an element of a free monoid and interprets it as an element of another monoid, memoiseTwoButton packs an element of a TwoButton into a cofree structure. The "interpretation" and "packing" here are both homomorphisms for their respective structures. Homomorphisms respect equations so if an equation holds between elements of a free monoid we expect it to also hold when interpreted in another monoid. But any element of a free monoid can be interpreted in any other monoid meaning that any equation that holds between elements of a free monoid must hold in any monoid. That's why free monoids are designed so that the only equations that hold between elements are those that follow from the monoid laws.

With the TwoButton we have a dualised version of the above. Every element of every TwoButton can be packed into the CofreeTwoButton. So every equation in the original structure will still hold after the packing. So every equation that holds in some TwoButton must have some solution in CofreeTwoButton. That gives an idea of what a CofreeTwoButton is by analogy with the free monoid.

Cofree comonads

A cofree comonad is basically a memoised comonad. So the data structure is:

> data Cofree f a = Cofree a (f (Cofree f a))

At each point in the "table" we store some observable value of type a. And we have a functorful of buttons, so we expect to have a functorful of new states we could transition to. The Functor instance looks like:

> instance Functor f => Functor (Cofree f) where
>   fmap f (Cofree a fs) = Cofree (f a) (fmap (fmap f) fs)

We apply f to the observable value and then push the fmap f down to the child nodes.

The duplicate function takes a memoised state and replaces the observable stored at each position with the memoised state that gives rise to the observable.

> instance Functor f => Comonad (Cofree f) where
>   extract (Cofree a _) = a
>   duplicate c@(Cofree _ fs) = Cofree c (fmap duplicate fs)

Now by analogy with memoiseTwoButton we can memoise comonads.

> memoiseComonad :: (Comonad w, Functor f) =>
>                   (forall x.w x -> f x) -> (forall x.w x -> Cofree f x)
> memoiseComonad f w = Cofree (extract w) (fmap (memoiseComonad f) (f (duplicate w)))

So that's what a cofree comonad is: it's a type that can be used to memoise all of the states that are accessible from a state in a comonad by pressing its buttons.

Cofree comonad meets free monad

But that's not all. There is a close relationship between cofree comonads and free monads. So to get going, here's a free monad type:

> data Free f a = Id a | Free (f (Free f a))

> join' :: Functor f => Free f (Free f a) -> Free f a > join' (Id x) = x > join' (Free fa) = Free (fmap join' fa)

> instance Functor f => Functor (Free f) where > fmap f (Id x) = Id (f x) > fmap f (Free fa) = Free (fmap (fmap f) fa)

> instance Functor f => Monad (Free f) where > return = Id > m >>= f = join' (fmap f m)

Now I'll define a kind of pairing between functors. Given a way to combine two kinds of element, the pairing gives a way to combine a pair of containers of those elements.

> class (Functor f, Functor g) => Pairing f g where
>   pair :: (a -> b -> r) -> f a -> g b -> r

> data Identity a = Identity a > instance Functor Identity where > fmap f (Identity x) = Identity (f x)

> instance Pairing Identity Identity where > pair f (Identity a) (Identity b) = f a b

> data (f :+: g) x = LeftF (f x) | RightF (g x) > instance (Functor f, Functor g) => Functor (f :+: g) where > fmap f (LeftF x) = LeftF (fmap f x) > fmap f (RightF x) = RightF (fmap f x)

> data (f :*: g) x = f x :*: g x > instance (Functor f, Functor g) => Functor (f :*: g) where > fmap f (x :*: y) = fmap f x :*: fmap f y

> instance (Pairing f f', Pairing g g') => Pairing (f :+: g) (f' :*: g') where > pair p (LeftF x) (a :*: _) = pair p x a > pair p (RightF x) (_ :*: b) = pair p x b

> instance (Pairing f f', Pairing g g') => Pairing (f :*: g) (f' :+: g') where > pair p (a :*: _) (LeftF x) = pair p a x > pair p (_ :*: b) (RightF x) = pair p b x

> instance Pairing ((->) a) ((,) a) where > pair p f = uncurry (p . f)

Given a pairing between f and g we get one between Cofree f and Free g.

> instance Pairing f g => Pairing (Cofree f) (Free g) where
>   pair p (Cofree a _) (Id x) = p a x
>   pair p (Cofree _ fs) (Free gs) = pair (pair p) fs gs

An element of Free g can be thought of as an expression written in a DSL. So this pairing gives a way to apply a monadic expression to a memoised comonad. In other words, if you think of comonads as machines, monads give a language that can be used to compute something based on the output of the machine.

Here's an almost trivial example just so you can see everything working together. A reasonable definition of a comagma structure on the type a is a -> UpDown a with UpDown defined as:

> data UpDown a = Up a | Down a

> instance Functor UpDown where > fmap f (Up a) = Up (f a) > fmap f (Down a) = Down (f a)

> type CofreeComagma a = Cofree UpDown a

A well known comagma structure on the positive integers is given by the famous Collatz conjecture:

> collatz :: Integer -> UpDown Integer
> collatz n = if even n then Down (n `div` 2) else Up (3*n+1)

We can memoise this as a cofree comonad:

> memoisedCollatz :: Integer -> CofreeComagma Integer
> memoisedCollatz n = Cofree n (fmap memoisedCollatz (collatz n))

Here's a picture of memoisedCollatz 12:

Now let's make the dual functor in readiness for building the dual monad:

> data Two a = Two a a
> instance Functor Two where
>   fmap f (Two a b) = Two (f a) (f b)

And here we set up a pairing:

> instance Pairing UpDown Two where
>   pair f (Up a) (Two b _) = f a b
>   pair f (Down a) (Two _ c) = f a c

> execute :: Cofree UpDown x -> Free Two (x -> r) -> r > execute w m = pair (flip ($)) w m

This gives rise to a free monad isomorphic to the one in my talk:

> data Direction = WentUp | WentDown deriving Show

> choose :: Free Two Direction > choose = Free (Two (return WentUp) (return WentDown))

And here's an example of some code written in the corresponding DSL:

> ex1 :: Free Two (Integer -> String)
> ex1 = do
>   x <- choose
>   y <- choose
>   case (x, y) of
>       (WentDown, WentDown) -> return (\z -> "Decreased twice " ++ show z)
>       _ -> return show

It can be represented as:

And here's what happens when they meet:

> go1 :: String
> go1 = execute (memoisedCollatz 12) ex1

This can be understood through the combined picture:


On getting monads from comonads more generally see Monads from Comonads. For more on memoising and how it's really all about the Yoneda lemma see Memoizing Polymorphic Functions. I'm waiting for Tom Leinster to publish some related work. The pairing above gives a way for elements of free monads to pick out elements of cofree comonads and is a special case of what I'm talking about here. But I think Tom has some unpublished work that goes further.

If you think of a comonad as a compressed object that is decompressed by a monadic decision tree, then you'd expect some form of information theoretical description to apply. That makes me think of Convex spaces and an operadic approach to entropy.

Saturday, May 17, 2014

Types, and two approaches to problem solving


There are two broad approaches to problem solving that I see frequently in mathematics and computing. One is attacking a problem via subproblems, and another is attacking a problem via quotient problems. The former is well known though I’ll give some examples to make things clear. The latter can be harder to recognise but there is one example that just about everyone has known since infancy.


Consider sorting algorithms. A large class of sorting algorithms, including quicksort, break a sequence of values into two pieces. The two pieces are smaller so they are easier to sort. We sort those pieces and then combine them, using some kind of merge operation, to give an ordered version of the original sequence. Breaking things down into subproblems is ubiquitous and is useful far outside of mathematics and computing: in cooking, in finding our path from A to B, in learning the contents of a book. So I don’t need to say much more here.

Quotient problems

The term quotient is a technical term from mathematics. But I want to use the term loosely to mean something like this: a quotient problem is what a problem looks like if you wear a certain kind of filter over your eyes. The filter hides some aspect of the problem that simplifies it. You solve the simplified problem and then take off the filter. You now ‘lift’ the solution of the simplified problem to a solution to the full problem. The catch is that your filter needs to match your problem so I’ll start by giving an example where the filter doesn’t work.

Suppose we want to add a list of integers, say: 123, 423, 934, 114. We can try simplifying this problem by wearing a filter that makes numbers fuzzy so we can’t distinguish numbers that differ by less than 10. When we wear this filter 123 looks like 120, 423 looks like 420, 934 looks like 930 and 114 looks like 110. So we can try adding 120+420+930+110. This is a simplified problem and in fact this is a common technique to get approximate answers via mental arithmetic. We get 1580. We might hope that when wearing our filters, 1580 looks like the correct answer. But it doesn’t. The correct answer is 1594. This filter doesn’t respect addition in the sense that if a looks like a’ and b looks like b’ it doesn’t follow that a+b looks like a’+b’.

To solve a problem via quotient problems we usually need to find a filter that does respect the original problem. So let’s wear a different filter that allows us just to see the last digit of a number. Our original problem now looks like summing the list 3, 3, 4, 4. We get 4. This is the correct last digit. If we now try a filter that allows us to see just the last two digits we see that summing 23, 23, 34, 14 does in fact give the correct last two digits. This is why the standard elementary school algorithms for addition and multiplication work through the digits from right to left: at each stage we’re solving a quotient problem but the filter only respects the original problem if it allows us to see the digits to the right of some point, not digits to the left. This filter does respect addition in the sense that if a looks like a’ and b looks like b’ then a+b looks like a’+b’.

Another example of the quotient approach is to look at the knight’s tour problem in the case where two opposite corners have been removed from the chessboard. A knight’s tour is a sequence of knight’s moves that visit each square on a board exactly once. If we remove opposite corners of the chessboard, there is no knight’s tour of the remaining 62 squares. How can we prove this? If you don’t see the trick you can get get caught up in all kinds of complicated reasoning. So now put on a filter that removes your ability to see the spatial relationships between the squares so you can only see the colours of the squares. This respects the original problem in the sense that a knight’s move goes from a black square to a white square, or from a white square to a black square. The filter doesn’t stop us seeing this. But now it’s easier to see that there are two more squares of one colour than the other and so no knight’s tour is possible. We didn’t need to be able to see the spatial relationships at all.

(Note that this is the same trick as we use for arithmetic, though it’s not immediately obvious. If we think of the spatial position of a square as being given by a pair of integers (x, y), then the colour is given by x+y modulo 2. In other words, by the last digit of x+y written in binary. So it’s just the see-only-digits-on-the-right filter at work again.)

Wearing filters while programming

So now think about developing some code in a dynamic language like Python. Suppose we execute the line:

a = 1

The Python interpreter doesn’t just store the integer 1 somewhere in memory. It also stores a tag indicating that the data is to be interpreted as an integer. When you come to execute the line:

b = a+1

it will first examine the tag in a indicating its type, in this case int, and use that to determine what the type for b should be.

Now suppose we wear a filter that allows us to see the tag indicating the type of some data, but not the data itself. Can we still reason about what our program does?

In many cases we can. For example we can, in principle, deduce the type of


if we know the types of a, b, c, d. (As I’ve said once before, it’s hard to make any reliable statement about a bit of Python code so let's suppose that a, b, c and d are all either of type int or type float.) We can read and understand quite a bit of Python code wearing this filter. But it’s easy to go wrong. For example consider

if a>1 then:
return 1.0
return 1

The type of the result depends on the value of the variable a. So if we’re wearing the filter that hides the data, then we can’t predict what this snippet of code does. When we run it, it might return an int sometimes and a float other times, and we won’t be able to see what made the difference.

In a statically typed language you can predict the type of an expression knowing the type of its parts. This means you can reason reliably about code while wearing the hide-the-value filter. This means that almost any programming problem can be split into two parts: a quotient problem where you forget about the values, and then problem of lifting a solution to the quotient problem to a solution to the full problem. Or to put that in more conventional language: designing your data and function types, and then implementing the code that fits those types.

I chose to make the contrast between dynamic and static languages just to make the ideas clear but actually you can happily use similar reasoning for both types of language. Compilers for statically typed languages, give you a lot of assistance if you choose to solve your programming problems this way.

A good example of this at work is given in Haskell. If you're writing a compiler, say, you might want to represent a piece of code as an abstract syntax tree, and implement algorithms that recurse through the tree. In Haskell the type system is strong enough that once you’ve defined the tree type the form of the recursion algorithms is often more or less given. In fact, it can be tricky to implement tree recursion incorrectly and have the code compile without errors. Solving the quotient problem of getting the types right gets you much of the way towards solving the full problem.

And that’s my main point: types aren’t simply a restriction mechanism to help you avoid making mistakes. Instead they are a way to reduce some complex programming problems to simpler ones. But the simpler problem isn’t a subproblem, it’s a quotient problem.

Dependent types

Dependently typed languages give you even more flexibility with what filters you wear. They allow you to mix up values and types. For example both C++ and Agda (to pick an unlikely pair) allow you to wear filters that hide the values of elements in your arrays while allowing you to see the length of your arrays. This makes it easier to concentrate on some aspects of your problem while completely ignoring others.


I wrote the first draft of this a couple of years ago but never published it. I was motivated to post by a discussion kicked off by Voevodsky on the TYPES mailing list http://lists.seas.upenn.edu/pipermail/types-list/2014/001745.html

This article isn’t a piece of rigorous mathematics and I’m using mathematical terms as analogies.

The notion of a subproblem isn’t completely distinct from a quotient problem. Some problems are both, and in fact some problems can be solved by transforming them so they become both.

More generally, looking at computer programs through different filters is one approach to abstract interpretation http://en.wikipedia.org/wiki/Abstract_interpretation. The intuition section there (http://en.wikipedia.org/wiki/Abstract_interpretation#Intuition) has much in common with what I’m saying.

Friday, April 25, 2014

The Monad called Free


As Dan Doel points out here, the gadget Free that turns a functor into a monad is itself a kind of monad, though not the usual kind of monad we find in Haskell. I'll call it a higher order monad and you can find a type class corresponding to this in various places including an old version of Ed Kmett's category-extras. I'll borrow some code from there. I hunted around and couldn't find an implementation of Free as an instance of this class so I thought I'd plug the gap.

> {-# LANGUAGE RankNTypes, FlexibleContexts, InstanceSigs, ScopedTypeVariables #-}

> import Control.Monad > import Data.Monoid

To make things unambiguous I'll implement free monads in the usual way here:

> data Free f a = Pure a | Free (f (Free f a))

> instance Functor f => Functor (Free f) where > fmap f (Pure a) = Pure (f a) > fmap f (Free a) = Free (fmap (fmap f) a)

> instance Functor f => Monad (Free f) where > return = Pure > Pure a >>= f = f a > Free a >>= f = Free (fmap (>>= f) a)

The usual Haskell typeclass Monad corresponds to monads in the category of types and functions, Hask. We're going to want monads in the category of endomorphisms of Hask which I'll call Endo.

The objects in Endo correspond to Haskell's Functor. The arrows in Endo are the natural transformations between these functors:

> type Natural f g = (Functor f, Functor g) => forall a. f a -> g a

So now we are led to consider functors in Endo.

> class HFunctor f where

A functor in Endo must map functors in Hask to functors in Hask. So if f is a functor in Endo and g is a functor in Hask, then f g must be another functor in Hask. So there must be an fmap associated with this new functor. There's an associated fmap for every g and we collect them all into one big happy natural family:

>     ffmap :: Functor g => (a -> b) -> f g a -> f g b

But note also that by virtue of being a functor itself, f must have its own fmap type function associated with it. The arrows in Endo are natural transformations in Hask so the fmap for HFunctor must take arrows in Endo to arrows in Endo like so:

>     hfmap :: (Functor g, Functor h) => Natural g h -> Natural (f g) (f h)

Many constructions in the category Hask carry over to Endo. In Hask we can form a product of type types a and b as (a, b). In Endo we form the product of two functors f and g as

> data Product f g a = Product (f (g a))

Note that this product isn't commutative. We don't necessarily have an isomorphism from Product f g to Product g f. (This breaks many attempts to transfer constructions from Hask to Endo.) We also won't explicitly use Product because we can simply use the usual Haskell composition of functors inline.

We can implement some functions that act on product types in both senses of the word "product":

> left :: (a -> c) -> (a, b) -> (c, b)
> left f (a, b) = (f a, b)

> right :: (b -> c) -> (a, b) -> (a, c) > right f (a, b) = (a, f b)

> hleft :: (Functor a, Functor b, Functor c) => Natural a c -> a (b x) -> c (b x) > hleft f = f

> hright :: (Functor a, Functor b, Functor c) => Natural b c -> a (b x) -> a (c x) > hright f = fmap f

(Compare with what I wrote here.)

We have something in Endo a bit like the type with one element in Hask, namely the identity functor. The product of a type a with the one element type in Hask gives you something isomorphic to a. In Endo the product is composition for which the identity functor is the identity. (Two different meanings of the word "identity" there.)

We also have sums. For example, if we define a functor like so

> data F a = A a | B a a

we can think of F as a sum of two functors: one with a single constructor A and another with constructor B.

We can now think about reproducing an Endo flavoured version of lists. The usual definition is isomorphic to:

> data List a = Nil | Cons a (List a)

And it has a Monoid instance:

> instance Monoid (List a) where
>   mempty = Nil
>   mappend Nil as = as
>   mappend (Cons a as) bs = Cons a (mappend as bs)

We can try to translate that into Endo. The Nil part can be thought of as being an element of a type with one element so it should become the identity functor. The Cons a (List a) part is a product of a and List a so that should get replaced by a composition. So we expect to see something vaguely like:

List' a = Nil' | Cons' (a (List' a))

That's not quite right because List' a is a functor, not a type, and so acts on types. So a better definition would be:

List' a b = Nil' b | Cons' (a (List' a b))

That's just the definition of Free. So free monads are lists in Endo. As everyone knows :-) monads are just monoids in the category of endofunctors. Free monads are also just free monoids in the category of endofunctors.

So now we can expect many constructions associated with monoids and lists to carry over to monads and free monads.

An obvious one is the generalization of the singleton map a -> List a:

> singleton :: a -> List a
> singleton a = Cons a Nil

> hsingleton :: Natural f (Free f) > hsingleton f = Free (fmap Pure f)

Another is the generalization of foldMap. This can be found under a variety of names in the various free monad libraries out there but this implementation is designed to highlight the similarity between monoids and monads:

> foldMap :: Monoid m => (a -> m) -> List a -> m
> foldMap _ Nil = mempty
> foldMap f (Cons a as) = uncurry mappend $ left f $ right (foldMap f) (a, as)

> fold :: Monoid m => List m -> m > fold = foldMap id

> hFoldMap :: (Functor f, Functor m, Monad m) => Natural f m -> Natural (Free f) m > hFoldMap _ (Pure x) = return x > hFoldMap f (Free x) = join $ hleft f $ hright (hFoldMap f) x

> hFold :: Monad f => Natural (Free f) f > hFold = hFoldMap id

The similarity here isn't simply formal. If you think of a list as a sequence of instructions then foldMap interprets the sequence of instructions like a computer program. Similarly hFoldMap can be used to interpret programs for which the free monad provides an abstract syntax tree.

You'll find some of these functions here by different names.

Now we can consider Free. It's easy to show this is a HFunctor by copying a suitable definition for List:

> instance Functor List where
>   fmap f = foldMap (singleton . f)

> instance HFunctor Free where > ffmap = fmap > hfmap f = hFoldMap (hsingleton . f)

We can define HMonad as follows:

> class HMonad m where
>     hreturn :: Functor f => f a -> m f a
>     hbind :: (Functor f, Functor g) => m f a -> Natural f (m g) -> m g a

Before making Free an instance, let's look at how we'd make List an instance of Monad

> instance Monad List where
>     return = singleton
>     m >>= f = fold (fmap f m)

And now the instance I promised at the beginning.

> instance HMonad Free where
>     hreturn = hsingleton
>     hbind m f = hFold (hfmap f m)

I've skipped the proofs that the monad laws hold and that hreturn and hbind are actually natural transformations in Endo. Maybe I'll leave those as exercises for the reader.


After writing this I tried googling for "instance HMonad Free" and I found this by haasn. There's some other good stuff in there too.

Saturday, February 01, 2014

Reinversion Revisited


A while back I talked about the idea of reinversion of control using the continuation monad to wrest control back from an interface that only wants to call you, but doesn't want you to call them back. I want to return to that problem with a slightly different solution. The idea is that we build an interpreter for an imperative language that's an embedded Haskell DSL. You arrange that the DSL does the work of waiting to be called by the interface, but from the point of view of the user of the DSL it looks like you're calling the shots. To do this I'm going to pull together a bunch of techniques I've talked about before. This approach is largely an application of what apfelmus described here.

The code

We'll start with some administrative stuff before getting down to the real code:

> {-# LANGUAGE TemplateHaskell #-}

> import Control.Lens > import Control.Monad > import Control.Monad.Loops

We'll make our DSL an imperative wrapper around Gloss:

> import Graphics.Gloss.Interface.Pure.Game

We'll define a structure that can be used to represent the abstract syntax tree (AST) of our DSL. Our DSL will support the reading of inputs, adding pictures to the current picture, and clearing the screen.

First we'll need a wrapper that allows us to represent ordinary Haskell values in our DSL:

> data Basic a = Return a

Now we want an expression that represents events given to us by Gloss. Internally we'll represent this by a function that says what our program does if it's given an event. It says what our program does by returning another AST saying what happens when the input is received. (I've previously talked about these kinds of expression trees here).

>              | Input (Event -> Basic a)

We have a command to render some graphics. It appends a new Picture to the current picture. Again, part of the AST muct be another AST saying what happens after the picture is rendered:

>              | Render Picture (Basic a)

And lastly here's the AST for a clear screen command:

>              | Cls (Basic a)

Our AST will form a monad. This will allow us to build ASTs using ordinary Haskell do-notation. This technique is what I described previously here.

> instance Monad Basic where
>     return = Return
>     Return a >>= f = f a
>     Input handler >>= f = Input (\e -> handler e >>= f)
>     Render p a >>= f = Render p (a >>= f)
>     Cls a >>= f = Cls (a >>= f)

You can think of the expression x >>= f as x with the tree f a grafted in to replace any occurrence of Return a in it. This is exactly what Return a >>= f does. But applying >>= f to the other ASTs simply digs down "inside" the ASTs to find other occurrences of Return a.

It's convenient to uses lenses to view Gloss's game world:

> data World = World { _program :: Basic (), _picture :: Picture }
> $(makeLenses ''World)

And now we have some wrappers around the interpreter's commands. The return () provides the convenient place where we can graft subtrees into our AST.

> input = Input return
> render p = Render p (return ())
> cls = Cls (return ())

Now we can start coding. Here's a test to see if a Gloss event is a key down event:

> keydown (EventKey (Char key)            Down _ _) = True
> keydown (EventKey (SpecialKey KeySpace) Down _ _) = True
> keydown _ = False

And now here's a complete program using our DSL. It's deliberately very imperative. It simply iterates over a nested pair of loops, collecting keystrokes and displaying them. It reads a lot like an ordinary program written in a language like Python or Basic:

> mainProgram = do
>     render (Color white $ Scale 0.2 0.2 $ Text "Type some text")

> forM_ [780, 760..] $ \ypos -> do > forM_ [0, 20..980] $ \xpos -> do

> event <- iterateUntil keydown $ input

> let key = case event of > EventKey (Char key) Down _ _ -> key > EventKey (SpecialKey KeySpace) Down _ _ -> ' '

> when (ypos == 780 && xpos == 0) $ cls > render $ Color white $ Translate (xpos-500) (ypos-400) $ Scale 0.2 0.2 $ Text $ [key]

Here is where we launch everything, placing our program and starting Blank picture into the World.

> main = play (InWindow "Basic" (1000, 800) (10, 10))
>  black 
>  60
>  (World mainProgram Blank)
>  (^. picture)
>  handleEvent
>  (const id)

So now we need just one more ingredient, an actual interpreter for our AST. It's the event handler:

> handleEvent :: Event -> World -> World

The Return command is purely a place to graft in subtrees. It should never be interpreted.

> handleEvent _ (World (Return a) _) = error "error!"

After receiving some input, I want the interpreter to keep interpreting commands such as Cls that don't need any more input. I'm going to do this by using a null event EventMotion (0,0). But when an input really is desired, I want this null event to be ignored.

> handleEvent (EventMotion (0, 0)) state@(World (Input handler) _) = state

We render something by mappending it to the current picture stored in the World. But the rendering is carried out by the event handler. We update the state so that at the next event, the subtree of the AST is executed. This means that after updating the picture, the event still needs to be handed back to the event handler:

> handleEvent event state@(World (Render p cont) _) = state & (picture <>~ p)    & (program .~ cont) & handleEvent event

Clearing the screen is similar:

> handleEvent event state@(World (Cls cont)      _) = state & (picture .~ Blank) & (program .~ cont) & handleEvent event

And now we need to handle inputs. We do this by applying the "what happens when the input is received" function to the event. The result is put back in the state indicating that this is what we want to happen at the next event. So the interpreter doesn't stop here, waiting for the next event, the interpreter sends itself a null event.

> handleEvent event state@(World (Input handler) _) = state & (program .~ handler event) & handleEvent (EventMotion (0, 0))

And that's it!

There are many changes that can be made. We can easily add more commands and make the state more complex. But you might also notice that we create the AST only to tear it apart again in the interpreter. We can actually elide the AST creation, but that will eventually bring us back to something like what I originally posted. This shouldn't be a big surprise, I've already shown how any monad can be replaced with the continuation monad here. By the way, it's pretty easy to add a Fork command. You can replace the _program :: Basic() field with _program :: [Basic ()] and interpret this as a list of threads using a scheduler of your choice.


I was prompted to write this (a little late, I know) after reading this article and Tekmo's post on reddit. I think ultimately continuations may perform better than using ASTs. But sometimes it's nice to build an AST because they give you an object that can easily be reasoned about and manipulated by code. Much as I love trickery with continuations, I find ASTs are much easier to think about.


My real motivation was that I was thinking about games. The rules of games are often given in imperative style: first player 1 does this. Then they do this. If this happens they do that. And then it's player two's turn. I wanted my Haskell code to reflect that style.


Added 'null' event to keep interpreter going when it makes sense to do so, but there's no event pending.

Friday, October 25, 2013

Distributed computing with alien technology


Suppose we are given a function of boolean arguments that returns a boolean result. Alice has bits, and Bob has another bits . Alice and Bob are widely separated and don't know each other's bits. What is the total number of bits that Alice has to send to Bob and that Bob has to send to Alice so that between them they can compute ? Think about how complex might get. The and might each describe half of a "voxelised" region of space and might answer a question about a computational fluid dynamics (CFD) simulation running in that space. CFD simulations can be chaotic and so we might expect that in the worst case many bits have to be transferred back and forth between Alice and Bob. In the worst case we might expect that Alice has to send Bob all of her bits, or vice versa.

But in fact Alice needs to send Bob just one bit.

A loophole

To get the communication requirements down to one bit we need to use a loophole. But I hope to (1) justify the cheat to some extent and (2) justify that it's even worthwhile to think about cheats.

Alice and Bob have access to some Ancient technology. They each have one of a pair of boxes. At prearranged times, Alice puts a bit into her box, and Bob puts a bit into his box. A bit pops back out of Alice's box and a bit pops back out of Bob's box. Whatever the input, both Alice and Box have a 0.5 chance of seeing a one or zero pop out of their respective boxes. But when the two outputs are XORed together the result is the logical AND of the two inputs. With such boxes, Alice can compute after Bob sends a single bit down a conventional communication channel.

"But this is a total cheat!" you complain before I even start to explain their technique. It seems Alice receives a bit that depends on what Bob input, and so Bob is communicating with Alice. But look closely and you'll see that the boxes don't allow any communication. No matter what Bob inputs, Alice has a 0.5 chance of getting zero or one. There is no way Bob can use this to communicate anything. It's like intercepting a message encrypted with a one time pad. Without the pad, the message is basically a sequence of random bits. Nonetheless, it is true that the outputs that Alice and Bob see are correlated.

I hope I've convinced you that Alice and Bob can't send any bits with these boxes. Despite this, it is pretty clear that the behaviour of the boxes is non-local. We'll call any kind of boxes that allow instantaneous long range correlations that can't be explained by purely local behaviour non-local boxes. Boxes that can't be used for message sending are called non-signalling local boxes. And the particular non-local box I describe above is called a PR box (eg. see here).

(BTW As an aside note that as the box results in widely separated outputs that are correlated, but doesn't allow communication, it's an example of how non-locality doesn't imply communication. Usually when people want to give examples of such a thing they talk about quantum mechanics. But there's no need to mention quantum mechanics to explain the behaviour of these particular non-local boxes.)

The method

Any single bit boolean function of a finite sequence of bits can be written as a polynomial modulo 2. Each monomial in the polynomial can be written as a product of terms involing just the and terms involving just the , ie.

where depends only on the , depends only on the and is drawn from some finite set. Alice can compute the and Bob can compute the . Now Alice and Bob, in parallel, feed and respectively into their PR boxes. We know that we could evaluate each term in the sum we want by adding Alice's output to Bob's output. But that would require sending one one-bit message for each . But we don't need each term one by one; we just want the sum. So Alice and Bob can individually sum their separate outputs knowing that adding Alice's output and Bob's output modulo 2 will be the correct sum. So Bob sends his sum to Alice. Alice adds that number to her own (modulo 2) and that's the value we want. Only one one-bit message was sent.

But what about reality?

Non-local boxes don't exist, do they? So why are we talking about them?

Actually, non-local boxes exist both theoretically and in the lab. Non-local correlations in quantum mechanics allow them to be constructed. But for this article I wanted to abstract from quantum mechanics and talk about the behaviour of a non-local box without getting my hands dirty with the details of quantum mechanics. Having said that, although non-local boxes do exist, the special case of the PR box can't in fact be constructed with quantum mechanics. In some sense it allows correlations that are "too strong". An article I wrote a while back describes the closest you can get to building a PR box with quantum correlations. Curiously, if you restrict yourself to the kind of non-local box quantum mechanics allows you to build you find that some functions can still be computed with less communication than you'd need if non-local correlations are disallowed. Nonetheless, the worst case scenario with QM still requires the sending of bits.

Going further there's an interesting conjecture. It says that any non-local box that is even marginally better (in some sense) than what quantum mechanics allows is powerful enough to allow the computation of any with only a single bit of communication. It suggests that quantum mechanics is right at the edge of the space of possible physics that make life difficult for us. If quantum mechanics were to be tweaked the tiniest amount to make correlations any stronger, large numbers of difficult distributed computing problems would suddenly collapse to become trivial. If the conjecture is true it means that nature looks a bit like a conspiracy to keep computer scientists in work. (It's possible the conjecture has been decided one way or the other by now.)

Final words

There are a couple of papers about universes where PR boxes can be built; so called boxworlds. There is a lot of interesting theoretical work in characterising quantum mechanics. In particular there are a number of theorems and conjectures that describe QM in the form "the most X theory that doesn't allow Y" where X is an interesting property and Y is something you'd like to do.


I learnt all of this from the paper Implausible Consequences of Superstrong Nonlocality by Wim van Dam.

Blog Archive