Thursday, November 26, 2009

Programming with impossible functions, or how to get along without monads.

A recent paper by Hyland and Power speculates about an alternative universe in which languages like Haskell develop along a path in which monads don't play a starring role. This post will be about making tentative steps in that direction to see what developer-friendly alternatives might exist to solve problems like writing code using I/O in an elegant and readable manner. I think all of what I'll say was worked out 20 years or more ago but there may be some good reasons to think about this stuff again.

We'll need some Haskell language extensions


> {-# LANGUAGE GADTs, RankNTypes, TypeOperators #-}

> import Prelude hiding (read, IO)
> import Control.Monad.Reader hiding (Reader)


Introduction
Let's start with this problem: We'd like to write pure functional code to read input values of type i sequentially from some source. In an imperative language we could define a function taking no arguments and returning type i. It might look like read :: () -> i, or even just read :: i. But in Haskell such a thing is impossible. Unlike in imperative languages, the value of something of type read :: i is a constant so it can't give a different value each time it is used. How could we implement something like the impossible function read in Haskell?

Here's a general sketch of how I plan to proceed: In some sense we want to add a new language feature to Haskell. So we need a new language that is Haskell plus read. Rather than write a whole new language, we can just define an embedded DSL using Haskell as our host language. When we want to perform read operations we build an abstract syntax tree (AST) for our code in the new language and pass it to an interpreter. Sounds complicated, But the last step will be to show how we can eliminate all of these steps and be left with simple and efficient Haskell code that does exactly what we want. I'll show you what the final code will look like for our read example:


> read_example1 =
> read $ \x ->
> read $ \y ->
> let z = x+y in
> read $ \w ->
> rvalue (z*w)


The same methodology will also work for other non-functional features.

A Reader
We start by defining a type to represent our AST in our read-extended language:


> data Reader i x where


I'm using GADTs because it's useful to make explicit the type of the constructors. The reason for i and x will be explained by the signatures of the constructors. Firstly, we need to represent the value of computations of any type that don't read anything. So we need a constructor like:


> RValue :: x -> Reader i x


Now how should we represent a computation that reads something? Well suppose we're reading something of type i (for input) from some source. We'd like the AST to tell us how such a value would be used once it is read. We express such an idea by writing a function that takes an input value and maps it to an AST. In other words:


> Read :: (i -> Reader i x) -> Reader i x


The argument to Read is an example of a continuation. It describes how to continue after an input is received.

That's it. Now we can write code like


> read_example2 = Read $ \x ->
> Read $ \y ->
> RValue (x+y)


There's just one thing wrong. It doesn't do anything. We have the syntax but no semantics. So we need to write an interpreter. We have some choice here, but I'll do something easy. I'll make our imaginary read function pick off values one by one from a list of inputs. Here is the implementation. We interpret an RValue by simply unwrapping it and ignoring the list of inputs:


> read_interpreter (RValue x) is = x


We handle Read by picking off the first input, applying our continuation to it, and then carrying on by interpreting the result:


> read_interpreter (Read cont) (i:is) = read_interpreter (cont i) is


Now we're ready to go:


> go1 = read_interpreter read_example2 [1,2,3,4,5]


One obvious advantage of leaving things as an AST is that we can apply different interpreters to read from different sources. But if we know what our source is to be we can optimise a bit by completely eliminating the interpreter. We can do this using a completely generic method that I described here. The Reader type is a pretty standard recursive type. In our examples we build a type, but when we interpret we immediately tear it apart again in a reduction operation. The reduction is an example of a generalised fold. Following what I described in that earlier post, the reduction operation looks like this:


> rfold :: (x -> w) -> ((c -> w) -> w) -> Reader c x -> w
> rfold rvalue read (RValue x) = rvalue x
> rfold rvalue read (Read cont) = read (rfold rvalue read . cont)


If you look at read_interpreter you can see there are two parts. There's the part that does the work of picking off inputs, and there's the part that recursively applies read_interpreter. rfold provides the latter so we only have to implement the former. So we now get:


> read_interpreter' = rfold rvalue read where
> rvalue x is = x
> read cont (i:is) = cont i is

> go2 = read_interpreter' read_example2 [1,2,3,4,5]


Now comes the magic step described in that earlier post. Rather than separately build a tree, and then fold it, we can fuse the two operations together by simply replacing the constructors with the arguments to rfold:


> read cont (i:is) = cont i is
> rvalue x is = x


And now we can test the first example:


> go3 = read_example1 [1,2,3,4,5]


All that remains is to work out the details of some more examples.

Writer
An obvious one is a writer. In an imperative language we'd expect a signature like write :: o -> (). Our AST will need a node to represent a write operation. It'll need to contain both the written value, and the continuation. In this case the continuation doesn't actually depend on a hypothetical input, it's just another AST. So we get


> data Writer o x where
> Write :: o -> Writer o x -> Writer o x
> WValue :: x -> Writer o x


Again we need to interpret. We can do this a bit like the Writer monad. The final value will be a pair consisting of a value and a list of outputs.


> write_interpreter (Write o x) = let (v, os) = write_interpreter x in (v, o:os)
> write_interpreter (WValue x) = (x, [])


And now we can implement things like:


> write_example1 from to =
> if from<=to
> then Write from $ write_example1 (from+1) to
> else WValue ()

> go4 = write_interpreter (write_example1 10 20)


Again we can construct a fold for this type:


> wfold :: (x -> w) -> (o -> w -> w) -> Writer o x -> w
> wfold value write (WValue x) = value x
> wfold value write (Write o x) = write o (wfold value write x)

> write_interpreter' = wfold value write where
> value x = (x, [])
> write o (x, os) = (x, o:os)


And eliminating the fold gives us:


> wvalue x = (x, [])
> write o (x, os) = (x, o:os)


Maybe you might have guessed an implementation like that without going through the AST. But it's good to step back and look at the pattern emerging here. For an imperative function of type v1 -> v2 -> ... -> vn -> w we want an AST constructor of type v1 -> v2 -> ... -> vn -> (w -> AST) -> AST, where AST is the type of our tree. Figuring that out gives us a half-way point making the job of writing our interpreter slightly easier. You can sort of read the signature as saying "there's no function mapping mapping v1 -> v2 -> ... -> vn -> v doing what I want, but imagine I had one anyway, what would I do with the result it gave me?".

But, if you look closely at the definition of Writer you might notice that it is, in fact, a type of list structure. It's basically a list of elements of type o terminating in an element of type x. All the interpreter is doing is converting it to a conventional list structure. And if you go back to the definition of Reader you'll see that it too is exactly the right type for the class. It's a sort of dual to a list, one that consumes elements as you work along its length rather than yielding them. We only need an interpreter because we want to apply this object to a specific datatype, a list of inputs.

The fact that we can derive these types automatically from signatures is a really wonderful and deep point that I'm going to have to return to in another post. Usually, when you implement a monad, you have to come up with a type that can express the semantics you want. But that type can (in some sense) be derived automatically from the signatures of the operations you want, and some equations specifying how these operations interact.

IO
We can now try to comine the writer and reader to handle simultaneous input and output:


> data IO i o x where
> IValue :: x -> IO i o x
> IRead :: (i -> IO i o x) -> IO i o x
> IWrite :: o -> IO i o x -> IO i o x


The interpreter is very similar to each of the individual interpreters. It applies our AST to a list of inputs and gives us back a pair consisting of a return value and a list of outputs:


> io_interpreter (IValue x) is = (x, [])
> io_interpreter (IRead cont) (i:is) = io_interpreter (cont i) is
> io_interpreter (IWrite o cont) is = let (x, os) = io_interpreter cont is in (x, o:os)

> io_example1 =
> IRead $ \x ->
> IRead $ \y ->
> IWrite (x+y) $
> IValue ()

> go5 = io_interpreter io_example1 [1..10]


The corresponding fold function is an amalgam of the previous ones:


> ifold :: (x -> w) -> ((i -> w) -> w) -> (o -> w -> w) -> IO i o x -> w
> ifold value read write (IRead cont) = read (ifold value read write . cont)
> ifold value read write (IWrite o cont) = write o (ifold value read write cont)
> ifold value read write (IValue x) = value x

> io_interpreter' = ifold value read write where
> value x is = (x, [])
> read cont (i:is) = cont i is
> write o cont is = let (x, os) = cont is in (x, o:os)

> go6 = io_interpreter' io_example1 [1..10]

> ivalue x is = (x, [])
> iread cont (i:is) = cont i is
> iwrite o cont is = let (x, os) = cont is in (x, o:os)

> io_example2 =
> iread $ \x ->
> iread $ \y ->
> iwrite (x+y) $
> ivalue ()

> go7 = io_example2 [1..10]


Playing with Control Flow
But we're not just limited to these kinds of side-effects. We can also play with control flow. For example, consider an imperative function of tye throw :: e -> () that throws a value out of our computation to the top level. If we followed the description above we'd be tempted to replace that with something of type e -> Exception e x -> Exception e x, just like with write. But that type uses a continuation to say how to proceed after writing. We don't want there to be any continuation, we just want to bomb out with the thrown value. So the type we need is actually e -> Exception e x.


> data Exception e x where
> Throw :: e -> Exception e x
> EValue :: x -> Exception e x


You might recognise that. It's just Either. But we can go through the motions anyway:


> exception_interpret (Throw e) = Left e
> exception_interpret (EValue x) = Right x

> safediv x y = if y==0 then Throw "SIGFPE" else EValue (x `div` y)
> throw_example1 = safediv 1 0
> go8 = exception_interpret throw_example1

> efold :: (e -> w) -> (x -> w) -> Exception e x -> w
> efold throw evalue (Throw e) = throw e
> efold throw evalue (EValue x) = evalue x

> exception_interpret' = efold Left Right

> throw = Left
> evalue = Right


On its own that's not a very useful example. Still, it gives a warmup for a much more interesting control-flow affecting example. Suppose we want to implement the imperative function fork :: [u] -> u that forks one process for each element of the list with each fork getting that element as the return value. So let x = fork [1,2,3] would result in three threads, one with x=1, one with x=2 and one with x=3.


> data ManyWorlds x where
> MValue :: x -> ManyWorlds x


The following line follows directly by applying the principle I described earlier to fork :: [u] -> u:


> Fork :: [u] -> (u -> ManyWorlds x) -> ManyWorlds x


Now we need an interpreter. The state of the system at any time is a list of values, one for each thread. So we can use


> manyworlds_interpret (MValue x) = [x]


When we fork, we want to apply the continuation to each element of the list. We'll then have a list of lists of threads which we collect together into a single list using concat.


> manyworlds_interpret (Fork as cont) = concat (map manyworlds_interpret (map cont as))

> fork_example1 =
> Fork [1,2,3] $ \x ->
> Fork [10,20,30] $ \y ->
> MValue (x+y)

> go9 = manyworlds_interpret fork_example1


Again we can write this as a fold:


> mfold :: (x -> w) -> (forall u. [u] -> (u -> w) -> w) -> ManyWorlds x -> w
> mfold value fork (MValue x) = value x
> mfold value fork (Fork xs u) = fork xs (mfold value fork . u)

> manyworlds_interpret' = mfold value fork where
> value u = [u]
> fork us cont = concatMap cont us


And the interpreter-free version is:


> fork xs cont = concatMap cont xs
> mvalue u = [u]


I hope that looks familiar!

Here's a recursive example to generate the cartesian product of a bunch of lists:


> prod [] = [[]]
> prod (x:xs) = fork x $ \u ->
> fork (prod xs) $ \v ->
> mvalue (u:v)

> fork_example2 = prod [[1,2],[3,4],[10,20]]


Abstract Nonsense
This section is just to give a taste of some deeper stuff going on.

The problem to solve is this: given two side-effectful "extensions" like those above, how to you combine them and still keep functional purity?

As above, given just the signatures of the "functions" in our extensions to Haskell we can build a datatype. These signatures form part of the description of an algebraic theory. The other part is a set of equations, something I skipped above. For example, if I had talked about State then we might have an equation stating that if we do a put, followed by two gets, then the gets would both "get" the same value. With a bunch of signatures and equations, the corresponding type, and even a monad, follow automatically. (You can probably see the monads lurking behind all of my examples above.)

Now, when we write two monads, there is no automatic way to combine them. We have to instead write monad transformers and there is no theory for how to automatically construct them.

On the other hand, there are ways to automatically combine algebraic theories (and hence the monads derived from them). So the latter approach seems like it could be much more powerful. The theory is the primary thing, the monad is just something derived from it.

But the catch is that when I said "automatically" I meant in the sense that there is a precise mathematical description. It's not clear to me yet how to turn the entire process into something that a compiler could perform for you.

And the goal isn't really to eliminate monads. In fact, do-notation is a great way to work and gives nicer looking code than my examples. It's just that maybe we need to be thinking about the theories first and monads second.


You are a protocol droid, are you not?
Anyway, on a tangent, here's a bit of code I spun off from the above that runs as a self-contained example in its own .hs file. I tweaked read and write for the IO case by using the constructor (,) to construct heterogeneous lists instead of using (:) to construct homogeneous lists. I then feed the outputs of a pair of IO expressions into each other as inputs. The result is a type-safe way to implement simple imperative looking coroutines in a couple of lines of code. Good for playing with protocols as a opposed to algorithms:


import Prelude hiding (read)

read f (c,cs) = f c cs
write o r is = let (v, os) = r is in (v, (o,os))
value x is = (x, ())

-- Diffie-Hellman key exchange

p = 23
g = 5

alice =
write "Hello" $
let a = 6 in
write (g^a `mod` p) $
read $ \b ->
let s = b^a `mod` p in
value s

bob =
read $ \message ->
let b = 15 in
write (g^b `mod` p) $
read $ \a ->
let s = a^b `mod` p in
value s

(alice_value, alice_output) = alice bob_output
(bob_value, bob_output ) = bob alice_output

go = print (bob_value,alice_value)




Wednesday, November 04, 2009

Memoizing Polymorphic Functions with High School Algebra and Quantifiers

A little while back Conal Elliott asked about the memoization of polymorphic types. I thought it'd be fun to describe how to memoize such functions in the same spirit as Ralph Hinze's use of tries to memoize non-polymorphic functions. Along the way I'll try to give a brief introduction to quantified types in Haskell as well as showing some applications of the Yoneda lemma at work.

You can think of a generalized trie for a function type T as a type that's isomorphic to T but doesn't have an arrow '->' anywhere in its definition. It's something that contains all the same information as a function, but as a data structure rather than as a function. Hinze showed how to construct these by using the high school algebra axioms on non-polymorphic types. Polymorphic types are types involving quantification. So to make suitable tries for these we need to add some rules for handling quantifiers to high school algebra.

At first it seems unlikely that we could memoize polymorphic types. When Hinze demonstrated how to construct generalized tries he showed how to make a tree structure that was tailored to the specific types in hand. With polymorphic functions we don't know what types we'll be dealing with, so we need a one-size fits all type. That sounds impossible, but it's not.

The first thing we need to look at is universal quantification. Suppose F(a) is a type expression involving the symbol a. Then the type ∀a.F(a) can be thought of as being a bit like the product of F(a) for all possible values of a. So ∀a.F(a) is a bit like the imaginary infinite tuple

data Forall f a = (f Bool, f Int, f Char, f String, f [Bool], f (IO (Int -> Char)), ...)

One reason you can think of it this way is that all of the projections exist. So for any type you choose, say B, there is a function (∀a.F(a)) -> F(B). In Haskell the ∀ is written as forall and probably the best known example is the Haskell id function of type forall a. a -> a. For any concrete type B, id gives us a function of type B -> B. Note that we usually write the signature simply as a -> a. Haskell implicitly prepends a forall for every free variable in a type. We have to use the following line of code if we want to be able to use forall explicitly (among other things):


> {-# LANGUAGE RankNTypes, ExistentialQuantification, EmptyDataDecls #-}


I'll approach the tries through a series of propositions. So here's our first one:

Proposition 1
∀a. a = 0

0 is the type with no elements. ∀a. a is a type that can give us an object of type B for any B. There is no way to to this. How could such a function manufacture an element of B for any B with nothing to work from? It would have to work even for types that haven't been defined yet. (By the way, do you notice a similarity with the axiom of choice?) So ∀a. a is the type with no elements. Here's the usual way to write the type with no elements:


> data Void


We also have:
Proposition 2
∀a. aa = 1

If we have a function of type forall a. a -> a then for any element of type a you give it, it can give you back an element of type a. There's only one way to do this - it must give you back what you gave it. It can't transform that element in any way because there is no uniform transformation you could write that works for all values of a. So ∀a. aa has one element, id.

A slightly more interesting proposition is this:
Proposition 3
∀a. aa.a = 2

A function of type (a,a) -> a gives you an a when you give it a pair of a's. As we don't know in advance what type a will be we can't write code that examines a in any way. So a function of this type must return one of the pair, and which one it returns can't depend on the value of the argument. So there are only two functions of this type, fst and snd.

We can rewrite the last proposition as ∀a. aa2 = 2. That suggests that maybe ∀a. aan = n for any type n. We can go one better. Here's another proposition:

Proposition 4
For any functor F and type n, ∀a. F(a)an = F(n)

I've already talked about that result. Here's an implementation of the isomorphisms:


> yoneda :: (forall b . (a -> b) -> f b) -> f a
> yoneda t = t id

> yoneda' :: Functor f => f a -> (forall b . (a -> b) -> f b)
> yoneda' a f = fmap f a


Throughtout this article I'll use the convention that if f is an isomorphism, f' is its inverse.

Now it's time to look at a kind of dual of the above propositions. Instead of universal quantification we'll consider existential quantification. The type ∃a.F(a) is a kind of infinite sum of all types of the form F(a). We can imagine it being a bit like the following definition:

data Exist f a = ABool (f Bool) | AnInt (f Int) | AChar (f Char) | AString (f String) | AListOfBool (f [Bool]) ...

The important point is that given any element of any type we can turn it into an element of ∃a.F(a). You'd think that we could write this in Haskell as exists a. F(a) but unfortunately Haskell does things differently. The idea behind the notation is this: as we can put anything of type F(b) into it. So if X = ∃a.F(a) then we have a function F(a) -> X for any a. So we have a function of type ∀a. F(a) -> X. So although this type is existentially quantified, its constructor is universally quantified. We tell Haskell to make a type existentially quantified by telling it the constructor is universally quantified:


> data Exist f a = forall a. Exist (f a)


You can think of Exist as not being a single constructor, but an infinite family of constructors, like ABool, AnInt, etc. above.

If you have an element of an ordinary non-polymorphic algebraic sum type then the only way you can do stuff to it is to apply case analysis. To do something with an existential type means you have to perform a kind of infinite case analysis. So to do something with an element of ∃a. F(a) you need to provide an infinite family of functions, one for each possible type. In other words, you need to apply a function of type ∀a. F(a) →B to it.

Time for another proposition:

Proposition 5
∃a. a = 1

It seems weird at first that the sum of all types is 1. But once you put something into this type, you can no longer get any information about it back out again. If you try doing case analysis you have to provide a polymorphic function that accepts an argument of type ∀a. a, which is as good as saying you can't do any case analysis. Proposition 5 is actually a special case of the following:

Proposition 6
For any functor f, ∃a. (na, f(a)) = f(n)

Briefly, the reason this is that the only thing you can do with a matching pair of na and f(a) is apply the former to the latter using fmap. This is a kind of dual to the Yoneda lemma and I say more about it here.

We already know from high school algebra that this is true:

Proposition 7
xy+z=xy.xz.

We can write the isomorphisms explicitly:


> prop7 :: (Either a b -> c) -> (a -> c, b -> c)
> prop7 f = (f . Left, f . Right)

> prop7' :: (a -> c, b -> c) -> Either a b -> c
> prop7' = uncurry either


It should be no surprise that the following 'infinite' version is true as well:

Proposition 8
x∃a. f(a) = ∀a. xf(a).

We can write the isomorphism directly:


> prop8 :: (Exist f a -> x) -> forall a. f a -> x
> prop8 g x = g (Exist x)
> prop8' :: (forall a. f a -> x) -> Exist f a -> x
> prop8' g (Exist x) = g x


We're now equipped to start constructing generalized tries for polymorphic functions. So let's consider memoizing the type forall a. [a] -> f a, for f a functor. At first this looks hard. We have to memoize a function that can take as argument a list of any type. How can we build a trie if we don't know anything in advance about the type of a? The solution is straightforward. We follow Hinze in applying a bit of high school algebra along with some of the propositions from above. By definition, L(a) = [a] is a solution to the equation L(a) = 1+a.L(a). So we want to simplify ∀a. f(a)L(a). We have

f(a)L(a) = f(a)1+a.L(a) = f(a).f(a)a.L(a) = f(a).f(a)a+a2.L(a) = f(a).f(a)a.f(a)a2.L(a)

I hope you can see a bit of a pattern forming. Let's define T(n) = f(a)an.L(a). Then

T(n) = f(a)an.(1+a.L(a)) = f(a)an.T(n+1) = f(n).T(n+1)

That's it! We can translate this definition directly into Haskell.


> data T f n = T (f n) (T f (Maybe n))


I'm using the fact that Maybe n is standard Haskell for the type n+1. (But note that this equality is only valid when we think of the list type as data, not codata. So like with Hinze's original tries, values at infinite lists don't get memoized.)

To build the isomorphism we need to trace through the steps in the derivation. At one point we used an+a1+n.L(a) = an.L(a) which we can implement as the pair of isomorphisms:


> lemma :: Either (n -> a) (Maybe n -> a, [a]) -> (n -> a, [a])
> lemma (Left f) = (f, [])
> lemma (Right (f, xs)) = (\n -> f (Just n),f Nothing : xs)

> lemma' :: (n -> a, [a]) -> Either (n -> a) (Maybe n -> a, [a])
> lemma' (f, []) = Left f
> lemma' (f, x:xs) = Right (maybe x f, xs)


We can put the other steps together with this to give:


> memoize :: Functor f => (forall a. (n -> a, [a]) -> f a) -> T f n
> memoize f = let x = prop7 (f . lemma)
> in T (yoneda (fst x)) (memoize (snd x))

> memoize' :: Functor f => T f n -> forall a. (n -> a, [a]) -> f a
> memoize' (T a b) = let y = (yoneda' a, memoize' b)
> in prop7' y . lemma'


Let's try a few examples. I'll use the identity functor for the first example.


> data I a = I a deriving Show
> instance Functor I where
> fmap f (I a) = I (f a)


Here's our first test function and some data to try it on:


> test1 (f, xs) = I $ if length xs>0 then head xs else f ()
> data1 = (const 1,[2,3,4])


In data1 we'e using a function to represent a kind of 'head' before the main list. For the next example we're leaving the first element of the pair undefined so that data2 is effectively of list type:


> test2 (f, xs) = reverse xs
> data2 = (undefined,[1..10])


We can test them by building the memoized versions of these functions.


> memo1 = memoize test1 :: T I ()
> memo2 = memoize test2 :: T [] Void


and then apply them


> ex1 = memoize' memo1 data1
> ex2 = memoize' memo2 data2


It appears to work!

So what's actually going on? We have

T(0) = f(0).T(1) = f(0).f(1).T(2) = ... = f(0).f(1).f(2).f(3)...


Now consider a function g :: [a] -> f a applied to a list. If the list isn't an infinite stream then it must have a certain length, say n. From these elements it builds something of type f a. However this f a is constructed, each of the elements of type a in it must be constructed from one of the n elements in the list. So if we apply g to the list [0,1,2,...,n-1] it will construct an element of f a where each a in it contains a label saying which position in the list it came from. (Compare with Neel's comment here). If we use integers we don't get a perfect trie because there are more elements of type f Integer than there are ways to indicate source positions. What we need is that for each length of list, n, we have a type with precisely n elements. And that's what the type n gives us.

We can memoize many different functions this way, though if the functor f is a function type you'll need to use some of Hinze's techniques to eliminate them. And you'll notice I haven't used all of the propositions above. I've tried to give some extra tools to allow people to memoize more types than just my two examples.

One last thing: I wouldn't use the type above to memoize in a real world application. But the methods above could be used to derive approximate tries that are efficient. One obvious example of an approximation would be to use Int instead of the finite types.


Update: I forgot to provide one very important link. This post was inspired by Thorsten Altenkirch's post here.