Memoizing Polymorphic Functions with High School Algebra and Quantifiers
A little while back Conal Elliott asked about the memoization of polymorphic types. I thought it'd be fun to describe how to memoize such functions in the same spirit as Ralph Hinze's use of tries to memoize non-polymorphic functions. Along the way I'll try to give a brief introduction to quantified types in Haskell as well as showing some applications of the Yoneda lemma at work.
You can think of a generalized trie for a function type T as a type that's isomorphic to T but doesn't have an arrow '->' anywhere in its definition. It's something that contains all the same information as a function, but as a data structure rather than as a function. Hinze showed how to construct these by using the high school algebra axioms on non-polymorphic types. Polymorphic types are types involving quantification. So to make suitable tries for these we need to add some rules for handling quantifiers to high school algebra.
At first it seems unlikely that we could memoize polymorphic types. When Hinze demonstrated how to construct generalized tries he showed how to make a tree structure that was tailored to the specific types in hand. With polymorphic functions we don't know what types we'll be dealing with, so we need a one-size fits all type. That sounds impossible, but it's not.
The first thing we need to look at is universal quantification. Suppose F(a) is a type expression involving the symbol a. Then the type ∀a.F(a) can be thought of as being a bit like the product of F(a) for all possible values of a. So ∀a.F(a) is a bit like the imaginary infinite tuple
One reason you can think of it this way is that all of the projections exist. So for any type you choose, say B, there is a function (∀a.F(a)) -> F(B). In Haskell the ∀ is written as forall and probably the best known example is the Haskell id function of type forall a. a -> a. For any concrete type B, id gives us a function of type B -> B. Note that we usually write the signature simply as a -> a. Haskell implicitly prepends a forall for every free variable in a type. We have to use the following line of code if we want to be able to use forall explicitly (among other things):
I'll approach the tries through a series of propositions. So here's our first one:
Proposition 1
∀a. a = 0
0 is the type with no elements. ∀a. a is a type that can give us an object of type B for any B. There is no way to to this. How could such a function manufacture an element of B for any B with nothing to work from? It would have to work even for types that haven't been defined yet. (By the way, do you notice a similarity with the axiom of choice?) So ∀a. a is the type with no elements. Here's the usual way to write the type with no elements:
We also have:
Proposition 2
∀a. aa = 1
If we have a function of type forall a. a -> a then for any element of type a you give it, it can give you back an element of type a. There's only one way to do this - it must give you back what you gave it. It can't transform that element in any way because there is no uniform transformation you could write that works for all values of a. So ∀a. aa has one element, id.
A slightly more interesting proposition is this:
Proposition 3
∀a. aa.a = 2
A function of type (a,a) -> a gives you an a when you give it a pair of a's. As we don't know in advance what type a will be we can't write code that examines a in any way. So a function of this type must return one of the pair, and which one it returns can't depend on the value of the argument. So there are only two functions of this type, fst and snd.
We can rewrite the last proposition as ∀a. aa2 = 2. That suggests that maybe ∀a. aan = n for any type n. We can go one better. Here's another proposition:
Proposition 4
For any functor F and type n, ∀a. F(a)an = F(n)
I've already talked about that result. Here's an implementation of the isomorphisms:
Throughtout this article I'll use the convention that if f is an isomorphism, f' is its inverse.
Now it's time to look at a kind of dual of the above propositions. Instead of universal quantification we'll consider existential quantification. The type ∃a.F(a) is a kind of infinite sum of all types of the form F(a). We can imagine it being a bit like the following definition:
The important point is that given any element of any type we can turn it into an element of ∃a.F(a). You'd think that we could write this in Haskell as exists a. F(a) but unfortunately Haskell does things differently. The idea behind the notation is this: as we can put anything of type F(b) into it. So if X = ∃a.F(a) then we have a function F(a) -> X for any a. So we have a function of type ∀a. F(a) -> X. So although this type is existentially quantified, its constructor is universally quantified. We tell Haskell to make a type existentially quantified by telling it the constructor is universally quantified:
You can think of Exist as not being a single constructor, but an infinite family of constructors, like ABool, AnInt, etc. above.
If you have an element of an ordinary non-polymorphic algebraic sum type then the only way you can do stuff to it is to apply case analysis. To do something with an existential type means you have to perform a kind of infinite case analysis. So to do something with an element of ∃a. F(a) you need to provide an infinite family of functions, one for each possible type. In other words, you need to apply a function of type ∀a. F(a) →B to it.
Time for another proposition:
Proposition 5
∃a. a = 1
It seems weird at first that the sum of all types is 1. But once you put something into this type, you can no longer get any information about it back out again. If you try doing case analysis you have to provide a polymorphic function that accepts an argument of type ∀a. a, which is as good as saying you can't do any case analysis. Proposition 5 is actually a special case of the following:
Proposition 6
For any functor f, ∃a. (na, f(a)) = f(n)
Briefly, the reason this is that the only thing you can do with a matching pair of na and f(a) is apply the former to the latter using fmap. This is a kind of dual to the Yoneda lemma and I say more about it here.
We already know from high school algebra that this is true:
Proposition 7
xy+z=xy.xz.
We can write the isomorphisms explicitly:
It should be no surprise that the following 'infinite' version is true as well:
Proposition 8
x∃a. f(a) = ∀a. xf(a).
We can write the isomorphism directly:
We're now equipped to start constructing generalized tries for polymorphic functions. So let's consider memoizing the type forall a. [a] -> f a, for f a functor. At first this looks hard. We have to memoize a function that can take as argument a list of any type. How can we build a trie if we don't know anything in advance about the type of a? The solution is straightforward. We follow Hinze in applying a bit of high school algebra along with some of the propositions from above. By definition, L(a) = [a] is a solution to the equation L(a) = 1+a.L(a). So we want to simplify ∀a. f(a)L(a). We have
I hope you can see a bit of a pattern forming. Let's define T(n) = f(a)an.L(a). Then
That's it! We can translate this definition directly into Haskell.
I'm using the fact that Maybe n is standard Haskell for the type n+1. (But note that this equality is only valid when we think of the list type as data, not codata. So like with Hinze's original tries, values at infinite lists don't get memoized.)
To build the isomorphism we need to trace through the steps in the derivation. At one point we used an+a1+n.L(a) = an.L(a) which we can implement as the pair of isomorphisms:
We can put the other steps together with this to give:
Let's try a few examples. I'll use the identity functor for the first example.
Here's our first test function and some data to try it on:
In data1 we'e using a function to represent a kind of 'head' before the main list. For the next example we're leaving the first element of the pair undefined so that data2 is effectively of list type:
We can test them by building the memoized versions of these functions.
and then apply them
It appears to work!
So what's actually going on? We have
Now consider a function g :: [a] -> f a applied to a list. If the list isn't an infinite stream then it must have a certain length, say n. From these elements it builds something of type f a. However this f a is constructed, each of the elements of type a in it must be constructed from one of the n elements in the list. So if we apply g to the list [0,1,2,...,n-1] it will construct an element of f a where each a in it contains a label saying which position in the list it came from. (Compare with Neel's comment here). If we use integers we don't get a perfect trie because there are more elements of type f Integer than there are ways to indicate source positions. What we need is that for each length of list, n, we have a type with precisely n elements. And that's what the type n gives us.
We can memoize many different functions this way, though if the functor f is a function type you'll need to use some of Hinze's techniques to eliminate them. And you'll notice I haven't used all of the propositions above. I've tried to give some extra tools to allow people to memoize more types than just my two examples.
One last thing: I wouldn't use the type above to memoize in a real world application. But the methods above could be used to derive approximate tries that are efficient. One obvious example of an approximation would be to use Int instead of the finite types.
Update: I forgot to provide one very important link. This post was inspired by Thorsten Altenkirch's post here.
You can think of a generalized trie for a function type T as a type that's isomorphic to T but doesn't have an arrow '->' anywhere in its definition. It's something that contains all the same information as a function, but as a data structure rather than as a function. Hinze showed how to construct these by using the high school algebra axioms on non-polymorphic types. Polymorphic types are types involving quantification. So to make suitable tries for these we need to add some rules for handling quantifiers to high school algebra.
At first it seems unlikely that we could memoize polymorphic types. When Hinze demonstrated how to construct generalized tries he showed how to make a tree structure that was tailored to the specific types in hand. With polymorphic functions we don't know what types we'll be dealing with, so we need a one-size fits all type. That sounds impossible, but it's not.
The first thing we need to look at is universal quantification. Suppose F(a) is a type expression involving the symbol a. Then the type ∀a.F(a) can be thought of as being a bit like the product of F(a) for all possible values of a. So ∀a.F(a) is a bit like the imaginary infinite tuple
data Forall f a = (f Bool, f Int, f Char, f String, f [Bool], f (IO (Int -> Char)), ...)
One reason you can think of it this way is that all of the projections exist. So for any type you choose, say B, there is a function (∀a.F(a)) -> F(B). In Haskell the ∀ is written as forall and probably the best known example is the Haskell id function of type forall a. a -> a. For any concrete type B, id gives us a function of type B -> B. Note that we usually write the signature simply as a -> a. Haskell implicitly prepends a forall for every free variable in a type. We have to use the following line of code if we want to be able to use forall explicitly (among other things):
> {-# LANGUAGE RankNTypes, ExistentialQuantification, EmptyDataDecls #-}
I'll approach the tries through a series of propositions. So here's our first one:
Proposition 1
∀a. a = 0
0 is the type with no elements. ∀a. a is a type that can give us an object of type B for any B. There is no way to to this. How could such a function manufacture an element of B for any B with nothing to work from? It would have to work even for types that haven't been defined yet. (By the way, do you notice a similarity with the axiom of choice?) So ∀a. a is the type with no elements. Here's the usual way to write the type with no elements:
> data Void
We also have:
Proposition 2
∀a. aa = 1
If we have a function of type forall a. a -> a then for any element of type a you give it, it can give you back an element of type a. There's only one way to do this - it must give you back what you gave it. It can't transform that element in any way because there is no uniform transformation you could write that works for all values of a. So ∀a. aa has one element, id.
A slightly more interesting proposition is this:
Proposition 3
∀a. aa.a = 2
A function of type (a,a) -> a gives you an a when you give it a pair of a's. As we don't know in advance what type a will be we can't write code that examines a in any way. So a function of this type must return one of the pair, and which one it returns can't depend on the value of the argument. So there are only two functions of this type, fst and snd.
We can rewrite the last proposition as ∀a. aa2 = 2. That suggests that maybe ∀a. aan = n for any type n. We can go one better. Here's another proposition:
Proposition 4
For any functor F and type n, ∀a. F(a)an = F(n)
I've already talked about that result. Here's an implementation of the isomorphisms:
> yoneda :: (forall b . (a -> b) -> f b) -> f a
> yoneda t = t id
> yoneda' :: Functor f => f a -> (forall b . (a -> b) -> f b)
> yoneda' a f = fmap f a
Throughtout this article I'll use the convention that if f is an isomorphism, f' is its inverse.
Now it's time to look at a kind of dual of the above propositions. Instead of universal quantification we'll consider existential quantification. The type ∃a.F(a) is a kind of infinite sum of all types of the form F(a). We can imagine it being a bit like the following definition:
data Exist f a = ABool (f Bool) | AnInt (f Int) | AChar (f Char) | AString (f String) | AListOfBool (f [Bool]) ...
The important point is that given any element of any type we can turn it into an element of ∃a.F(a). You'd think that we could write this in Haskell as exists a. F(a) but unfortunately Haskell does things differently. The idea behind the notation is this: as we can put anything of type F(b) into it. So if X = ∃a.F(a) then we have a function F(a) -> X for any a. So we have a function of type ∀a. F(a) -> X. So although this type is existentially quantified, its constructor is universally quantified. We tell Haskell to make a type existentially quantified by telling it the constructor is universally quantified:
> data Exist f a = forall a. Exist (f a)
You can think of Exist as not being a single constructor, but an infinite family of constructors, like ABool, AnInt, etc. above.
If you have an element of an ordinary non-polymorphic algebraic sum type then the only way you can do stuff to it is to apply case analysis. To do something with an existential type means you have to perform a kind of infinite case analysis. So to do something with an element of ∃a. F(a) you need to provide an infinite family of functions, one for each possible type. In other words, you need to apply a function of type ∀a. F(a) →B to it.
Time for another proposition:
Proposition 5
∃a. a = 1
It seems weird at first that the sum of all types is 1. But once you put something into this type, you can no longer get any information about it back out again. If you try doing case analysis you have to provide a polymorphic function that accepts an argument of type ∀a. a, which is as good as saying you can't do any case analysis. Proposition 5 is actually a special case of the following:
Proposition 6
For any functor f, ∃a. (na, f(a)) = f(n)
Briefly, the reason this is that the only thing you can do with a matching pair of na and f(a) is apply the former to the latter using fmap. This is a kind of dual to the Yoneda lemma and I say more about it here.
We already know from high school algebra that this is true:
Proposition 7
xy+z=xy.xz.
We can write the isomorphisms explicitly:
> prop7 :: (Either a b -> c) -> (a -> c, b -> c)
> prop7 f = (f . Left, f . Right)
> prop7' :: (a -> c, b -> c) -> Either a b -> c
> prop7' = uncurry either
It should be no surprise that the following 'infinite' version is true as well:
Proposition 8
x∃a. f(a) = ∀a. xf(a).
We can write the isomorphism directly:
> prop8 :: (Exist f a -> x) -> forall a. f a -> x
> prop8 g x = g (Exist x)
> prop8' :: (forall a. f a -> x) -> Exist f a -> x
> prop8' g (Exist x) = g x
We're now equipped to start constructing generalized tries for polymorphic functions. So let's consider memoizing the type forall a. [a] -> f a, for f a functor. At first this looks hard. We have to memoize a function that can take as argument a list of any type. How can we build a trie if we don't know anything in advance about the type of a? The solution is straightforward. We follow Hinze in applying a bit of high school algebra along with some of the propositions from above. By definition, L(a) = [a] is a solution to the equation L(a) = 1+a.L(a). So we want to simplify ∀a. f(a)L(a). We have
f(a)L(a) = f(a)1+a.L(a) = f(a).f(a)a.L(a) = f(a).f(a)a+a2.L(a) = f(a).f(a)a.f(a)a2.L(a)
I hope you can see a bit of a pattern forming. Let's define T(n) = f(a)an.L(a). Then
T(n) = f(a)an.(1+a.L(a)) = f(a)an.T(n+1) = f(n).T(n+1)
That's it! We can translate this definition directly into Haskell.
> data T f n = T (f n) (T f (Maybe n))
I'm using the fact that Maybe n is standard Haskell for the type n+1. (But note that this equality is only valid when we think of the list type as data, not codata. So like with Hinze's original tries, values at infinite lists don't get memoized.)
To build the isomorphism we need to trace through the steps in the derivation. At one point we used an+a1+n.L(a) = an.L(a) which we can implement as the pair of isomorphisms:
> lemma :: Either (n -> a) (Maybe n -> a, [a]) -> (n -> a, [a])
> lemma (Left f) = (f, [])
> lemma (Right (f, xs)) = (\n -> f (Just n),f Nothing : xs)
> lemma' :: (n -> a, [a]) -> Either (n -> a) (Maybe n -> a, [a])
> lemma' (f, []) = Left f
> lemma' (f, x:xs) = Right (maybe x f, xs)
We can put the other steps together with this to give:
> memoize :: Functor f => (forall a. (n -> a, [a]) -> f a) -> T f n
> memoize f = let x = prop7 (f . lemma)
> in T (yoneda (fst x)) (memoize (snd x))
> memoize' :: Functor f => T f n -> forall a. (n -> a, [a]) -> f a
> memoize' (T a b) = let y = (yoneda' a, memoize' b)
> in prop7' y . lemma'
Let's try a few examples. I'll use the identity functor for the first example.
> data I a = I a deriving Show
> instance Functor I where
> fmap f (I a) = I (f a)
Here's our first test function and some data to try it on:
> test1 (f, xs) = I $ if length xs>0 then head xs else f ()
> data1 = (const 1,[2,3,4])
In data1 we'e using a function to represent a kind of 'head' before the main list. For the next example we're leaving the first element of the pair undefined so that data2 is effectively of list type:
> test2 (f, xs) = reverse xs
> data2 = (undefined,[1..10])
We can test them by building the memoized versions of these functions.
> memo1 = memoize test1 :: T I ()
> memo2 = memoize test2 :: T [] Void
and then apply them
> ex1 = memoize' memo1 data1
> ex2 = memoize' memo2 data2
It appears to work!
So what's actually going on? We have
T(0) = f(0).T(1) = f(0).f(1).T(2) = ... = f(0).f(1).f(2).f(3)...
Now consider a function g :: [a] -> f a applied to a list. If the list isn't an infinite stream then it must have a certain length, say n. From these elements it builds something of type f a. However this f a is constructed, each of the elements of type a in it must be constructed from one of the n elements in the list. So if we apply g to the list [0,1,2,...,n-1] it will construct an element of f a where each a in it contains a label saying which position in the list it came from. (Compare with Neel's comment here). If we use integers we don't get a perfect trie because there are more elements of type f Integer than there are ways to indicate source positions. What we need is that for each length of list, n, we have a type with precisely n elements. And that's what the type n gives us.
We can memoize many different functions this way, though if the functor f is a function type you'll need to use some of Hinze's techniques to eliminate them. And you'll notice I haven't used all of the propositions above. I've tried to give some extra tools to allow people to memoize more types than just my two examples.
One last thing: I wouldn't use the type above to memoize in a real world application. But the methods above could be used to derive approximate tries that are efficient. One obvious example of an approximation would be to use Int instead of the finite types.
Update: I forgot to provide one very important link. This post was inspired by Thorsten Altenkirch's post here.
9 Comments:
Great work! I will still need some time to fully digest this but...
Just a quick remark, in your definition of Exists:
data Exists f a = forall a. Exists (f a)
The type parameter 'a' is useless and misleading since it is different
from the forall one. I suggest:
data Exists f = forall a. Exists (f a)
What I don't understand is that you want to memoize forall a. [a] -> f a, and end up with memoization for forall a. (n -> a, [a]) -> f a
Sjoerd,
Maybe I should have said ir more clearly, but for n=0, (n->a,[a]) -> f a is the same as [a]. I could add a line to exhibit that isomorphism.
Or maybe you meant the more philosophical question of "why when looking to memoize [a] -> f a are we led to a whole family of types?". Which I'm not sure how to answer.
Is there a reason why you don't include the (Functor f =>) constraint in the type of yoneda? It would make yoneda less general, but on the other hand would make it more symmetric with its inverse yoneda'.
Shin no Noir,
The constraint is missing only because I could accidentally get away with it without the compiler complaining.
I also think that a n=0 version would help understanding better the solution.
> type T0 f = T f Void
> voidElim ∷ Void → ∀ a· a
> voidElim _ = error "voidElim: impossible"
> memoize0 ∷ Functor f ⇒ (∀ a· [a] → f a) → T0 f
> memoize0 f = memoize (λ(_ , xs) → f xs)
> memoize0' ∷ Functor f ⇒ T0 f → ∀ a· [a] → f a
> memoize0' t xs = memoize' t (voidElim, xs)
ertai,
Thanks for that.
It makes me grumpy when the Haskell compiler forces me to implement a function of type voidElim ∷ Void → ∀ a· a when it seems obvious to me that a complete implementation should take zero lines of code. :-)
This was great and challenging. I followed the link back to your old post about data and codata and a few more lights turned on upstairs. Kind of like skiing a blue run after a black. Thanks for continuing this blog!
Dear Author blog.sigfpe.com !
It you have correctly told :)
Post a Comment
<< Home