Saturday, September 08, 2007

Tries and their Derivatives

This was going to be part two of three, but I'm now squashing two parts down to this one article. So this is the conclusion to what I started in my previous installment.

Previously I talked about certain things you could do with types built up recursively using addition and multiplication of types - the so-called regular tree types. But I didn't talk about types that could be built using ->, ie. function types. These types seem somehow different. The type Either a b contains data pertaining to an a or a b, and (a,b) contains data about an a and a b. But a -> b is something completely different, a function rather than a datastructure. What I want to show is that this distinction isn't so clear, and that in fact functions can often be replaced by an isomorphic datastructure. There's an obvious application for this: memoisation, and most of what I say can be found in a paper by Ralf Hinze on that subject. However, I have an ulterior motive - computing generalised antidiagonals, and my notation will be biased accordingly.

Firstly, I'll be using conventional mathematical exponentiation notation for function types, so I'll use AB to mean the same thing as B -> A, ie. functions mapping from B to A. This will help to make some of the algebra seem more natural.

So let's start with a type like A -> (). We could also write this as 1A. From ordinary algebra we might expect this to equal 1. Sure enough, there is (up to pointwise equality) only one function of type A -> (), and it's called const ().

Now consider AB+C. Algebraically this is just ABAC. In fact, the isomorphism is given by the standard Haskell either function. So far we can take any type B, built from singletons and addition, and reduce AB to an isomorphic type with no exponentiation. Our goal will be to generalise this - for any type B (well...as many as possible anyway) to find a type constructor T such that T[A] = AB, and in such a way that T is defined without exponentiation. Such a T is called a (generalised) trie. Like we did for the antidiagonal we can express this through a multiparameter type class:


> class Trie a t | a -> t where
> tabulate :: (a -> b) -> t b
> apply :: t b -> (a -> b)


tabulate converts a function into a datastructure and apply performs the opposite.

Again we're going to have the same issues with the antidiagonal. Haskell can't build tries for us automatically, but various generic forms of Haskell can. I'll be using plain Haskell so we'll be doing a bit of unnecessary labour. (I did try Hinze's Generics for the Masses approach but couldn't get it to work in this context. I suspect it can be made to work with more effort.)

So let's use the same example as in my previous installment, Bool.


> data BoolTrie a = BoolTrie a a deriving Show

> instance Trie Bool BoolTrie where
> tabulate f = BoolTrie (f False) (f True)
> apply (BoolTrie f _) False = f
> apply (BoolTrie _ t) True = t


It's an easy exercise to show that apply and tabulate are mutual inverses. The BoolTrie stores the two possible values that a function Bool -> a could take. But here's the cool thing: given an expensive to compute function, f, of type Bool -> X, tabulate f is a datastructure that tells us everything we need to know about f. apply (tabulate f) gives us back our function again, but note how it never needs to call f more than once for each argument. In other words, we can define


> memo :: Trie a t => (a -> b) -> (a -> b)
> memo = apply . tabulate


which automatically converts functions into a memoised form.

Again, mimicking antidiagonals, let's implement the trie of Either a b:


> data EitherTrie u v a = EitherTrie (u a) (v a)

> instance (Trie a u,Trie b v) => Trie (Either a b) (EitherTrie u v) where
> apply (EitherTrie f _) (Left x) = apply f x
> apply (EitherTrie _ g) (Right y) = apply g y
> tabulate f = EitherTrie (tabulate (f . Left)) (tabulate (f . Right))



Next consider products. This is slightly subtler, but only slightly. We use ABC=(AC)B. If we define U[X]=XB and V[X]=XC then ABC=U[V[A]].


> data PairTrie u v a = PairTrie (u (v a))

> instance (Trie a u,Trie b v) => Trie (a,b) (PairTrie u v) where
> apply (PairTrie f) (b,c) = apply (apply f b) c
> tabulate f = PairTrie $ tabulate (\a -> tabulate (\b -> f (a,b)))


This all applies recursively. So let's try tackling boolean lists again. We have L = 1+2L. Define TL[X]=XL. So TL[X]=X1X2L=X(XL)2=X(TL[X])2. This gives a nice bona fide definition of TL. Just as with antidiagonals, it's annoying to have to write this by hand. A good generic programming language should be able to build a BoolTrie from a [Bool] automatically:


> data BoolListTrie a = BoolListTrie a (BoolListTrie a) (BoolListTrie a)

> instance Trie [Bool] BoolListTrie where
> apply (BoolListTrie n _ _) [] = n
> apply (BoolListTrie _ f _) (False:bs) = apply f bs
> apply (BoolListTrie _ _ t) (True:bs) = apply t bs
> tabulate f = BoolListTrie (f []) (tabulate (f . (False:))) (tabulate (f. (True:)))


Here's a silly example putting this in action on a binary version of the Ackermann function:


> inc (False : bs) = True : bs
> inc (True : bs) = False : inc bs
> inc [] = [True]

> dec [True] = []
> dec (True : bs) = False : bs
> dec (False : bs) = True : dec bs

> ack [] n = inc n
> ack m [] | not (null m) = ack (dec m) [True]
> ack m n = ack (dec m) (ack m (dec n))

> ack' = curry (memo (uncurry ack))


Note how once you've used ack' on some arguments, it runs instantly when you reuse it on the same or smaller arguments.


Part 2



Tries are containers. So suppose T is the trie corresponding to the type X with T[A] isomorphic to AX. What is T[1]? Well it's just 1X which from basic algebra equals 1. Or do look at it another way, it's the type of all functions taking values in the type 1. There's only one such function and so clearly there is a unique object of type T[1] and T[1] must be isomorphic to 1. An element of '1' can be thought of as a type with no information in it. So if we have a container with elements of 1 in it, it's as if the slots for those elements have simply been closed off. So another way to look at T[1] is that it is a container with all of its slots closed. So for tries, T[1] must be equivalent to the trivial type 1.

At this point, derivatives of containers have become mainstream. So I don't need to talk about this and can direct you to the Haskell wikibook if you need to brush up.

So what do we get if we differentiate the trie T? T is a container that holds one value for each element of X. The derivative, T', is a trie with a 'hole' in it. In other words, it's a container that contains a value for all elements of X but one. So what's T'[1]? All of the slots in the container have been stuffed with 1 and so are effectively 'closed'. But the hole is still there and the whole hasn't been 'closed'. So a T'[1] is a trie where one slot has been singled out as a hole. But the slots are in one-to-one correspondence with elements of X, and so T'[1]=X. There's another way to see this. Define the function on the reals t(y)=yx. Then t'(y)=xyx-1 so t'(1)=x. So it should come as no surprise that T'[1]=X. See the Container Types blog for more dicussion of this. They call F'[1] by the name log F. Their Y can be seen as the operator that maps a type to its trie.

Now, let's go back to the antidiagonal again. I showed how to compute X2, but you can see that it's tricky to use the same approach to extend this to Xn, for arbitrary naturals n. Instead, consider differentiating T' to make another hole in the trie. T''[X] must be a trie of X's with two holes, but the second hole obviously can't be where the first hole was. So T''[1] is a datastructure that consists of nothing but two distinct holes in a T[1]. As if by magic, T''[1] must be X2. There's another way to see this. If t(y)=yx, then t''(y)=x(x-1)yx-2 so t''(1)=x2. And more generally, T(n)[1]=Xn. And that's the solution!

So a programming language powerful enough to differentiate types automatically, and powerful enough to build tries automatically (and Type-Indexed Data Types describes implementations of both of these in Generic Haskell) allows us to automatically construct the type of n-tuples where all of the elements are distinct. I'll leave the code to you as an exercise. :-)

Let me at least do the algebra for simple binary trees.

B = 1+B2

Let T be the trie of B so T[X] is isomorphic to XB.

XB = X(XB)B

T[X] = XT[T[B]]

(Exercise: try to draw a picture of this kind of tree. Because of the nesting of T's it's not a 'regular' type and you'll find yourself rapidly needing a lot of paper!).

So now we get

T'[X] = T[T[X]]+XT'[T[X]]T'[X]
T'[1] = T[1]+XT'[1]T'[1] = 1+X(T'[1])2

Ie. T'[1]=B

T''[X] = T'[T[X]]T'[X]+T'[T[X]]T'[X]+XT''[T[X]]T'[X]2
+XT'[T[X]]T''[X]
T''[1] = 2B2+B2T''[1]+BT''[1]

And that's a perfectly respectable recursive definition for T''[1]. I'll leave you to write the code. (It does work - this is actually the point which I started from...)


Some random thoughts



You can think of differentiation as annihilating an element of a container, leaving only a hole as the 'trace' that it was there before. So it's pretty weird that annihilating a 1 from a T[1] actually creates an X. It's almost as if T[1] is some kind of vacuum and X is an anti-particle. Weirder still, when we do this twice we get two distinct X's. So it's as if T[1] is some kind of vacuum for fermionic X's. This isn't completely insane. In quantum field theory we use differential operators to create and annihilate particles.

I haven't written the code, but it ought to be possible to go further and define Xn/n! for instances of Ord. This is the type of n-tuples where the elements are distinct and in order. I expect you can construct this using a similar recursive technique to the way I built up the antidiagonal.

I don't know how to get an overline with HTML so I can't use Knuth's notation for this. But I expect that for instances of Ord there's a similar method that can be used to define X(X+1)...(X+n-1)/n!. This is the type of non-decreasing n-tuples, or, equivalently, n-element multisets. I'm not sure, but I think this may conflict with what Abbott et al. say in their paper on Derivatives of Containers where they want to identify exp (X) with multisets. I think it should be the sum of X(X+1)...(X+n-1)/n!, not the sum of Xn/n!. But as that's just a sketch in the final paragraph of the paper, maybe I shouldn't get too worried about it. (If you're still reading Conor...)

Also, many identities satisfied by falling factorial should correspond to isomorphisms implementable in Haskell. And I'm sure there must be an Umbral Calculus connection.

And an important final note: X -> X2 is not a functor. So don't even think about differentiating it.


Appendix


Changed my mind. Here's some code for implementing X2/2!.

Call the set of pairs (a,a') with a<a' the subdiagonal. An element of (a,a) either has a lower first element, a lower second element, or the two elements are equal. This is expressed in the type Pair below.

I'll sketch some algebra. Write s(X) for the subdiagonal. So s(X) is a little like X(X-1)/2. It's not hard to see that

s(1) = 0
s(a+b) = s(a)+ab+s(b)

s(ab) = s(a)b2+as(b)

If L[X] is a list of X's, then

L[X]=1+X L[X]

s(L[X]) = XL[X]+s(X)L[X]+Xs(L[X])



> import Test.QuickCheck

> data Pair a u = Lower u | Diagonal a | Upper u deriving Show

> class Ord a => SubDiagonal a u | a -> u where
> twine :: (a,a) -> Pair a u
> untwine :: Pair a u -> (a,a)
> twine' :: (a,a) -> u
> untwine' :: u -> (a,a)
> twine (a,b) = case compare a b of
> LT -> Lower (twine' (a,b))
> EQ -> Diagonal a
> GT -> Upper (twine' (b,a))
> untwine (Lower u) = untwine' u
> untwine (Diagonal a) = (a,a)
> untwine (Upper u) = uncurry (flip (,)) $ untwine' u

> data SubEither a b u v = BothLeft u | Diff a b | BothRight v deriving Show

> instance (SubDiagonal a u,SubDiagonal b v) => SubDiagonal (Either a b) (SubEither a b u v) where
> twine' (Left a,Left a') = BothLeft (twine' (a,a'))
> twine' (Left a,Right b) = Diff a b
> twine' (Right b,Right b') = BothRight (twine' (b,b'))
> untwine' (BothLeft u) = let (a,a') = untwine' u in (Left a,Left a')
> untwine' (Diff a b) = (Left a,Right b)
> untwine' (BothRight u) = let (b,b') = untwine' u in (Right b,Right b')

> data SubPair a b u v = LeftDiffers u b b | LeftSame a v deriving Show

> instance (SubDiagonal a u,SubDiagonal b v) => SubDiagonal (a,b) (SubPair a b u v) where
> twine' ((a,b),(a',b')) | a/=a' = LeftDiffers (twine' (a,a')) b b'
> | otherwise = LeftSame a (twine' (b,b'))
> untwine' (LeftDiffers u b b') = let (a,a') = untwine' u in ((a,b),(a',b'))
> untwine' (LeftSame a v) = let (b,b') = untwine' v in ((a,b),(a,b'))

> instance SubDiagonal Bool () where
> twine' (False,True) = ()
> untwine' () = (False,True)

> data SubList a u = FirstDiffers u (Pair [a] (SubList a u))| FirstSame a (SubList a u) | LeftNil a [a] deriving Show

> instance (SubDiagonal a u) => SubDiagonal [a] (SubList a u) where
> twine' ([],(a:as)) = LeftNil a as
> twine' ((a:as),(b:bs)) | a/=b = FirstDiffers (twine' (a,b)) (twine (as,bs))
> | otherwise = FirstSame a (twine' (as,bs))
> untwine' (FirstDiffers u v) = let (a,a') = untwine' u in let (as,bs) = untwine v in (a:as,a':bs)
> untwine' (FirstSame a u) = let (bs,bs') = untwine' u in (a:bs,a:bs')
> untwine' (LeftNil b bs) = ([],b:bs)

> type Natural = Integer

> instance SubDiagonal Natural (Natural,Natural) where
> twine' (a,b) = (a,b-a)
> untwine' (a,b) = (a,a+b)

> main = do
> quickCheck (\(x,y) -> untwine (twine (x,y))==(x::Bool,y))
> quickCheck (\(x,y) -> untwine (twine (x,y))==(x::[Bool],y))
> quickCheck (\(x,y) -> untwine (twine (x,y))==(x::(Bool,[Bool]),y))
> quickCheck (\(x,y) -> untwine (twine (x,y))==(x::([Bool],Natural),y))
> quickCheck (\(x,y) -> untwine (twine (x,y))==(x::[[(Bool,Natural)]],y))


It's interesting to play with this a little and see what actually gets constructed. For example


*Main> twine ([1..10],[1,2,3,6,5,6,6,8,9,10,11::Integer])
Lower (FirstSame 1 (FirstSame 2 (FirstSame 3 (FirstDiffers (4,2) (Upper
(FirstSame 5 (FirstSame 6 (FirstDiffers (6,1) (Upper (FirstSame 8
(FirstSame 9 (FirstSame 10 (LeftNil 11 [])))))))))))))


Spans of values that are the same in both lists are represented with just one set of values. Individual differences are flagged as such. And of course the whole thing works recursively if you have lists of lists as in the last quickCheck example above.


A Closing Puzzle


Can you define the type of unordered binary trees? An unordered binary tree is either the empty tree, or an unordered pair of unordered binary trees. I'd like to say T=1+T(T+1)/2 = 1+T+T2, but those can't be turned directly into a valid definition is Haskell.

I don't know if there's a solution.

7 comments:

  1. I have some code that implements a generic trie using Hinze's Generics for the Masses technique.

    I've posted a variation that supports apply/tabulate. (The original wouldn't work because it's defined for partial maps, i.e. apply would have type Trie a t => t b -> a -> Maybe b.)

    ReplyDelete
  2. zednenem,

    Cool! When I get some time I'll try to adapt it to the anti- or sub-diagonal.

    ReplyDelete
  3. Hey sigfpe,

    This is Caroline from SocialRank.


    I am trying to get in touch with you but couldn't find your email address.


    We will index your blog posts as part of our content filter. I'd like to send you an invite to a beta preview of our new Web 2.0 site.


    Can you get back to me with your email address.


    Mine is caroline@mathbloggers.com

    Kind regards,


    Caroline

    www.SocialRank.com

    ReplyDelete
  4. To obtain overlines in HTML:
    <span style="text-decoration:overline">your-text-here</span>

    ReplyDelete
  5. Can you do something even simpler...unordered pairs (over a set; that is, not a recursive thing)?

    I have trouble seeing this in terms of an algebra (er..I can't), but it is certainly 'computable' (one can rank and unrank these kinds of pairs).

    Of course, once you can do arbitrary pairs you can use it recursively.

    Really, I think the best you can do is say:

    X = 1 + unorderedpair(X)

    where 'unordered pair' is a blackbox that (somehow) computes the set of unordered pairs of X. google for Flajolet, Salvy, and van Cutsem, or look at the maple package for combinatorial structures.

    ReplyDelete
  6. I'm sorry for commenting on this so late in the "game" - what can I say, I'm a slow reader.

    Anyway, isn't that 'tabulate' function in the same form as Yoneda's Lemma? Or am I missing something (not surprising)?

    ReplyDelete
  7. BlackMeph: actually, to reply even LATER in the game:

    Another way to look at the tabulate/apply functions is that they indicate that a Trie is a representable functor.

    http://hackage.haskell.org/packages/archive/category-extras/0.53.5/doc/html/Control-Functor-Representable.html

    http://en.wikipedia.org/wiki/Representable_functor

    ReplyDelete