A Neighborhood of Infinity: September 2007

Sunday, September 30, 2007

Arboreal Isomorphisms from Nuclear Pennies

The game of Nuclear Pennies is a game for one player played on a semi-infinite strip of sites, each of which may contain any non-negative number of pennies. The aim is to achieve a target configuration of pennies from a starting configuration by means of penny fusion and penny fission. In penny fission, a penny is split into a pair of pennies in the two neighbouring sites. In penny fusion, two pennies, separated by exactly one site, are fused into one penny in the intervening site. Obviously no fission or fusion take place at the very end site because there is no room for one of the fission products or fusion precursors. The following diagrams show some legal moves in Nuclear Pennies:

One using fission:

One using fusion:

Can you achieve this target configuration:

Starting from this configuration?

It's a surprisingly non-trivial task but I urge you to have a go. If you give up, here's a video solution:

So a natural question arises: what single penny configurations can be achieved from that starting position? As every move in Nuclear Pennies is reversible, if you solved the puzzle you know we can shift a single penny 6 places left or right. (But to move 6 spaces left we need one extra space on the left.) So labelling positions 0,1,2,... we know that starting from 7 we can place a penny on 6n+1 for all non-negative n. But what about other positions?

We can borrow a technique from M Reiss's seminal paper on peg solitaire: Beiträge zur Theorie des Solitär-Spiels, Crelles Journal 54 (1857) 344-379. The idea is to associate an algebraic value with each position and then ensure that each this is chosen so that each legal move leaves the value unchanged.

Label the sites as follows:

We consider the value of each penny to be the value of its site and to get the value of a position, add the values of the individual pennies. If we pick x to be one of the complex cube roots of -1 then we have x=x²+1, because x³+1=(1+x)(x²+1-x). We define a move to be paralegal if it doesn't change the value of a position. It should be clear that penny fusion and fission are both paralegal. In other words

legal ⇒ paralegal

and

paraillegal ⇒ illegal

Now consider the values of 1,x,x²,... and so on. It's not hard to show that the only time an element in this sequence takes the value 1 is when the exponent is 6. In fact, x is a sixth root of unity. So this shows we can never shift a single coin by anything other than a multiple of 6 sites. As the video shows how to shift by 6 sites we know we can shift a single coin by n sites if and only if n is a multiple of 6 (and there's one site available to the left of the leftmost penny). It's neat that we took an excursion through the field of complex numbers to prove something about a seemingly simple piece of discrete mathematics.

But who cares about nuclear coin reactions? That's not my real motivation here. Believe it or not, each coin shuffle above corresponds to an isomorphism between certain types and solving the puzzle above actually demonstrates a really neat isomorphism between tuples of data structures.

Go back to the algebra above. We have that legal ⇒ paralegal because coin reactions correspond to legal manipulations of x. Fissioning a coin represents replacing x by 1+x², for example. But the converse isn't true. We can manipulate complex numbers in all kinds of interesting ways that don't correspond to coin reactions. For example, we can write -x² but we aren't allowed negative numbers of coins in a site. But there's another algebraic structure which corresponds exactly to coin reactions. To see that, it's time to start writing Haskell code:



> import Prelude hiding (Left,Right)
> import Control.Monad
> import Test.QuickCheck

> data T = Leaf | Fork T T deriving (Eq,Show)

T is a simple binary tree type. That declaration simply says that in the algebra of types, T=T²+1. Now unlike complex numbers, when you form a polynomial in the type T you can't have anything other than a non-negative integer as a coefficient. In other words, positions in the game of Nuclear Pennies correspond precisely to polynomials, in the algebra of types, in T.

For example, consider the first move in the video solution I gave:

It corresponds to

T⁷->T⁸+T⁶

or in Haskell notation an isomorphism

(T,T,T,T,T,T,T) -> Either (T,T,T,T,T,T,T,T) (T,T,T,T,T,T)

So you know what I'm going to do next, I'm going to code up the entire solution as a Haskell program mapping seven trees to one and back again. You don't need to read all of the following, but I thought I'd put it here for completeness. So skip over the code to the end if you feel like it...

Firstly, Either is a bit tedious to write. So here's my own implementation that uses a slightly non-standard way to write a type constructor:



> data a :+ b = Left a | Right b

We know that the type algebra is commutative, so A+B=B+A. But that '=' sign is really an isomorphism, not equality, and we'll need that isomorphism explicitly:



> commute :: a :+ b -> b :+ a
> commute (Left a) = Right a
> commute (Right b) = Left b

Same goes for associativity:



> associate :: (a :+ b) :+ c -> a :+ (b :+ c)
> associate (Left (Left a)) = Left a
> associate (Left (Right b)) = Right (Left b)
> associate (Right c) = Right (Right c)

And associativity the other way. Here I'm starting a convention of using primes to represent the inverse of a function.



> associate' :: a :+ (b :+ c) -> (a :+ b) :+ c
> associate' (Left a) = Left (Left a)
> associate' (Right (Left b)) = Left (Right b)
> associate' (Right (Right c)) = Right c

For a fixed b, '+b' is a functor so f :: a -> b induces a map a+b -> c+b. Unfortunately I don't know an easy way to make '+b', as opposed to 'b+', a functor in Haskell. So I'll have to make do with this function that lifts f to a+b:



> liftLeft  :: (a -> c) -> a :+ b -> c :+ b
> liftLeft f (Left a) = Left (f a)
> liftLeft f (Right b) = Right b

I'm going to need all these.



> liftLeft2 = liftLeft . liftLeft
> liftLeft3 = liftLeft . liftLeft2
> liftLeft4 = liftLeft . liftLeft3
> liftLeft5 = liftLeft . liftLeft4

I'm going to represent tuples a little weirdly, though arguably this is much more natural than the usual way to write tuples. Tn corresponds to Tⁿ:



> type T0 = ()
> type T1 = (T,T0)
> type T2 = (T,T1)
> type T3 = (T,T2)
> type T4 = (T,T3)
> type T5 = (T,T4)
> type T6 = (T,T5)
> type T7 = (T,T6)
> type T8 = (T,T7)

And now we need functions to assemble and disassemble trees at the start of a tuple. These correspond to fusion and fission respectively in the game.



> assemble (Left x) = (Leaf,x)
> assemble (Right (a,(b,x))) = (Fork a b,x)

> assemble' (Leaf,x) = Left x
> assemble' (Fork a b,x) = Right (a,(b,x))

Here's the first step in the solution. That was easy.



> step1 :: T7 -> T6 :+ T8
> step1 = assemble'

It gets successively harder because we need to push the assemble' function down into the type additions:



> step2 :: T6 :+ T8 -> T5 :+ T7 :+ T8
> step2 = liftLeft assemble'

> step3 :: T5 :+ T7 :+ T8 -> T4 :+ T6 :+ T7 :+ T8
> step3 = liftLeft2 assemble'

> step4 :: T4 :+ T6 :+ T7 :+ T8 -> T3 :+ T5 :+ T6 :+ T7 :+ T8
> step4 = liftLeft3 assemble'

> step5 :: T3 :+ T5 :+ T6 :+ T7 :+ T8 -> T2 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8
> step5 = liftLeft4 assemble'

> step6 :: T2 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8 -> T1 :+ T3 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8
> step6 = liftLeft5 assemble'

We're going to use a few functions whose sole purposes is to shuffle around type additions using commutativity and associativity.



> swap23 :: (a :+ b) :+ c -> (a :+ c) :+ b
> swap23 = commute . associate . liftLeft commute

> shuffle1 :: T1 :+ T3 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8 -> T1 :+ T4 :+ T3 :+ T5 :+ T6 :+ T7 :+ T8
> shuffle1 = liftLeft4 swap23

> shuffle2 :: (T1 :+ T4) :+ T3 :+ T5 :+ T6 :+ T7 :+ T8 -> T3 :+ T5 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4)
> shuffle2 = swap23 . liftLeft swap23 . liftLeft2 swap23 . liftLeft3 swap23 . liftLeft4 commute

After the left-to-right sweep, here's the right-to-left sweep:



> step7 :: T3 :+ T5 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4) -> T4 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4)
> step7 = liftLeft4 assemble

> step8 :: T4 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4) -> T5 :+ T7 :+ T8 :+ (T1 :+ T4)
> step8 = liftLeft3 assemble

> step9 :: T5 :+ T7 :+ T8 :+ (T1 :+ T4) -> T6 :+ T8 :+ (T1 :+ T4)
> step9 = liftLeft2 assemble

> step10 :: T6 :+ T8 :+ (T1 :+ T4) -> T7 :+ (T1 :+ T4)
> step10 = liftLeft assemble

> shuffle3 :: T7 :+ (T1 :+ T4) -> T1 :+ T4 :+ T7
> shuffle3 = commute

And now we have another left-to-right sweep followed by a right-to-left sweep:



> step11 :: T1 :+ T4 :+ T7 -> T2 :+ T0 :+ T4 :+ T7
> step11 = liftLeft2 (commute . assemble')

> step12 :: T2 :+ T0 :+ T4 :+ T7 -> T3 :+ T1 :+ T0 :+ T4 :+ T7
> step12 = liftLeft3 (commute . assemble')

> step13 :: T3 :+ T1 :+ T0 :+ T4 :+ T7 -> T4 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7
> step13 = liftLeft4 (commute . assemble')

> step14 :: T4 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7-> T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7
> step14 = liftLeft5 (commute . assemble')

> shuffle4 :: T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7 -> T7 :+ T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4
> shuffle4 = liftLeft5 commute . liftLeft4 swap23 . liftLeft3 swap23 . liftLeft2 swap23 . liftLeft swap23 . swap23

> shuffle5 :: T7 :+ T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 -> T7 :+ T5 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0
> shuffle5 = liftLeft3 swap23 . liftLeft2 swap23 . liftLeft swap23 . swap23

> step15 :: T7 :+ T5 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0 -> T6 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0
> step15 = liftLeft5 (assemble . commute)

> step16 :: T6 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0 -> T5 :+ T3 :+ T2 :+ T1 :+ T0
> step16 = liftLeft4 (assemble . commute)

> step17 :: T5 :+ T3 :+ T2 :+ T1 :+ T0 -> T4 :+ T2 :+ T1 :+ T0
> step17 = liftLeft3 (assemble . commute)

> step18 :: T4 :+ T2 :+ T1 :+ T0 -> T3 :+ T1 :+ T0
> step18 = liftLeft2 (assemble . commute)

> step19 :: T3 :+ T1 :+ T0 -> T2 :+ T0
> step19 = liftLeft (assemble . commute)

> step20 :: T2 :+ T0 -> T1
> step20 = assemble . commute

And put it all together:



> iso :: T7 -> T1
> iso = step20 . step19 . step18 . step17 . step16 . step15 . shuffle5 . shuffle4 . step14 .
>     step13 . step12 . step11 . shuffle3 . step10 . step9 . step8 . step7 .
>     shuffle2 . shuffle1 . step6 . step5 . step4 . step3 . step2 . step1

At this point I might as well write the inverse. This was mostly derived from the forward function using vim macros:



> step20' :: T1 -> T2 :+ T0
> step20' = commute . assemble'

> step19' :: T2 :+ T0 -> T3 :+ T1 :+ T0
> step19' = liftLeft (commute . assemble')

> step18' :: T3 :+ T1 :+ T0 -> T4 :+ T2 :+ T1 :+ T0
> step18' = liftLeft2 (commute . assemble')

> step17' :: T4 :+ T2 :+ T1 :+ T0 -> T5 :+ T3 :+ T2 :+ T1 :+ T0
> step17' = liftLeft3 (commute . assemble')

> step16' :: T5 :+ T3 :+ T2 :+ T1 :+ T0 -> T6 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0
> step16' = liftLeft4 (commute . assemble')

> step15' :: T6 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0 -> T7 :+ T5 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0
> step15' = liftLeft5 (commute . assemble')

> shuffle5' :: T7 :+ T5 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0 -> T7 :+ T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4
> shuffle5' = swap23 . liftLeft swap23 . liftLeft2 swap23 . liftLeft3 swap23

> shuffle4' :: T7 :+ T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 -> T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7
> shuffle4' = swap23 . liftLeft swap23 . liftLeft2 swap23 . liftLeft3 swap23 . liftLeft4 swap23 . liftLeft5 commute

> step14' :: T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7 -> T4 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7
> step14' = liftLeft5 (assemble . commute)

> step13' :: T4 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7 -> T3 :+ T1 :+ T0 :+ T4 :+ T7 
> step13' = liftLeft4 (assemble .commute)

> step12' :: T3 :+ T1 :+ T0 :+ T4 :+ T7 -> T2 :+ T0 :+ T4 :+ T7 
> step12' = liftLeft3 (assemble . commute)

> step11' :: T2 :+ T0 :+ T4 :+ T7 -> T1 :+ T4 :+ T7 
> step11' = liftLeft2 (assemble . commute)

> shuffle3' :: T1 :+ T4 :+ T7 -> T7 :+ (T1 :+ T4) 
> shuffle3' = commute

> step10' :: T7 :+ (T1 :+ T4) -> T6 :+ T8 :+ (T1 :+ T4) 
> step10' = liftLeft assemble'

> step9' :: T6 :+ T8 :+ (T1 :+ T4) -> T5 :+ T7 :+ T8 :+ (T1 :+ T4) 
> step9' = liftLeft2 assemble'

> step8' :: T5 :+ T7 :+ T8 :+ (T1 :+ T4) -> T4 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4) 
> step8' = liftLeft3 assemble'

> step7' :: T4 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4) -> T3 :+ T5 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4) 
> step7' = liftLeft4 assemble'

> shuffle2' :: T3 :+ T5 :+ T6 :+ T7 :+ T8 :+ (T1 :+ T4) -> (T1 :+ T4) :+ T3 :+ T5 :+ T6 :+ T7 :+ T8 
> shuffle2' = liftLeft4 commute . liftLeft3 swap23 . liftLeft2 swap23 . liftLeft swap23 . swap23

> shuffle1' :: T1 :+ T4 :+ T3 :+ T5 :+ T6 :+ T7 :+ T8 -> T1 :+ T3 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8 
> shuffle1' = liftLeft4 swap23

> step6' :: T1 :+ T3 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8 -> T2 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8 
> step6' = liftLeft5 assemble

> step5' :: T2 :+ T4 :+ T5 :+ T6 :+ T7 :+ T8 -> T3 :+ T5 :+ T6 :+ T7 :+ T8 
> step5' = liftLeft4 assemble

> step4' :: T3 :+ T5 :+ T6 :+ T7 :+ T8 -> T4 :+ T6 :+ T7 :+ T8 
> step4' = liftLeft3 assemble

> step3' :: T4 :+ T6 :+ T7 :+ T8 -> T5 :+ T7 :+ T8
> step3' = liftLeft2 assemble

> step2' :: T5 :+ T7 :+ T8 -> T6 :+ T8
> step2' = liftLeft assemble

> step1' :: T6 :+ T8 -> T7
> step1' = assemble

> iso' :: T1 -> T7
> iso' = step1' . step2' . step3' . step4' .
>     step5' . step6' . shuffle1' . shuffle2' . step7' .
>     step8' . step9' . step10' . shuffle3' . step11' .
>     step12' . step13' . step14' . shuffle4' . shuffle5' .
>     step15' . step16' . step17' . step18' . step19' . step20'

Enough already! Time to test the code. It's already pretty obvious that each of the steps above is invertible but let's use QuickCheck anyway. And here's a small puzzle: why to I have return Leaf twice?



> instance Arbitrary T where
>     arbitrary = oneof [return Leaf,return Leaf,liftM2 Fork arbitrary arbitrary]

> main = do
>     quickCheck $ \x -> x==iso (iso' x)
>     quickCheck $ \x -> x==iso' (iso x)

All done.

So time to think about what exactly we have here. We have a way to pack 7-tuples of trees into a single tree exactly. It's easy to store two trees in a tree say, just by making them the left and right branch of another tree. But then we get some 'wastage' because the isolated leaf doesn't correspond to any pair of trees. We have a perfect packing. Of course the set of trees and the set of 7-tuples of trees have the same cardinality, and it's not hard to find a bijection by enumerating these sets and laying them out side by side. But such a bijection could get arbitrarily complicated. But the isomorphism above only looks at the tops of the trees and can be executed lazily. It's a particularly nice isomorphism. And from what I showed above, you can only pack (6n+1)-tuples into a single tree. Weird eh?

And everything I say here is derived from the amazing paper Seven Trees in One by Andreas Blass. I've mentioned it a few times before but I've been meaning to implement the isomorphism explicitly for ages. Blass's paper also shows that you can, to some extent, argue paralegal ⇒ legal - something that's far from obvious.

Saturday, September 08, 2007

Tries and their Derivatives

This was going to be part two of three, but I'm now squashing two parts down to this one article. So this is the conclusion to what I started in my previous installment.

Previously I talked about certain things you could do with types built up recursively using addition and multiplication of types - the so-called regular tree types. But I didn't talk about types that could be built using ->, ie. function types. These types seem somehow different. The type Either a b contains data pertaining to an a or a b, and (a,b) contains data about an a and a b. But a -> b is something completely different, a function rather than a datastructure. What I want to show is that this distinction isn't so clear, and that in fact functions can often be replaced by an isomorphic datastructure. There's an obvious application for this: memoisation, and most of what I say can be found in a paper by Ralf Hinze on that subject. However, I have an ulterior motive - computing generalised antidiagonals, and my notation will be biased accordingly.

Firstly, I'll be using conventional mathematical exponentiation notation for function types, so I'll use A^B to mean the same thing as B -> A, ie. functions mapping from B to A. This will help to make some of the algebra seem more natural.

So let's start with a type like A -> (). We could also write this as 1^A. From ordinary algebra we might expect this to equal 1. Sure enough, there is (up to pointwise equality) only one function of type A -> (), and it's called const ().

Now consider A^B+C. Algebraically this is just A^BA^C. In fact, the isomorphism is given by the standard Haskell either function. So far we can take any type B, built from singletons and addition, and reduce A^B to an isomorphic type with no exponentiation. Our goal will be to generalise this - for any type B (well...as many as possible anyway) to find a type constructor T such that T[A] = A^B, and in such a way that T is defined without exponentiation. Such a T is called a (generalised) trie. Like we did for the antidiagonal we can express this through a multiparameter type class:



> class Trie a t | a -> t where
>   tabulate :: (a -> b) -> t b
>   apply :: t b -> (a -> b)

tabulate converts a function into a datastructure and apply performs the opposite.

Again we're going to have the same issues with the antidiagonal. Haskell can't build tries for us automatically, but various generic forms of Haskell can. I'll be using plain Haskell so we'll be doing a bit of unnecessary labour. (I did try Hinze's Generics for the Masses approach but couldn't get it to work in this context. I suspect it can be made to work with more effort.)

So let's use the same example as in my previous installment, Bool.



> data BoolTrie a = BoolTrie a a deriving Show

> instance Trie Bool BoolTrie where
>   tabulate f = BoolTrie (f False) (f True)
>   apply (BoolTrie f _) False = f
>   apply (BoolTrie _ t) True = t

It's an easy exercise to show that apply and tabulate are mutual inverses. The BoolTrie stores the two possible values that a function Bool -> a could take. But here's the cool thing: given an expensive to compute function, f, of type Bool -> X, tabulate f is a datastructure that tells us everything we need to know about f. apply (tabulate f) gives us back our function again, but note how it never needs to call f more than once for each argument. In other words, we can define



> memo :: Trie a t => (a -> b) -> (a -> b)
> memo = apply . tabulate

which automatically converts functions into a memoised form.

Again, mimicking antidiagonals, let's implement the trie of Either a b:



> data EitherTrie u v a = EitherTrie (u a) (v a)

> instance (Trie a u,Trie b v) => Trie (Either a b) (EitherTrie u v) where
>   apply (EitherTrie f _) (Left x) = apply f x
>   apply (EitherTrie _ g) (Right y) = apply g y
>   tabulate f = EitherTrie (tabulate (f . Left)) (tabulate (f . Right))

Next consider products. This is slightly subtler, but only slightly. We use A^BC=(A^C)^B. If we define U[X]=X^B and V[X]=X^C then A^BC=U[V[A]].



> data PairTrie u v a = PairTrie (u (v a))

> instance (Trie a u,Trie b v) => Trie (a,b) (PairTrie u v) where
>   apply (PairTrie f) (b,c) = apply (apply f b) c
>   tabulate f = PairTrie $ tabulate (\a -> tabulate (\b -> f (a,b)))

This all applies recursively. So let's try tackling boolean lists again. We have L = 1+2L. Define T_L[X]=X^L. So T_L[X]=X¹X^2L=X(X^L)²=X(T_L[X])². This gives a nice bona fide definition of T_L. Just as with antidiagonals, it's annoying to have to write this by hand. A good generic programming language should be able to build a BoolTrie from a [Bool] automatically:



> data BoolListTrie a = BoolListTrie a (BoolListTrie a) (BoolListTrie a)

> instance Trie [Bool] BoolListTrie where
>   apply (BoolListTrie n _ _) [] = n
>   apply (BoolListTrie _ f _) (False:bs) = apply f bs
>   apply (BoolListTrie _ _ t) (True:bs) = apply t bs
>   tabulate f = BoolListTrie (f []) (tabulate (f . (False:))) (tabulate (f. (True:)))

Here's a silly example putting this in action on a binary version of the Ackermann function:



> inc (False : bs) = True : bs
> inc (True : bs) = False : inc bs
> inc [] = [True]

> dec [True] = []
> dec (True : bs) = False : bs
> dec (False : bs) = True : dec bs

> ack [] n = inc n
> ack m [] | not (null m) = ack (dec m) [True]
> ack m n = ack (dec m) (ack m (dec n))

> ack' = curry (memo (uncurry ack))

Note how once you've used ack' on some arguments, it runs instantly when you reuse it on the same or smaller arguments.

Part 2

Tries are containers. So suppose T is the trie corresponding to the type X with T[A] isomorphic to A^X. What is T[1]? Well it's just 1^X which from basic algebra equals 1. Or do look at it another way, it's the type of all functions taking values in the type 1. There's only one such function and so clearly there is a unique object of type T[1] and T[1] must be isomorphic to 1. An element of '1' can be thought of as a type with no information in it. So if we have a container with elements of 1 in it, it's as if the slots for those elements have simply been closed off. So another way to look at T[1] is that it is a container with all of its slots closed. So for tries, T[1] must be equivalent to the trivial type 1.

At this point, derivatives of containers have become mainstream. So I don't need to talk about this and can direct you to the Haskell wikibook if you need to brush up.

So what do we get if we differentiate the trie T? T is a container that holds one value for each element of X. The derivative, T', is a trie with a 'hole' in it. In other words, it's a container that contains a value for all elements of X but one. So what's T'[1]? All of the slots in the container have been stuffed with 1 and so are effectively 'closed'. But the hole is still there and the whole hasn't been 'closed'. So a T'[1] is a trie where one slot has been singled out as a hole. But the slots are in one-to-one correspondence with elements of X, and so T'[1]=X. There's another way to see this. Define the function on the reals t(y)=y^x. Then t'(y)=xy^x-1 so t'(1)=x. So it should come as no surprise that T'[1]=X. See the Container Types blog for more dicussion of this. They call F'[1] by the name log F. Their Y can be seen as the operator that maps a type to its trie.

Now, let's go back to the antidiagonal again. I showed how to compute X², but you can see that it's tricky to use the same approach to extend this to Xⁿ, for arbitrary naturals n. Instead, consider differentiating T' to make another hole in the trie. T''[X] must be a trie of X's with two holes, but the second hole obviously can't be where the first hole was. So T''[1] is a datastructure that consists of nothing but two distinct holes in a T[1]. As if by magic, T''[1] must be X². There's another way to see this. If t(y)=y^x, then t''(y)=x(x-1)y^x-2 so t''(1)=x². And more generally, T⁽ⁿ⁾[1]=Xⁿ. And that's the solution!

So a programming language powerful enough to differentiate types automatically, and powerful enough to build tries automatically (and Type-Indexed Data Types describes implementations of both of these in Generic Haskell) allows us to automatically construct the type of n-tuples where all of the elements are distinct. I'll leave the code to you as an exercise. :-)

Let me at least do the algebra for simple binary trees.

B = 1+B²

Let T be the trie of B so T[X] is isomorphic to X^B.

X^B = X(X^B)^B

T[X] = XT[T[B]]

(Exercise: try to draw a picture of this kind of tree. Because of the nesting of T's it's not a 'regular' type and you'll find yourself rapidly needing a lot of paper!).

So now we get

T'[X] = T[T[X]]+XT'[T[X]]T'[X]
T'[1] = T[1]+XT'[1]T'[1] = 1+X(T'[1])²

Ie. T'[1]=B

T''[X] = T'[T[X]]T'[X]+T'[T[X]]T'[X]+XT''[T[X]]T'[X]²
+XT'[T[X]]T''[X]
T''[1] = 2B²+B²T''[1]+BT''[1]

And that's a perfectly respectable recursive definition for T''[1]. I'll leave you to write the code. (It does work - this is actually the point which I started from...)

Some random thoughts

You can think of differentiation as annihilating an element of a container, leaving only a hole as the 'trace' that it was there before. So it's pretty weird that annihilating a 1 from a T[1] actually creates an X. It's almost as if T[1] is some kind of vacuum and X is an anti-particle. Weirder still, when we do this twice we get two distinct X's. So it's as if T[1] is some kind of vacuum for fermionic X's. This isn't completely insane. In quantum field theory we use differential operators to create and annihilate particles.

I haven't written the code, but it ought to be possible to go further and define Xⁿ/n! for instances of Ord. This is the type of n-tuples where the elements are distinct and in order. I expect you can construct this using a similar recursive technique to the way I built up the antidiagonal.

I don't know how to get an overline with HTML so I can't use Knuth's notation for this. But I expect that for instances of Ord there's a similar method that can be used to define X(X+1)...(X+n-1)/n!. This is the type of non-decreasing n-tuples, or, equivalently, n-element multisets. I'm not sure, but I think this may conflict with what Abbott et al. say in their paper on Derivatives of Containers where they want to identify exp (X) with multisets. I think it should be the sum of X(X+1)...(X+n-1)/n!, not the sum of Xⁿ/n!. But as that's just a sketch in the final paragraph of the paper, maybe I shouldn't get too worried about it. (If you're still reading Conor...)

Also, many identities satisfied by falling factorial should correspond to isomorphisms implementable in Haskell. And I'm sure there must be an Umbral Calculus connection.

And an important final note: X -> X² is not a functor. So don't even think about differentiating it.

Appendix

Changed my mind. Here's some code for implementing X²/2!.

Call the set of pairs (a,a') with a<a' the subdiagonal. An element of (a,a) either has a lower first element, a lower second element, or the two elements are equal. This is expressed in the type Pair below.

I'll sketch some algebra. Write s(X) for the subdiagonal. So s(X) is a little like X(X-1)/2. It's not hard to see that

s(1) = 0
s(a+b) = s(a)+ab+s(b)

s(ab) = s(a)b²+as(b)

If L[X] is a list of X's, then

L[X]=1+X L[X]

s(L[X]) = XL[X]+s(X)L[X]+Xs(L[X])



> import Test.QuickCheck

> data Pair a u = Lower u | Diagonal a | Upper u deriving Show

> class Ord a => SubDiagonal a u | a -> u where
>     twine :: (a,a) -> Pair a u
>     untwine :: Pair a u -> (a,a)
>     twine' :: (a,a) -> u
>     untwine' :: u -> (a,a)
>     twine (a,b) = case compare a b of
>         LT -> Lower (twine' (a,b))
>         EQ -> Diagonal a
>         GT -> Upper (twine' (b,a))
>     untwine (Lower u) = untwine' u
>     untwine (Diagonal a) = (a,a)
>     untwine (Upper u) = uncurry (flip (,)) $ untwine' u

> data SubEither a b u v = BothLeft u | Diff a b | BothRight v deriving Show

> instance (SubDiagonal a u,SubDiagonal b v) => SubDiagonal (Either a b) (SubEither a b u v) where
>     twine' (Left a,Left a') = BothLeft (twine' (a,a'))
>     twine' (Left a,Right b) = Diff a b
>     twine' (Right b,Right b') = BothRight (twine' (b,b'))
>     untwine' (BothLeft u) = let (a,a') = untwine' u in (Left a,Left a')
>     untwine' (Diff a b) = (Left a,Right b)
>     untwine' (BothRight u) = let (b,b') = untwine' u in (Right b,Right b')

> data SubPair a b u v = LeftDiffers u b b | LeftSame a v deriving Show

> instance (SubDiagonal a u,SubDiagonal b v) => SubDiagonal (a,b) (SubPair a b u v) where
>     twine' ((a,b),(a',b')) | a/=a'      = LeftDiffers (twine' (a,a')) b b'
>                            | otherwise  = LeftSame a (twine' (b,b'))
>     untwine' (LeftDiffers u b b') = let (a,a') = untwine' u in ((a,b),(a',b'))
>     untwine' (LeftSame a v) = let (b,b') = untwine' v in ((a,b),(a,b'))

> instance SubDiagonal Bool () where
>     twine' (False,True) = ()
>     untwine' () = (False,True)

> data SubList a u = FirstDiffers u (Pair [a] (SubList a u))| FirstSame a (SubList a u) | LeftNil a [a] deriving Show

> instance (SubDiagonal a u) => SubDiagonal [a] (SubList a u) where
>     twine' ([],(a:as)) = LeftNil a as
>     twine' ((a:as),(b:bs)) | a/=b       = FirstDiffers (twine' (a,b)) (twine (as,bs))
>                            | otherwise  = FirstSame a (twine' (as,bs))
>     untwine' (FirstDiffers u v) = let (a,a') = untwine' u in let (as,bs) = untwine v in (a:as,a':bs)
>     untwine' (FirstSame a u) = let (bs,bs') = untwine' u in (a:bs,a:bs')
>     untwine' (LeftNil b bs) = ([],b:bs)

> type Natural = Integer

> instance SubDiagonal Natural (Natural,Natural) where
>     twine' (a,b) = (a,b-a)
>     untwine' (a,b) = (a,a+b)

> main = do
>     quickCheck (\(x,y) -> untwine (twine (x,y))==(x::Bool,y))
>     quickCheck (\(x,y) -> untwine (twine (x,y))==(x::[Bool],y))
>     quickCheck (\(x,y) -> untwine (twine (x,y))==(x::(Bool,[Bool]),y))
>     quickCheck (\(x,y) -> untwine (twine (x,y))==(x::([Bool],Natural),y))
>     quickCheck (\(x,y) -> untwine (twine (x,y))==(x::[[(Bool,Natural)]],y))

It's interesting to play with this a little and see what actually gets constructed. For example



*Main> twine ([1..10],[1,2,3,6,5,6,6,8,9,10,11::Integer])
Lower (FirstSame 1 (FirstSame 2 (FirstSame 3 (FirstDiffers (4,2) (Upper
(FirstSame 5 (FirstSame 6 (FirstDiffers (6,1) (Upper (FirstSame 8
(FirstSame 9 (FirstSame 10 (LeftNil 11 [])))))))))))))

Spans of values that are the same in both lists are represented with just one set of values. Individual differences are flagged as such. And of course the whole thing works recursively if you have lists of lists as in the last quickCheck example above.

A Closing Puzzle

Can you define the type of unordered binary trees? An unordered binary tree is either the empty tree, or an unordered pair of unordered binary trees. I'd like to say T=1+T(T+1)/2 = 1+T+T², but those can't be turned directly into a valid definition is Haskell.

I don't know if there's a solution.

Sunday, September 02, 2007

The Antidiagonal

In a programming language with constrained types we can construct a type like "the type of pairs of X's where the two X's are distinct". But can we make such a type in Haskell? Answering this question will take us on a long journey which I think I'm going to split over three parts. And as usual, just put this blog post into a file called main.lhs and you can run it with ghci -fglasgow-exts -fallow-undecidable-instances.

We can write our question a little more formally. Given a type X, can we form a type U with the property that

X² = U+X

The idea is that the = sign is an isomorphism with the property that the diagonal in X², ie. elements of the form (x,x), get mapped to the right component of U+X. When we are able to do this, we'll call U the antidiagonal of X, and say that X is splittable.

We can express the relationship between U and X through a multiparameter type class



> diagonal x = (x,x)

> class Eq x => AntiDiagonal u x | x -> u where
>   twine :: (x,x) -> Either x u
>   untwine :: Either x u -> (x,x)

>   twine' :: (x,x) -> u
>   untwine' :: u -> (x,x)

>   twine (x,y) | x==y      = Left x
>               | otherwise = Right $ twine' (x,y)

>   untwine (Left x)  = (x,x)
>   untwine (Right y) = untwine' y

The isomorphism between X² and X+U is given by twine and untwine. But to save writing similar code over and over again, and to ensure we really are mapping the diagonal of X² correctly, we define these in terms of twine' and untwine'. (Note that twine' is partial, it's only guaranteed to take a value off the diagonal.)

Just to get into the swing of things, here's a simple example:



> instance AntiDiagonal Bool Bool where
>   twine' (a,b) = a
>   untwine' a = (a,not a)

It's not hard to check that twine and untwine are mutual inverses and that twine . diagonal=Left. You can view this as a special case of 2²=2+2 as 2 is essentially a synonym for Bool.

It looks a lot like what we're trying to do is subtract types by forming X²-X or X(X-1). Much as I've enjoyed trying to find interpretations of conventional algebra and calculus in the context of the algebra of types, subtraction of types, in general, really doesn't make much sense. Consider what we might mean by X-1. If X=Bool then we could simply define



data Bool' = False'

This certainly has some of the properties you might expect of Bool-1, such as having only one instance. But it's not very natural. How should we embed this type back in Bool? There are two obvious ways of doing it and neither stands out as better than the other, and neither is a natural choice. But what we've shown above is that there is a natural way to subtract X from X² because a copy of X appears naturally in X² as the diagonal. So the question is, can we extend this notion to types beyond Bool?

How about a theorem. (I'll give a more intuitive explanation below.)

Theorem

In any commutative semiring fix a and b. If the equation a²=u+a has a solution, and the equation b²=v+b has a solution, then the equations (a+b)²=x+(a+b) and (ab)²=y+ab also have solutions.

Proof

Simply define

x = u+v+2ab

and

y = au+bv+uv.

QED

So if some types are splittable, so are their sums and products. As types form a commutative semiring we see that this theorem, twined with 1²=1+0 allows us to form "X(X-1)" for any type X built purely using non-recursive Haskell data declarations. In fact, we can use the above theorem to define "X(X-1)" for types, and use the notation X² for this. (I hope your browser shows that the exponent here is underlined.) There's a reason I use this notation which I'll get to later. So (a+b)²=a²+b²+2ab and (ab)²=ab²+ba²+a²b².

Here's a more intuitive explanation of that theorem above. Suppose u and v are both in Either a b. Then there are several ways they could be different to each other. For example, they could both be of the form Left _ in which case u = Left u' and v = Left v' and u' and v' must be distinct. They could be of the form Left u' and Right v' in which case it doesn't matter what u' and v' are. A little thought shows there are four distinct cases in total. These can be written in algebraic notation as u+v+ab+ba=u+v+2ab. To make this clearer, here's an implementation:



> data Either' a b u v = BothLeft u | BothRight v | LeftAndRight Bool a b

> instance (AntiDiagonal u a,AntiDiagonal v b) => AntiDiagonal (Either' a b u v) (Either a b) where
>   twine' (Left x,Left y)   = BothLeft (twine' (x,y))
>   twine' (Right x,Right y) = BothRight (twine' (x,y))
>   twine' (Left x,Right y)  = LeftAndRight False x y
>   twine' (Right x,Left y)  = LeftAndRight True y x

>   untwine' (BothLeft u) = let (a,b) = untwine' u in (Left a,Left b)
>   untwine' (BothRight v) = let (a,b) = untwine' v in (Right a,Right b)
>   untwine' (LeftAndRight False x y) = (Left x,Right y)
>   untwine' (LeftAndRight True x y) = (Right y,Left x)

A similar argument can be carried through for (a,b) leading to:



> data Pair' a b u v = LeftSame a v | RightSame u b | BothDiffer u v deriving (Eq,Show)

> instance (AntiDiagonal u a,AntiDiagonal v b) => AntiDiagonal (Pair' a b u v) (a,b) where
>   twine' ((a,b),(a',b')) | a==a'     = LeftSame a (twine' (b,b'))
>                          | b==b'     = RightSame (twine' (a,a')) b
>                          | otherwise = BothDiffer (twine' (a,a')) (twine' (b,b'))

>   untwine' (LeftSame a v)   = let (b,b') = untwine' v in ((a,b),(a,b'))
>   untwine' (RightSame u b)  = let (a,a') = untwine' u in ((a,b),(a',b))
>   untwine' (BothDiffer u v) = let (a,a') = untwine' u
>                                   (b,b') = untwine' v
>                                   in ((a,b),(a,b'))

This all works very well, but at this point it becomes clear that Haskell has some weaknesses. The type Either () () is isomorphic to Bool so we should be able to use the above code to construct the antidiagonal of Bool automatically. But Haskell doesn't give us access to that information. We can't ask, at runtime, if Bool is the sum of more primitive types. There are a number of solutions to this problem - we can use various more generic types of Haskell, or use Template Haskell. But I'm just going to stick with Haskell and manually construct the antidiagonal. (I wonder what Coq has to offer here.)

So I've solved the problem for 'finite' types built from singles, addition and multiplication. But what about recursive types. Before doing that, let's consider an approach to forming the antidiagonal of the naturals. Haskell has no natural type but let's pretend anyway



> type Natural = Integer

There's an obvious packing of a distinct pair of naturals as a pair of naturals:



> instance AntiDiagonal (Bool,Natural,Natural) Natural where
>   twine' (a,b) | compare a b == GT = (False,a-b,b)
>                | compare a b == LT = (True,b-a,a)
>   untwine' (False,d,b) = (b+d,b)
>   untwine' (True,d,a) = (a,a+d)

(I had to use compare to work around a blogger.com HTML bug!) It'd be cool if the code above could have been derived automatically, but alas it's not to be. But we can get something formally similar. Define the natural numbers like this:



> data N = Zero | S N deriving (Eq,Show)

A natural is zero or the successor of a natural. Algebraically this is just N=1+N. Now we wish to find N². Using 1²=0 and the earlier theorem we get M=1²+N²+2N, ie. M=M+2N. Let's code this up:



> data M = Loop M | Finish Bool N deriving Show

> instance AntiDiagonal M N where
>   twine' (Zero,S x) = Finish False x
>   twine' (S x,Zero) = Finish True x
>   twine' (S x,S y) = Loop (twine' (x,y))
>   untwine' (Finish False x) = (Zero,S x)
>   untwine' (Finish True x) = (S x,Zero)
>   untwine' (Loop m) = let (a,b) = untwine' m in (S a,S b)

Note that I've more or less just coded up the same thing as what I did for Either above. Can you see that that this is a disguised version of the code for the antidiagonal of Natural above? Think about the type N=1+N. We can view this as the set of paths through this finite state diagram, starting at N and ending at 1:

Essentially there's just one path for each natural number, and this number counts how many times you loop around. Now consider the same sort of thing with the type M=M+2N, starting at M and ending at 1:

We can describe such paths by the number of loops taken at M, the number of loops taken at N, and a Bool specifying whether we took path 0 or 1 from state 2 to state N. In other words, M is much the same thing as (Bool,Natural,Natural) above! M is a kind of 'compressed' version of a pair of N's. Suppose we want to twine S (S (S Zero)) and S (S (S (S Zero))). Both of these share a S (S (S _)) part. What the type M does is allow you to factor out this part (that's the part that goes into Loop) and the remainder is stored in the Finish part, with a boolean specifying whether it was the first or second natural that terminated first.

Let's step back or a second. Earlier I showed how for any type X, built from addition, multiplication and 1, we could form X². Now we've gone better, we can now form X² even for recursive types. (At least for data, not codata.) We haven't defined subtraction in general, but we have shown how to form X(X-1) in a meaningful way.

Let's try lists of booleans next, [Bool]. We can write this type as L=1+2L. Let's do the algebra above to find P = L²:

P = L²
= (1+2L)²
= 1²+(2L)²+4L
= (2L²+2²L+2²L²)+4L
= 2P+2P+2L+4L
= 4P+6L

In other words P = 4P+6L. The easiest thing is to code this up. Remember that each '=' sign in the above derivation corresponds to an isomorphism defined by twine and untwine so after lots of unpacking, and rewriting 4P+6L as 2L+2L+2P+2L+2P we get



> data SharedList = LeftNil Bool [Bool] | RightNil Bool [Bool]
>                   | HeadSame Bool SharedList | TailSame Bool [Bool]
>                   | Diff Bool SharedList deriving Show

> instance AntiDiagonal SharedList [Bool] where
>   twine' ([],b:t) = LeftNil b t
>   twine' (b:t,[]) = RightNil b t
>   twine' (a:b,a':b') | a==a' = HeadSame a (twine' (b,b'))
>                   | b==b' = TailSame a b
>                   | otherwise = Diff a (twine' (b,b'))

>   untwine' (LeftNil b t)  = ([],b:t)
>   untwine' (RightNil b t) = (b:t,[])
>   untwine' (HeadSame a b) = let (t1,t2) = untwine' b in (a:t1,a:t2)
>   untwine' (TailSame a b) = (a:b,not a:b)
>   untwine' (Diff a b)     = let (t1,t2) = untwine' b in (a:t1,not a:t2)

This looks pretty hairy, but it's really just a slight extension of the M=M+2N example. What's happening is that if two lists have the same prefix, then SharedList makes that sharing explicit. In other words this type implements a form of compression by factoring out shared prefixes. Unfortunately it took a bit of work to code that up. However, if we were programming in generic Haskell, the above would come absolutely for free once we'd defined how to handle addition and multiplication of types. What's more, it doesn't stop with lists. If you try it with trees you automatically get factoring of common subtrees and it works with any other datatype you can build from a Haskell data declaration (that doesn't use builtin primitive types like Int or Double).

So now I can say why I used the underlined superscript notation. It's the falling factorial. More generally Xⁿ=X(X-1)...(X-n+1). You may be able to guess what this actually means - it's an n-tuple with n distinct elements. Unfortunately, you can probably see that generalising the above theorem to n from 2 gets a bit messy. But (and this is what I really want to talk about) there's an amazing bit of calculus that allows you to define a generating function (more like generating functor) that gives you all of the Xⁿ in one go. But before I can talk about that I need to write a blog about generalised tries...

Here are some exercises.

(1) Can you code up the antidiagonal of binary boolean trees, T = 2+T²:



data BoolTree = Leaf Bool | Fork BoolTree BoolTree

(2) There are more efficient ways to define the naturals than through the successor function. Can you come up with a more efficient binary scheme and then code up its antidiagonal?

(3) The antidiagonal of the integers can be approximated by (Integer,Integer). This seems a bit useless - after all, the whole point of what I've written above is to split this up. But we can use this approximation to construct approximations of other types where you do get a payoff. Implement an approximation to [Integer]² this way so that you still get the benefit of prefix sharing. This looks a lot like traditional tries.

A Neighborhood of Infinity

Search This Blog