Friday, January 26, 2007

The Monads Hidden Behind Every Zipper

Uustalu points out that behind every zipper lies a comonad. I used this in my cellular automaton example. What he doesn't mention is that there is also a monad lurking behind every zipper and that it arises in a natural way. The catch is that it's a monad in the wrong category making it tricky to express in Haskell.

Firstly, what does "wrong category" mean? In the category of vector spaces, every object is equipped with a commutative binary + operator. We can make no such assumption in Haskell. What's more, Haskell doesn't easily allow us to restrict type constructors to instances of Num say. So we can't easily implement monads in the category of vector spaces as instances of Haskell's Monad. Nonetheless, we can always write our own bind and return constrained by Num, and that's what I'll do here. I believe there are workarounds that don't involve GADT abuse but really I just want to illustrate an idea, not write a complete working application.

As I've mentioned on #haskell, convolution is comonadic. A convolution (more specifically, a discrete convolution) is a function that maps a (doubly infinite) sequence of real numbers to another such sequence of real numbers in such a way that each number in the new sequence is a weighted sum of numbers in the old sequence and the weights depend only on the relative positions of the numbers. For example: replacing each number in the sequence with the average of it and its neighbours is a convolution.

Here's a simple example. Consider the sequence ...0,0,1,2,3,0,0,... We'll replace each number with the sum of its neighbours and twice itself. So this maps to ...,0,0+0+1,0+2*1+2,1+2*2+3,2+2*3+0,3+2*0+0,0,... The structure of this computation is very similar to the structure of cellular automaton evaluation and we can reuse that code here:

> data Zipper a = Zipper [a] [a] deriving (Eq,Show)

> left (Zipper (a:as) bs) = Zipper as (a:bs)
> right (Zipper as (b:bs)) = Zipper (b:as) bs

> class Comonad w where
> coreturn :: w a -> a
> cobind :: (w a -> b) -> w a -> w b

> iterate1 f x = tail $ iterate f x

> instance Comonad Zipper where
> cobind f a = fmap f $ Zipper (iterate1 left a) (iterate right a)
> coreturn (Zipper _ (b:_)) = b

> a = Zipper (repeat 0) ([0,1,2,3]++repeat 0)
> f (Zipper (a:_) (b:c:_)) = a+2*b+c

> test = let Zipper u v = cobind f a in take 5 v

When we run test we get [1,4,8,8,3].

Now think about the structure of this computation. We write a function f that gathers up the value at a cell and its neighbours and computes the weighted sum. cobind then applies this function all the way through our sequence to get our result. The important thing is that f pulls in the data it needs, distilling it down to one value, and that's why it has type Zipper a -> a.

There's another approach to convolution. We could push or scatter data. At each point in the sequence we could take the value and push one copy left, one copy right, and leave double the original value in place. Each cell will end up with three values left in it and all we have to do is sum the values deposited in a cell. What we want therefore is to take a value in a cell, of type a, and generate an entire zipper full of values pushed from it, ie. a Zipper a. We'd then like bind for the monad to sum over the returned zippers, staggering them right and left as appropriate. Here's a table illustrating what I mean:

0 -> 0 0 0 0 0 0
1 -> 0 1 2 1 0 0
2 -> 0 0 2 4 2 0
3 -> 0 0 0 3 6 3
0 -> 0 0 0 0 0 0
0 1 4 8 8 3

Unfortunately, as I pointed out above, bind for a monad can't perform summations, so I have to define bind' and return' instead.

> plus (Zipper a b) (Zipper a' b') = Zipper (plus' a a') (plus' b b') where
> plus' (a:as) (b:bs) = (a+b) : plus' as bs
> plus' a [] = a
> plus' [] a = a

Note that I'm going to assume that elements past the end of a list (so to speak) actually represent zero and define new versions of left and right that produce the required zeros on demand:

> left' (Zipper (a:as) bs) = Zipper as (a:bs)
> left' (Zipper [] bs) = Zipper [] (0:bs)
> right' (Zipper as (b:bs)) = Zipper (b:as) bs
> right' (Zipper as []) = Zipper (0:as) []

> tail' [] = []
> tail' a = tail a
> stagger f [] = []
> stagger f (x:xs) = x : map f (stagger f xs)
> stagger1 f x = tail' (stagger f x)

> instance Functor Zipper where
> fmap f (Zipper a b) = Zipper (map f a) (map f b)

Here's our fake monad:

> return' a = Zipper [] [a]
> bind' f x = let Zipper a b = fmap f x
> in foldl1 plus (stagger left' b ++ stagger1 right' a)

> a' = Zipper [] [0,1,2,3]
> f' x = Zipper [x] [2*x,x]

> test' = let Zipper u v = bind' f' a' in take 5 v

Run test' and we get the same result as before by a quite different method.

You should be able to see that in some sense comonads describe lazy computations and monads represent strict ones. In strict languages you evaluate your arguments first, and then push them into a function. In lazy languages you call the function first, and it pulls in the value of its arguments as needed. In fact, in Haskell we have a strictness monad and in ocaml you can implement a lazy comonad. By and large, it's easier to write pull algorithms in Haskell. Push algorithms are much easier to implement in a language with side effects. In fact, one of the things that Haskell beginners need to learn is how to rewrite push algorithms as pull algorithms and eliminate those side effects.

What I think is interesting here is that this dichotomy between push and pull, scatter and gather, is pervasive all the way through computing, and it's quite pretty that this deep notion it can be captured nicely by a(n almost) monad and a comonad. Just to give an illustration of how pervasive it is consider 3D rendering. We typically want to take a list of primitives and turn it into an array of pixels. If we rasterise we push the primitives, asking for each primitive, what pixels does it cover. If we ray-trace we pull on the pixels asking, for each one, what primitives cover it. Many optimisations to both of these approaches consist of making judicious decisions about which part of the pipeline should be switched from push to pull or vice versa.

BTW I have an algorithm for automatically converting some types of pull algorithm into push algorithms and vice versa. It's another one of those papers that I keep meaning to finish and submit for publication one day...

Saturday, January 20, 2007

Fox's Ubiquitous Free Derivative pt. 2

As promised earlier, here are some Haskell definitions of derivatives that can be applied to regular expressions. They all give operators that might occasionally be of practical use. But what's more interesting to me is that we now have enough of an intuition pump going between types and regular expression that we can siphon off ideas from papers on types and use them with regexps.

First define a regular expression type:

> import List
> data RE a = Symbol a | Star (RE a) | Sum [RE a] | Prod [RE a]
> deriving (Eq,Show,Ord)

I use lists instead of binary operators to represent products and sums because
addition and multiplication of regular expressions is associative.

> a <*> b = Prod [a,b]
> a <+> b = Sum [a,b]

> zero = Sum []
> one = Prod []

Here's how to make a simple regular expression from a string

> re x = Prod $ map Symbol x

Exercise: Write a proper parser for these things so we can use conventional regexp notation!

Here's a test to see whether a regular expression matches the empty string:

> acceptsEmpty (Symbol _) = False
> acceptsEmpty (Star _) = True
> acceptsEmpty (Sum l) = or $ map acceptsEmpty l
> acceptsEmpty (Prod l) = and $ map acceptsEmpty l

I've decided to name these derivatives after people. So first I define the McBride derivative which is a generalisation of the dissection operator introduced in Clowns to the left.... The key idea is that if d = mcbride c j a then d (x <*> y) = (d x <*> j y) <+> (c x <*> d y). This needs to respect the associativity of <*> and one way to achieve this is by choosing c and j that are multiplicative homomorphisms so that, for example, c(x <*> y) = c x <*> c y.

> mcbride _ _ a (Symbol a') = if a==a' then one else zero
> mcbride c j a (Sum p) = Sum $ map (mcbride c j a) p
> mcbride _ _ _ (Prod []) = zero
> mcbride c j a (Prod x) = Sum $ map f (splits x) where
> f (x,y:ys) = Prod $ map c x ++ [mcbride c j a y] ++ map j ys
> mcbride c j a (Star p) = c (Star p) <*> mcbride c j a p <*> j (Star p)

And now I can define all the Leibniz, Fox and Brzozowski derivatives as special cases of the mcbride derivative. The latter each come in left- and right-handed pairs.

> delta p = if acceptsEmpty p then one else zero

> leibniz,fox_l,fox_r,brzozowski_l :: (Ord a,Eq a) => a -> RE a -> RE a
> leibniz a p = simplify $ mcbride id id a p
> fox_l a p = simplify $ mcbride id (const (one)) a p
> fox_r a p = simplify $ mcbride (const (one)) id a p
> brzozowski_l a p = simplify $ mcbride delta id a p
> brzozowski_r a p = simplify $ mcbride id delta a p

brzozowski_l is the usual Brzozowski derivative. If r matches some language, brzozowski_l a t matches the sublanguage of strings beginning with a and with that a removed. We can define regular expression matching in the usual way in terms of it:

> matches re s = acceptsEmpty (matches' re s) where
> matches' re [] = re
> matches' re (a:as) = matches' (brzozowski_l a re) as

Now we can start matching.

ex1 would normally be written as "abc|b*c".

> ex1 = (re "abc") <+> (Star (re "b") <*> Symbol 'c')
> test1 = map (matches ex1) ["abc","bbc","bbbbc","abbbbc"]

brzozowski_r chews off a symbol from the right:

> test2 = map (matches (brzozowski_r 'c' ex1)) ["abc","bbc","ab","bbbbb"]

leibniz a allows you to leave out one a from anywhere in the regexp.

ex3 matches "abracadabra" as it might be shouted by a wizard who might lengthen the second last 'a' to make sure the spirits can hear. ex4 matches such strings with precisely one missing 'r'.

ex3 would normally be written as "abracada*bra".

> ex3 = re "abracad" <*> Star (re "a") <*> re "bra"
> test3 = map (matches ex3) ["abracadabra","abracadaaaaabra"]
> ex4 = leibniz 'r' ex3
> test4 = map (matches ex4) ["abracadabra","abacadabra","abracaaaaadaba","abacadaba"]

fox_l a makes regular expressions accept any prefix ending just before an a. Combining fox_l and fox_r allows you to search for all elements of a language just between a chosen pair of symbols.

In ex5 we build a regexp for certain types of statement in a simple language and then pull out from it the part corresponding to substrings within parentheses. The undifferentiated part of ex5 would normally be written as "x = ([0-9]+[0-9]); *".

> ex5 = simplify $ fox_r '(' $ fox_l ')' $
> let digits = Star (Sum (map Symbol "0123456789")) in
> re "x = (" <*> digits <*> re "+" <*> digits <*> re ");" <+> Star (re " ")
> test5 = map (matches ex5) ["1+2","7-4"]

Open Question: There are more derivatives that can be constructed from the parts I've given. What do they do? What other multiplicative homomorphisms can you define? What derivatives do they give? (Hint: consider the property "R matches only a finite language").

Oh...I almost forgot the whole point of this. It's now possible to see that McBride's l-operator is a version of the Brzozowski derivative. In fact, the theorems he proves about l are pretty similar to the theorems that get proved about the Brzozowski derivative such as Theorem 4.4 here.

Some helper functions:

splits x returns all the ways of splitting x into pairs u and v such that u ++ v == x and v is non-empty.

> splits [] = []
> splits (a:as) = ([],a:as) : map f (splits as) where
> f (x,y) = (a:x,y)

Some (incomplete) simplification rules for regular expressions

> simplify a = let b = simplify' a in if a==b then b else simplify b where
> simplify' (Prod [a]) = a
> simplify' (Prod a) | zero `elem` a = zero
> simplify' (Prod (a:b)) = case a of
> Prod x -> Prod $ x ++ map simplify b
> a -> Prod $ filter (/= one) $ map simplify (a:b)
> simplify' (Sum [a]) = a
> simplify' (Sum (a:b)) = case a of
> Sum x -> Sum $ x ++ map simplify b
> a -> Sum $ nub $ sort $ filter (/= zero) $ map simplify (a:b)
> simplify' (Star a) = Star (simplify a)
> simplify' a = a

Tuesday, January 16, 2007

Fox's Ubiquitous Free Derivative

A post in three acts.

1. Regular Expressions

Every regular expression defines a language - the set of strings exactly matching the expression. Given a language L made from the symbols in S, define ∂sL, where s is a symbol in S, to be the language made by taking all of the prefixes in L that end just before an s, deleting the final s. For example, if the language were L={MADAM} then ∂ML={∅,MADA}, ∂AL={M,MAD} and ∂DL={MA}. We can write the regular expressions for these languages as 1+MADA, M+MAD and MA. So in this case, applying ∂s trivially gives another regular language. So here's a problem: is the set of prefixes of a regular language L, ending at s, also regular?

Let's use ∂s to define an operation on regular expressions that computes the regular expression for the prefixes.

Consider a regular expression of the form ab, where a and b are regular expresions. A prefix for the language ending at symbol s either ands in the 'a' part, corresponding to something matching ∂sa, or it ends in the b part in which case it matches a∂sb.

But so far I've just mentioned finite languages. What about the * operator. Well a prefix in a* must consist of a sequence of strings matching a but end as a prefix of a. Ie. we need ∂s(a*)=a*∂sa.

So it's not hard to see (I hope) that

ss = 1
ss' = 0 for symbol s'≠s
s(a+b) = ∂sa+∂sb
s(ab) = ∂sa+a∂sb

works nicely and proves that the answer to the above question is yes.

2. Dissecting types

Rather than regurgitate someone else's words, take a peek at Dissecting Data Structures. Define the operator, acting on types, ∂, by ∂ F X = Δ F X 1, where Δ is my poor man's version of del with a vertical line through it. From the definition of Δ you can see it satisfies:

∂(K A) X = 0
∂Id X = 1
∂(F+G) X=∂F X+∂G X
&part(F×G) X=∂F X×G 1+F X×∂G X

If you read McBride's paper, you'll see that the ∂ operator slices a type in half, throwing away the elements in the right half, keeping just the 'shape' of the right half.

3. Fox's Free Derivatives

Check out the definition of the Fox derivative in section 4 of Free Differential Calculus. The Fox derivative is used to compute the Alexander polynomial of a knot.

But I hope you can see that we have yet another definition that looks very similar. The common theme is differentiation with a twist. When you differentiate a product fg, one term is the usual one, fD(g), but the other term isn't D(f)g but the g is replaced by a drastically simplified version (either 1, or g(1)). Both versions 1 and 2 have a notion of chopping something in half at some point and keeping the left half. (Actually, you could tweak the definition in part 2 so that g is replaced by 1 too, then you'd no longer preserve the shape of the right half of the type and you'd really have a slicing operator, and the analogies would become clearer.)

I just need to relate the * operator in part 1 to the derivative of the inverse in the free derivative. Let me do this in a hand-waving manner: Conceptually, a*=1+a+a²+… which might be considered the same as 1/(1-a). ∂(1/(1-a)) = (using 4(ii)) 1/(1-a)∂a = a*∂a.

So what does it all mean? I've no idea. I'm just pointing out that the Fox derivative, or something like it, has what I think are hitherto unnoticed applications to computer science. Anyway, some of this will become clearer when I post some Haskell code implementing no less than five different types of derivatives of regular expressions (some time in the next week when I have a spare moment), and that should partially answer my question that I posed on λtU.

On further reflection, I can say something about what this means. The way to view derivatives is as an operation that chops things in 'half', destroying something in the middle. The Leibniz rules corresponds to the notion that when you chop a product you must either chop in the left side of the product or the right side. In the usual Leibniz rule you reassemble the pieces on either side - leaving a hole. With the Fox derivative you throw away what's on the right. With the ∂ operator in part 2 you keep only the shape on the right. And McBride's Δ operator gives you a choice of what do do with the left and right hand sides.

Saturday, January 06, 2007

Oriented Fish and Hairy Balls

I have to admit, I have a tendency to write about abstract nonsense here. But I do so for entirely practical reasons - I don't want to run afoul of my employer by mentioning anything that could be construed as a trade secret. But today I thought I'd mention a small application of topology to computer graphics. The reason I can safely mention it is that I won't be revealing a technique or algorithm. Instead, I want to point out the opposite: a proof that a certain algorithm cannot possibly exist.

From time to time, anyone who works in computer graphics is likely to want to implement a function with this specification:

Implement a function f:R3R3 such that for all non-zero x, f(x) is also non-zero and is perpendicular to x.

This kind of function is ubiquitous in graphics and most math libraries geared towards graphics that I have seen have some kind of implementation of this. A typical application is as follows: suppose you're animating shoals of fish. Using a variety of methods its easy to generate nice paths that guide the motion of thousands of fish. For example you could assume that the motion of the fish is approximately described by a divergence-free vector field (so that the fish don't 'pile up' anywhere) and then integrate these fields to give streamlines that the fish can follow. Unfortunately there is a slight catch with this. If you have some 3D geometry representing the fish's shape, knowing the position of the fish isn't enough to specify how it looks. You also need to give its orientation. An obvious choice is to orient the fish so that the line from tail to nose points along the flow. But you still need to be able to describe its orientation around this axis. To do this, we need to complete this orientation vector along the fish to an orthogonal basis ie. find a set of mutually orthogonal unit vectors, e1, e2, and e3 such that e1 points from tail to nose. We can easily define e3 to be e1×e2, so if we had the function f above we could write e2=f(e1).

The function f appears in a multitude of other contexts too. It's useful when orienting 'virtual' cameras. For example the user might want to specify that the camera points at a target and have the graphics system select an orientation with this property.

However, anyone who's met these problems immediately hits an issue. Try writing code to do this for yourself if you haven't already. Chances are you'll end up with a conditional branch in your code, and when you flip from one branch to another you'll see it clearly causing ugly flips in the animation. So the above spec needs to be revised to:

Implement a continuous function f:R3R3 such that for all non-zero x, f(x) is also non-zero and is perpendicular to x.

And here's the rub: the Hairy Ball Theorem says there is no such function. So if you're having trouble writing one, it's not because you're not being ingenious enough. It just can't be done! (Though there are lots of other solutions to the animation problems I mention above.) See, topology is useful after all, it can save you wasting your time on impossible tasks.

(Funnily enough though, the best way to see that this is true is using a suitable functor...)

Actually, I think anyone who works in graphics ought to read up on the Hairy Ball Theorem, the Ham Sandwich Theorem and the Borsuk-Ulam theorem as well as learning about the double cover of SO(3) (which I'll write about here in the near future).

That was a public service announcement.

Blog Archive