## Friday, July 20, 2007

### I'll have a Buchburger with fries

`> import Data.Map as M> import Data.List as L`

Everyone else is at it, so why not me? Here's my approach to solving this problem. Let's warm up with a simpler menu. Consider:

Fries \$3
Burger \$5

What can I get that costs exactly \$13?

My approach is going to revolve around rewrite rules. I'm going to find a set of rewrite rules that converts money into fries and burgers, and does so in such a way that it always eliminates the money, if it is possible to do so.

Let's use D to represent a dollar, F to represent fries and B to represent a burger. Here's our first attempt at some rewrite rules:

DDD->F
DDDDD->B

Now we can take the string DDDDDDDDDDDDD and try to rewrite it. We could start with the first rule:
DDDDDDDDDDDDD
->DDDDDDDDDDF
->DDDDDDDFF
->DDDDFFF
->DFFFF
But now we're stuck with a D and no way to get rid of it.

Before we go much further, however, it's time to simplify our notation. Firstly, unlike normal rewrite rules, we don't care about the order of our symbols. So let's assume that we can reorder symbols any time we want in our strings and rules. So FDD=DFD=DDF and so on. Let's also use Xn to mean n copies of X in a row. Now we can write our strings as mononomials in D, F and B. That makes things much easier. We can now write our rules as

D3->F
D5->B

We've seen that just picking the first rule each time can get us stuck. It seems that the solution should be to come up with an algorithm to generate the correct order. But there's another strategy - it's to modify the rules so that the order doesn't matter. A set of rules with this property is said to be confluent. Amazingly you can turn any reasonable set of rewriting rules into a confluent one merely by adding in elements.

So let's define a monomial type:
`> newtype Monomial a i = Monomial (Map a i) deriving (Eq,Show)> unM (Monomial x) = x> divides (Monomial a) (Monomial b) = isSubmapOfBy (<=) a b> lcm (Monomial a) (Monomial b) = Monomial \$ unionWith max a b> instance (Show a, Show i, Num i,Ord a) => Num (Monomial a i) where>     Monomial a*Monomial b = Monomial \$ unionWith (+) a b> instance (Show a, Show i, Num i,Ord a) => Fractional (Monomial a i) where>     Monomial a/Monomial b = Monomial \$ M.filter (/=0) \$ unionWith (+) a (M.map negate b)`

Rewrite rules are simply pairs of monomials:
`> data Rule a i = Monomial a i :-> Monomial a i deriving (Eq,Show,Ord)> lm (a :-> _) = a`

Now we can write code to apply these rules to a string. The first function simply applies the rule to a term:
`> apply (p :-> q) r = r/p*q`

applicable searches for the first applicable rule in a list of rules:
`> applicable r rules = find (\(p :-> _) -> divides p r) rules`

And reduceStep applies it. Note that it returns the result twice in a pair. This is simply so that it works with the next function. In the event that no rule is applicable it returns Nothing.
`> reduceStep rules r = applicable r rules >>= return . dup . flip apply r>     where dup x = (x,x)`

reduce repeatedly applies reduceStep until it can be applied no more. Repeatedly doing something until Nothing is returned is best handled by unfoldr:
`> reduce rules r = last \$ (r:) \$ unfoldr (reduceStep rules) r`

Now consider our example rewrite rules. Let's find a simple example of where confluence fails to hold. Consider two different ways to rewrite D5. We could use the first rule and get D5->D2F or use the second and get D5->B. Either way we terminate - and get different results. How can we get out of this impasse? There's a really obvious solution, add in a new rule D2F->B. Now if we start with D5 and apply the first rule then we can get out of it by applying this new third rule. Either way, we get B.

Before we generate new rules, we need to say something about how we can ensure termination. If we have one rule that says X->Y and another that says Y->X then we're going to get stuck. So what we can do is insist on an ordering on our monomials. We'll restrict ourselves to rules where the left hand side is larger than the right hand side. So here's an ordering known to algebraists as lexicographic ordering. The idea is that the symbols in the monomial have an ordering on them. One monomial, a, is considered bigger than another, b, if the power of the least symbol (in the symbol ordering) in a/b is negative. (BTW See postscript for a comment on this.) What this means is that the rewrite rules will try to eliminate symbols earlier in the ordering and replace them with symbols that come later. By making our currency symbol the first in the ordering the rewrite rules will tend to act so as to eliminate the currency. And that's what the original problem asked us to do. Note that there are other orderings we could use so for generality Monomial ought to be a typeclass.
`> instance (Show a,Ord a,Ord i,Num i) => Ord (Monomial a i) where>     compare x y = let>         Monomial u = x/y>         in>         if M.null u>             then EQ>             else if snd (M.findMin u)>0>                 then LT>                 else GT`

We also use a function to ensure that any rule is in canonical form with the greater term on the right:
`> order (a :-> b) = if a>b then b :-> a else a :-> b`

If we find a complete confluent set of rules that respect the ordering in this way, how do we know the rules terminate? Suppose the complete ordered list of symbols in our rules is (a1,a2,...,an) then any string a1k1a2k2...ankn can be written as a vector (k1,k2,...,kn). Each application of a rewrite rule replaces the corresponding vector with one that is lexicographically earlier. It's pretty clear that such a process must terminate. (Aside: You can think of this as using the fact that ωn is well-ordered. More general kinds of rewrite rules make use of more complex ordinals to prove termination. This goes to show that ordinal arithmetic is actually a useful tool.)

So back to what we were doing: finding how we can add new rules to ensure confluence. Suppose we have two rules whose LHS 'overlap'. If we think in terms of monomials then we mean two monomials that have a non-trivial common factor. If we form the lowest common multiple of these monomials then we now have a 'minimal' term to which both rules can be applied. In fact, that's how I got the example above. One rule has P3 on the LHS and one has P5. The lowest common multiple of these is P5 and I investigated how the two rules apply to this one term. When we did this, we ended up with two different terms which we simply made into a new rule. Here it is in Haskell:
`> newRule pq@(p :-> _) pq'@(p' :-> _) = let>     j = Main.lcm p p'>     in order \$ apply pq j :-> apply pq' j`

Given a rule, we also simplify the left and right hand sides with normal:
`> normal rules (p :-> q) = let>     p' = reduce rules p>     q' = reduce rules q>     in if p'/=q'>         then Just \$ order \$ reduce rules p :-> reduce rules q>         else Nothing`

The problem is that when we generate a new rule there now appear a whole bunch of new pairs that could be considered because you need to consider pairs made up of the new rule and all of the old rules. Our strategy for dealing with this will be simple: take the list of all pairs of rules and generate new rules from those. Every time we generate a new rule we need to add in the new rule pairs that are generated. We maintain a list of pairs found so far, chewing pairs for newRule from the front, and adding newly generated pairs to the back.

The first line is easy:
`> complete incomplete = let`

We make all possible pairs. We don't want to consider pairs twice so we use the inherited ordering on pairs:
`>     pairs = [(x,y) | x <- incomplete, y <- incomplete, y>x]`

Here's a little heuristic. Things go faster if we use monomials that come lower in the ordering before higher ranking monomials. So the completed set of rules is going to be sorted:
`>     rules = sort incomplete`

Now here's a weird bit to explain. This isn't essential to the mathematics but is is essential to this piece of code. The local function iterate is going to return a pair that consists of the remaining list of rule pairs and the completed rules. But iterate also consumes rule pairs. So this function is actually going to consume its own output. There's an important issue with this: eventually the rate of new rule creation will peter out so that it will consume its way to the end. At this point iterate will try to consume the very element it's expected to generate and spin in an infinite loop. (The dangers of unguarded recursion!) So we also maintain another list whose length is the same as the distance between the front and back of our pair list. When this first list is empty, we know we can return [] for the second list. It doesn't matter what this list is, as long as it has the correct length. This 'phantom' list is the first argument to iterate. But for now I suggest ignoring it and coming back to it if you ever get to a second reading.

Here's the base case I mentioned:
`>     iterate [] pairs rules = ([],rules)`

And here is the guts of the loop:
`>     iterate (_:n) (p:pairs) rules = `

p is the first pair chewed off the front. We generate a new rule from it and normalise it:
`>         case normal rules \$ uncurry newRule p of`

If we get a new rule we now need to add it to our list of rules, and add in all the new pairs it generates:
`>             Just pq -> let u = [(x,pq) | x <- rules, x /= pq]`

Note the use of insert to maintain the order.
`>                            (pairs', rules') = iterate (u ++ n) pairs \$ L.insert pq rules>                            in (u ++ pairs', rules')`

Otherwise we don't need to generate anything:
`>             Nothing -> iterate n pairs rules`

Here's where we tie the knot by feeding our output back in as input:
`>     (pairs', rules') = iterate pairs (pairs ++ pairs') rules>     in rules'`

It's not obvious at this point that the algorithm terminates. It does, but you'll have to trust me.

Anyway, that's not all. We also do a bit of tidying up by eliminating redundant rules. In a confluent system, if there are two rules we can apply, it doesn't matter which one we choose. Now consider a pair of rules looking something like: AB2C -> ... and AB3C2 -> ... . If we can apply the second rule, then we could also apply the first rule. But in a confluent system, it doesn't matter which rule we choose. So we can just throw out the second rule here. So we tidy up our rules by eliminating rules made redundant by another. Turns out that at this stage we have a canonical form. For any given set of rules, any completion with the reducant elements deleted is the same as any other (modulo reordering). So our completion process is itself confluent! This isn't hard to prove but I'm not going to do it now.
`> nonredundant rules r = L.null [s | s <- rules, s/=r, lm s `divides` lm r]> tidy rules = L.filter (nonredundant rules) rules`

And now canonical puts our rules into a complete canonical form:
`> canonical rules = tidy \$ complete rules`

Back to the original problem:
`> data Menu = P >     | MixedFruit >     | FrenchFries >     | SideSalad >     | HotWings >     | MozzarellaSticks >     | SamplerPlate deriving (Eq,Ord,Show)`

Note that deriving Ord. This is so that the lexicographic ordering above can work.
`> makeRule a b = (Monomial \$ M.fromList a) :-> (Monomial \$ M.fromList b)> rules = [>         makeRule [(P,215)] [(MixedFruit,1)],>         makeRule [(P,275)] [(FrenchFries,1)],>         makeRule [(P,335)] [(SideSalad,1)],>         makeRule [(P,355)] [(HotWings,1)],>         makeRule [(P,420)] [(MozzarellaSticks,1)],>         makeRule [(P,580)] [(SamplerPlate,1)]     >     ]> crules = canonical rules> ex n = Monomial \$ fromList [(P,n)]> example0 = reduce crules (ex 1505)> main = print (reduce crules (ex 1505))`

Now, some caveats. I don't claim that this is a fast method, just a different approach. Completing a set of rules is an expensive operation and if you start with N rules, the worst case running time is something horrendous like O(exp(exp(N)). On the other hand, once you've found a set of canonical rules, reduction with them is very fast. This makes it ideal for problems involving repeated use of the same menu. Or for when you have a lot of spare cash:
`> example = print (reduce crules (ex 10000115))`

And now I have a confession to make. I've really just described the Buchberger algorithm for Gröbner basis computation. It's not quite the Buchberger algorithm because I'm only considering binomials, not polynomials - but you'll notice my code is remarkably similar to the code at the other end of that link. The rule p :-> q represents the binomial p-q. Unfortunately, the Buchberger algorithm is usually presented in its more general form - but understanding that requires a background in commutative algebra and algebraic geometry. I hope I've managed to present a form of it in a way that's digestible by people without abstract algebra experience. The binomial Buchberger algorithm has many applications, including a variety of applications in integer programming.

Note that this is also very close to the Knuth-Bendix algorithm. However, the latter algorithm is for the noncommutative case, and it isn't guaranteed to terminate.

BTW I recommend playing with the completed rules by hand - at least for simple examples. It's interesting to see what steps are taken to reach the solution.

PS My monomial ordering is the opposite of the usual lexicographic monomial ordering. This is because I want my rules sorted with what is usually considered the largest monomial at the front.

PPS Can I get rid of that weird phantom list thing? You can replace the list with an integer for its length, but it's not clear to me that this is an actual improvement.

Update: I forgot to mention this. Check out David Amos's implementation of the full Buchberger algorithm which you can use to solve many more problems than my simplistic monomial version.

Labels: ,

## Saturday, July 14, 2007

### Data and Codata

In Gödel, Escher, Bach, Hofstadter introduces the programming language Bloop. Bloop is a little like BASIC except that it forces programmers to specify in advance how many times each loop will iterate. As a result, Bloop programs have the handy property that they are guaranteed to terminate. Unfortunately this property also makes it impossible to write something like an operating system in Bloop. To write an OS we need open-ended loops that keep running until the user explicitly chooses to shut the OS down. One solution to this is to write code in Floop. Floop allows us to write unbounded loops, the equivalent of C's while (1) { ... }. The problem with that, however, is that we can write runaway infinite loops that never terminate and never give us any output. Is there some language that lies between Bloop and Floop that can give us unbounded looping when we need it, but which never allows us to hoist ourselves by our petards by writing runaway loops?

Wishing no disrepsect to OS writers, at first blush it might seem that the distinction between a runaway loop and an idle OS is too fine - if we can write an infinite loop that does something useful, then surely we can write a useless one too. But it turns out that there is a very elegant and well-principled way to distinguish between these kinds of loops, and this allows us to write open-ended interactive software in a programming language that nonetheless always produces a well-defined output, no matter what the input. In order to do this we need to distinguish between two kinds of data: data and codata. By ensuring that a function expecting codata never receives data, and vice versa, we can ensure that even programs with open-ended loops always produce a well defined output.

The concepts I want to talk about are very general and can apply to whatever programming language you use. I'm going to use some simple Haskell examples but most of these will translate to other languages. So consider something like
`sum [] = 0sum (a:as) = a + sum as`

This sums the elements of a list. Note how it's well behaved as long as we give it a finite list as input. On the other hand, consider
`sum' [] = 0sum' a = sum' (1:a) - 1`

This isn't well behaved at all. Except when you input empty lists, it never gives a result. From a mathematical perspective it's not a good definition either, there are many functions that satisfy these two properties. Is there some general principle at work here that allows us to tell immediately that one of these terminates and the other doesn't? We know from Turing that there is no procedure that guarantees we can always tell such programs apart, but in this case there is something that we can easily point to. In the first program, the right hand side of the second line uses the sum function recursively but we only apply it to a strict subpart of the input, its tail in fact. In the second example we apply sum' to something that contains the argument. The former function is using what is known as structural recursion, and it's not hard to convince yourself that structural recursion, applied to finite datastructures, always terminates.

So if we limit ourselves to structural recursion we can guarantee our programs will always terminate. But what about a definition like:
`fact 0 = 1fact n = n * fact (n-1)`

That doesn't appear to use structural recursion. But we can view it as such like this. Define the natural numbers as follows:
`data Nat = Zero | S Nat`

0 is represented as Zero, 1 is represented as S Zero and so on. We can represent every natural this way. Here's the important thing: if n>0 then n-1 is simply a subpart of n. So we can view this kind of recursion as a type of structural recursion.

(In fact, by a curious quirk of the Haskell 98 standard we can rewrite our definition to look more like a structural recursion:
`fact' 0 = 1fact' (n+1) = (n+1) * fact' n`

I'm guessing this feature is in Haskell precisely so that people can 'pretend' they are using structural recursion with the + in n+1 serving a role as a kind of pseudo-constructor.)

So we have a nice rule for ensuring our code terminates. But sum fails when applied to infinite lists. Should we simply rule out infinite datastructures? That seems a bit drastic. The example that convinced me to look into Haskell was
`fib = 1 : 1 : zipWith (+) fib (tail fib)`

We really don't want to rule out such a succinct definition of the Fibonacci numbers. But how can we allow such structures when we have functions like sum sitting around. Applying sum to fibs will obviously cause non-termination.

Let's consider another example:
`sumSoFar x [] = [x]sumSoFar x (y:ys) = x : sumSoFar (x+y) ys`

Like sum, this fails to terminate for an infinite input. But unlike sum, it's possible to make sense of it. If the inputs were 0 and the infinite list [1,1,1,1,...] then the result would be [0,1,2,3,...]. The program might not terminate, but from a mathematical perspective this is a completely well defined function. What's more, suppose the input list represented a stream of data being input at a keyboard, and that the output was displayed on a screen one element at a time, then we'd have a simple cash register. This program might not terminate, but it's completely well behaved. Note that this could only work in a lazy language. A strict language would want to evaluate the entire list before outputting anything. But in a lazy language we can start outputting the beginning of the list before the rest of it is computed.

From the point of view of using infinite lists, it's sum that's badly behaved, and sumSoFar that's well behaved. Again, can we point to a simple distinction between these two programs that explains this? Turns out we can, and in some sense it's dual to what we said above. sumSoFar is well behaved because when we recursively call sumSoFar on the right hand side we do so from inside a list constructor. (Remember that : is the list constructor.) This is known as guarded recursion and it guarantees that even though our programs don't terminate, they still define a unique mathematical function and result in a well behaved program. In the case of sumSoFar, each time we look at another element of the result, we trigger another lazy evaluation that terminates. But the right hand side won't just run off and compute endlessly until we do that triggering because the recursion is 'guarded' within a constructor. (That, by the way, was the main point of this article, so you can probably relax now.)

Note the duality: in structural recursion we 'deconstruct' the argument and then we're allowed to recurse. In guarded recursion we recurse first, and then we're allowed to use the constructor.

So we've almost achieved our goal of describing rules to allow open-ended loops because we've managed to give a rule for writing functions that are guaranteed to be well-behaved even though they manipulate infinite data structures. But we're not quite home yet - we still can't have functions like sum coexist with infinite lists. How can we ensure that an infinite list is never handed to sum?

Consider the definition of Haskell lists. It's something like this:
`data [a] = [] | a : [a]`

I.e. a list of a's is either the empty list [] or it's made from an a and a list of a's.
You can think of this as an equation in [a]. In Haskell we take this as uniquely defining what [a] is, but in reality there is more than one solution to this equation. Consider the type consisting of only finite lists. That satisfies this equation. A finite list is either an empty list, or an element followed by a finite list. Similarly a possibly infinite list is either an empty list, or an element followed by a possibly infinite list. There is an ambiguity. Finite lists form, what is in some sense, the smallest possible solution to this equation. The possibly infinite lists form the largest possible solution. Haskell takes the largest possible solution.

But suppose we were to distinguish between these two different solutions. We could use the keyword data to mean the smallest solution and codata to mean the largest solution. The former represents data, and it's always finite. The latter represents what we call codata, and it's possibly infinite. And now we can refine our rules for well-behavedness. Consider data and codata to be distinct types. In a strongly typed language this immediately allows us to restrict sum to data, not codata. The rule is: you're only allowed to use structural recursion with data and guarded recursion with codata. With that rule, we're guaranteed that our recursions will always be safe, and yet that we can still have open-ended loops in our code. Sometimes these are called recursion and corecursion respectively.

And now we can go a little crazy with language. When we want to prove that a structurally recursive program terminates we use induction. This doesn't work straightforwardly for corecursion, so instead we use a principle called coinduction. Recursive programs typically terminate. Corecursive programs don't necessarily terminate, but they're still well-behaved as long as they keep on going whenever we give them input. We can call this cotermination. And so on... I'm not going to say what coinduction is because I'd have to talk about bisimulation, and this post would get way too long.

So now a mathematical aside. The reason for all the "co"s is that data and codata have categorical descriptions and they turn out to be dual to each other. But you don't hear mathematicians talking about corecursion and coinduction much. Why not? Well one of the axioms of set theory is the Axiom of Foundation. One way of interpreting this is the statement that there is no infinite sequence

...a3∈a2∈a1∈a0.

So even though we can construct infinite lists in mathematics, we can't construct 'infinitely deep' lists. This means that in mathematics we can use a form of structural recursion. In fact, the familiar principle of induction follows from this. So for many of the things that mathematicians do, induction works fine. But if we decide to use a non-standard variation of set theory where the axiom of foundation doesn't hold we can no longer use recursion. For example the Axiom of Extension says that two sets are equal if their elements are equal. This is a recursive definition, but it's useless in the presence of a set a such that a∈a. At this point mathematicians need a principle of coinduction. And for more on that, I refer you to Vicious Circles.

Oh...time for a quick rant. Over the years I've seen a few people argue that there's something fundamentally wrong with the notion of the algorithm because it doesn't apply to the kind of open-ended loop we see in operating systems and interactive applications. Some have even gone further to suggest that somehow mathematics and computer science are fundamentally different because mathematics can't seek to describe these kinds of open-ended phenomena. As I've tried to show above, not only are there nice ways to look at open-ended computations, but from a mathematical perspective they are precisely dual, in the category theoretical sense, to terminating computations. It may be true that mathematicians sometimes spend more time with things, and computer scientists with cothings. But this really isn't such a big difference and the same langiage can be used to talk about both.

I didn't really get why anyone would want to distinguish between data and codata until I saw some clues in some slides by Altenkirch and some comments by Conor McBride. I'm still not sure anyone says it quite as clearly as this: distinguishing between data and codata means we can allow the coexistence of infinite lists, structural recursion and open-ended loops, without risk of causing bad behaviour.

Labels: ,

## Friday, July 06, 2007

### Death>Dishonour =/=> ~Dishonour>~Death

That you rate death before dishonour should not imply that you rate honour above life. Or more precisely: if A and B are (not necessarily disjoint) outcomes, define A>B to mean that the expected utility of outcome A is higher than the expected utility of outcome B. Use the symbol '~' to mean the negation of an outcome. Then Death>Dishonour does not imply that ~Dishonour>~Death, or more generally, A>B =/=> ~B>~A.

This example from the book Subjective Probability: The Real Thing by Richard Jeffrey was pointed out to me by a friend of mine yesterday. Let's draw a little diagram of possible outcomes and their utility:

 B\A death life dishonour 1 * honour 2 3

The '*' indicates an outcome with probability zero and the other outcomes are considered to be equally likely. A is the variable describing your state of liveness and B is the variable representing your honour.

We have:

E(utility|A=death) = 1.5

E(utility|B=dishonour) = 1

E(utility|B=honour) = 2.5

E(utility|A=life) = 3

With this set of utilities one does indeed rate death before dishonour but not honour before life.

This gets mixed reactions. Some people think that it's a nice counterintuitive result. Some people think it's so trivial they don't see why anyone would even mention it. If you're in the latter group, sorry for wasting your time. :-)

(I leave the Haskell program using a probability monad as an exercise...)

Labels: ,