# A Neighborhood of Infinity

## Friday, July 20, 2007

### I'll have a Buchburger with fries

> import Data.Map as M> import Data.List as L

Everyone else is at it, so why not me? Here's my approach to solving this problem. Let's warm up with a simpler menu. Consider:

Fries $3 Burger$5

What can I get that costs exactly $13? My approach is going to revolve around rewrite rules. I'm going to find a set of rewrite rules that converts money into fries and burgers, and does so in such a way that it always eliminates the money, if it is possible to do so. Let's use D to represent a dollar, F to represent fries and B to represent a burger. Here's our first attempt at some rewrite rules: DDD->F DDDDD->B Now we can take the string DDDDDDDDDDDDD and try to rewrite it. We could start with the first rule: DDDDDDDDDDDDD ->DDDDDDDDDDF ->DDDDDDDFF ->DDDDFFF ->DFFFF But now we're stuck with a D and no way to get rid of it. Before we go much further, however, it's time to simplify our notation. Firstly, unlike normal rewrite rules, we don't care about the order of our symbols. So let's assume that we can reorder symbols any time we want in our strings and rules. So FDD=DFD=DDF and so on. Let's also use Xn to mean n copies of X in a row. Now we can write our strings as mononomials in D, F and B. That makes things much easier. We can now write our rules as D3->F D5->B We've seen that just picking the first rule each time can get us stuck. It seems that the solution should be to come up with an algorithm to generate the correct order. But there's another strategy - it's to modify the rules so that the order doesn't matter. A set of rules with this property is said to be confluent. Amazingly you can turn any reasonable set of rewriting rules into a confluent one merely by adding in elements. So let's define a monomial type: > newtype Monomial a i = Monomial (Map a i) deriving (Eq,Show)> unM (Monomial x) = x> divides (Monomial a) (Monomial b) = isSubmapOfBy (<=) a b> lcm (Monomial a) (Monomial b) = Monomial$ unionWith max a b> instance (Show a, Show i, Num i,Ord a) => Num (Monomial a i) where>     Monomial a*Monomial b = Monomial $unionWith (+) a b> instance (Show a, Show i, Num i,Ord a) => Fractional (Monomial a i) where> Monomial a/Monomial b = Monomial$ M.filter (/=0) $unionWith (+) a (M.map negate b) Rewrite rules are simply pairs of monomials: > data Rule a i = Monomial a i :-> Monomial a i deriving (Eq,Show,Ord)> lm (a :-> _) = a Now we can write code to apply these rules to a string. The first function simply applies the rule to a term: > apply (p :-> q) r = r/p*q applicable searches for the first applicable rule in a list of rules: > applicable r rules = find (\(p :-> _) -> divides p r) rules And reduceStep applies it. Note that it returns the result twice in a pair. This is simply so that it works with the next function. In the event that no rule is applicable it returns Nothing. > reduceStep rules r = applicable r rules >>= return . dup . flip apply r> where dup x = (x,x) reduce repeatedly applies reduceStep until it can be applied no more. Repeatedly doing something until Nothing is returned is best handled by unfoldr: > reduce rules r = last$ (r:) $unfoldr (reduceStep rules) r Now consider our example rewrite rules. Let's find a simple example of where confluence fails to hold. Consider two different ways to rewrite D5. We could use the first rule and get D5->D2F or use the second and get D5->B. Either way we terminate - and get different results. How can we get out of this impasse? There's a really obvious solution, add in a new rule D2F->B. Now if we start with D5 and apply the first rule then we can get out of it by applying this new third rule. Either way, we get B. Before we generate new rules, we need to say something about how we can ensure termination. If we have one rule that says X->Y and another that says Y->X then we're going to get stuck. So what we can do is insist on an ordering on our monomials. We'll restrict ourselves to rules where the left hand side is larger than the right hand side. So here's an ordering known to algebraists as lexicographic ordering. The idea is that the symbols in the monomial have an ordering on them. One monomial, a, is considered bigger than another, b, if the power of the least symbol (in the symbol ordering) in a/b is negative. (BTW See postscript for a comment on this.) What this means is that the rewrite rules will try to eliminate symbols earlier in the ordering and replace them with symbols that come later. By making our currency symbol the first in the ordering the rewrite rules will tend to act so as to eliminate the currency. And that's what the original problem asked us to do. Note that there are other orderings we could use so for generality Monomial ought to be a typeclass. > instance (Show a,Ord a,Ord i,Num i) => Ord (Monomial a i) where> compare x y = let> Monomial u = x/y> in> if M.null u> then EQ> else if snd (M.findMin u)>0> then LT> else GT We also use a function to ensure that any rule is in canonical form with the greater term on the right: > order (a :-> b) = if a>b then b :-> a else a :-> b If we find a complete confluent set of rules that respect the ordering in this way, how do we know the rules terminate? Suppose the complete ordered list of symbols in our rules is (a1,a2,...,an) then any string a1k1a2k2...ankn can be written as a vector (k1,k2,...,kn). Each application of a rewrite rule replaces the corresponding vector with one that is lexicographically earlier. It's pretty clear that such a process must terminate. (Aside: You can think of this as using the fact that ωn is well-ordered. More general kinds of rewrite rules make use of more complex ordinals to prove termination. This goes to show that ordinal arithmetic is actually a useful tool.) So back to what we were doing: finding how we can add new rules to ensure confluence. Suppose we have two rules whose LHS 'overlap'. If we think in terms of monomials then we mean two monomials that have a non-trivial common factor. If we form the lowest common multiple of these monomials then we now have a 'minimal' term to which both rules can be applied. In fact, that's how I got the example above. One rule has P3 on the LHS and one has P5. The lowest common multiple of these is P5 and I investigated how the two rules apply to this one term. When we did this, we ended up with two different terms which we simply made into a new rule. Here it is in Haskell: > newRule pq@(p :-> _) pq'@(p' :-> _) = let> j = Main.lcm p p'> in order$ apply pq j :-> apply pq' j

Given a rule, we also simplify the left and right hand sides with normal:
> normal rules (p :-> q) = let>     p' = reduce rules p>     q' = reduce rules q>     in if p'/=q'>         then Just $order$ reduce rules p :-> reduce rules q>         else Nothing

The problem is that when we generate a new rule there now appear a whole bunch of new pairs that could be considered because you need to consider pairs made up of the new rule and all of the old rules. Our strategy for dealing with this will be simple: take the list of all pairs of rules and generate new rules from those. Every time we generate a new rule we need to add in the new rule pairs that are generated. We maintain a list of pairs found so far, chewing pairs for newRule from the front, and adding newly generated pairs to the back.

The first line is easy:
> complete incomplete = let

We make all possible pairs. We don't want to consider pairs twice so we use the inherited ordering on pairs:
>     pairs = [(x,y) | x <- incomplete, y <- incomplete, y>x]

Here's a little heuristic. Things go faster if we use monomials that come lower in the ordering before higher ranking monomials. So the completed set of rules is going to be sorted:
>     rules = sort incomplete

Now here's a weird bit to explain. This isn't essential to the mathematics but is is essential to this piece of code. The local function iterate is going to return a pair that consists of the remaining list of rule pairs and the completed rules. But iterate also consumes rule pairs. So this function is actually going to consume its own output. There's an important issue with this: eventually the rate of new rule creation will peter out so that it will consume its way to the end. At this point iterate will try to consume the very element it's expected to generate and spin in an infinite loop. (The dangers of unguarded recursion!) So we also maintain another list whose length is the same as the distance between the front and back of our pair list. When this first list is empty, we know we can return [] for the second list. It doesn't matter what this list is, as long as it has the correct length. This 'phantom' list is the first argument to iterate. But for now I suggest ignoring it and coming back to it if you ever get to a second reading.

Here's the base case I mentioned:
>     iterate [] pairs rules = ([],rules)

And here is the guts of the loop:
>     iterate (_:n) (p:pairs) rules =

p is the first pair chewed off the front. We generate a new rule from it and normalise it:
>         case normal rules $uncurry newRule p of If we get a new rule we now need to add it to our list of rules, and add in all the new pairs it generates: > Just pq -> let u = [(x,pq) | x <- rules, x /= pq] Note the use of insert to maintain the order. > (pairs', rules') = iterate (u ++ n) pairs$ L.insert pq rules>                            in (u ++ pairs', rules')

Otherwise we don't need to generate anything:
>             Nothing -> iterate n pairs rules

Here's where we tie the knot by feeding our output back in as input:
>     (pairs', rules') = iterate pairs (pairs ++ pairs') rules>     in rules'

It's not obvious at this point that the algorithm terminates. It does, but you'll have to trust me.

Anyway, that's not all. We also do a bit of tidying up by eliminating redundant rules. In a confluent system, if there are two rules we can apply, it doesn't matter which one we choose. Now consider a pair of rules looking something like: AB2C -> ... and AB3C2 -> ... . If we can apply the second rule, then we could also apply the first rule. But in a confluent system, it doesn't matter which rule we choose. So we can just throw out the second rule here. So we tidy up our rules by eliminating rules made redundant by another. Turns out that at this stage we have a canonical form. For any given set of rules, any completion with the reducant elements deleted is the same as any other (modulo reordering). So our completion process is itself confluent! This isn't hard to prove but I'm not going to do it now.
> nonredundant rules r = L.null [s | s <- rules, s/=r, lm s divides lm r]> tidy rules = L.filter (nonredundant rules) rules

And now canonical puts our rules into a complete canonical form:
> canonical rules = tidy $complete rules Back to the original problem: > data Menu = P > | MixedFruit > | FrenchFries > | SideSalad > | HotWings > | MozzarellaSticks > | SamplerPlate deriving (Eq,Ord,Show) Note that deriving Ord. This is so that the lexicographic ordering above can work. > makeRule a b = (Monomial$ M.fromList a) :-> (Monomial $M.fromList b)> rules = [> makeRule [(P,215)] [(MixedFruit,1)],> makeRule [(P,275)] [(FrenchFries,1)],> makeRule [(P,335)] [(SideSalad,1)],> makeRule [(P,355)] [(HotWings,1)],> makeRule [(P,420)] [(MozzarellaSticks,1)],> makeRule [(P,580)] [(SamplerPlate,1)] > ]> crules = canonical rules> ex n = Monomial$ fromList [(P,n)]> example0 = reduce crules (ex 1505)> main = print (reduce crules (ex 1505))

Now, some caveats. I don't claim that this is a fast method, just a different approach. Completing a set of rules is an expensive operation and if you start with N rules, the worst case running time is something horrendous like O(exp(exp(N)). On the other hand, once you've found a set of canonical rules, reduction with them is very fast. This makes it ideal for problems involving repeated use of the same menu. Or for when you have a lot of spare cash:
> example = print (reduce crules (ex 10000115))

And now I have a confession to make. I've really just described the Buchberger algorithm for Gröbner basis computation. It's not quite the Buchberger algorithm because I'm only considering binomials, not polynomials - but you'll notice my code is remarkably similar to the code at the other end of that link. The rule p :-> q represents the binomial p-q. Unfortunately, the Buchberger algorithm is usually presented in its more general form - but understanding that requires a background in commutative algebra and algebraic geometry. I hope I've managed to present a form of it in a way that's digestible by people without abstract algebra experience. The binomial Buchberger algorithm has many applications, including a variety of applications in integer programming.

Note that this is also very close to the Knuth-Bendix algorithm. However, the latter algorithm is for the noncommutative case, and it isn't guaranteed to terminate.

BTW I recommend playing with the completed rules by hand - at least for simple examples. It's interesting to see what steps are taken to reach the solution.

PS My monomial ordering is the opposite of the usual lexicographic monomial ordering. This is because I want my rules sorted with what is usually considered the largest monomial at the front.

PPS Can I get rid of that weird phantom list thing? You can replace the list with an integer for its length, but it's not clear to me that this is an actual improvement.

Update: I forgot to mention this. Check out David Amos's implementation of the full Buchberger algorithm which you can use to solve many more problems than my simplistic monomial version.

alpheccar said...

For a good summary of the correspondence between rewriting systems and Grobner bases : Rewriting as a Special Case of
Non commutative Grobner Basis Theory

Mikael said...

I saw recently a sketch of a research program that basically stated that Knuth-Bendix, commutative Gröbner Bases, non-commutative Gröbner Bases, Juliet Bases (?) and one or two other things are -really- just Gröbner basis like methods on the level of operads.

This is also, by the way, one of the directions I want to go find things out post-degree.

Alan Falloon said...

I really liked your post (as usual). I was wondering if you can provide the article as an lhs file as well?

I would like to download it and play with it in hat or ghci to really understand it. Right now that means saving the page and cleaning out the HTML crud or copy-pasting it into an editor.

Its not a ton of effort, but even attaching the original lhs that the article is from would lower the barrier for experimentation that much more.

Thanks again for the wonderful post.

sigfpe said...

Alan,

Copy and paste directly from the rendered HTMl should work. In fact, I tested the last couple of drafts by copying directly from my web browser into my text editor. Tell me if that doesn't work for you.

David said...

"We'll restrict ourselves to rules where the left hand side is larger than the right hand side."

I think that sentence is the wrong way round.

sigfpe said...

David,

Someone's reading closely! You're right about me making a mistake. The problem is that I've flipped the usual ordering in the textbooks because it's more convenient when making use of the Haskell prelude. And I confused myself in the process...