A Neighborhood of Infinity: 2010

Friday, December 24, 2010

Generalising Gödel's Theorem with Multiple Worlds. Part II.

Introduction
Last time we looked at a method for testing whether propositions of provability logic were valid by looking at the consequences of propositions within nested collections of worlds. This lends itself naturally to an algorithm that we can implement in Haskell. The diagrams I used last time are a variant on what are known as tableaux. But tableaux can be used in a number of ways and so we need code that is suitably generalised. In the code that follows I've ensured that the core algorithm has an interface that is suitable for carrying out the four tasks I'll demand of it. This means that the code leans more towards practicality than mathematical elegance.

And I apologise in advance: this post is mostly a bunch of implementation details. We'll get back to some mathematics next time. Nonetheless, by the end of this post you'll have working code to test the validty of propositions of provability logic.

Central to tableau algorithms is pattern matching. I'd like the Haskell pattern matcher do much of the work, and to increase its flexibility I'll need this extension:

> {-# LANGUAGE ViewPatterns #-}

We'll need these libraries too:

> import Control.Applicative
> import Control.Arrow
> import Control.Monad
> import Data.Function
> import List
> import Maybe
> import Text.Show

Logical Propositions
Now we'll need a bunch of logical operators. The first three are constructors for a proposition type and the rest are sugar to make expressions look nicer:

> infixr 1 :->
> infixr 2 :\/
> infixr 3 :/\

> infixr 1 -->
> infixr 1 <--
> infixr 1 <->
> infixr 2 \/
> infixr 3 /\

> (\/)    = (:\/)
> (/\)    = (:/\)
> (-->)   = (:->)
> (<--)   = flip (:->)
> p <-> q = (p :-> q) :/\ (q :-> p)

Here's our basic proposition type:

> data Prop = Letter String
>           | Prop :\/ Prop
>           | Prop :/\ Prop
>           | Prop :-> Prop
>           | Box Prop
>           | Dia Prop
>           | F
>           | T
>           | Neg Prop deriving (Eq, Ord)

I want show to know about operator precedence for propositions:

> instance Show Prop where
>     showsPrec p (a :/\ b) = showParen (p>3) $ showsPrec 3 a . showString " /\\ " . showsPrec 3 b
>     showsPrec p (a :\/ b) = showParen (p>2) $ showsPrec 2 a . showString " \\/ " . showsPrec 2 b
>     showsPrec p (a :-> b) = showParen (p>1) $ showsPrec 1 a . showString " --> " . showsPrec 1 b
>     showsPrec p (Neg r)   = showParen (p>4) $ showString "Neg " . showsPrec 5 r
>     showsPrec p (Box r)   = showParen (p>4) $ showString "Box " . showsPrec 5 r
>     showsPrec p (Dia r)   = showParen (p>4) $ showString "Dia " . showsPrec 5 r
>     showsPrec p (Letter n)= showParen (p>5) $ showsPrec 6 n
>     showsPrec p T         = showString "T"
>     showsPrec p F         = showString "F"

Some simple rules for simplification of some logical expressions:

> simplify p = let simplify' (a :\/ F) = a
>                  simplify' (F :\/ b) = b
>                  simplify' (a :/\ T) = a
>                  simplify' (T :/\ b) = b
>                  simplify' (a :\/ T) = T
>                  simplify' (T :\/ b) = T
>                  simplify' (a :/\ F) = F
>                  simplify' (F :/\ b) = F
>                  simplify' (F :-> b) = T
>                  simplify' (T :-> b) = b
>                  simplify' (a :-> F) = Neg a
>                  simplify' (a :-> T) = T
>                  simplify' (Neg T) = F
>                  simplify' (Neg F) = T
>                  simplify' (Box T) = T
>                  simplify' (Dia F) = F
>                  simplify' z = z
>    in case p of
>        a :/\ b -> let a' = simplify a
>                       b' = simplify b
>                   in simplify' (a' :/\ b')
>        a :\/ b -> let a' = simplify a
>                       b' = simplify b
>                   in simplify' (a' :\/ b')
>        a :-> b -> simplify' (simplify a :-> simplify b)
>        Box a   -> simplify' (Box (simplify a))
>        Dia a   -> simplify' (Dia (simplify a))
>        Neg (Neg a) -> simplify a
>        a           -> a

Kinds of Proposition
I'm actually going to use more than one proposition type. So when we do case analysis I need to make my patterns more abstract so they can work with multiple types. I'm going to use the PropType type to represent the ways we're going to classify logical propositions. The types are:

1. Atomic: A single letter or its negation
2. Constant: Simply T or F or a negation thereof.
3. DoubleNegation.
4. Disjunction: used to represent things like a∧b or ¬(a∨b).
5. Conjunction: used to represent things like a∨b or ¬(a∧b).
6. Provability: These are statements about provability like those starting with ◻ or ¬◊.
7. Consistency: These are statements about consistency like those starting with ◊ or ¬◻.

As this type is a simple container we can make it a functor too.

> data PropType a = Atomic a
>                 | Constant a
>                 | DoubleNegation a
>                 | Disjunction a a
>                 | Conjunction a a
>                 | Provability a
>                 | Consistency a

> instance Functor PropType where
>     fmap f (Atomic a)         = Atomic (f a)
>     fmap f (Constant a)       = Constant (f a)
>     fmap f (DoubleNegation a) = DoubleNegation (f a)
>     fmap f (Provability a)    = Provability (f a)
>     fmap f (Consistency a)    = Consistency (f a)
>     fmap f (Conjunction a b)  = Conjunction (f a) (f b)
>     fmap f (Disjunction a b)  = Disjunction (f a) (f b)

I'll introduce a typeclass that will allow us to use the PropType view to query propositions:

> class PropTypeable a where
>     propType :: a -> PropType a
>     neg      :: a -> a
>     isF      :: a -> Bool
>     negative :: a -> Bool
>     positiveComponent :: a -> Prop

And now here we have the cases that I summarised in English above:

> instance PropTypeable Prop where
>     propType (a :\/ b)        = Disjunction a b
>     propType (a :/\ b)        = Conjunction a b
>     propType (Neg (a :\/ b))  = Conjunction (Neg a) (Neg b)
>     propType (Neg (a :/\ b))  = Disjunction (Neg a) (Neg b)
>     propType (a :-> b)        = Disjunction (Neg a) b
>     propType (Neg (a :-> b))  = Conjunction a (Neg b)
>     propType (Neg (Neg a))    = DoubleNegation a
>     propType (Box a)          = Provability a
>     propType (Neg (Box a))    = Consistency (Neg a)
>     propType (Dia a)          = Consistency a
>     propType (Neg (Dia a))    = Provability (Neg a)
>     propType (Letter a)       = Atomic (Letter a)
>     propType (Neg (Letter a)) = Atomic (Neg (Letter a))
>     propType T                = Constant T
>     propType F                = Constant F
>     propType (Neg F)          = Constant T
>     propType (Neg T)          = Constant F
>     neg                       = Neg
>     isF F                     = True
>     isF (Neg T)               = True
>     isF _                     = False
>     positiveComponent (Neg a) = a
>     positiveComponent a       = a
>     negative (Neg _)          = True
>     negative _                = False

It'll be a while before we need the full generality so it's going to seem like overkill for the moment!

And some pre-packaged letters for convenience:

> [a, b, c, d, p, q, r, s, t] = map (Letter . return) "abcdpqrst"

We're going to need some operations that act on lists.

placesWhere finds all of the elements of a list for which some predicate holds. Instead of just listing the elements that match, it lists the elements paired with the rest of the list after the matching element is removed. We can think of these pairs as elements and their surrounding context:

> placesWhere p []     = []
> placesWhere p (x:xs) = let r = map (second (x:)) $ placesWhere p xs
>                        in if p x then ((x, xs) : r) else r

This finds something in the intersection of two sets using a given 'equality' predicate for matching. As we may be using a predicate different from == we need to see both of the (possibly different) elements that satisfy the predicate.

> findIntersection eq a b = listToMaybe [(x, y) | x <- a, y <- b, x `eq` y]

Sometimes we'll meet propositions that we can start to reason about using ordinary propositional calculus. These will match the propositional predicate:

> propositional (propType -> DoubleNegation _) = True
> propositional (propType -> Conjunction _ _)  = True
> propositional (propType -> Disjunction _ _)  = True
> propositional _                              = False

On the other hand the provability predicate is used to identify propositions that need rules pertaining to provability and consistency:

> provability (propType -> Provability _) = True
> provability (propType -> Consistency _) = True
> provability _                           = False

The Algorithm
And now we're almost ready to implement the tableau rules. Because we'll be using tableaux in a number of different ways I need lots of hooks into the algorithm that can perform different operations. I've collected all of these hooks into a single type. The algorithm will take a proposition of type prop (which will be Prop for the first three cases) and produce something of type result.

> data TableauRules prop result = TableauRules {

If our result corresponds to a world that is self-contradictory it is said to close. Here's how we indicate a closed (and hence not really existing) world. In the simplest case we won't actually store any information about a world, just whether or not it closes. So closes will be the identity function:

>     closes :: result -> Bool,

Occasionally we'll find a world with something obviously equivalent to F in it. It closes. Here's what we want to return in that case. The argument is the offending proposition:

>     foundF :: prop -> result,

Sometimes we'll find a pair that obviosuly contradict, like a and ¬a:

>     foundContradiction :: (prop, prop) -> result,

And sometimes we'll find an open world (ie. a real non-closed one). This function gets handed the list of propositions that have been found to hold in it:

>     open :: [prop] -> result,

Here's what we do when we find a conjunction. I hope you remember that when we meet a conjunction we can delete it and replace it with the two subpropositions. conjRule is handed the subpropositions as well as the result from proceeding with the tableau rule for conjunctions:

>     conjRule :: prop -> prop -> result -> result,

Disjunctions work a little differently. When handling a∨b, say, we need to handle two subtableaux, one with a and one with b. The first argument to disjRule rules is the disjunction itself, the next two are the left and right subpropositions, and the last two arguments are the results of continuing the two subtableaux.

>     disjRule :: prop -> prop -> prop -> result -> result -> result,

With doubleNegation we get to see propositions that have undergone double negation elimination.

>     doubleNegation :: prop -> result -> result,

When we use Dia to open new worlds we need to ensure that each of these subworlds is valid. Each subworld is processed with processWorld . For example, when we're drawing tableau diagrams we can use this book to draw a frame around the subtableaux. We then fold together these subworlds using combineWorlds:

>     processWorld  :: prop -> result -> result,
>     combineWorlds :: result -> result -> result,

Lastly we have our driver function that kicks off the whole tableau algorithm on a list of propositions:

>     tableau :: [prop] -> result -> result
> }

We can now implement a tableau computation algorithm that supports these hooks. We'll start with the check for whether or not we have an immediate contradiction:

> simpleClosure rules ps = case find isF ps of
>    Just a  -> foundF rules a

Split the propositions into those that are negated and those that are not. We're looking for propositions in one part that directly contradict propositions in the other:

>    Nothing ->
>      let (neg, pos) = partition negative ps
>          maybePair = findIntersection ((==) `on` positiveComponent) neg pos
>      in case maybePair of
>          Just pair -> foundContradiction rules pair
>          Nothing   -> open rules ps

Double negation elimination is straightforward to apply. Delete the original proposition and replace it with the version without double negation:

> applyDNeg rules p a props = doubleNegation rules a $
>   applyPropositional rules (a : delete p props)

The rule for handling conjunctions. We delete the conjunction from the current list of propositions and replace it with the two subpropositions:

> applyConj rules p a b props = conjRule rules a b $
>   applyPropositional rules (a : b : delete p props)

Disjunctions require running two separate subtableaux:

> applyDisj rules p a b props =
>    let props' = delete p props
>        left   = applyPropositional rules (a : props')
>        right  = applyPropositional rules (b : props')
>    in disjRule rules p a b left right

Here we tie together the rules for propositional calculus. We use a bit of case analysis to decide which rule to apply, and if no rule applies we try the provability rules instead:

> applyPropositional rules props =
>     let t = simpleClosure rules props in if closes rules t
>         then t
>         else case find propositional props of
>             Nothing -> applyProvability t rules props
>             Just p  -> case p of
>                 (propType -> DoubleNegation q) -> applyDNeg rules p q props
>                 (propType -> Conjunction a b)  -> applyConj rules p a b props
>                 (propType -> Disjunction a b)  -> applyDisj rules p a b props

When we've exhausted all possible rules from propositional calculus we scan for propositions like Dia p or Neg (Box p). These may imply the existence of subworlds. We then try to instantiate these subworlds, seeded according to the rules I gave in the previous article:

> applyProvability t rules props =
>     let impliedWorlds = placesWhere consistency props

>         consistency (propType -> Consistency _) = True
>         consistency _ = False

>         testWorld (p@(propType -> Consistency q), props) =

In the following line, neg p corresponds to the application of Löb's Theorem. provabilities is the list of propositions inherited by a subworld from statements about provability in the parent:

>              let tableau = runTableau rules (q : neg p : provabilities)
>                  provabilities = do 
>                      p@(propType -> Provability q) <- props
>                      [p, q]
>              in processWorld rules p tableau

>     in foldr (combineWorlds rules) t (map testWorld impliedWorlds)

And finally, here's where we kick our algorithm off:

> runTableau rules props = tableau rules props $ applyPropositional rules props

Testing Validity
We can now use a set of simple rules to test the validity of propositions. As mentioned above, the return data is a Bool used to indicate whether a subworld was invalid. By and large, these rules do the trivial thing. Note how the rule for disjunction requires both of the alternatives to be invalid in order to completely invalidate, so we use (&&). But when considering subworlds, just one bad subworlds is enough to invalidate a world:

> validRules = TableauRules {
>     closes = id,
>     open   = \_ -> False,

>     foundF             = \_ -> True,
>     foundContradiction = \_ -> True,

>     conjRule       = \_ _ t -> t,
>     disjRule       = \_ _ _ -> (&&),
>     doubleNegation = \_ t -> t,

>     combineWorlds = (||),
>     processWorld  = \_ t -> t,

>     tableau = \_ t -> t
> }

We can now write a simple validty test. We negate the proposition we're interested in and test whether the implied world closes:

> valid p = runTableau validRules [neg p]

Here's a small regression test to ensure everything works. It's just a bunch of examples that I worked out by hand or lifted from Boolos's book:

> valids = [
>         T,
>         a :-> a,
>         Box a :-> Box a,
>         Box a :-> Box (Box a),
>         Box (Box a :-> a) :-> Box a,
>         Box F <-> Box (Dia T),
>         let x = p :/\ q :-> r :-> a in Box (Box x :-> x) :-> Box x,
>         F :-> Dia p,
>         Box (Dia p) :-> Box (Box F :-> F),
>         (Box F \/ q /\ Dia (Box F /\ Neg q)) <->
>           (Dia (Box F \/ q /\ Dia (Box F /\ Neg q))
>           --> q /\ Neg (Box (Box F \/ q /\ Dia (Box F /\ Neg q)
>           --> q)))
>     ]

> invalids = [
>         F,
>         a :-> Box a,
>         Box a :-> a,
>         Box (Box a :-> a) :-> a,
>         Dia T,
>         Box (Dia T),
>         Neg (Box F),
>         (Box F \/ p /\ Dia (Box F /\ Neg q)) <->
>           (Dia (Box F \/ q /\ Dia (Box F /\ Neg q))
>           --> q /\ Neg (Box (Box F \/ q /\ Dia (Box F /\ Neg q)
>           --> q)))
>     ]

If everything is working, regress1 should give the result True.

> regress1 = do
>     print $ (and $ map valid valids) &&
>             (and $ map (not . valid) invalids)

That's enough implementation. In the next installment we'll start putting this code to work.

References
This code is an implementation of the algorithm described in Chapter 10 of The Logic of Provability. See that book if you want (1) a proof that the above algorithm always terminates and (2) that it really does correctly decide the validity of propositions of provability logic.

Saturday, December 11, 2010

Generalising Gödel's Theorem with Multiple Worlds. Part I.

Introduction
In his first incompleteness theorem, Gödel showed us that we can construct a sentence that denies its own provability. In his second incompleteness theorem he showed that an example of such a sentence is the one that asserts the inconsistency of arithmetic. If arithmetic is consistent then it can't prove its own consistency. On the other hand, if arithmetic is inconsistent then we can prove anything, and hence we can prove its consistency.

Can we generalise what Gödel did? For example, can we construct sentences that we can prove assert their own provability? What about sentences that deny that their provability is provable? Or what about sentences that assert that if they're provable then it's not provable that it's inconsistent that they imply that they're inconsistent with the rest of arithmetic?

Not only can we do these things, we can also write a computer program that generates such theorems for us. We can do so by working with the idea that a consistent set of axioms describes a world, and that a set of axioms able to talk about sets of axioms describes a world within a world, and a set of axioms that...well you can guess how it goes. A bit like Inception really.

Gödel's Theorem
Briefly, Gödel's first incompleteness theorem goes like this: we work with PA, the logical system built from Peano's Axioms. Within PA we can state and prove theorems about arithmetic like the fact that 1+2=3 or that for any prime there is always another greater prime. But even though PA is about numbers, we can talk about other things if we can encode them as numbers. In particular, Gödel came up with a scheme to encode propositions of PA as numbers. We use [P] to represent the number for the proposition P in Gödel's scheme. A proof is essentially just a list of propositions where each one is connected to some earlier ones by simple mechanical rules. These rules can be turned into arithmetical statements about the Gödel numbers for the propositions. So in the language of PA it is possible to assert that a particular number is the Gödel number of a provable proposition. Let Prov(n) denote the proposition of PA that says that n is the Gödel number of a provable proposition. Prov(n) is a proposition of PA and so is ¬Prov(n). Imagine we could find a proposition G with the property that G↔¬Prov([G]). It would assert its own unprovability. But it also appears to involve stuffing a representation of the Gödel number of G within G as well as all the rules for determining provability. Amazingly Gödel figured out how to do this (using tricks not unlike those used for quining). If G were false, G would be provable. So if we trust PA as a means of reasoning about proofs in PA, then G is true, though it can't be proved using the axioms of PA.

Provability Logic
We're going to be interested specifically in provability, so we don't need all of the power of PA. So we're going to work with a simplified domain specific logic called GL (for Gödel-Löb), otherwise known as Provability Logic. The idea is that sentences of GL will be shorthand for classes of statement in PA. GL will contain propositional calculus. So here's an example statement in GL: p∧q. The unknowns p and q represent propositions of PA and statements of GL are considered valid if they're provable whatever statements of PA they represent. For example, we could assign p="there are at least 10 primes" and q="7>2", in which case p∧q holds. But we could assign p="there are just 10 primes" in which case it's false. So p∧q isn't a valid proposition of GL as it doesn't hold for all p and q. On the other hand p→p∨q is valid because no matter what crazy propositions we assign to p and q, p→p∨q is true. (Of course, when I say "there are at least 10 primes", I mean a long and complicated sentence of PA that amounts to the same thing.)

But there's more to GL than propositional calculus. It also has the one-argument predicate ◻ which asserts that its argument is provable. More precisely, ◻p says that whatever proposition of PA p represents, let's say it's P, we have Prov([P]). It's just shorthand. Here's another example: ◻(q→◻p) says that whatever P we assign to p, and whatever Q we assign to q, Prov([Q →Prov([P])]). Or in English it says "for any propositions P and Q, it is provable that Q implies that P is provable".

We'll use ⊤ and ⊥ from propositional calculus as the always true and always false propositions in the usual way. ◻⊥ is the assertion that we can prove ⊥. In other words it's the assertion that PA is inconsistent. So now we can state Gödel's second incompleteness theorem: ¬◻⊥→¬◻¬◻⊥. If PA is consistent, we can't prove it.

We can also introduce the symbol ◊. This is just shorthand for ¬◻¬. ◊p says that it's not provable that p is false. In other words, it says that p is consistent with the rest of PA. So ◻ is provability and ◊ is consistency.

A set of assignments of propositions of PA to a bunch of letters in GL can be thought of as a world. For example, we can imagine a world in which p="2>1" and q="2<1". In this world, we have p and ¬q. The GL proposition ◻p says that we have a proof of p so it must be true in all worlds. Conversely, ◊p says that it's not true there is no world where p holds. In other words, it asserts the existence of a world where p holds. So worlds can talk about other worlds. If we have ◊◊p in some world, then it's asserting that there's another world in which ◊p holds. In other words, ◊◊p asserts there is a world in which it is asserted that there is another world where p holds. We can draw pictures to represent this. If a world has propositions that talk about another world then we draw the talked about world as a kind of subworld. Here's how we can picture ◊◊p:

Worlds are assignments of propositions of PA to letters of GL. But most of the time we won't particularly care about specific propositions themselves like "2>1". We'll be more interested in what truth assignments we can make to the propositions represented by the letters. So we can think of a world as a place where the letters of GL have been assigned truth values, true or false. And we can think of each world as containing subworlds consisting of propositions that can be proved or disproved by their parent worlds.

I'm going to spend most of my time looking at how we can explore and unfold all of the implications contained in a world.

The Rules of the Game
We can give some rules. If a world contains p∧q like this:

then it must contain both p and q as well:

I crossed out the p∧q as it became redundant. It's important to note that when I wrote p∧q we could have any propositions of GL standing in for p and q. So a world containing (p∨q)∧r must also contain p∨q and r. Other rules may apply too. If we had p∧q and r∧s we can unfold both to get p, q, r and s. This rule also kicks in if it applies to negated propositions that become conjunctions if we "push down" the negation using de Morgan's laws. For example if a world contains ¬(p→q) then it also contains p and ¬q.

If a world contains p∨q then we don't know exactly what it looks like.

There are two possibilities. We can draw both together like this:

The line in the middle means that the world either looks like what's on the left or it looks like what's on the right. Once we've indicated there are two alternatives then the original proposition became redundant again. When we have a vertical line like this then everything above the vertical line applies in both the left and right possibilities. This saves us having to split our world into two and copy all of our formulae to both sides. Like in the case of conjunctions this rule also kicks in for other kinds of disjunctions. So a world containing p→q splits into two separate worlds headed by ¬p and q.

If a world contains ⊥, or a contradiction, in means that it wasn't really a valid world after all. If we meet a world like this, we know that our starting point must have been been self-contradictory. If a world implies the existence of a subworld that isn't valid, then it must itself be invalid:

On the other hand, if we've split a world into two possibilities because we found p∨q then even if one branch is invalid the world might still be valid if the other branch is valid.

If we have a world containing ◻p:

then we know that p must be in every subworld. Actually, we know it must also hold in any subsubworld too, all the way down. This is because if we can prove something we can then use that proof directly to form a constructive proof that we can prove it. So that means ◻p holds in every subworld too:

We treat negation as above. So if we see ¬◻p we treat it like ◊¬p.

As we have also said, if we have a world with a ◊p in it then it must contain a subworld with p in it. Note that ◊ implies the existence of subworlds, but ◻ doesn't. It just tells us what must be in them if they exist.

But there's one last rule we'll need. It's Lob's theorem. This is the big theorem on which everything else I say depends. It states that ◻(◻p →p) →◻p. If we can prove that proving something implies its truth then we can prove it. I could sketch a proof here, but I highly recommend the cartoon proof here. In a way, Löb's theorem is a bit sad. The raison d'etre of mathematics is that we can use proofs to be sure of things. In other words, we take for granted that ◻p→p. But if we could prove this then we could prove ◻p, even if p were false! ◻p→p is not valid!

(Philosophical digression: Mathematicians assume ◻p→p. They don't assume ◻p→p because there's a proof. They assume it because experience has shown it to work. So I claim that ◻p→p is an empirical fact that we learn by scientific induction. That's controversial because I'm basically saying that much of mathematics is empirical - at least it is if you talk about truth rather than proof. If you disagree, don't worry about it. The rest of what I say here is independent of this digression.)

Anyway, we can flip Löb's theorem around to get ◊p →◊(p ∧¬◊p). So if we have a world in which ◊p holds:

then we have a subworld in which both p and ¬◊p hold:

And that's all we'll need apart from the obvious rule that we can remove double negation. Suppose we have a proposition of GL. We can use the rules above to extract as many implications as we can. Eventually there will come a point where there is nothing more we can do. If we can do this without hitting a contradiction then we've found a bunch of possible worlds for the proposition. If we can't find a valid world, however, the the original proposition must have been false. You might ask whether or not there any other rules we need in addition to the ones above. Maybe there are other theorems like Löb's theorem that we need to use. Amazingly it can be shown that no other rules are needed. This is a sure-fire terminating algorithm for determining whether or not a proposition of GL is valid! This is a powerful tool. We can now start constructing wild propositions like those I started with and find out whether they are valid. (Note that I've not in any way proved this procedure always works. You'll need to look at Boolos's book to see why.)

Some Proofs
Let's work through some examples. First I'll prove something from propositional calculus: (p∧q)∨(p∧r)→p. We start by drawing a world with the negation:

Now ¬(a→b) is the same as a∧¬b. So we get:

Now the only way to proceed is to consider the disjunction and consider two alternatives:

On both sides of the vertical line we find p. But that contradicts the ¬p we discovered earlier. So there is no way the negation of our original proposition can hold in any world. And therefore the original proposition must be valid.

How about showing Gödel's second incompleteness theorem this way. ◊⊤ says that ⊤ is consistent with the rest of arithmetic. Ie. it expresses the consistency of arithmetic. The theorem is then ◊⊤→¬◻◊⊤, ie. if PA is consistent, then PA can't prove it is consistent. We'll start with this:

We can deal with the negated implication like before. We can also deal straightforwardly with the double negation:

There's now no way to proceed except to use the ◊⊤ to open up a new world. Remember that when we use this we must use Löb's theorem as well as inheriting p and ◻p from the parent world's ◻p:

And we get a contradiction because we have both ◊⊤ and ¬◊⊤. So Gödel's second incompleteness theorem is indeed valid. Note that this isn't a demonstration from scratch. We've shown its validity from Löb's theorem. So this isn't really a useful way to show it's valid. But it *is* a useful way to show the validity of generalisations of Gödel's theorem. Unfortunately, I have to stop for now.

In the coming posts I'll implement all of the above as a computer program. If there's any ambiguity in what I've said I hope the source code to the program will resolve those ambiguities. What's more, our program won't just test the validity of a proposition but it will draw out a nice picture of our world with its subworlds. So you'll be able to trace through every step to make sure you understand!

But that's not all. We'll also see how to write a program to illustrate a bunch more theorems like Craig's Interpolation Lemma and Beth's Definability Theorem and then we'll finish with a program that is designed to construct self-referential propositions. In particular, given any self-referential description like "p is a proposition that is equivalent to the proposition that denies that it's provable that p's provability is consistent with p itself" it will solve to find p, even though GL doesn't allow us to directly construct self-referential propositions.

So let me recap: a world is an assignment of consistent truth values to the letters (and consequently propositions) of GL. Some of these propositions imply the existence of other subworlds with different truth values for these propositions. We draw these worlds as subworlds of the original world. For a world to be valid it mustn't contain any contradictions (unless the world contains a bunch of alternatives in which case just one alternative needs to be valid.) A proposition of GL is valid if unfolding its negation doesn't result in an invalid world.

Exercises
Which are the following are valid? If you report back any difficulties you have I can incorporate any needed revisions into the description above.
1. p→◻p
2. ◻p→◻◻p
3. ◊p→◊◻◻p
4. ◊(p→◻q) →◊(◻(◊(p∧q)))
5. ◻(p∧q∧r∧s) →◻p∧◻q∧◻r∧◻s

References
Most of this stuff is based on Chapter 10 of The Logic of Provability by Boolos. The idea of using multiple worlds to prove theorems is due to Kripke. I believe the procedure of unfolding the implications of a proposition in the tree-like way I describe above is due to Smullyan.

Update
Solutions to problems:
1. Not valid. Diagram:

2. Valid
3. Valid.
4. Valid.
5. Valid. Diagram:

Friday, November 19, 2010

Beating the odds with entangled qubits

Quantum mechanics allows the possibility of "spooky action at a distance", correlations between widely separated but simultaneous random events that can't be explained by probability theory. These events look like they secretly communicate with each other, but we also know that quantum mechanics prevents us sending messages faster than the speed of light. Nonetheless, even though we can't exploit non-locality to send messages faster than the speed of light, two cooperating parties can exploit non-locality to perform tasks better than would be possible without non-locality. The CHSH game is one such example.

My goal here is to write code to emulate the CHSH game. It will require reusing the probability and quantum mechanics monads I've used here many times before. So I won't be explaining how these work.

> import Data.Map (toList, fromListWith)
> import Complex
> infixl 7 .*

The CHSH game is cooperative in the sense that the two players, A and B, are attempting to work together to win. The two players are widely separated. Between the two players is the game show host. The host randomly generates a pair of bits, s and t. s is sent to A and t is sent to B. Neither A nor B gets to see the message sent to their partner. A and B must now simultaneously make their moves, stating a choice of bit.

Call A's move a and B's move b. A and B win if they can arrange that a XOR b equals s AND t. So, for example, if A receives a true bit, and thinks B has also received a true bit, then A wants to make a move that differs from B's, otherwise A wants to make the same move as B. A and B are allowed to plan as much as they like beforehand but it should be pretty clear that they can't possibly guarantee a win.

We can formally write the victory condition as:

> victory a b s t = (a `xor` b) == (s && t)

Now there's a pretty good strategy A and B can adopt: three quarters of the time, s AND t will be false. In that case, A and B want their answers to match. So they could simply choose false, regardless of what message the game show host sends them. We can give their strategies as a function of the host's message:

> astrategy s = False
> bstrategy t = False

Now we can simulate our game:

> game = do

The host picks a random bit to send to each player:

>    s <- 0.5 .* return False + 0.5 .* return True
>    t <- 0.5 .* return False + 0.5 .* return True

The players now respond to each of their messages:

>    let a = astrategy s
>    let b = bstrategy t

Now we can collect the replies and score the result:

>    let score = victory a b s t
>    return score

> play1 = collect game

Running play1 gives the expected result that A and B have a 3/4 chance of winning. It's not hard to prove classically that they can do no better than this.

But in a quantum universe it is possible to do better! We now allow A and B to adopt strategies that involve making measurements of a quantum system. To describe their strategies we need to use the quantum monad. Here is the previous strategy rewritten for this monad. In this case, the argument b is the state of the quantum system they observe. The first element of the pair is the move in the game, the second element is the state the physical system is left in by the player:

> aqstrategy b s = qreturn (False, b)
> bqstrategy b t = qreturn (False, b)

Now we can rewrite game to support quantum processes:

> game' aqstrategy bqstrategy bits = do
>    s <- 0.5 .* preturn False + 0.5 .* preturn True
>    t <- 0.5 .* preturn False + 0.5 .* preturn True
>    (score, _, _) <- collect $ observe $ do
>        (abit, bbit) <- bits
>        (a, abit') <- aqstrategy abit s
>        (b, bbit') <- bqstrategy bbit t
>        let score = victory a b s t

Note that we have to return abit' and bbit' because quantum processes are reversible and can't erase information about a state. Also note that abit' and bbit' can be widely separated in space.

>        qreturn (score, abit', bbit')

Probabilistic processes can erase whatever they like:

>    preturn score

The quantum version of a coin toss to generate a random bit, ie. an equal superposition of False and True:

> coin' = (1/sqrt 2) .* qreturn False + (1/sqrt 2) .* qreturn True

> bits = do
>   a <- coin'
>   b <- coin'
>   return (a, b)

So now we play exactly as in play1 except that we give each player an independent qubit, which they ignore:

> play2 = collect $ game' aqstrategy bqstrategy bits

Unsurprisingly, the probability of winning is just the same as before.

But now we can try something impossible in the classical case. We give each player half of a perfectly correlated pair of qubits. The players are now entangled and can exploit non-locality. Of course if their strategies ignore the qubits we get the same result as before:

> bell = (1/sqrt 2) .* qreturn (False, False) + (1/sqrt 2) .* qreturn (True, True)
> play3 = collect $ game' aqstrategy bqstrategy bell

Now comes the surprising bit. The players can each look at the (classical) bit given to them by the game host. Depending what it is they rotate the qubit's state through some angle in state space. (In the case of the qubit being an electron state, this is an actual physical rotation of the electron.) In these strategies, the choice of move is the same as the state the qubit is left in after observation, hence the qreturn (b', b') bit.

> aqstrategy' b False = rotate (0) b >>= \b' -> qreturn (b', b')
> aqstrategy' b True  = rotate (pi/2) b >>= \b' -> qreturn (b', b')

> bqstrategy' b False = rotate (pi/4) b >>= \b' -> qreturn (b', b')
> bqstrategy' b True  = rotate (-pi/4) b >>= \b' -> qreturn (b', b')

And now when we play, the probability of winning is greater than 3/4.

> play4 = collect $ game' aqstrategy' bqstrategy' bell

All of the 'communication' took place before the game started. A and B didn't communicate s and t to each other. And yet they can beat the classical odds.

So in conclusion: Quantum mechanics gives opportunities for collusion that are impossible classically. Sadly we don't yet know how to maintain the state of separated entangled qubits for extended periods of time. But I remember seeing recently that people are managing to maintain qubit states for nanoseconds.

By the way, there are games where it is possible to achieve a 100% success rate with the help of quantum states. These give examples of what is known as quantum pseudo-telepathy. I presume the "pseudo" is because despite the 100% success rate, it still doesn't give a way to send messages instantly.

A last thought from me: one reason why humans send messages is to allow them to coordinate strategies. But quantum game theory shows that we can coordinate strategies without sending messages. In other words, even though non-locality doesn't give us faster-than-light communication, it does allow us to do things that were previously thought to require FTL. I think this may have some profound consequences.

And an example from a different domain: in biochemistry one could imagine remote parts of ligands coordinating the way they bind to receptors, something that would be completely missed by the kind of quasi-classical simulation I've seen biochemists use.

My standard quantum mechanics code:

> data W b a = W { runW :: [(a, b)] } deriving (Eq, Show, Ord)

> mapW f (W l) = W $ map (\(a, b) -> (a, f b)) l

> instance Functor (W b) where
>  fmap f (W a) = W $ map (\(a, p) -> (f a, p)) a

> instance Num b => Monad (W b) where
>  return x = W [(x, 1)]
>  l >>= f = W $ concatMap (\(W d, p) -> map (\(x, q)->(x, p*q)) d) (runW $ fmap f l)

> a .* b = mapW (a*) b

> instance (Eq a, Show a, Num b) => Num (W b a) where
>   W a + W b = W $ (a ++ b)
>   a - b = a + (-1) .* b
>   _ * _ = error "Num is annoying"
>   abs _ = error "Num is annoying"
>   signum _ = error "Num is annoying"
>   fromInteger a = if a==0 then W [] else error "fromInteger can only take zero argument"

> collect :: (Ord a, Num b) => W b a -> W b a
> collect = W . filter ((/= 0) . snd) . toList . fromListWith (+) . runW

> type P a = W Double a
> type Q a = W (Complex Double) a

> a `xor` b = a/=b

> rotate :: Double -> Bool -> Q Bool
> rotate theta True = let theta' = theta :+ 0
>   in cos (theta'/2) .* return True - sin (theta'/2) .* return False
> rotate theta False = let theta' = theta :+ 0
>   in cos (theta'/2) .* return False + sin (theta'/2) .* return True

> observe :: Ord a => Q a -> P a
> observe = W . map (\(a, w) -> (a, magnitude (w*w))) . runW . collect

Some help for the compiler (and maybe humans too):

> preturn = return :: a -> P a
> qreturn = return :: a -> Q a

Tuesday, November 09, 2010

Statistical Fingertrees

I have no time to post a proper article. But I have time to post a mini-article with details in the links.

> {-# LANGUAGE MultiParamTypeClasses #-}
> import Data.FingerTree
> import Data.Monoid

I'm a bad person. I often use this method to compute the mean and variance of some samples. It's not robust. It performs badly if the variance is small compared to the size of the samples. I shouldn't use it. As penance, this is a quick article on dynamically computing variances robustly and efficiently for datasets that are frequently manipulated.

For convenience I'll talk about the unscaled variance: the sum, not the average of the square deviation from the mean. You can easily compute the variance from this if you know the dataset size.

If we know the size, mean and unscaled variance of two sets (more properly, multisets) we can find the mean and unscaled variance of the their union using the formula here. This method is much more robust that the naive algorithm.

The rule for combining two datasets gives us a monoid:

> data Stats = Stats { n :: Float, mean :: Float, unscaledVariance :: Float }
>              deriving Show

> instance Monoid Stats where
>     mempty = Stats 0 0 undefined
>     Stats n m v `mappend` Stats 0 _ _ = Stats n m v
>     Stats 0 _ _ `mappend` Stats n m v = Stats n m v
>     Stats n m v `mappend` Stats n' m' v' = 
>       let delta = m' - m
>       in Stats (n + n') ((n*m+n'*m')/(n+n'))
>                (v + v' + delta*delta*n*n'/(n+n'))

Given a single sample, we can compute its stats:

> instance Measured Stats Float where
>     measure x = Stats 1 x 0

Now we need just one more line of code:

> type StatsTree = FingerTree Stats Float

We now have a data structure that allows us to freely split, join, delete elements from and add elements to sequences of samples, all the while robustly keeping track of their mean and unscaled variance (and hence their variance).

For example:

> example = fromList [1..10] :: StatsTree
> test = let (_, b) = split ((>=4) . n) example
>            (c, _) = split ((>3)  . n) b
>        in measure c

computes the stats for the 3 elements starting at the 4th element of example.

An example application might be maintaining rolling averages and variances for a sliding window.

This was inspired an article by John D Cook somewhere around here.

Saturday, September 18, 2010

On Removing Singularities from Rational Functions

Introduction
Suppose we have the function

> f x = 1/(x+x^2) - 1/(x+2*x^2)

Some basic algebraic manipulation shows that in the limit as x→0, f(x)→1. But we can't simply compute f 0 because this computation involves division by zero at intermediate stages. How can we automate the process of computing the limit without implementing symbolic algebra?

I've already described one way to remove singularities from a function. But that approach is very limited in its applicability.

This article is about a variation on the approach to formal power series that nicely showcases some advantages of lazy lists. It will allow us to form Laurent series of functions so we can keep track of the singularities.

The usual Haskell approach to power series allows you to examine the coefficients of any term in the power series of the functions you can form. These series can't be used, however, to evaluate the function. Doing so requires summing an infinite series, but we can't do so reliably because no matter how many terms in a power series we add, we can never be sure that there aren't more large terms further downstream that we haven't reached yet. And if we want to perform computations completely over the rationals, say, we don't want to be dealing with infinite sums.

I'd like to look at a way of working with power series that allows us to perform exact computations making it possible to answer questions like "what is the sum of all the terms in this power series starting with the x^n term?" By extending to Laurent series, and implementing the ability to selectively sum over just the terms with non-negative powers, we can compute functions like f above at 0 and simply skip over the troublesome http://en.wikipedia.org/wiki/Pole_%28complex_analysis%29.

Power Series
When I previously discussed power series I used code that worked with the coefficients of the power series. This time we want to work with values of the function so it makes sense to store, not the coefficients a_i but the terms themselves, a_ixⁱ. So instead of a list of coefficients, Num a => [a] we need a representation that looks a little like:

> data Power a = Power (a -> [a])

where we pass x in as an argument to the function contained in a Power. But we also want to allow Laurent series so we need to also store an offset to say which (possibly negative) term our series starts with:

> data Laurent a = Laurent (a -> (Int, [a]))

But this fails us for at least two reasons:

1. We have the individual terms, but to evaluate the function requires summing all of the terms in an infinite list.
2. If we have a Laurent series, then we need to store values of a_ixⁱ for x=0 and i<0. We'll end up with division by zero errors.

Partial Sum Series
So here's what we'll do instead. Suppose our power series is Σ_i=n^∞a_ixⁱ. We'll store the terms s_j=Σ_i=j^∞a_ix^i-j. Our type will look like:

> data Partial a = Partial (a -> (Int, [a]))

> instance Eq (Partial a)
> instance Show (Partial a)

It's straightforward to add two functions in this form. We just add them term by term after first aligning them so that the xⁱ term in one is lined up with the xⁱ term in the other:

> instance Num a => Num (Partial a) where

>  Partial f + Partial g = Partial $ \x ->
>     let (m, xs) = f x
>         (n, ys) = g x
>         pad 0 _ ys = ys
>         pad n x ys = let z:zs = pad (n-1) x ys
>                      in x*z : z : zs
>         l = min m n
>     in (l, zipWith (+) (pad (m-l) x xs) (pad (n-l) x ys))

Notice the slight subtlety in the alignment routine pad. By the definition above, the jth term has a factor of x^j built into it. So we need to multiply by x each time we pad our list on the left.

Now we need to multiply series. We know from ordinary power series that we need some sort of convolution. But it looks like for this case we have an extra complication. We appear to need to difference our representation to get back the original terms, convolve, and then resum. Amazingly, we don't need to do this at all. We can convolve 'in place' so to speak.

Here's what an ordinary convolution looks like when we want to multiply the sequence of terms (a_i) by (b_i):

In this example, the blue diagonal corresponds to the terms that are summed to get the 4th term in the result.

However, we wish to work with partial sums s_j=Σ_i=j^∞a_ix^i-j and t_j=Σ_i=j^∞t_ix^i-j, constructing the partial sums of the convolution of a and b from s and t. The partial sums of the convolution can be derived from the partial sums by tweaking the convolution so it looks like this:

The blue terms work just like before and need to be summed. But we also need to subtract off the red terms, weighted by a factor of x. That's it! (I'll leave that as an exercise to prove. The inclusion-exclusion principle helps.)

The neat thing is that the red terms for each sum are a subset of the blue terms needed for the next element. We don't need to perform two separate sums. We can share much of the computation between the red and blue terms. All we need to do is write an ordinary convolution routine that additionally returns not just the blue terms, but a pair containing the blue sum and the red sum.

>  Partial f * Partial g = Partial $ \x ->
>     let (m, xs) = f x
>         (n, ys) = g x
>         (outer, inner) = convolve xs ys
>         f' a b = a-x*b -- (the subtraction I mentioned above)
>     in (m+n, zipWith f' outer inner)
>  fromInteger n = let n' = fromInteger n in Partial $ \_ -> (0, n' : repeat 0)

>  negate (Partial f) = Partial $ \x -> let (m, xs) = f x
>                                 in (m, map negate xs)
>  signum = error "signum not implemented"
>  abs    = error "signum not implemented"

This is an ordinary convolution routine tweaked to return the partial sum inner.

> convolve (a0:ar@(a1:as)) ~(b0:br@(b1:bs)) =
>  let (inner, _) = convolve ar br
>      ab = map (a0 *) bs
>      ba = map (* b0) as
>  in (a0*b0 : a0*b1+a1*b0
>            : zipWith3 (\a b c -> a+b+c) inner ab ba, 0 : inner)

The code is very similar to the usual power series multiplication routine. We can also use the same method described by McIlroy to divide our series.

As our series are a munged up version of the usual power series it's pretty surprising that it's possible to divide with so little code:

> instance Fractional a => Fractional (Partial a) where
>   fromRational n = let n' = fromRational n in Partial $ \_ -> (0, n' : repeat 0)
>   recip (Partial f) = Partial $ \x ->
>      let nibble (n, x:xs) | x==0      = nibble (n+1, xs)
>                           | otherwise = (n, (x:xs))
>          (n, xs) = nibble (f x)
>      in (-n, rconvolve x xs)

In effect, rconvolve solves the equation convolve a b==1:

> rconvolve x (a0:ar@(a1:as)) =
>   let (outer, inner) = convolve ar result
>       f a b = x*b-a
>       r = -1/f a0 a1
>       result = recip a0 : (map (r *) $ zipWith f outer inner)
>   in result

Note one ugly quirk of this code. I need to 'nibble' off leading zeroes from the series. This requires our underlying type a to have computable equality. (In principle we can work around this using parallel or).

That's it. We can now write a function to compute the positive part of a rational function. (By positive part, I mean all of the terms using non-negative powers of x.)

> pos f z = let Partial g = f $ Partial $ \x -> (1, 1 : repeat 0)
>               (n, xs) = g z
>           in if n>0
>               then z^n*head xs
>               else xs!!(-n)

Here are some examples:

> test1 = let f x = (1+2*x)/(3-4*x*x)
>         in pos (\x -> 1/(f x-f 0)/x) (0::Rational)

> test2 = pos (\x -> 1/(1+4*x+3*x^2+x^3) - 1/(1+x)) (1::Rational)

The original example I started with:

> test3 = pos (\x -> 1/(x+x^2) - 1/(x+2*x^2)) (0::Rational)

No division by zero anywhere!

Conclusions
The code works. But it does have limitations. As written it only supports rational functions. It's not hard to extend to square roots. (Try writing the code - it makes a nice exercise.) Unfortunately, any implementation of square root will (I think) require a division by x. This means that you'll be able to compute the positive part away from zero, but not at zero.

This method can't be extended fully to transcendental functions. But it is possible to add partial support for them. In fact, So with a little work we can still compute the positive part of functions like 1/sqrt(cos x-1) away from x==0. But applying cos to an arbitrary rational function may need more complex methods. I encourage you to experiment.

Note that this code makes good use of laziness. If your function has no singularities then you might find it performs no computations beyond what is required to compute the ordinary numerical value.

Sunday, September 05, 2010

Automatic even/odd splitting

Statement of the Problem
Suppose you have a real valued function on the reals, say f. We can split it into the sum of an even and odd part:

f(x) = f_odd(x)+f_even(x)

where

f_odd(x) = (f(x)-f(-x))/2, f_even(x) = (f(x)+f(-x))/2

If f_odd has a power series around zero, then all of its terms must have odd powers in x. So f_odd(x)/x must have all even powers and it becomes natural to 'compress' down the terms to form f'(x) = f_odd(sqrt(x))/sqrt(x). (Take that as a definition of f' for this article.) We can implement this operation as a higher order function:

> import Prelude hiding (odd)

> odd f x = let s = sqrt x in 0.5*(f s - f (-s))/s

Here's a simple example:

> f x = x*(1+x*(2+x*(3+7*x)))

> test0 = odd f 3

But there's something not quite right about this. If f is rational, then so is odd f. But the implementation of odd involves square roots. Among other things, square roots introduce inaccuracy. As square roots don't appear in the final result, can we eliminate them from the intermediate steps of the computation too?

A Better Solution
Let's use the results of last week's article to compute odd another way. We want a linear function that maps as follows:

1	->	0
x	->	1
x²	->	0
x³	->	x
x²ⁿ	->	0
x²ⁿ⁺¹	->	xⁿ

Here's an automaton:

If we start at 0, end at 1, and take exactly n steps, then the product of the factors we collect up along the way is given by the second column of that table. In n is even there is no such path so we collect up 0. As we're working with polynomials over the reals, rather than types, we have x1=1x and so on. We can construct a transition matrix:

0	1
x	0

We now do something like we did last time. Any time we have a function of some variable x, we replace x with the transition matrix. Our functions now take matrix values like

a	b
c	d

Any polynomial of our transition matrix always gives us equal elements along the diagonal. This is true even if we form the inverse of the transition matrix. So we don't need to store d. So now we implement a simple matrix type needing only to store three elements instead of four:

> data O a = O a a a deriving (Show, Eq)

> instance Num a => Num (O a) where
>    O a b c + O a' b' c' = O (a+a') (b+b') (c+c')
>    O a b c * O a' b' c' = O (a*a'+b*c') (a*b'+b*a') (c*a'+a*c')
>    fromInteger n = let i = fromInteger n in O i 0 0
>    negate (O a b c) = O (negate a) (negate b) (negate c)

Notice how similar this is to automatic differentiation. We can extend to reciprocals too:

> instance Fractional a => Fractional (O a) where
>    fromRational n = let i = fromRational n in O i 0 0
>    recip (O a b c) = let idet = recip (a*a-b*c) in O (idet*a) (idet*negate b) (idet*negate c)

And now we can implement a replacement for odd:

> transition x = O 0 1 x
> odd' f x = let O _ fx _ = f (transition x) in fx

> test1 = odd' f 3

Another example:

> test3 = odd' (\x -> (x+3*x*x-1/x)/(x*x)) 2
> test4 = odd  (\x -> (x+3*x*x-1/x)/(x*x)) 2

This new version has many advantages: it uses only rational functions, it's more accurate, and it's well defined at zero.

Conclusion
Automatic differentiation is just one of a family of methods that can be used to compute a wide variety of functions of real-valued functions. Essentially we're just working over real-valued matrices instead of real numbers. By using automata we can simplify the process of working out which matrices to use. (Though for the simple example above, you may have been able to guess the matrix without any other help).

(BTW I think there are hidden automata lurking in a few places in mathematics. For example, in Umbral calculus.)

Saturday, August 14, 2010

Constraining Types with Regular Expressions

Structures with Constraints
Here's a picture of an element of a binary tree type:

The leaves correspond to elements and the letters indicate the type of those elements. If we read the leaves from left to right they match the regular expression A^*B^*. We can define a binary type whose leaves always match this string by making use of dissections. Similarly we can construct all kinds of other structures, such as lists and more general trees, whose leaves match this expression.

But can we make structures that match other regular expressions? Here are some examples:

1. Structures that match the expression Aⁿ for some fixed n. These are structures, like trees, that have a fixed size. For the case n=5 we need a type that includes elements like:

2. Structures that match (AA)^*. These are structures like trees with an even number of elements. This is easy for lists. We can just use a list of pairs. But it's more complex for trees because subtrees may have odd or even size even though the overall structure is even in size. We could also generalise to ((AA)ⁿ)^* for some fixed n.

3. Structures that match A^*1A^*. Here's an example:

The element marked 1 is of unit type (). It's a 'hole'. So this regular expression is a derivative type. We can also construct a variety of types like those matching A^*BA^*CB^* by using a mixture of dissection and differentiation:

4. A bytestring- or rope-like type used to represent a string that is statically guaranteed to match a given regular expression.

5. Many kinds of other constraints you could imagine on a datastructure. Like trees which are guaranteed not to have two neighbouring leaves of the same type, or whose sequence of leaves never contain a certain subsequence.

So the challenge is this: can we implement a uniform way of building any container type to match any regular expression we want?

We'll need some extensions:

> {-# OPTIONS_GHC -fwarn-incomplete-patterns #-}
> {-# LANGUAGE TypeFamilies, EmptyDataDecls, UndecidableInstances, TypeOperators #-}

First Example: Differentiating Lists
Let l(x) be the type of lists of elements of type x. We can define a list type through

data List x = Nil | Cons x (List x)

Algebraically we can write this as

l(x) = 1+x l(x)

Let's try approaching the derivative of a list from the perspective of regular expressions. We know from Kleene's Theorem that the set of strings (ie. the language) matching a regular expression is precisely the language accepted by a finite state automaton. Let's consider the case x^*1x^*. This is the language we get by considering all paths from state 0 to state 1 in the following automaton:

Note that I have overloaded the numeral '1' to mean both the type with one element (when labelling an edge) and to mean the state numbered 1 (when labelling a vertex).

Define the L_ij(x) to be the type of lists whose sequence of elements correspond to all possible paths from state i to state j. A list in L_ij(x) is either an empty list, or constructed from a first element and a list. If we combine an element and a list, the combination has to match a possible path from i to j. There are a number of ways we could do this. The element could correspond to a transition from i to k. But if this is the case, then the remainder of the list must correspond to a path from k to j. So we must replace Cons with something that for its first argument takes a type corresponding to a single automaton step from i to k. For its second argument it must take an element of L_kj(x). The set of paths from i to j that correspond to a single step is the (i,j)-th element of the transition matrix for the automaton. This is X_ij(x) where

X(x) = (X_ij(x)) = (x 1)
(0 x)

On the other hand, we also need to replace Nil with a version that respects transitions too. As Nil takes no arguments, it must correspond to paths of length zero in the automaton. The only such paths are zero length paths from a state to itself. So the matrix for such paths is:

I = (1 0)

(0 1)

Let's also define the matrix L(x)= (L_ij(x)).

The words above boil down to:

L_ij(x) = I_ij+Σ_k X_ik(x)L_kj(x)

where the sum is over all the places we might visit on the first step of the journey from i to j.

We can rewrite this using standard matrix notation:

L(x) = I + X(x)L(x)

Compare with the definition of ordinary lists given above. We get the type of constrained lists by taking the original definition of a list and replacing everything with matrices. We replace 1 with I. We replace x with the transition matrix of the automaton. And we replace the structure we're trying to define with a family of types - one for each pair of start and end states for the automaton. We can describe this replacement more formally: it's a homomorphism from the set of types to the set of matrices of types. (Actually, it's a bit more subtle than that. This isn't quite the usual semiring of types. For one thing, the order of multiplication matters.) And it doesn't just apply to lists. We can apply this rule to any container type. For example, suppose we wish to repeat the above for trees. Then we know that for ordinary binary trees, t(x), we have

t(x) = x+t(x)²

We replace this with

T(x) = X(x)+T(x)²

We started this section by considering the specific case of the pattern x^*1x^* with a corresponding matrix X(x). Because X₁₀=0 and X₀₀=X₁₁ it's not hard to see that any type T we constrain using this regular expression will also have similar 'shape', ie. T₁₀=0 and T₀₀=T₁₁. So we can write

T = (t(x) t'(x))

(0 t(x))

where by definition, t(x)=T₀₀(x) and t'(x)=T₀₁(x). Suppose we have two such collections of types, (S_ij) and (T_ij). Now consider the types of pairs where the first element is of type S_ij and the second of T_jk. Then the leaves of the pair structure correspond to a path from i to k. So we have

(st(x) (st)'(x)) = (s(x) s'(x)) (t(x) t'(x))

(0 st(x)) (0 s(x)) (0 t(x))

Multiply out and we find that

(st)'(x) = s(x)t'(x)+s'(x)t(x)

In other words, the usual Leibniz rule for differentiation is nothing more than a statement about transitions for the automaton I drew above. To get a transition 0→1 you either go 0→0→1 or 0→1→1.

Although I talked specifically about differentiation, much of what I said above applies for any finite state automaton whose edges are labelled by types. The best thing now is probably to put together some code to see how this all looks.

A Specification
If you haven't checked Brent Yorgey's solution to my problem last week, now is a good opportunity. My code is a generalisation of that but it may be helpful to look at Brent's specialisation first.

The goal is to be able to define a transition matrix like this. ('K' is an abbreviation for 'constant' matrix.)

> type D a = K22 (a,    ())
>                (Void, a )

And then define a functor like this:

> type ListF = I :+: (X :*: Y)

Think of ListF being a bifunctor taking arguments X and Y.

We'd like then to able to form the matrix of fixed points of Y = ListF X Y. In this case, ordinary lists should appear as the element at position (0,0) in the matrix:

> type List x = Fix (I0, I0) (D x) ListF

I'm using In to represent the integer n at the type level.

Derivatives of lists should appear at (0,1) so we want

> type List' x = Fix (I0, I1) (D x) ListF

But Fix is intended to be completely generic. So it needs to be defined in a way that also works for trees:

> type TreeF = X :+: (Y :*: Y)
> type Tree  = Fix (I0, I0) (D Int) TreeF
> type Tree' = Fix (I0, I1) (D Int) TreeF

And of course it needs to work with other transition matrices. For example x^*1y^*1z^* has the following transition diagram and matrix:

> type E x y z = K33 (x,    (),   Void)
>                    (Void, y,    ()  )
>                    (Void, Void, z   )

So we'd expect

> type DDTree x y z = Fix (I0, I1) (E x y z) TreeF

to define the second divided difference of trees.

The Implementation

We'll need a type with no elements:

> data Void

And some type level integers:

> data Zero
> data S a
> type I0 = Zero
> type I1 = S I0
> type I2 = S I1
> type I3 = S I2

Now we'll need some type-level matrices. For any square matrix, we need a type-level function to give its dimension and another to access its (i, j)-th element:

> type family Dim x :: *
> type family (:!) x ij :: *

(Thanks to Brent's tutorial that code is much better than how it used to look.)

We can now define matrix addition through pointwise addition of the elements:

> data (:+) m n
> type instance Dim (m :+ n) = Dim m
> type instance (m :+ n) :! ij = Either (m :! ij) (n :! ij)

And similarly we can define multiplication. I'm using the type-level function Product' to perform the loop required in the definition of matrix multiplication:

> data (:*) m n
> type instance Dim (m :* n) = Dim m
> type instance (m :* n) :! ij = Product' I0 (Dim m) m n :! ij

> data Product' i k m n
> type instance Product' p I1 m n :! (i, j) = (m :! (i, p), n :! (p, j))
> type instance Product' p (S (S c)) m n :! (i, j) = Either
>    (m :! (i, p), n :! (p, j))
>    (Product' (S p) (S c) m n :! (i, j))
>

(Weird seeing all that familiar matrix multiplication code at the type level.)

Now we need some types to represent our functors:

> data I
> data X
> data Y
> data K n
> data (f :+: g)
> data (f :*: g)
> data F m f

(I think phantom empty data types should be called ethereal types.)

To turn these into usable types we need to implement the homomorphism I described above. So here are the rules laid out formally:

> type Id = K22 ((),   Void)
>               (Void, ())

> type family   Hom self m f :: *
> type instance Hom self m I = Id
> type instance Hom self m X = m
> type instance Hom self m Y = self
> type instance Hom self m (K n) = n
> type instance Hom self m (f :+: g) = Hom self m f :+ Hom self m g
> type instance Hom self m (f :*: g) = Hom self m f :* Hom self m g
> type instance Dim (F m f) = Dim m

> data Fix ij m f = Fix (Hom (F m f) m f :! ij)
> type instance (:!) (F m f) ij = Fix ij m f

That's more or less it. We can now go ahead and try to construct some elements. We could (as Brent suggests) write some smart constructors to make our life easier. But for now I'm writing everything explicitly so you can see what's going on:

> x0 = Fix (Left ())                         :: List Int
> x1 = Fix (Right (Left (1, Fix (Left ())))) :: List Int

x0 is the empty list. x1 is the list [1]. The Left and Right get a bit tedious to write. But this is intended as a proof that the concept works rather than a user-friendly API.

We can explicitly implement the isomorphism with the more familiar list type:

> iso1 :: [x] -> List x
> iso1 []     = Fix (Left ())
> iso1 (a:as) = Fix (Right (Left (a, iso1 as)))

> iso1' :: List x -> [x]
> iso1' (Fix (Left ()))              = []
> iso1' (Fix (Right (Left (a, as)))) = a : iso1' as
> iso1' (Fix (Right (Right a)))      = error "Can't be called as a is void"

So that's it! If we can write our container as the fixed point of a polynmomial functor, and if we can convert our regular expression to a finite state automaton, then Fix completely automatically builds the constrained container type.

What have we learnt?

We haven't just solved the original problem. We've shown that derivatives and dissections are special cases of a more general operation. Take a look at the definition of D x again. we can think of it as xI+Delta where Delta is the matrix

> type Delta = K22 (Void, ()  )
>                  (Void, Void)

This matrix has the property that its square is zero. It's the 'infinitesimal type' I described here. In other words, this is type-level automatic differentiation. We've also been doing type-level automatic divided differencing.

We can now go back and look at the matrix form of divided differences on wikipedia. I hope you can now see that the matrix T_id(x₀,...,x_n-1) defined there is nothing other than a transition matrix for this automaton:

In fact, we can use what we've learnt about regular expressions here to solve some numerical problems. But I won't write about that until the next article.

By the way, I think what I've described here can be viewed as an application of what Backhouse talks about in these slides.

I think that for any automaton we have a 2-category. The 0-cells are states, the 1-cells are the types associated with paths from one state to another, and the 2-cells are functions between types that respect the constraint. I haven't worked out the details however. The 2-category structure is probably important. As things stand, I've just shown how to make the types. But we don't yet have an easy way to write functions that respect these constraints. I suspect 2-categories give a language to talk about these things. But that's just speculation right now.

By the way, I couldn't write a working Show instance for Fix. Can you write one? And an implementation of arbitrary for QuickCheck?

And I hope you can now solve my problem from last week.

Leftover bits of code

K22 and K33 are constructors for 2×2 and 3×3 matrices. It would probably have been better to have used lists like Brent did.

> data K22 row0 row1
> type instance Dim (K22 row0 row1) = I2
> 
> type instance (:!) (K22 (m00, m01) row1) (I0, I0) = m00
> type instance (:!) (K22 (m00, m01) row1) (I0, I1) = m01
> type instance (:!) (K22 row0 (m10, m11)) (I1, I0) = m10
> type instance (:!) (K22 row0 (m10, m11)) (I1, I1) = m11
> 
> data K33 row0 row1 row2
> type instance Dim (K33 row0 row1 row2) = I3
> 
> type instance (:!) (K33 (m00, m01, m02) row1 row2) (I0, I0) = m00
> type instance (:!) (K33 (m00, m01, m02) row1 row2) (I0, I1) = m01
> type instance (:!) (K33 (m00, m01, m02) row1 row2) (I0, I2) = m02
> type instance (:!) (K33 row0 (m10, m11, m12) row2) (I1, I0) = m10
> type instance (:!) (K33 row0 (m10, m11, m12) row2) (I1, I1) = m11
> type instance (:!) (K33 row0 (m10, m11, m12) row2) (I1, I2) = m12
> type instance (:!) (K33 row0 row1 (m20, m21, m22)) (I2, I0) = m20
> type instance (:!) (K33 row0 row1 (m20, m21, m22)) (I2, I1) = m21
> type instance (:!) (K33 row0 row1 (m20, m21, m22)) (I2, I2) = m22

Update: I neglected to mention that there is a bit of subtlety with the issue of being able to create the same string by different walks through the automaton. I'll leave that as an exercise :-)

Saturday, August 07, 2010

Divided Differences and the Tomography of Types

Health Warning
This article assumes knowledge of Conor McBride's work on the differentiation and dissection of types, even if the early parts look deceptively simple.

Tables of Divided Differences
Given a sequence of numbers like 2, 6, 12, 20, 30 we can use the well known method of finite differences to predict the next in the series. We write the numbers in a column. In the next column we write the differences between the numbers in the previous column and iterate like so:

2
	4
6		2
	6		0
12		2
	8		0
20		2
	10
30

On the assumption that the rightmost column is zero all the way down we can extend this table to:

2
	4
6		2
	6		0
12		2
	8		0
20		2
	10		0
30		2
	12
42

We can think of this as an exercise in polynomial fitting. We have some x-values, in this case: x₀=1, x₂=2,..., x₄=5, and some y-values: y₀=2, y₁=6,..., y₄=30. We hope to fit a polynomial f so that f(x_i)=y_i. The table of finite differences has the property that the (n+1)-th column becomes constant if a degree n polynomial fits the data.

But what happens if we want to fit data to x-values that aren't equally spaced? Then we can use the *divided* difference table instead. In this case we don't record the differences between the y-values, but the quotient between the y-differences and the x-differences. For x_0,1,2,3 = 1, 2, 4, 6 and y_0,1,2,3 = 3, 6, 18, 138 we get:

1	3
		(6-3)/(2-1)=3
2	6		(6-3)/(4-1)=1
		(18-6)/(4-2)=6		(1-1)/(6-1)=0
4	18		(10-6)/(6-2)=1
		(38-18)/(6-4)=10
6	38

Again we reach zero. This is because y_i = f(x_i) = x_i²+2 so we have a quadratic again.

Let's assume y_i = f(x_i) for some set of points. We'll leave x, y and f unknown and fill out the table:

x₀	f(x₀)
		(f(x₁)-f(x₀))/(x₁-x₀)= f[x₀,x₁]
x₁	f(x₁)		(f[x₁,x₂]-f[x₀,x₁])/(x₂-x₀)= f[x₀,x₁,x₂]
		(f(x₂)-f(x₂))/(x₂-x₁)= f[x₁,x₂]		(f[x₁,x₂,x₃]-f[x₀,x₁,x₂])/(x₃-x₀)= f[x₀,x₁,x₂,x₃]
x₂	f(x₂)		(f[x₂,x₃]-f[x₁,x₂])/(x₃-x₁)= f[x₁,x₂,x₃]
		(f(x₃)-f(x₂))/(x₃-x₂)= f[x₂,x₃]
x₃	f(x₃)

Note that for any f, this table also defines the generalised divided differences f[x,y], f[x,y,z] and so on. (Compare with the notation at Wikipedia).

Divided Differences of Types
Now suppose that f isn't a function of real numbers, but is a container type. Then as I noted here, the second column corresponds to Conor McBride's dissection types. f[x,y] is an f-container with a hole, with everything to the left of the hole containing an x, and everything to the right of the hole containing a y.

So what's the next column? Consider f(x)=x⁴. Then f[x,y] = (y⁴-x⁴)/(y-x) = x³+x²y+xy²+y³. This corresponds to the description I just gave involving a hole. Now consider f[x,y,z] = (f[y,z]-f[x,y])/(z-x) = z²+xz+x²+y²+yz+yx. This is the type consisting of a 4-tuple, with two holes, and with x left of the first hole, y between the first and second holes, and z to the right of the second hole. In other words, it's a trisection of a container. More generally, f[x₀,...,x_n] is the type corresponding to n holes and blocks of elements of x_i between them. I've only shown this for the type x⁴ but it works for all the same types in Conor's paper. But there's a catch: I've used a definition involving subtraction, and that makes no sense for types. Don't worry, I'll address that later.

We can give trisections and so on a computational interpretation like in Conor's paper. Dissections correspond to tail recursion elimination. They make explicit the state of a function traversing and transforming a recursive type. (I recommend Conor's description in the paper so I won't reproduce it here.) Trisections correspond to a pair of coroutines. The first is transforming a recursive type. The second is transformed the result of the first routine. The second one can (optionally) start consuming pieces of tree as soon as the first has started producing them. At any moment in time we have 3 pieces: (1) the part of tree that is as yet untouched, (2) the part that the first coroutine has produced, but which the second hasn't seen yet, and (3) the output from the second coroutine. Make that explicit, and you get a trisection. The deal's much the same for quadrisections and so on.

Rediscovering Newton
A consequence of this is that we can now give a computational interpretation to much of the description of divided differences at Wikipedia. Among other things, when Conor derives a type theoretical analogue of the Taylor series, he's actually rediscovering a form of the Newton polynomial.

Type Matrices
One interesting property of divided differences is that they have a description in terms of matrices. In particular, Wikipedia describes a homomorphism from the ring of functions of x to the ring of matrices whose elements are functions of n unknowns. We know that types don't form a ring, but they do form a semiring. So we can form an analogous semiring homomorphism from the set of container types to the set of matrices whose entries are *types*. Matrices of types aren't something we see every day. How can we put this notion to work?

Let's use the notation from Wikipedia and consider the matrices we get when n = 1. (At this point I recommend reading all of the Wikipedia article on divided differences.) Let's define ourselves a tree container:

> data F a = Leaf a | Form (F a) (F a)

We'll call the homomorphism T so that

T(f)(x₀,x₁)=	f(x₀)	f[x₀,x₁]
	0	f(x₁)

Our tree type satisfies the equation f(x)=x+f(x)². As we have a homomorphism, we also expect this to hold:

T(f)(x₀,x₁) = T(i)(x₀,x₁)+T(f)(x₀,x₁)² (Equation 1)

where i is the identity function i(x) = x.

We can easily compute T(i)(x₀,x₁) directly using (y-x)/(y-x)=1. We get

T(i)(x₀,x₁)=	x₀	1
	0	x₁

Multiplying out the matrices in equation 1 now gives us a bunch of equations:

f(x₀) = x₀ + f(x₀)²
f(x₁) = x₁ + f(x₁)²
f[x₀,x₁] = 1 + f(x₀)f[x₀,x₁]+f[x₀,x₁]f(x₁)

The first two equations are just the definition of our tree. The third line, in Haskell, is:

> data F' x0 x1 = Empty | ForkL (F x0) (F' x0 x1) | ForkR (F' x0 x1) (F x1)

This is essentially the application of Conor's version of the Leibniz law to trees. F' is the dissected tree. And note how we don't have any subtraction. By using the matrix formulation of divided differences we only have matrix multiplications and sums to deal with.

The image of f(x) = x+f(x)² under the homomorphism T yields the simultaneous definitions of f and its dissection. More generally, if we had chosen to work with larger n we'd get the simultaneous definition of trees, their dissections, their trisections and so on. And it'll work for any recursive container type.

So matrices of types are a meaningful concept. They give a way to organise the mutually recursive definitions of higher order divided differences. If you look at this hard enough you may also see that the matrix I defined above is playing the same role as the D type in my previous article. With a little template Haskell I think we could in fact implement automatic (higher order) divided differences at the type level.

But this is all just scratching the surface. The matrix and homomorphism defined above don't just apply to divided differences. Matrices of types have another deeper and surprising interpretation that will allow me to unify just about everything I've ever said on automatic differentiation, divided differences, and derivatives of types as well as solve a wide class of problems relating to building data types with certain constraints on them. I'll leave that for my next article.

Exercise
In preparation for the next installment, here's a problem to think about: consider the tree type above. We can easily build trees whose elements are of type A or of type B. We just need f(A+B). We can scan this tree from left to right building a list of elements of type A+B, ie. whose types are each either A or B. How can we redefine the tree so that the compiler enforces the constraint that at no point in the list, the types of four elements in a row spell the word BABA? Start with a simpler problem, like enforcing the constraint that AA never appears.

Apology
Sorry about the layout above. My literate Haskell->HTML program doesn't support tables yet so there was a lot of manual HTML. This meant that I didn't write stuff out as fully as I could have and it may be a bit sketchy in places. I may have to switch to PDF for my next post. (Or use Wordpress, but I had trouble there too. or I could use one of the Javascript TeX renderers, but I don't like the external dependency.)

A Neighborhood of Infinity

Search This Blog