A Neighborhood of Infinity

Saturday, July 04, 2009

A Monad for Combinatorial Search with Heuristics

Haskell provides a great way to perform combinatorial searching with backtracking: the list monad. Do-notation provides a nice DSL that makes it easy to express the trying out of different possibilities. But the list monad only performs a simple-minded walk through all of the alternatives giving little opportunity to direct that walk. In particular, it's not easy to provide heuristics to say things like "try this alternative first but if it starts going badly consider this alternative too". This post contains a monad that gives a simple scheme to allow programmers to direct searches in this way.

First the Haskell administrativia...


> import Data.Char
> import Control.Monad
> import Data.Monoid
> import Data.List


When using the list monad, a list is interpreted as a list of candidates in a search. The join function for this monad takes a list of lists of candidates and flattens it into a list of candidates. This is all the list monad really does: you write code that generates new candidates from old, and the >>= function applies this code to all of the candidates it knows about and then flattens this back out to a single list of candidates. Importantly it does this in a lazy way so that you only need look at candidates as they are generated.

This new monad will keep slightly more information: each candidate will have a 'penalty' value attached to it saying how attractive a candidate it is. Candidates with score 0 will be tried first, and those with score n will be tried after those with lower scores. We can represent a collection of candidates and their scores simply as a list of lists. The first list in the list will have those with score 0, the second will have those with score 1 and so on. We'll call these lists penalty lists and the positions within those lists slots.

Here's the definiton of the penalty list type:


> data PList a = P { unO :: [[a]] } deriving (Show,Eq)


It's a functor in a straightforward way:


> instance Functor PList where
> fmap f (P xs) = P (fmap (fmap f) xs)


The rule we'll adopt is that if you're trying a combination of two candidates then the penalty associated with the combination is the sum of the penalties of the individual objects. To implement this we need an alternative version of the join operation. If we have a penalty list of penalty lists and we have an element in the mth slot in the nth penalty sublist then we want it to end up in the (m+n)th slot in the final penalty list. Within a slot we can just order the elements just like in the original list monad.


> headm :: Monoid m => [m] -> m
> headm (a:as) = a
> headm [] = mempty

> tailm :: Monoid m => [m] -> [m]
> tailm (a:as) = as
> tailm [] = []

> zipm :: Monoid m => [[m]] -> [m]
> zipm ms | all null ms = []
> zipm ms = let
> heads = map headm ms
> tails = map tailm ms
> h = mconcat heads
> t = zipm (filter (not . null) tails)
> in h : t

> instance Monad PList where
> return x = P [[x]]
> x >>= f = let P xs = (fmap (unO . f) x) in P (join xs) where
> join [] = []
> join (m:ms) = let
> part1 = zipm m
> part2 = join ms
> in headm part1 : zipm [tailm part1,part2]


Explaining how join is implemented would take many words so I hope this picture of the computation of an example will do instead. I used the Monoid class simply to avoid directly referring to one level of nesting of brackets. It is intended to be a proper implementation of a Monad satisfying the three monad laws but I haven't proved this and it's possible that it occasionally leaves trailing empty lists around - which have no impact on search results.



> instance MonadPlus PList where
> mzero = P []
> mplus (P xs) (P ys) = P (zipm [xs,ys])


We can use this much like the list monad. First it will search for possibilities with zero penalty. When these are exhausted it'll backtrack to the last place where it can start finding possibilities with penalty 1. Then it'll try penalty 2 and so on. Importantly it manages to do this lazily so that we don't explore penalty n+1 until we've finished penalty n.

So now we can start using it. We'll hunt for Pythagorean triples by simply hunting through all of the triples of integers. But we'll try to find solutions where the sum of the integers is as small as possible. So as list of candidate integers we use P [[1],[2],[3]...]. In other words, the integer n has penalty n-1. Here's the code:


> ex1 = do
> x <- P $ map (\x -> [x]) [1..]
> y <- P $ map (\y -> [y]) [1..]
> z <- P $ map (\z -> [z]) [1..]
> guard $ x*x+y*y==z*z
> return $ (x,y,z)


Of course we wouldn't really search for Pythagorean triples this way. This is just an illustration of how to use the code. But note, crucially, that the equivalent code using the regular list monad would give us back no solutions. It'd start with x=1 and y=1 and then go off to infinity finding candidates for z. So as a side effect the penalty list allows us to tame some infinite searches.

Anyway, that was a simple numerical example. But this monad can be used with much more complex kinds of search. In fact it almost serves as a drop-in replacement for the list monad. This is a really nice example of the way separation of concerns is easy in Haskell. The task of generating candidates for search can easily be separated from the task of selecting from those candidates, even though the operations are highly interleaved during execution.

So here's a more complex example: writing a parser that can tolerate errors without running into combinatorial explosion. The idea is that we associate a penalty with each error. The penalty will make the parser run on the assumption of no errors until it can no longer parse, and then it'll backtrack on the assumption of one error until that assumption is no longer tenable and so on. We can liberally sprinkle 'erroneous' parsings throughout our code confident that these branches will only be taken in the event that an error-free parsing can't be found.

Firstly, here's a penalty list that we can use to introduce a penalty of just 1.


> penalty :: PList ()
> penalty = P [[],[()]]


If we stick that in the code path then anything following acquires a penalty of 1.

Now we can write a parser. We can implement Hutton's parser in his monad parsers paper with very little modification. We simply replace the usual list with the penalty list and do away with the +++ operator to allow it to be a bit more liberal about backtracking. Here's the parser type:


> newtype Parser a = Parser (String -> PList (a,String))


We could have parameterised that with the underlying monad so that we could have parsers with a choice of search strategy.

The rest is a lot like in Hutton's paper:


> parse (Parser f) x = f x

> instance Monad Parser where
> return a = Parser (\cs -> P [[(a,cs)]])
> p >>= f = Parser (\cs -> do
> (a,cs') <- parse p cs
> parse (f a) cs')

> instance MonadPlus Parser where
> mzero = Parser (\cs -> mzero)
> p `mplus` q = Parser (\cs -> parse p cs `mplus` parse q cs)

> item :: Parser Char
> item = Parser (\cs -> case cs of
> "" -> mzero
> (c:cs) -> P [[(c,cs)]])

> sat :: (Char -> Bool) -> Parser Char
> sat p = do
> c <- item
> if p c then return c else mzero

> char :: Char -> Parser Char
> char c = sat (c ==)


Now for a simple parsing problem. We'll parse simple arithmetical expressions a lot like in Hutton's paper. But I'm going to tolerate two kinds of error:
1. The shift key doesn't always work so occasionally a shifted or unshifted version of a character may appear and
2. parentheses are occasionally left out by the clumsy user.

Now we can code up a simple grammar for this. First the mapping between shifted and unshifted characters (on a Mac US keyboard):


> lowers = "1234567890-=/"
> uppers = "!@#$%^&*()_+?"
> lower x = lookup x (zip uppers lowers)
> upper x = lookup x (zip lowers uppers)

> upperChar x = case upper x of
> Nothing -> mzero
> Just y -> char y >> return x

> lowerChar x = case lower x of
> Nothing -> mzero
> Just y -> char y >> return x


A version of penalty wrapped for the parser monad:


> avoid :: Parser ()
> avoid = Parser $ \cs -> do
> penalty
> return ((),cs)


Reading keys on the assumption that the shift key may have failed:


> keyChar x = char x `mplus` (avoid >> upperChar x) `mplus` (avoid >> lowerChar x)

> digit = do
> x <- foldl mplus mzero (map keyChar "0123456789")
> return (fromIntegral (ord x-ord '0'))

> number1 :: Integer -> Parser Integer
> number1 m = return m `mplus` do
> n <- digit
> number1 (10*m+n)

> number :: Parser Integer
> number = do
> n <- digit
> number1 n

> chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a
> chainl p op a = (p `chainl1` op) `mplus` return a
> chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
> p `chainl1` op = do {a <- p; rest a}
> where
> rest a = (do
> f <- op
> b <- p
> rest (f a b)) `mplus` return a


Optional parentheses:


> shouldHave c = keyChar c `mplus` (avoid >> return c)


And the main part of the expression grammar:


> expr = term `chainl1` addop
> term = monomial `chainl1` mulop
> monomial = factor `chainl1` powop
> factor = number `mplus` do {shouldHave '('; n <- expr; shouldHave ')'; return n}
> powop = keyChar '^' >> return (^)
> addop = do {keyChar '+'; return (+)} `mplus` do {keyChar '-'; return (-)}
> mulop = do {keyChar '*'; return (*)} `mplus` do {keyChar '/'; return (div)}


Match the end of a string:


> end :: Parser ()
> end = Parser $ \cs ->
> if null cs then P [[((),"")]] else mzero


We can test it out with:


> completeExpr = do
> n <- expr
> end
> return n

> ex2 = parse completeExpr "2^(1+3"


When we run this we get no error-free parsing but we do get 3 readings with one error. One comes from reading the '(' as 9, one comes from inserting the missing ')' at the end and one comes from inserting ')' after '1'. Note that even for complex expressions we'll quickly find a 1- or 2-error parsing. For the regular list monad we might never get a parsing because there are an infinite number of ways of inserting parentheses.

Anyway, that was just a toy parsing problem. But a more complex application comes to my mind. Some written languages are tricky to parse because their orthography doesn't fully capture the phonetics of the original language, because there are few or no indicators of sentence or even word breaks, and because they have numerous optional orthographic and grammatical rules and use a script whose individual characters are occasionally hard to reliably identify. In such a situation it's good to have a parser driven by heuristics about what is likely to be intended and the penalty list monad might serve well. Here's an example of such a language.

Update: I forgot to add some connections to previous monads I've talked about:

  1. PList is a variation of the convolution monad I described here. It deals with the "wrong category" aspect so it is a true Haskell monad. Penalty lists form some kind of dual to the convolution comonad.
  2. It has much in common with this monad. That monad doesn't do anything smart about ordering searches but it does have the neat ability to 'fuse' different branches of a search so that different ways to arrive at the same place don't add to the combinatorial explosion. It's good for searches where you want to know what the minimum penalty is to get somewhere, but don't care what the best path actually is.

Also, in response to a comment on #haskell I've made the join example more complex so it's easier to generalise from it.

Saturday, June 20, 2009

Automata and the A-D-E classification.

Introduction


The A-D-E classification is a strange ubiquitous pattern that appears in many branches of mathematics. Typically it appears when you try to classify certain types of mathematical construction. If the A-D-E classification applies then you end up with two infinite sequences of cryptically named objects (A1,A2,A3,...) and (D1,D2,D3,...) as well as three leftover objects called E6, E7 and E8. Unfortunately, most of these objects and their classifications are tricky to define using only elementary mathematics. However, there is one type of object that is classified in this way that can be given a relatively straightforward computational description involving a little linear algebra and assuming you know a tiny bit about automata.

But first: why care about the A-D-E classification? Well I tried to say a little bit about how symmetries relate to nature a while back. Certain types of possible symmetry of particle physics can be classified the A-D-E way. The symmetry group corresponding to E8 is the now famous exceptional group E8. I won't be able to get to an explanation of how groups are involved. But at least I'll be able to give a hint about the bigger picture that E8 is part of.

Non-deterministic Finite State Automata


Here's a diagram representing a very simple non-deterministic finite state automaton (NDA):

It can be in one of two states. When in state A it can transition to state B and in state B it can only transition back to state B, but it can do so in two different ways.

Vector Automata


Now I'll introduce a more general kind of automaton: a vector automaton (VA). (I made that term up, it's not meant to correspond with anyone else's terminology.) Every vector automaton is built from an NDA. But each state corresponds to a finite dimensional vector space and each transition corresponds to a linear function mapping from the vector space of the source state to the vector space of the destination state. We could turn the above example into a VA by assigning a 1D vector space VA to A, a 2D vector space to VB and defining linear functions:

f : (x) -> (x,0)
g : (x,y) -> (-y,x)
h : (x,y) -> (y,-x)

A VA is just like an NDA in that it transitions from state to state according to the given transitions. But additionally it keeps track of a vector in the vector space corresponding to the current state. Each time it makes a transition the linear function corresponding to that function is applied to the vector. So in the example above, the NDA might start in state A with a scalar value x (ie. a 1D vector). When it makes its first transition its vector becomes the 2D vector (x,0) and after that each transition rotates the vector through 90 degrees clockwise or anticlockwise.

There's a lot of freedom in defining a VA given its underlying NDA. For each node you can pick any vector space you like of any finite dimension, and for each transition you can pick any linear function you like mapping between the source and target vector spaces.

Let's make this a little more formal. An NDA is a finite set of states combined with a finite set of transitions. Each transition has a source and destination, each of which is a state. That's it. You're allowed any finite number of transitions between states and a transition can have the same state as source and destination.

A VA is an NDA combined with a finite dimensional vector space attached to each state and a linear function for each transition such that the function maps from the vector space of the of the source to the vector space of the target.

Now consider this really simple NDA:

When we build a VA from this NDA, for convenience I'll call the vector spaces corresponding to A and B, A and B. And I'll call the linear function from A to B, f. How many VAs can be made from this VA? Clearly an infinite number. But a lot of them are very similar.

Suppose we assume A and B are 2-dimensional and write their elements as pairs (x,y). Suppose f(1,0)=u and f(0,1)=v and that u is not a multiple of v and both are non-zero. Then we can use u and v as a basis for B. If we write f as f' in this new basis we get f'(1,0)=(1,0) and f'(0,1)=(0,1). So by relabelling the basis of B we have actually revealed that an infinite number of choices for f reduce to the same thing apart from a change of basis in B.

Up to change of basis in A and B we find there are only three possibilities:
(1) u and v are distinct, non-zero, and not multiples of each other.
(2) u is non-zero but v is zero
(3) both u and v are zero

We started with an infinite number of possibilities for our particular choice of dimensions and ended up with just 3. We'll say that two VAs are equivalent if we can get one from the other by changing basis like this.

On the other hand, consider this NDA:

Let's choose A, B and C to be 1-dimensional vector spaces and define f, g and h as:

f(x) = x
g(x) = x
h(x) = λx

where λ is any real number. Then h(g(f(x)))=λx. We can choose any λ we like so we have an infinity of possibilities. No amount of basis change is going to change this fact. This is different from the case above because now we're comparing x and h(g(f(x))) which lie in the same vector space. So when our NDAs have loops in them, the space of possible VAs, for most choices of dimension, is infinite.

The sum of two VAs


If we have two VAs corresponding to the same NDA we can combine them together to make a single machine. The state is given by a state in the shared underlying NDA, but we now have a pair of vectors. Each time there is a transition we apply the pair of transforms to transform the two vector simultaneously. But we can encode a pair of vectors as a single vector simply by concatenating together the vector components in some basis. So this machine is just another VA. The dimension of the vector space for each state is simply the sum of the dimensions of the vector spaces in the original VAs. So, given two VAs we can sum them to get a third.

Given a VA for an NDA it may or may not be equivalent to the sum of two simpler VAs. If it isn't, it's said to be irreducible. If it is, then we can ask the same question about the simpler VAs. In this way, every VA is the sum of irreducible VAs.

Main Theorem


Now comes the main result I want to give:

If the graph underlying the NDA is one of the following list then (and only then) there is a finite number of inequivalent irreducible VAs for the NDA. All other VAs for that NDA are simply finite sums of machines from this finite set. Here's the list:

That's it! Weird huh? (Note that diagram Xn has n nodes.)

Strangely, those same diagrams (and a few more) appear (in a quite different way) when classifying the possible symmetries of fundamental particles. The symmetries are given the same names as these diagrams. And in just the same way we get those 'sporadic' symmetries leading up to E8. Those same diagrams also arise in Catastrophe Theory.

I'd like to sketch the proof, at least in one direction, but I seem to have run out of time. Another day perhaps. But note that none of these diagrams have loops for the reasons I gave above, and I've already shown that A2 gives only a finite number of choices for a certain choice of dimension.

Meanwhile, I should give the proper mathematician names for the things above. The diagrams listed above are examples of Dynkin diagrams. A non-deterministic automaton as described above is known as a quiver. A vector automaton is normally known as a representation of a quiver. And the theorem above is half of Gabriel's theorem.

Sunday, June 07, 2009

Hashing Molecules

Twenty or so years ago I worked for a pharmaceutical company that had a large database of compounds. That got me thinking about the problem of how to perform lookups based on molecular structures. If you can find a bunch of numbers that encapsulate the molecular structure then you can use them as database keys. But you need to ensure that the same molecule entered in two different ways gets mapped to the same numbers, and you'd like different molecules, such as stereoisomers, or even enantiomers to get mapped to different values.

That got me thinking and around then I came up with an idea for hashing molecules inspired by some of the mathematics I'd been doing not long before. I never got around to coding it up but twenty years later it dawned on me that I could easily do it using a similar technique to what I used for untangling tangles and translating trace diagrams. Anyway, as I've already given examples of how to translate diagrams to monadic expressions I'm going to skimp on the details in this post and just talk about things specific to molecular structures. For this post to make sense you have to understand those earlier posts.

So first comes the usual Haskell preamble:


> {-# LANGUAGE MultiParamTypeClasses,FlexibleInstances,FunctionalDependencies,GeneralizedNewtypeDeriving #-}

> module Main where

> import Data.HashTable
> import Data.List as L
> import Data.Int
> import Data.Array
> import GHC.Arr
> import qualified Data.Map as M
> import Control.Monad
> infixl 5 .+
> infixl 6 .*


I'm going to be using the vector space monad with a 4-dimensional vector space. This type labels the dimensions:


> data I = A | B | C | D deriving (Eq,Ord,Show,Ix,Enum)


n-valent atoms will be represented by functions that consume n-tuples. We'll start with a simple hash:


> c' (a,b,c,d) = hashString ("C" ++ show (a,b,c,d))


My prime motivation for using Haskell for this problem is that the code was super-easy to write and investigate. But it's inefficient. I'll talk about how to remedy this properly at the end. But for now I'm going to memoise many functions as arrays using a Memoisable type class with a memo method. So I'll be using c instead of c':

> c = memo $ \x -> symmetrise a4 c' x .* return ()

Note the use of the symmetrise function. The idea is that the 4 bonds coming out from (singly-bonded) carbon can be thought of as lying at the corners of a tetrahedron.

They have tetrahedral symmetry. So I'd like my hash to also have this symmetry so that, for example, c (i,j,k,l) == c (j,i,l,k). We can enforce this by summing over all 24 permutations of the arguments compatible with this symmetry, also known as A4. So a4 lists all of these permutations:


> a4 (i,j,k,l) = [
> (i,j,k,l),
> (j,i,l,k),
> (k,l,i,j),
> (l,k,j,i),

> (i,k,l,j),
> (i,l,j,k),

> (k,j,l,i),
> (l,j,i,k),

> (j,l,k,i),
> (l,i,k,j),

> (j,k,i,l),
> (k,i,j,l)
> ]


And symmetrise performs the summation:


> symmetrise group f x = sum (map f (group x))


We can define other molecules too. Hydrogen is easy:


> h = memo $ \a -> hashString ("H" ++ show a) .* return ()


Oxygen has S2 symmetry:


> s2 (i,j) = [ (i,j), (j,i) ]
> o' (a,b) = hashString ("O" ++ show (a,b))
> o = memo $ \x -> symmetrise s2 o' x .* return ()


You can think of a carbon atom and a hydrogen atom, say, as a pair of arrays cijkl and hi, and bonding them together as summation over a shared index. So methane would be the sum over i,j,k,l = A..D of cijklhihjhkhl. To this end, define a bond as:


> bond :: V Int32 I
> bond = return A .+ return B .+ return C .+ return D


We can make H2 like so:


> h2 = simp $ do
> i <- bond
> h ! i
> h ! i


i <- bond makes i a bond which we then attach to two hydrogen atoms. Evaluating h2 will give us the hash for hydrogen gas.

Rather than dive straight into CH4 we can construct some useful building blocks. Hydrogen with a bond already attached:


> h_ :: V Int32 I
> h_ = do
> m <- bond
> h ! m
> return m


I'm using a trailing underscore _ to indicate a free bond.

ch2_ accepts one bond and returns another. Once memoised it is, in effect, just a 4x4 matrix and can be used to rapidly build chains.


> ch2_ = memo $ \i -> simp $ do
> k <- bond
> l <- bond
> m <- bond
> h ! l
> h ! m
> c ! (i,k,l,m)
> return k


So here's code to make an alkyl chain with a free bind at the end.


> alkyl_ 0 = h_

> alkyl_ n = simp $ do
> i <- alkyl_ (n-1)
> ch2_ ! i


We can now make alkanes by attaching a hydrogen atom at the end and make methane as a special case:


> alkane n = simp $ alkyl_ n >>= (h ! )
> methane = alkane 1


Now a hydroxyl group:


> oh = memo $ \i -> simp $ do
> j <- bond
> h ! j
> o ! (i,j)


And you can have a drink on me:


> ethanol = simp $ do
> i <- alkyl_ 2
> oh ! i


Carbon double bonds turn out to be straightforward. We can simply use a pair of bonds:


> doubleBond :: V Int32 (I,I)
> doubleBond = simp $ do
> i <- bond
> j <- bond
> return (i,j)


Carbon-carbon double bonds tend not to twist and this is reflected in the hash. We'd have to apply symmetrise if we wanted twistable bonds.

We can make a pre-canned doubly bonded carbon atom pair:



> c_c = memo $ \(i,j,k,l) -> simp $ do
> (m,n) <- doubleBond
> c ! (i,j,m,n)
> c ! (m,n,k,l)


So we can build ethene like this


> ethene = simp $ do
> i <- h_
> j <- h_
> k <- h_
> l <- h_
> c_c ! (i,j,k,l)


Here's a methyl group with a free bond


> ch3_ = simp $ do
> l <- h_
> ch2_ ! l


So we can build a bunch more compounds


> propene = simp $ do
> j <- ch3_
> i <- h_
> k <- h_
> l <- h_
> c_c ! (i,j,k,l)

> cisbut2ene = simp $ do
> i <- ch3_
> j <- h_
> k <- ch3_
> l <- h_
> c_c ! (i,j,k,l)

> transbut2ene = simp $ do
> i <- h_
> j <- ch3_
> k <- ch3_
> l <- h_
> c_c ! (i,j,k,l)

> cisbut2ene' = simp $ do
> i <- h_
> j <- ch3_
> k <- h_
> l <- ch3_
> c_c ! (i,j,k,l)

> _2methylpropene = simp $ do
> i <- ch3_
> j <- ch3_
> k <- h_
> l <- h_
> c_c ! (i,j,k,l)

> _2methylpropene' = simp $ do
> i <- h_
> j <- h_
> k <- ch3_
> l <- ch3_
> c_c ! (i,j,k,l)


An interesting problem is building a benzene ring. Here's a first attempt with six free bonds:


> ring1 (p,q,r,s,t,u) = simp $ do
> i <- bond
> j <- bond
> k <- bond
> c_c ! (j,q,i,p)
> c_c ! (i,u,k,t)
> c_c ! (k,s,j,r)


The problem with that is that benzene rings are special. The electrons are 'delocalised' so that the ring has rotational symmetry. We need to sum over the two consistent ways to assign single and double bonds around the ring. For the more general case of interlocking benzene rings we must still sum over all consistent assignments.




> ring = memo $ \(p,q,r,s,t,u) -> simp $ ring1 (p,q,r,s,t,u) .+ ring1 (q,r,s,t,u,p)


And some more compunds:


> phenyl = memo $ \p -> simp $ do
> q <- h_
> r <- h_
> s <- h_
> t <- h_
> u <- h_
> ring ! (p,q,r,s,t,u)

> benzene :: V Int32 ()
> benzene = simp $ do
> i <- bond
> phenyl ! i
> h ! i

> phenol :: V Int32 ()
> phenol = simp $ do
> i <- bond
> phenyl ! i
> oh ! i

> toluene :: V Int32 ()
> toluene = simp $ do
> i <- ch3_
> phenyl ! i

> toluene' :: V Int32 ()
> toluene' = simp $ do
> p <- h_
> q <- h_
> r <- h_
> s <- ch3_
> t <- h_
> u <- h_
> ring ! (p,q,r,s,t,u)

> toluene'' :: V Int32 ()
> toluene'' = simp $ do
> p <- h_
> q <- h_
> r <- ch3_
> s <- h_
> t <- h_
> u <- h_
> ring ! (p,q,r,s,t,u)

> _1_2_dimethylbenzene = simp $ do
> p <- ch3_
> q <- ch3_
> r <- h_
> s <- h_
> t <- h_
> u <- h_
> ring ! (p,q,r,s,t,u)

> _1_3_dimethylbenzene = simp $ do
> p <- ch3_
> q <- h_
> r <- ch3_
> s <- h_
> t <- h_
> u <- h_
> ring ! (p,q,r,s,t,u)

> _1_4_dimethylbenzene = simp $ do
> p <- ch3_
> q <- h_
> r <- h_
> s <- ch3_
> t <- h_
> u <- h_
> ring ! (p,q,r,s,t,u)

> _1_5_dimethylbenzene = simp $ do
> p <- ch3_
> q <- h_
> r <- h_
> s <- h_
> t <- ch3_
> u <- h_
> ring ! (p,q,r,s,t,u)


As we might hope, the three different ways to define toluene give the same result. We also discover that 1,3- and 1,5-dimethylbenzene are the same compound (or at least probably are, this isn't a perfect hash).


> main = do
> print $ "toluene = " ++ show toluene
> print $ "toluene' = " ++ show toluene'
> print $ "toluene'' = " ++ show toluene''
> print $ "_1_2_dimethylbenzene = " ++ show _1_2_dimethylbenzene
> print $ "_1_3_dimethylbenzene = " ++ show _1_3_dimethylbenzene
> print $ "_1_4_dimethylbenzene = " ++ show _1_4_dimethylbenzene
> print $ "_1_5_dimethylbenzene = " ++ show _1_5_dimethylbenzene


Now I need to say something about performance. The above code is naive and performs many unnecessary summations. For example, hashing a long chain should only take time linear in its length. But using the above code indiscriminately could give you exponential time. A good implementation might take a divide and conquer approach: slice the molecule in half through a bunch of bonds, compute partial hashes for each half and then sew the halves together in time exponential in the number of bonds you sliced through. For the types of molecules I've seen in real pharmaceutical databases (say) this is actually pretty cheap if you're smart about the slicing. The hashes in the above code could probably be computed many thousands of times faster. As it is, you'll probably need to compile the above with optimisation.

I'm willing to bet that with small changes, and with suitable choice of matrices over the reals, we can get invariants of molecules that predict physical properties. These calulations are reminiscent of algorithms for various types of counting algorithm so at the very least they probably compute things that are meaningful from a statistical mechanics perspective.

Incidentally, this approach to stitching together atoms was inspired by an old paper by R. C. Penner on fatgraphs - nothing to do with chemistry. A few days ago he put a paper online about an application to modelling proteins.

Update: Since writing this I've found a name for what I'm doing. I'm converting a chemical structure diagram into a tensor network. There seems to be lots of literature on how to evaluate these efficiently. In effect, everything I've been doing in this blog in terms of converting diagrams to code is an example of evaluating a tensor network.




Now comes the memoisation class:


> class Memoisable ix where
> memo :: (ix -> a) -> Array ix a

> instance Memoisable I where
> memo f = array (A,D) [(i,f i) | i <- [A .. D]]

> instance Memoisable (I,I) where
> memo f = array ((A,A),(D,D)) [(i,f i) |
> p <- [A .. D],
> q <- [A .. D],
> let i = (p,q) ]

> instance Memoisable (I,I,I,I) where
> memo f = array ((A,A,A,A),(D,D,D,D)) [(i,f i) |
> p <- [A .. D],
> q <- [A .. D],
> r <- [A .. D],
> s <- [A .. D],
> let i = (p,q,r,s) ]

> instance Memoisable (I,I,I,I,I,I) where
> memo f = array ((A,A,A,A,A,A),(D,D,D,D,D,D)) [(i,f i) |
> p <- [A .. D],
> q <- [A .. D],
> r <- [A .. D],
> s <- [A .. D],
> t <- [A .. D],
> u <- [A .. D],
> let i = (p,q,r,s,t,u) ]


Missing instances from Data.Array:


> instance (Ix a1, Ix a2, Ix a3, Ix a4, Ix a5, Ix a6) => Ix (a1,a2,a3,a4,a5,a6) where
> range ((l1,l2,l3,l4,l5,l6),(u1,u2,u3,u4,u5,u6)) =
> [(i1,i2,i3,i4,i5,i6) | i1 <- range (l1,u1),
> i2 <- range (l2,u2),
> i3 <- range (l3,u3),
> i4 <- range (l4,u4),
> i5 <- range (l5,u5),
> i6 <- range (l6,u6)]

> unsafeIndex ((l1,l2,l3,l4,l5,l6),(u1,u2,u3,u4,u5,u6)) (i1,i2,i3,i4,i5,i6) =
> unsafeIndex (l6,u6) i6 + unsafeRangeSize (l6,u6) * (
> unsafeIndex (l5,u5) i5 + unsafeRangeSize (l5,u5) * (
> unsafeIndex (l4,u4) i4 + unsafeRangeSize (l4,u4) * (
> unsafeIndex (l3,u3) i3 + unsafeRangeSize (l3,u3) * (
> unsafeIndex (l2,u2) i2 + unsafeRangeSize (l2,u2) * (
> unsafeIndex (l1,u1) i1)))))

> inRange ((l1,l2,l3,l4,l5,l6),(u1,u2,u3,u4,u5,u6)) (i1,i2,i3,i4,i5,i6) =
> inRange (l1,u1) i1 && inRange (l2,u2) i2 &&
> inRange (l3,u3) i3 && inRange (l4,u4) i4 &&
> inRange (l5,u5) i5 && inRange (l6,u6) i6


And the same vector space monad I've used many times before. Strictly speaking, it's more like a semiring module monad as Int32 isn't a field.


> swap (x,y) = (y,x)

> class Num k => VectorSpace k v | v -> k where
> zero :: v
> (.+) :: v -> v -> v
> (.*) :: k -> v -> v
> (.-) :: v -> v -> v
> v1 .- v2 = v1 .+ ((-1).*v2)

> data V k a = V { unV :: [(k,a)] } deriving (Show)

> reduce x = filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ x
> simp (V x) = V (reduce x)

> instance (Ord a,Num k) => Eq (V k a) where
> V x==V y = reduce x==reduce y

> instance (Ord a,Num k,Ord k) => Ord (V k a) where
> compare (V x) (V y) = compare (reduce x) (reduce y)

> instance Num k => Functor (V k) where
> fmap f (V as) = V $ map (\(k,a) -> (k,f a)) as

> instance Num k => Monad (V k) where
> return a = V [(1,a)]
> x >>= f = join (fmap f x)
> where join x = V $ concat $ fmap (uncurry scale) $ unV $ fmap unV x
> scale k1 as = map (\(k2,a) -> (k1*k2,a)) as

> instance Num r => MonadPlus (V r) where
> mzero = V []
> mplus (V x) (V y) = V (x++y)

> instance (Num k,Ord a) => VectorSpace k (V k a) where
> zero = V []
> V x .+ V y = V (x ++ y)
> (.*) k = (>>= (\a -> V [(k,a)]))

> e = return :: Num k => a -> V k a
> coefficient b (V bs) = maybe 0 id (L.lookup b (map swap (reduce bs)))

Saturday, May 16, 2009

Trace Diagrams with Monads

Knot diagrams aren't the only kind of diagram that can be translated nicely into Haskell monad notation. Other types of diagram include Penrose Spin Networks, many kinds of Feynman Diagram, Penrose Tensor Notation, birdtracks and a type of closely related diagram I hadn't met before: Trace Diagrams.

I encourage readers to check out the Wikipedia page and associated papers on trace diagrams as they give a better tutorial than I could write. My aim here is to show how those diagrams can be translated directly into working code just like with knots.

As usual, this is literate Haskell so I need these lines:


> {-# LANGUAGE MultiParamTypeClasses,FunctionalDependencies,FlexibleInstances #-}

> module Main where

> import qualified Data.Map as M
> import Control.Monad
> infixl 5 .+
> infixl 6 .*


I'll reuse my vector space monad code from before and work in a 3D space with the axes labelled X, Y and Z.

> data Space = X | Y | Z deriving (Eq,Show,Ord)


We draw vectors as little boxes with connections emerging from them:


Now recall from my knot posts that we represent a diagram with m legs at the top and n legs at the bottom as an expression that takes an m-tuple as input and returns an n-tuple "in the monad" as output.

Vectors can be represented as elements of V Float Space, for example:

> u,v,w :: V Float Space
> u = return X .- return Y
> v = return X .+ 2.* return Y
> w = return Y .- return Z

I could have emphasised that there are zero inputs at the top by using type signature () -> V Float Space instead.

Given two vectors we can form their dot product. The dot product itself is represented by a little u-shaped curve:

So the dot product of v and w is drawn as:

(The i and j are just so you can see what corresponds to what in the code below. They're not really part of the diagram.)

We can implement the dot product as

> cup :: (Space,Space) -> V Float ()
> cup (i,j) = case (i,j) of
> (X,X) -> return ()
> (Y,Y) -> return ()
> (Z,Z) -> return ()
> otherwise -> 0 .* return ()

and compute an example using

> vdotw = do
> i <- v
> j <- w
> cup (i,j)

We hook up legs of the diagram using corresponding inputs and outputs in the code.

Now consider this diagram:

If we attach another vector to the free leg then we get the dot product. So this object is a thing that maps vectors to scalars. Ie. it's a dual vector. So dual vectors are represented by diagrams with a free leg at the top. We can redraw this diagram:
In other words, turning a vector v upside down turns it into a dual vector that takes w to the dot product of v and w. Here's some code for making the dual of v.

> dual :: V Float Space -> Space -> V Float ()
> dual v i = do
> j <- v
> cup (i,j)

We can also consider cross products. These take two vectors as input and output one. So we're looking at a diagram with two legs at the top and one at the bottom. We'll use a bold dot to represent one of these:

Here's the implementation:

> cross :: (Space,Space) -> V Float Space
> cross (X,Y) = return Z
> cross (Y,Z) = return X
> cross (Z,X) = return Y

> cross (Y,X) = (-1) .* return Z
> cross (Z,Y) = (-1) .* return X
> cross (X,Z) = (-1) .* return Y

> cross _ = mzero

We can form a triple product u.(v×w) like this:

We can then abstract out the triple product bit that looks like this:

Implementing it as:

> trident :: (Space,Space,Space) -> V Float ()
> trident (i,j,k) = do
> l <- cross (i,j)
> cup (l,k)

Remember that if u, v and w give the rows of a 3x3 matrix, then u.(v×w) is the determinant of that matrix.

We can also define a dot product for dual vectors that we can draw like this:

The code looks like this:

> cap :: () -> V Float (Space,Space)
> cap () = return (X,X) .+ return (Y,Y) .+ return (Z,Z)

We can now combine the two dot products in a diagram like this:

We can write that as:

> cupcap i = do
> (j,k) <- cap ()
> cup (i,j)
> return k

We'd hope that we could pull this diagram taut and get the identity linear map. If you try applying cupcap to X, Y and Z you'll see it has exactly the same effect as return, which does indeed represent the identity.

(If you allow me to digress, I'll point out that there's something really deep going on with this almost trivial looking identity. It represents the identity map in the sense that it copies the input i to the output k. Imagine we were dealing with the trivial monad, ie. the one that just wraps values. Then no matter how cup and cap were implemented it would be impossible for k to be a copy of i. If you follow the flow of information through that code then i disappears into cup and k is read from cap without it seeing i. If we read from top to bottom we can think of cap as emitting a pair of objects and of cup as absorbing two. There is no way that any information about i can be communicated to k. But in the vector space monad, k can depend on i. As I've mentioned a few times over the years, the universe is described by quantum mechanics which can be described using the vector space monad. Amazingly the above piece of code, or at least something like it, can be physically realised in terms of particles. It describes a process that is fundamentally quantum, and not classical. In fact, Coecke shows that this is a precursor to quantum teleportation in section 3c of this paper. You could also think in terms of information about i being sent back in time through the cap. That's the idea behind this paper on Effective Quantum Time Travel.)

Now we can make a fork by bending down the tines of the cross product:


> fork () = do
> (i,j) <- cap ()
> (k,l) <- cap ()
> m <- cross (j,k)
> return (i,l,m)


We can write matrices as boxes with a leg for input and a leg for output. Here's an example matrix:


> a :: Space -> V Float Space
> a X = 2 .* return X
> a Y = return Z
> a Z = (-1) .* return Y

It rotates by 90 degrees around the X axis and scales the X axis by a factor of two.

With the help of our two dot products we can turn a matrix upside-down:

The corresponding code is:

> b :: Space -> V Float Space
> b l = do
> (i,j) <- cap ()
> k <- a j
> cup (k,l)
> return i

Turning a matrix upside down gives its transpose. You'll find that matrix B rotates in the opposite direction to A but still scales by a factor of two.

Surprisingly, 3! times the determinant of a 3x3 matrix A can be represented by this diagram:

So we can write a determinant routine as follows:

> det a = do
> (i,j,k) <- fork ()
> i' <- a i
> j' <- a j
> k' <- a k
> (1/6.0) .* trident (i',j',k')

(Again I've labelled the diagram so you can easily see what corresponds where in the code.)

I could keep going, but at this point I'll just defer to Elisha Peterson's paper. I hope that I've given you enough clues to be able to translate his diagrams into Haskell code, in effect giving semantics for his syntax. As an exercise, try writing code to compute the adjugate of a 3x3 matrix.

And a reminder: none of the above is intended as production-worthy code for working with 3-vectors. It is intended purely as a way to give a practical realisation of trace diagrams allow people to experimentally investigate their properties and make testable conjectures.

And now comes the library code needed to make the above code work:


> swap (x,y) = (y,x)

> class Num k => VectorSpace k v | v -> k where
> zero :: v
> (.+) :: v -> v -> v
> (.*) :: k -> v -> v
> (.-) :: v -> v -> v
> v1 .- v2 = v1 .+ ((-1).*v2)

> data V k a = V { unV :: [(k,a)] }
> instance (Num k,Ord a,Show a) => Show (V k a) where
> show (V x) = show (reduce x)

> reduce x = filter ((/=0) . fst) $ fmap swap $ M.toList $ M.fromListWith (+) $ fmap swap $ x

> instance (Ord a,Num k) => Eq (V k a) where
> V x==V y = reduce x==reduce y

> instance (Ord a,Num k,Ord k) => Ord (V k a) where
> compare (V x) (V y) = compare (reduce x) (reduce y)

> instance Num k => Functor (V k) where
> fmap f (V as) = V $ map (\(k,a) -> (k,f a)) as

> instance Num k => Monad (V k) where
> return a = V [(1,a)]
> x >>= f = join (fmap f x)
> where join x = V $ concat $ fmap (uncurry scale) $ unV $ fmap unV x
> scale k1 as = map (\(k2,a) -> (k1*k2,a)) as

> instance Num r => MonadPlus (V r) where
> mzero = V []
> mplus (V x) (V y) = V (x++y)

> instance (Num k,Ord a) => VectorSpace k (V k a) where
> zero = V []
> V x .+ V y = V (x ++ y)
> (.*) k = (>>= (\a -> V [(k,a)]))

> e = return :: Num k => a -> V k a
> coefficient b (V bs) = maybe 0 id (lookup b (map swap (reduce bs)))

Sunday, May 03, 2009

The Three Projections of Doctor Futamura

Introduction


The Three Projections of Futamura are a sequence of applications of a programming technique called 'partial evaluation' or 'specialisation', each one more mind-bending than the previous one. But it shouldn't be programmers who have all the fun. So I'm going to try to explain the three projections in a way that non-programmers can maybe understand too. But whether you're a programmer or not, this kind of self-referential reasoning can hurt your brain. At least it hurts mine. But it's a good pain, right?

So rather than talk about computer programs, I'll talk about machines of the mechanical variety. A bit like computer programs, these machines will have some kind of slot for inputting stuff, and some kind of slot where output will come out. But unlike computer programs, I'll be able to draw pictures of them to show what I'm talking about. I'll also assume these machines have access to an infinite supply of raw materials for manufacturing purposes and I'll also assume that these machines can replicate stuff - because in a computer we can freely make copies of data, until we run out of memory at least.

Minting coins


A really simple example of a machine is one that has a slot for inputting blanks, and outputs newly minted coins:

That's a dedicated $1 manufacturing machine. We could imagine that internally it stamps the appropriate design onto the blank and spits out the result.

It'd be more interesting if we could make a machine with another input slot that allowed us to input the description of the coin. By providing different inputs we could mint a variety of different coins with one machine. I'm going to adopt the convention that when we want to input a description we input a picture of the result we want. I'll draw pictures as rectangles with the subject inside them. Here's a general purpose machine manufacturing pound coins:

The same machine could make dollars, zlotys or yen. You could imagine this machine works by taking the description and then milling the coin CNC style. We call such a machine an interpreter. It interprets the instructions and produces its result.

The interpreter has a great advantage over the dedicated dollar mint. You make make any kind of coin. But it's going to run a lot slower. The dedicated minter can just stamp a coin in one go. The interpreter can't do this because every input might be different. It has to custom mill each coin individually. Is there a way to get the benefits of both types of machine? We could do this: take the coin description and instead of milling the coin directly we mill negative reliefs for both sides of the coin. We then build a new dedicated minting machine that uses these negatives to stamp out the coin. In other words we could make a machine that takes as input a coin description and outputs a dedicated machine to make that type of coin. This kind of machine making machine is called a compiler. It takes a set of instructions, but instead of executing them one by one, it makes a dedicated machine to perform them. Here's one in action:


So here are the two important concepts so far:

Interpreters: these take descriptions or instructions and use them to make the thing described.
Compilers: these take descriptions or instructions and use them to make a machine dedicated to making the thing described. The process of making such a machine from a set of instructions is known as compiling.

The Projections of Doctor Futamura help make clear the relationship between these kinds of things.

Specialisation


We need one more important concept: the specialiser. Suppose we have a machine that has two inputs slots, A and B. But now suppose that when we use the machine we find that we vary the kind of thing we put into slot B, but always end up putting the same thing into slot A. If we know that slot A will always get the same input then we could streamline the machine using our knowledge of the properties of A. This is similar to the minting situation - if we know we're always going to make $1 coins then we can dedicate our machine to that purpose. In fact, if we know that we're always going to input the same thing into slot A we don't even need slot A any more. We could just stick an A inside the machine and whenever the user inputs something to slot B, the machine would then replicate the A and then use it just as if it had been input.

In summary, given a machine with two slots A and B, and given some input suitable for slot A, we could redesign it as a machine with just a B slot that automatically, internally self-feeds the chosen item to A. But we can often do better than this. We don't need to self-feed stuff to slot A. We might be able to redesign the way the machine works based on the assumption that we always get the same stuff going into slot A. For example, in the minting example a dedicate $1 minter was more specialised than just a general purpose minter that interpreted the instructions for making a $1 coin. This process of customising a machine for a particular input to slot A is called specialisation or partial evaluation.

Now imagine we have a machine for automatically specialising designs for machines. It might have two slots: one for inputting a description for a two slot machine with slots A and B, and one for inputting stuff suitable for slot A. It would then print out a description for a customised machine with just a slot B. We could call it a specialisation machine. Here is one at work:

It's converting a description of a two input machine into a description of a one input machine.

The First Projection


The process of specialisation is similar to what I was talking about with dedicated minting machines. Rather than just have a similarity we can completely formalise this. Note that the interpreter above takes two inputs. So the design for an interpreter could be fed into the first input of a specialiser. Now we feed a description the coin we want into slot B. The specialiser whirrs away and eventually outputs a description of a machine that is an interpreter that is dedicated to making that one particular coin. The result will describe a machine with only one input suitable for blanks. In other words, we can use a specialiser as a compiler. This is the first of Doctor Futamura's Projections. Here's a picture of the process at work:

What this shows is that you don't need to make compilers. You can make specialisers instead. This is actually a very practical thing to do in the computing world. For example there are commercial products (I'm not connected with that product in any way) that can specialise code to run on a specific architecture like CUDA. It's entirely practical to convert an interpreter to a compiler with such a tool. By writing a specialiser, the purveyors of such tools allow third parties to develop their own compilers and so this is more useful than just writing a dedicated compiler.

The Second Projection


Time to kick it up a notch. The first input to the specialiser is a description of a two input machine. But the specialiser is itself a two input machine. Are you thinking what I'm thinking yet? We could stuff a description of a specialiser into the specialiser's own first input! In the first projection we provided an interpreter as input to the specialiser. If we know we're always going to want to use the same interpreter then we could streamline the specialiser to work specifically with this input. So we can specialise the specialiser to work with our interpreter like this:

But what is that machine whose description it has output? An interpreter takes as input a description of how to operate on some stuff, like turning blanks into coins. In effect, the output machine has the interpreter built into it. So it takes descriptions and outputs a machine for performing those instructions. In other words it's a compiler. If the specialiser is any good then the compiler will be good too. It won't just hide an interpreter in a box and feed it your description, it will make dedicated parts to ensure your compiler produces a fast dedicated machine. And that is Doctor Futamura's Second Projection.

The Third Projection


But we can go further. The specialiser can accept a description of a specialiser as its first input. That means we can specialise it specifically for this input. And to do that, we use a specialiser. In other words we can feed a descrption of a specialiser into both inputs of the specialiser! Here we go:

But what is the X machine that it outputs? In the second projection we pass in an interpreter as the second argument and get back a compiler. So the third projection gives us a dedicated machine for this task. The X machine accepts the description of an interpreter as input and outputs the description of a compiler. So the X machine is a dedicated interpreter-to-compiler converter. And that is the Third Projection of Doctor Futamura.

If we have a specialiser we never need to make a compiler again. We need only design interpreters that we can automatically convert to compilers. In general it's easier to write interpreters than compilers and so in principle this makes life easier for programmers. It also allows us to compartmentalise the building of compilers. We can separate the interpreter bit from the bit that fashions specific parts for a task. The specialiser does the latter so our would-be compiler writer can concentrate on the former. But who would have guessed that passing a specialiser to itself twice would give us something so useful?

Summary


So here are the projections:

  1. Compiling specific programs to dedicated machines.
  2. Making a dedicated machine for compilation from an interpreter.
  3. Making a machine dedicated to the task of converting interpreters into compilers.

There are lots of variations we can play with. I've just talked about descriptions of things without saying much about what those descriptions look like. In practice there are lots of different 'languages' we can use to express our descriptions. So variations on these projections can generate descriptions in different languages, possibly converting between them. We might also have lots of different specialisers that are themselves optimised for specific types of specialisation. The Futamura projections give interesting ways to combine these. And there are also variations for generating dedicating machines for other tasks related to compiling - like parsing the descriptions we might feed in as input.

If you want to read more on this subject there's a whole book online with example code. They're not easy things to design.

I think that specialisation is a killer feature that I'd like to see more of. Present day compilers (and here I'm talking about computers, not machines in general) are hard-coded black boxes for the task of compilation. They're not very good at allowing you to get in there and tweak the way compilation occurs - for example if you want to generate code according to a strategy you know. Specialisation is a nice alternative to simply bolting an API onto a compiler. It would make it easy for anyone to write optimising and optimised compilers for their own languages and combine such compilers with interpreters for interactive instead of offline compilation.

I learnt about this stuff, as well as lots of other stuff in my blog, from the excellent Vicious Circles. The theory is closely related to the theory of writing quines that I used for my three language quine.

And if you keep your ears to the ground you can hear rumours of a fabled fourth projection...

Saturday, April 04, 2009

Faster than a speeding photon

How to outrun a photon


I thought it would be fun to try to give a readable account of Unruh effect. It's a surprising phenomenon, and there isn't universal agreement over what exactly the theory predicts, let alone whether the effect has ever been observed. It has important implications for physics and philosophy and may even give a way to test some aspects of quantum gravity in the lab.

One way to start the story is consideration of this problem: if a photon is speeding towards you, can you outrun it? Let's simplify things a bit so that we're considering motion in one dimension.

If we're confined to one dimension, we can't dodge the photon, we can only hope to remain ahead of it. As the only things that can travel at the speed of light, c, are massless things like photons, it seems that there is no hope for a massive thing like a person in a spaceship to avoid it. The photon will always be faster than you, and so it'll catch you.

But in theory you can outrun a photon! Do you see the flaw in the above reasoning that made it seem impossible?

The best way to make things clear is to draw a diagram. We'll plot some graphs of position vs. time for some photons and spaceships. We'll have time going up the vertical axis and position along the horizontal axis. Here's an example:

I've chosen units so that one second on the vertical axis is drawn the same size as one light-second on the horizontal axis. The net result is that photons always travel at 45 degree angles to the axes. Massive objects, that travel slower than light, are confined to travel on courses that have angles of smaller than 45 degrees with respect to the vertical axis. The path of the photon is the diagonal black line and the path of a spaceship is in red. It starts to the right of the photon but as we move up the time axis the photon eventually catches up with it.

If the spaceship travels faster then it will follow an angle closer to 45 degrees. Here are a pair of paths corresponding to faster spaceships:

The faster the ship is, the further it gets before the photon catches up. But we're just putting off the inevitable. It seems that whatever we do, the photon will always catch up.

But there's a hidden assumption in the above. By drawing straight lines for the spaceship I was assuming it was travelling at a constant velocity. But there's no reason for that to be true. Here's a different path the spaceship could follow:


At no point does the red path of the spaceship meet the black path of the photon. And yet at no point does the red path reach 45 degrees to the vertical axis. In other words, the spaceship never travels at the speed of light, and yet the photon never catches up with it. Spaceships can outrun photons!

So what kind of path is that? It's actually a hyperbola and it corresponds to a spaceship accelerating at a constant rate. You might wonder how it can be constant acceleration when the speed of the spaceship never exceeds that of light. From an external observer's point of view, after a while it does look like the ship is travelling at a more or less constant velocity close to the speed of light. But from the point of view of someone on the spaceship it feels exactly like constant acceleration. So that is the path that would be taken by a spaceship with its thrusters firing at a constant rate.

An event horizon!


I chose that path so that the spaceship stays just in front of the photon. A photon that starts slightly to the right will eventually catch up with the ship. But photons starting further to the left of the ship will never reach it. This means that absolutely nothing starting to the left of the black photon path can ever catch the ship. That should sound familiar. It's exactly like a black hole. From the point of view of someone on the ship, the diagonal black line is exactly like the event horizon of a black hole. Nothing to the left of it can ever be seen by observers on the ship.

What does it look like if an observer in the ship watches something that crosses the event horizon? Here's another diagram:


Again, the black diagonal line is the path of a photon, which we know is a bit like an event horizon. The blue line is the path of an object at rest. In effect, it's falling over our apparent event horizon. Of course the blue object doesn't see any unusual phenomenon on approaching the event horizon because there's nothing really there - it's only something seen by observers in the ship. The blue object emits a series of photons (shown in green) at equal intervals. As long as these photons are emitted before the event horizon they eventually catch up with the red spaceship. But notice how they arrive at more and more widely spaced intervals. A photon released exactly at the event horizon never reaches the ship. So the viewers on the ship see the spacing between the photons get longer and longer. They'll never see the blue object cross the event horizon they'll just see it getting closer and closer until eventually it appears to freeze. Again, this is just like a black hole event horizon.

The universe looks pretty weird from the point of view of a constantly accelerating observer. Half of it is simply missing behind an event horizon. But that's just the start of the weirdness. When we throw Quantum Mechanics into the mix something much weirder happens.

Matter from a vacuum


It's well known that physicists expect black holes to emit particles as Hawking radiation. But our accelerating observer sees something like a black hole, so we might expect them to see something like Hawking radiation. We also know that an observer at rest sees no event horizon. Which means that we might predict that accelerating observers see particles that observers at rest don't. Can we take such a prediction seriously?

Let's look a bit more closely at this. According to a popular view of quantum mechanics, the vacuum is teeming with vacuum fluctuations - ephemeral particle-antiparticle pairs that briefly come into existence and then annihilate each other. In the diagram below I've drawn some of these events:


As we follow up the time axis, pairs of (complementary coloured) particles come into existence and then annihilate each other. These events are so fleeting that they have no effect on our particle detectors and we see a vacuum. But note that I've drawn one of these events straddling our apparent event horizon. From the point of view of an accelerating observer this looks like a pair of particles coming into existence, but because of the argument I sketched above, they seem to freeze near the event horizon. In other words, to an accelerating observer these fleeting events are no longer fleeting, they look like real particles coming into existence and sticking around forever. Accelerating observers appear to see particles in a vacuum!

What I've described above is absolutely not a rigourous argument. But amazingly, when you use the machinery of quantum field theory you end up making exactly the same prediction: accelerating observers see particles. This is known as the Unruh effect. When you do this properly you can compute a bit more detail. It turns out that the energies of the particles are random with exactly the same distribution as black body radiation. In other words, the vacuum looks like it has a glow corresponding to a particular temperature that is proportional to the acceleration. But it's not a bright glow. You need to accelerate at about 1020 m/s2 before the temperature appears to be 1K. Building a thermometer that can survive such accelerations is no mean feat. So it looks like the Unruh effect is a curiosity that might never be observed in the lab.

But it has been suggested that the Unruh effect has already been observed. There aren't many things that can survive that kind of acceleration, but an electron can, and an electron can behave like a thermometer. Electrons in circular particle accelerators routinely undergo the kinds of accelerations we're talking about. They do so because they are driven by a magnetic field. Now an electron has spin, so you can think of it as a bit like a little electric current running round in a loop. That means an electron is like a little dipole electromagnet. Magnets in magnetic fields tend to want to line up along the field - that's how a compass works. So electrons that spend long enough in a particle accelerator, eg. those in a storage ring, should eventually line up with the field. Lining up like this is known as polarisation, and in this particular case it's known as he Sokolov Ternov effect. But when we look at electrons in a storage ring it turns out they're not quite completely lined up, they're slightly depolarised. This is easily explained by Unruh radiation - they're constantly accelerating and so they feel themselves to be in a hot environment. The continual interaction with this hot environment causes the electron spins to be a bit randomised, so they don't all line up nicely.

Unfortunately this isn't definitive evidence for Unruh radiation because when we carry out the full calculation of the Sokolov-Ternov effect it turns out that it predicts partial depolarisation anyway. Now it looks like we don't have evidence for the Unruh effect. But it's not that simple. The Unruh effect isn't a new effect made up by a physicist. It's a prediction based on a new way of looking at fairly conventional physics. The Sokolov-Ternov effect is also predicted from standard physics, just in a different frame of reference. So maybe the partial depolarisation predicted by this effect is in fact the very same thing as the Unruh effect, just looked at from a different point of view.

What does this mean philosophically? We're used to the idea that looking at things from different angles changes how they look. Einstein extended this notion to spacetime so space and time seem different to moving observers. The Unruh effect goes one step further. Whether or not an individual particle exists depends on your point of view. Do particles not have any kind of existence independently of how we look at them? And how does it look to an observer at rest watching an accelerating observer fly by with a thermometer. Do they see a thermometer apparently responding to nothing? Or do they also see the particles once they've interacted with a thermometer? It's all so weird that some physicists take the view that the notion of the particle is outdated and we should only be talking about quantum mechanical wavefunctions.

There's another reason why the Unruh effect is important. The Unruh effect doesn't involve General Relativity, which is all about curved spacetime. But it does use the same mathematical machinery so it gives a way to test out that mathematics. So even if we don't have laboratory black holes to play with, we may still be able to investigate the mathematical framework that predicts phenomena like the Hawking effect. But also note that at least one paper claims it's all a mathematical error and there is no Unruh effect in reality.

In recent years there has been a dramatic increase in the number of papers on the Unruh effect. I expect this trend is going to keep going for a while

References



  1. I learnt about the Unruh effect from Wald's book Quantum field theory in curved spacetime and black hole thermodynamics.
  2. The diagram showing particle creation/annihilation events straddling the event horizon came from Susskind and Lindesay's An Introduction to Black Holes, Information and the String Theory Revolution.
  3. I tried to catch up with recent developments by reading some of the paper The Unruh effect and its applications.

Saturday, March 07, 2009

Dinatural Transformations and Coends

Abstract nonsense warning: my goal here is simply to understand what the definition of a coend means in the context of Haskell.


> {-# LANGUAGE Rank2Types,ExistentialQuantification #-}


Pick a type. Let's call it X. Now consider the type

forall a . (a -> X,a) -> X

What can we say about elements of this type? There's an obvious example of such a function, eval that applies the first element of the pair to the second.


> eval (f,x) = f x


It's also easy to construct others. What property do they all share? Well there's a nice way to answer this question. We can use Janis Voigtländer's Free Theorem Generator. After a bit of simplification it tells us that if f is in

forall a . (a -> X,a) -> X

then for any compatible g

f (x,g y) = f (x . g,y)

We can apply any function we like to the second argument and we get exactly the same result by pre-composing with the first argument.

This is a pretty strong property - it puts a big constraint on what f can do. It holds because any function of type forall a . (a -> X,a) -> X maps to a type X that makes no reference to the quantified a. It can't let any information about the type a escape. And this means that f has to somehow eliminate an element of a and and element of a->X. There's only one non-trivial way to do that: provide the former to the latter as a function argument. So f must factor as h . eval for some function h. The free theorem comes from the fact that eval (x,g y) = eval (x.g,y). Or, f could be the constant function, but that still factors in the same way.

Let's try something slightly more complex. Consider the type

forall a . (a -> X,[a]) -> X

Again we can ask the free theorem generator to give us a property. We're told that if f is of this type, then for any compatible g

f(x,map g y) = f(x . g,y)

The crucial point this time is that in order to eliminate the [a] we have to use fmap to apply the function of type a -> X. So any function of this type must factor through fmap, and that's why the free theorem follows.

Now it's time to put this in a more general framework. We can write both of the above examples as elements of type forall a . S a a -> X where S is some type constructor. In both cases we also have that S a is a functor. But it's also a cofunctor in its first argument. Intuitively this says that S a b contains or produces b's but consumes a's. Here's a class to express this:


> class Difunctor h where
> lmap :: (b -> a) -> h a c -> h b c
> rmap :: (c -> d) -> h a c -> h a d


(I made up the word Difunctor.)

For (co)functoriality we insist on the laws

lmap (f . g) = lmap g . lmap f
rmap (f . g) = rmap f . rmap g


We can make both of the examples above instances:


> data Ex1 x a b = Ex1 (a -> x) b

> instance Difunctor (Ex1 x) where
> lmap f (Ex1 g x) = Ex1 (g . f) x
> rmap f (Ex1 g x) = Ex1 g (f x)

> data Ex2 x a b = Ex2 (a -> x) [b]

> instance Difunctor (Ex2 x) where
> lmap f (Ex2 g x) = Ex2 (g . f) x
> rmap f (Ex2 g x) = Ex2 g (map f x)


Our free theorems were essentially about this type:


> type DiNatural s x = forall a . (s a a -> x)


and in both cases the free theorem could be written as

f . lmap g == f . rmap g

This is a highly non-trivial result. Look at the preconditions for this theorem: they say something about each of the arguments to s individually. And yet we deduce that the two arguments are in fact intimately connected. We can apply g using either lmap or rmap and get the same result. This property is known as dinaturality. More precisely, if for some X, f :: s a a -> X, where s is a difunctor, then f is dinatural if for all compatible g,

f . lmap g == f . rmap g


We have this theorem: if s is an instance of the Difunctor class (obeying its laws) then for any X, elements of DiNatural s X are dinatural.

(Compare with the definition of naturality: f . fmap g = fmap g . f.)

I'm tempted to call this "Dinaturality for Free" but there's already a paper by that name and I don't know what it's about. And note that I only claim it's a theorem. I don't actually know how to prove this uniformly for all difunctors, but I'd stake a beer on it. At least as long as we don't do any weird haskell coding (so our free theorems are always valid) and as long as we restrict ourselves to functions that always terminate.

In the examples I gave above the reason why this holds is related to the fact that any function of the given type can be factored as something following an evaluation type function. More generally, if s is a dinatural then if there is a single Y, and function i :: s a a -> Y, such that every function f :: s a a -> X, for any X, can be factored as f = h . i, then Y is said to be the coend of s.

There are different approaches to computing a coend. Above I've used inspection. The coend of (a -> X,a) is X. But there's also a kind of cheating approach where we can use an existential type to get the type uniformly for all dinaturals:


> data Coend s = forall a . Coend (s a a)


For an explanation of why that works, see Edward Kmett's post on Kan Extensions. For this example, this means we expect the types t and Coend (Ex1 t) to be isomorphic. Here is the isomorphism and its inverse:


> iso :: Coend (Ex1 t) -> t
> iso (Coend (Ex1 f x)) = f x

> iso' :: t -> Coend (Ex1 t)
> iso' x = Coend (Ex1 (const x) ())


I'll leave the proof that iso and iso' are mutual inverses to you.

I hope that gives some idea of what a coend is. Informally it captures the method by which a dinatural transformation of type forall a . s a a -> X is able to eliminate the quantified a. If you look through the Haskell libraries you'll find many dinaturals (or at least things that can be made dinatural through the use of uncurry).

This code and description was inspired by the discussion at nLab.

Blog Archive

About Me