tag:blogger.com,1999:blog-112951322017-06-21T19:32:16.455-07:00A Neighborhood of InfinityDan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.comBlogger293125tag:blogger.com,1999:blog-11295132.post-4318209395803516382017-06-06T20:32:00.000-07:002017-06-07T17:41:04.895-07:00A relaxation technique<BR><b>Introduction</b><p>Sometimes you want to differentiate the expected value of something. I've written about some tools that can help with this. For example you can use <a href="http://blog.sigfpe.com/2005/07/automatic-differentiation.html">Automatic Differentiation</a> for the derivative part and <a href="http://blog.sigfpe.com/2007/03/monads-vector-spaces-and-quantum.html">probability monads</a> for the expectation. But the probability monad I described in that article computes the complete probability distribution for your problem. Frequently this is intractably large. Instead people often use Monte Carlo methods. They'll compute the "something" many times, substituting pseudo-random numbers for the random variables, and then average the results. This provides an estimate of the expected value and is ubiquitous in many branches of computer science. For example it's the basis of ray-tracing and path-tracing algorithms in 3D rendering, and plays a major role in machine learning when used in the form of stochastic gradient descent. <P><BR>But there's a catch. Suppose we want to compute <img src="https://chart.googleapis.com/chart?cht=tx&chl=E%28f%28X_1%2CX_2%2C%5Cldots%2CX_n%29%29" style="vertical-align:middle"> where each of the <img src="https://chart.googleapis.com/chart?cht=tx&chl=X_i" style="vertical-align:middle"> belong to the Bernoulli distribution <img src="https://chart.googleapis.com/chart?cht=tx&chl=B%28p%29" style="vertical-align:middle">. I.e. each <img src="https://chart.googleapis.com/chart?cht=tx&chl=X_n" style="vertical-align:middle"> has a probability <img src="https://chart.googleapis.com/chart?cht=tx&chl=p" style="vertical-align:middle"> of being 1 and probability <img src="https://chart.googleapis.com/chart?cht=tx&chl=1-p" style="vertical-align:middle"> of being 0. If we compute this using a Monte Carlo approach we'll repeatedly generate pseudo-random numbers for each of the <img src="https://chart.googleapis.com/chart?cht=tx&chl=X_i" style="vertical-align:middle">. Each one will be 0 or 1. This means that our estimate depends on <img src="https://chart.googleapis.com/chart?cht=tx&chl=p" style="vertical-align:middle"> via subexpressions that can't meaningfully be differentiated with respect to <img src="https://chart.googleapis.com/chart?cht=tx&chl=p" style="vertical-align:middle">. So how can we use automatic differentiation with the Monte Carlo method? I'm proposing an approach that may or may not already be in the literature. Whether it is or not, I think it's fun to get there by combining many of the things I've previously talked about here, such as free monads, <a href="https://en.wikipedia.org/wiki/Negative_probability">negative probabilities</a> and automatic differentiation. I'm going to assume you're familiar with using dual numbers to compute derivatives as I've written about this before and <a href="https://en.wikipedia.org/wiki/Dual_number">wikipedia</a> has the basics. <P><BR><BR><b>A probability monad</b><p><P><BR>I want to play with a number of different approaches to using monads with probability theory. Rather than define lots of monads I think that the easiest thing is to simply work with one free monad and then provide different interpreters for it. <P><BR>First some imports: <P><BR><pre><br />> import Control.Monad<br />> import qualified System.Random as R<br />> import qualified Data.Map.Strict as M<br /><P><BR><br /></pre>I'm going to use a minimal free monad that effectively gives us a DSL with a new function that allows us to talk about random Bernoulli variables: <P><BR><pre><br />> data Random p a = Pure a | Bernoulli p (Int -> Random p a)<br /><P><BR><br /></pre>The idea is that <tt>Pure a</tt> represents the value <tt>a</tt> and <tt>Bernoulli p f</tt> is used to say "if we had a random value <tt>x</tt>, <tt>f x</tt> is the value we're interested in". The <tt>Random</tt> type isn't going to do anything other than represent these kinds of expressions. There's no implication that we actually have a random value for <tt>x</tt> yet. <P><BR><pre><br />> instance Functor (Random p) where<br />> fmap f (Pure a) = Pure (f a)<br />> fmap f (Bernoulli p g) = Bernoulli p (fmap f . g)<br /><P><BR><br />> instance Applicative (Random p) where<br />> pure = return<br />> (<*>) = ap<br /><P><BR><br />> instance Monad (Random p) where<br />> return = Pure<br />> Pure a >>= f = f a<br />> Bernoulli p g >>= f = Bernoulli p (\x -> g x >>= f)<br /><P><BR><br /></pre>We'll use <tt>bernoulli p</tt> to represent a random Bernoulli variable drawn from <img src="https://chart.googleapis.com/chart?cht=tx&chl=B%28p%29" style="vertical-align:middle">. <P><BR><pre><br />> bernoulli :: p -> Random p Int<br />> bernoulli p = Bernoulli p return<br /><P><BR><br /></pre>So let's write our first random expression: <P><BR><pre><br />> test1 :: Random Float Float<br />> test1 = do<br />> xs <- replicateM 4 (bernoulli 0.75)<br />> return $ fromIntegral $ sum xs<br /><P><BR><br /></pre>It sums 4 Bernoulli random variables from <img src="https://chart.googleapis.com/chart?cht=tx&chl=B%28%5Cfrac%7B3%7D%7B4%7D%29" style="vertical-align:middle"> and converts the result to a <tt>Float</tt>. The expected value is 3. <P><BR>We don't yet have a way to do anything with this expression. So let's write an interpreter that can substitute pseudo-random values for each occurrence of <tt>bernoulli p</tt>: <P><BR>It's essentially interpreting our free monad as a state monad where the state is the random number seed: <P><BR><pre><br />> interpret1 :: (Ord p, R.Random p, R.RandomGen g) => Random p a -> g -> (a, g)<br />> interpret1 (Pure a) seed = (a, seed)<br />> interpret1 (Bernoulli prob f) seed = <br />> let (r, seed') = R.random seed<br />> b = if r <= prob then 1 else 0<br />> in interpret1 (f b) seed'<br /><P><BR><br /></pre>You can use the expression <tt>R.getStdRandom (interpret1 test1)</tt> if you want to generate some random samples for yourself. <P><BR>We're interested in the expected value, so here's a function to compute that: <P><BR><pre><br />> expect1 :: (Fractional p, Ord p, R.Random p, R.RandomGen g) => Random p p -> Int -> g -> (p, g)<br />> expect1 r n g = <br />> let (x, g') = sum1 0 r n g<br />> in (x/fromIntegral n, g')<br /><P><BR><br />> sum1 :: (Ord p, Num p, R.Random p, R.RandomGen g) => p -> Random p p -> Int -> g -> (p, g)<br />> sum1 t r 0 g = (t, g)<br />> sum1 t r n g =<br />> let (a, g') = interpret1 r g<br />> in sum1 (t+a) r (n-1) g'<br /><P><BR><br /></pre>You can test it out with <tt>R.getStdRandom (expect1 test1 1000)</tt>. You should get values around 3. <P><BR>We can try completely different semantics for <tt>Random</tt>. This time we compute the entire probability distribution: <P><BR><pre><br />> interpret2 :: (Num p) => Random p a -> [(a, p)]<br />> interpret2 (Pure a) = [(a, 1)]<br />> interpret2 (Bernoulli p f) =<br />> scale p (interpret2 (f 1)) ++ scale (1-p) (interpret2 (f 0))<br /><P><BR><br />> scale :: Num p => p -> [(a, p)] -> [(a, p)]<br />> scale s = map (\(a, p) -> (a, s*p))<br /><P><BR><br /></pre>You can try it with <tt>interpret2 test1</tt>. <P><BR>Unfortunately, as it stands it doesn't collect together multiple occurrences of the same value. We can do that with this function: <P><BR><pre><br />> collect :: (Ord a, Num b) => [(a, b)] -> [(a, b)]<br />> collect = M.toList . M.fromListWith (+)<br /><P><BR><br /></pre>And now you can use <tt>collect (interpret2 test1)</tt>. <P><BR>Let's compute some expected values: <P><BR><pre><br />> expect2 :: (Num p) => Random p p -> p<br />> expect2 r = sum $ map (uncurry (*)) (interpret2 r)<br /><P><BR><br /></pre>The value of <tt>expect2 test1</tt> should be exactly 3. One nice thing about <tt>interpret2</tt> is that it is differentiable with respect to the Bernoulli parameter when this is meaningful. Unfortunately it has one very big catch: the value of <tt>interpret2</tt> can be a very long list. Even a small simulation can results in lists too big to store in the known universe. But <tt>interpret1</tt> doesn't produce differentiable results. Is there something in-between these two interpreters? <P><BR><BR><b>Importance sampling</b><p>Frequently in Monte Carlo sampling it isn't convenient to sample from the distribution you want. For example it might be intractably hard to do so, or you might have proven that the resulting estimate has a high variance. So instead you can sample from a different, but possibly related distribution. This is known as <a href="https://en.wikipedia.org/wiki/Importance_sampling">importance sampling</a>. Whenever you do this you must keep track of how "wrong" your probability was and patch up your expectation estimate at the end. For example, suppose a coin comes up heads 3/4 of the time. Instead of simulating a coin toss that comes up 3/4 of the time you could simulate one that comes up heads half of the time. Suppose at one point in the simulation it does come up heads. Then you used a probability of 1/2 when you should have used 3/4. So when you compute the expectation you need to scale the contribution from this sample by (3/4)/(1/2) = 3/2. You need so scale appropriately for every random variable used. A straightforward way to see this for the case of a single Bernoulli variable is to note that <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=E%28f%28X%29%29%20%3D%20pf%281%29%2B%281-p%29f%280%29%20%3D%20q%5Cfrac%7Bp%7D%7Bq%7Df%281%29%2B%281-q%29%5Cfrac%7B1-p%7D%7B1-q%7Df%280%29" style="vertical-align:middle">. </blockquote>We've replaced probabilities <img src="https://chart.googleapis.com/chart?cht=tx&chl=p" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=1-p" style="vertical-align:middle"> with <img src="https://chart.googleapis.com/chart?cht=tx&chl=q" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=1-q" style="vertical-align:middle"> but we had to scale <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> appropriately in each of the cases <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%280%29" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%281%29" style="vertical-align:middle"> to keep the final value the same. I'm going to call the scale value the <i>importance</i>. If we generate <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle"> random numbers in a row we need to multiply all of the <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle"> importance values that we generate. This is a perfect job for the <tt>Writer</tt> monad using the <tt>Product</tt> monoid. (See Eric Kidd's <a href="http://www.randomhacks.net/files/build-your-own-probability-monads.pdf">paper</a> for some discussion about the connection between <tt>Writer</tt> and importance sampling.) However I'm just going to write an explicit interpreter for our free monad to make it clear what's going where. <P><BR>This interpreter is going to take an additional argument as input. It'll be a rule saying what probability we should sample with when handling a variable drawn from <img src="https://chart.googleapis.com/chart?cht=tx&chl=B%28p%29" style="vertical-align:middle">. The probability should be a real number in the interval <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5B0%2C1%5D" style="vertical-align:middle">. <P><BR><pre><br />> interpret3 :: (Fractional p, R.RandomGen g) =><br />> (p -> Float) -> Random p a -> g -> ((a, p), g)<br />> interpret3 rule (Pure a) g = ((a, 1), g)<br />> interpret3 rule (Bernoulli p f) g = <br />> let (r, g') = R.random g<br />> prob = rule p<br />> (b, i) = if (r :: Float) <= prob<br />> then (1, p/realToFrac prob)<br />> else (0, (1-p)/realToFrac (1-prob))<br />> ((a, i'), g'') = interpret3 rule (f b) g'<br />> in ((a, i*i'), g'')<br /><P><BR><br /></pre>Here's the accompanying code for the expectation: <P><BR><pre><br />> expect3 :: (Fractional p, R.RandomGen g) =><br />> (p -> Float) -> Random p p -> Int -> g -> (p, g)<br />> expect3 rule r n g = <br />> let (x, g') = sum3 rule 0 r n g<br />> in (x/fromIntegral n, g')<br /><P><BR><br />> sum3 :: (Fractional p, R.RandomGen g) =><br />> (p -> Float) -> p -> Random p p -> Int -> g -> (p, g)<br />> sum3 rule t r 0 g = (t, g)<br />> sum3 rule t r n g =<br />> let ((a, imp), g') = interpret3 rule r g<br />> in sum3 rule (t+a*imp) r (n-1) g'<br /><P><BR><br /></pre>For example, you can estimate the expectation of <tt>test1</tt> using unbiased coin tosses by evaluating <tt>R.getStdRandom (expect3 (const 0.5) test1 1000)</tt>. <P><BR><BR><b>Generalising probability</b><p>Did you notice I made my code slightly more general than seems to be needed? Although I use probabilities of type <tt>Float</tt> to generate my Bernoulli samples, the argument to the function <tt>bernoulli</tt> can be of a more general type. This means that we can use importance sampling to compute expected values for generalised measures that take values in a more general algebraic structure than the interval [0,1]. For example, we could use negative probabilities. <a href="https://arxiv.org/abs/1401.2561">An Operational Interpretation of Negative Probabilities and No-Signalling Models</a> by Adamsky and Brandenberger give a way to interpret expressions involving negative probabilities. We can implement it using <tt>interpret3</tt> and the rule <tt>\p -> abs p/(abs p+abs (1-p))</tt>. Note that it is guaranteed to produce values in the range [0,1] (if you start with dual numbers with real parts that are ordinary probabilities) and reproduces the usual behaviour when given ordinary probabilities. <P><BR>Here's a simple expression using a sample from "<img src="https://chart.googleapis.com/chart?cht=tx&chl=B%282%29" style="vertical-align:middle">": <P><BR><pre><br />> test2 = do<br />> a <- bernoulli 2<br />> return $ if a==1 then 2.0 else 1.0<br /><P><BR><br /></pre>It's expected value is 3. We can get this exactly using <tt>expect2 test2</tt>. For a Monte Carlo estimate use <P><BR><pre><br />R.getStdRandom (expect3 (\back p -> abs p/(abs p+abs (1-p))) test2 1000)<br /><P><BR><br /></pre>Note that estimates involving negative probabilities can have quite high variances so try a few times until you get something close to 3 :-) <P><BR>We don't have to stick with real numbers. We can use this approach to estimate with complex probabilities (aka quantum mechanics) or other algebraic structures. <P><BR><BR><b>Discrete yet differentiable</b><p>And now comes the trick: automatic differentiation uses the algebra of dual numbers. It's not obvious at all what a probability like <img src="https://chart.googleapis.com/chart?cht=tx&chl=0.5%2B%5Cepsilon" style="vertical-align:middle"> means when <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cepsilon" style="vertical-align:middle"> is infinitesimal. However, we can use <tt>interpret3</tt> to give it meaningful semantics. <P><BR>Let'd define the duals in the usual way first: <P><BR><pre><br />> data Dual a = D { real :: a, infinitesimal :: a }<br /><P><BR><br />> instance (Ord a, Num a) => Num (Dual a) where<br />> D a b + D a' b' = D (a+a') (b+b')<br />> D a b * D a' b' = D (a*a') (a*b'+a'*b)<br />> negate (D a b) = D (negate a) (negate b)<br />> abs (D a b) = if a > 0 then D a b else D (-a) (-b)<br />> signum (D a b) = D (signum a) 0<br />> fromInteger a = D (fromInteger a) 0<br /><P><BR><br />> instance (Ord a, Fractional a) => Fractional (Dual a) where<br />> fromRational a = D (fromRational a) 0<br />> recip (D a b) = let ia = 1/a in D ia (-b*ia*ia)<br /><P><BR><br />> instance Show a => Show (Dual a) where<br />> show (D a b) = show a ++ "[" ++ show b ++ "]"<br /><P><BR><br /></pre>Now we can use the rule <tt>real</tt> to give as a real-valued probability from a dual number. The function <tt>expect3</tt> will push the infinitesimal part into the importance value so it doesn't get forgotten about. And now <tt>expect3</tt> gives us an estimate that is differentiable despite the fact that our random variables are discrete. <P><BR>Let's try an expression: <P><BR><pre><br />> test3 p = do<br />> a <- bernoulli p<br />> b <- bernoulli p<br />> return $ if a == 1 && b == 1 then 1.0 else 0.0<br /><P><BR><br /></pre>The expected value is <img src="https://chart.googleapis.com/chart?cht=tx&chl=p%5E2" style="vertical-align:middle"> and the derivative is <img src="https://chart.googleapis.com/chart?cht=tx&chl=2p" style="vertical-align:middle">. We can evaluate at <img src="https://chart.googleapis.com/chart?cht=tx&chl=p%3D0.5" style="vertical-align:middle"> with <tt>expect2 (test3 (D 0.5 1))</tt>. And we can estimate it with <P><BR><pre><br />R.getStdRandom (expect3 real (test4 (D 0.5 1)) 1000)<br /><P><BR><br /></pre>What's neat is that we can parameterise our distributions in a more complex way and we can freely mix with conventional expressions in our parameter. Here's an example: <P><BR><pre><br />> test4 p = do<br />> a <- bernoulli p<br />> b <- bernoulli (p*p)<br />> return $ p*fromIntegral a*fromIntegral b<br /><P><BR><br /></pre>Try evaluating <tt>expect2 (test4 (D 0.5 1))</tt> and <pre><br />R.getStdRandom (expect3 real (test4 (D 0.5 1)) 1000)<br /><P><BR><br /></pre>I've collected the above examples together here: <P><BR><pre><br />> main = do<br />> print =<< R.getStdRandom (interpret1 test1)<br />> print $ collect $ interpret2 test1<br />> print =<< R.getStdRandom (expect1 test1 1000)<br />> print (expect2 test1)<br />> print =<< R.getStdRandom (expect3 id test1 1000)<br />> print =<< R.getStdRandom (expect3 (const 0.5) test1 1000)<br />> print "---"<br />> print $ expect2 test2<br />> print =<< R.getStdRandom (expect3 (\p -> abs p/(abs p+abs (1-p))) test2 1000)<br />> print "---"<br />> print $ expect2 (test3 (D 0.5 1))<br />> print =<< R.getStdRandom (expect3 real (test3 (D 0.5 1)) 1000)<br />> print "---"<br />> print $ expect2 (test4 (D 0.5 1))<br />> print =<< R.getStdRandom (expect3 real (test4 (D 0.5 1)) 1000)<br /><P><BR><br /></pre><BR><b>What just happened?</b><p>You can think of a dual number as a real number that has been infinitesimally slightly <a href="https://en.wikipedia.org/wiki/Deformation_theory">deformed</a>. To differentiate something we need to deform something. But we can't deform 0 or 1 and have them stay 0 or 1. So the trick is to embed probability sampling in something "bigger", namely importance sampling, where samples carry around an importance value. This bigger thing does allow infinitesimal deformations. And that allows differentiation. This process of turning something discrete into something continuously "deformable" is generally called <a href="https://en.wikipedia.org/wiki/Relaxation_%28approximation%29">relaxation</a>. <P><BR><BR><b>Implementation details</b><p>I've made no attempt to make my code fast. However I don't think there's anything about this approach that's incompatible with performance. There's no need to use a monad. Instead you can track the importance value through your code by hand and implement everything in C. Additionally, I've previously <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.159.4990&rep=rep1&type=pdf">written</a> about the fact that for any trick involving forward mode AD there is another corresponding trick you can use with reverse mode AD. So this method is perfectly comptible with back-propagation. Note also that the dual number importances always have real part 1 which means you don't actually need to store them. <P><BR>The bad news is that the derivative estimate can sometimes have a high variance. Nonetheless, I've used it successfully for some toy optimisation problems. I don't know if this approach is effective for industrial strength problems. Your mileage may vary :-) <P><BR><BR><b>Alternatives</b><p>Sometimes you may find that it is acceptable to deform the samples from your discrete distribution. In that case you can use the <a href="https://arxiv.org/abs/1611.00712">concrete relaxation</a>. <P><BR><BR><b>Continuous variables</b><p>The above method can be adapted to work with continuous variables. There is a non-trivial step which I'll leave as an exercise but I've tested it in some Python code. I think it reproduces a <a href="http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/">standard technique</a> and it gives an alternative way to think about that trick. That article is also useful for ways to deal with the variance issues. Note also that importance sampling is normally used itself as a variance reduction technique. So there are probably helpful ways to modify the <tt>rule</tt> argument to <tt>interpret3</tt> to simultaneously estimate derivatives and keep the variance low. <P><BR><BR><b>Personal note</b><p>I've thought about this problem a couple of times over the years. Each time I've ended up thinking "there's no easy way to extend AD to work with random variables so don't waste any more time thinking about it". So don't listen to anything I say. Also, I like that this method sort of comes "for free" once you combine methods I've described previously. <P><BR><BR><b>Acknowledgements</b><p>I think it was Eric Kidd's paper on <a href="http://www.randomhacks.net/files/build-your-own-probability-monads.pdf">building probability monads</a> that first brought to my attention that there are many kinds of semantics you can use with probability theory - i.e. there are many interpreters you can write for the <tt>Random</tt> monad. I think there is an interesting design space worth exploring here. <P><BR><BR><b>Answer to exercise</b><p>I set the continuous case as an exercise above. Here is a solution. <P><BR>Suppose you're sampling from a distribution parameterised by <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta" style="vertical-align:middle"> with pdf <img src="https://chart.googleapis.com/chart?cht=tx&chl=p%28%5Ctheta%2C%20%5Ccdot%29" style="vertical-align:middle">. To compute the derivative with respect to <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta" style="vertical-align:middle"> you need to consider sampling from <img src="https://chart.googleapis.com/chart?cht=tx&chl=p%28%5Ctheta%2B%5Cepsilon%2C%20%5Ccdot%29" style="vertical-align:middle"> where <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cepsilon" style="vertical-align:middle"> is an infinitesimal. <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=p%28%5Ctheta%2B%5Cepsilon%2C%20x%29%20%3D%20p%28%5Ctheta%2C%20x%29%2B%5Cepsilon%5Cfrac%7B%5Cpartial%20p%28%5Ctheta%2C%20x%29%7D%7B%5Cpartial%5Ctheta%7D" style="vertical-align:middle">. </blockquote>As we don't know how to sample from a pdf with infinitesimals in it, we instead sample using <img src="https://chart.googleapis.com/chart?cht=tx&chl=p" style="vertical-align:middle"> as usual, but use an importance of <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%28p%28%5Ctheta%2C%20x%29%2B%5Cepsilon%5Cfrac%7B%5Cpartial%20p%28%5Ctheta%2C%20x%29%7D%7B%5Cpartial%5Ctheta%7D%29%2Fp%28%5Ctheta%2C%20x%29%20%3D%201%2B%5Cepsilon%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Ctheta%7D%5Clog%20p%28%5Ctheta%2C%20x%29" style="vertical-align:middle"></blockquote>The coefficient of the <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cepsilon" style="vertical-align:middle"> gives the derivative. So we need to compute the expectation, scaling each sample with this coefficient. In other words, to estimate <img src="https://chart.googleapis.com/chart?cht=tx&chl=E%28f%28X%29%29" style="vertical-align:middle"> we use <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7B1%7D%7BN%7D%5Csum_%7Bi%3D1%7D%5EN%20f%28x_i%29%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Ctheta%7D%5Clog%20p%28%5Ctheta%2C%20x_i%29" style="vertical-align:middle"> </blockquote>where the <img src="https://chart.googleapis.com/chart?cht=tx&chl=x_i" style="vertical-align:middle"> are drawn from the original distribution. This is exactly what is described at <a href="http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/">Shakir Mohamed's blog</a>. <P><BR><BR><b>Final word</b><p>I managed to find the method in the literature. It's part of the <a href="http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf">REINFORCE method</a>. For example, see equation (5) there. Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com1tag:blogger.com,1999:blog-11295132.post-82660760361962124902017-02-05T10:30:00.000-08:002017-02-16T07:10:25.685-08:00Logarithms and exponentials of functions<BR><b>Introduction</b><p>A popular question in mathematics is this: given a function <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle">, what is its "square root" <img src="https://chart.googleapis.com/chart?cht=tx&chl=g" style="vertical-align:middle"> in the sense that <img src="https://chart.googleapis.com/chart?cht=tx&chl=g%28g%28x%29%29%20%3D%20f%28x%29" style="vertical-align:middle">. There are many questions about this on <a href="http://mathoverflow.net/questions/tagged/fractional-iteration">mathoverflow</a> but it's also a popular subject in mathematics forums for non-experts. This question seems to have a certain amount of notoriety because it's easy to ask but hard to answer fully. I want to look at an approach that works nicely for formal power series, following from the Haskell code I wrote <a href="http://blog.sigfpe.com/2007/11/small-combinatorial-library.html">here</a>. There are some methods for directly finding "functional square roots" for formal power series that start as <img src="https://chart.googleapis.com/chart?cht=tx&chl=z%2Ba_2z%5E2%2Ba_3z%5E3%2B%5Cldots" style="vertical-align:middle">, but I want to approach the problem indirectly. When working with real numbers we can find square roots, say, by using <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Csqrt%7Bx%7D%3D%5Cexp%28%5Cfrac%7B1%7D%7B2%7D%5Clog%7Bx%7D%29" style="vertical-align:middle">. I want to use an analogue of this for functions. So my goal is to make sense of the idea of the logarithm and exponential of a formal power series as composable functions. Warning: the arguments are all going to be informal. <P><BR><BR><b>Notation</b><p>There's potential for a lot of ambiguous notation here, especially as the usual mathematical notation for <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle">th powers of trig functions is so misleading. I'm going to use <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ccirc" style="vertical-align:middle"> for composition of functions and power series, and I'm going to use the notation <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%5E%7B%5Ccirc%20n%7D" style="vertical-align:middle"> to mean the <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle">th iterate of <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle">. So <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%5E%7Bn%2B1%7D%28x%29%20%3D%20f%28x%29f%5En%28x%29" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%5E%7B%5Ccirc%20n%2B1%7D%28x%29%20%3D%20f%28f%5E%7B%5Ccirc%20n%7D%28x%29%29" style="vertical-align:middle">. As I'll be working mostly in the ring of formal power series <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%5B%5C%21%5Bz%5D%5C%21%5D" style="vertical-align:middle"> for some ring <img src="https://chart.googleapis.com/chart?cht=tx&chl=R" style="vertical-align:middle">, I'll reserve the variable <img src="https://chart.googleapis.com/chart?cht=tx&chl=z" style="vertical-align:middle"> to refer only to the corresponding element in this ring. I'll also use formal power series somewhat interchangeably with functions. So <img src="https://chart.googleapis.com/chart?cht=tx&chl=z" style="vertical-align:middle"> can be thought of as representing the identity function. To make sure we're on the same page, here are some small theorems in this notation: <ol><li><img src="https://chart.googleapis.com/chart?cht=tx&chl=z%5Emz%5En%20%3D%20z%5E%7Bm%2Bn%7D" style="vertical-align:middle"></li><li><img src="https://chart.googleapis.com/chart?cht=tx&chl=f%5E%7B%5Ccirc%20m%7D%5Ccirc%20f%5E%7B%5Ccirc%20n%7D%20%3D%20f%5E%7B%5Ccirc%20m%2Bn%7D" style="vertical-align:middle"></li><li><img src="https://chart.googleapis.com/chart?cht=tx&chl=%281%2Bz%29%5En%20%3D%20%5Csum_%7Bi%3D0%7D%5En%7Bn%5Cchoose%20i%7Dz%5En" style="vertical-align:middle"></li><li><img src="https://chart.googleapis.com/chart?cht=tx&chl=%281%2Bz%29%5E%7B%5Ccirc%20n%7D%3Dn%2Bz" style="vertical-align:middle">.</li></ol>That last one simply says that adding one <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle"> times is the same as adding <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle">. <P><BR>As I'm going to have ordinary logarithms and exponentials sitting around, as well as functional logarithms and exponentials, I'm going to introduce the notation <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D" style="vertical-align:middle"> for functional logarithm and <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D" style="vertical-align:middle"> for functional exponentiation. <P><BR><BR><b>Preliminaries</b><p>The first goal is to define a non-trivial function <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D" style="vertical-align:middle"> with the fundamental property that <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D%28f%5E%7B%5Ccirc%20n%7D%29%3Dn%5Coperatorname%7BLOG%7D%28f%29" style="vertical-align:middle"><P><BR>First, let's note some basic algebraic facts. The formal power series form a commutative ring with operations <img src="https://chart.googleapis.com/chart?cht=tx&chl=%2B" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ccdot" style="vertical-align:middle"> (ordinary multiplication) and with additive identity <img src="https://chart.googleapis.com/chart?cht=tx&chl=0" style="vertical-align:middle"> and multiplicative identity <img src="https://chart.googleapis.com/chart?cht=tx&chl=1" style="vertical-align:middle">. The formal power series form a ring-like algebraic structure with operation <img src="https://chart.googleapis.com/chart?cht=tx&chl=%2B" style="vertical-align:middle"> and partial operation <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ccirc" style="vertical-align:middle"> with additive identity <img src="https://chart.googleapis.com/chart?cht=tx&chl=0" style="vertical-align:middle"> and multiplicative identity <img src="https://chart.googleapis.com/chart?cht=tx&chl=z" style="vertical-align:middle">. But it's not actually ring or even a <a href="https://en.wikipedia.org/wiki/Near-ring">near-ring</a>. Composition isn't defined for all formal power series and even when it's defined, we don't have distributivity. For example, in general <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%5Ccirc%28g%2Bh%29%5Cne%20f%5Ccirc%20g%2Bf%5Ccirc%20h" style="vertical-align:middle">, after all there's no reason to expect <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%28g%28x%29%2Bh%28x%29%29" style="vertical-align:middle"> to equal <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%28g%28x%29%29%2Bf%28h%28x%29%29" style="vertical-align:middle">. We do have right-distributivity however, i.e. <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%28f%2Bg%29%5Ccirc%20h%20%3D%20f%5Ccirc%20g%2Bf%5Ccirc%20h" style="vertical-align:middle">, </blockquote>because <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%28f%2Bg%29%28h%28x%29%29%3Df%28h%28x%29%29%2Bg%28h%28x%29%29" style="vertical-align:middle">, </blockquote>more or less by definition of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%2B" style="vertical-align:middle">. <P><BR><BR><b>We can't use power series on our power series</b><p>There's an obvious approach, just use power series of power series. So we might tentatively suggest that <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D%28z%2Bf%29%20%3D%20f-%5Cfrac%7B1%7D%7B2%7Df%5E%7B%5Ccirc%202%7D%2B%5Cfrac%7B1%7D%7B3%7Df%5E%7B%5Ccirc%203%7D%2B%5Cldots" style="vertical-align:middle">. </blockquote>Note that I consider <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D%28z%2Bf%29" style="vertical-align:middle"> rather than <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D%281%2Bf%29" style="vertical-align:middle"> because <img src="https://chart.googleapis.com/chart?cht=tx&chl=z" style="vertical-align:middle"> is the multiplicative identity in our ring-like structure. <P><BR>Unfortunately this doesn't work. The reason is this: if we try to use standard reasoning to show that the resulting function has the fundamental property we seek we end up using distributivity. We don't have distributivity. <P><BR><BR><b>Sleight of hand</b><p>There's a beautiful trick I spotted on mathoverflow recently that allows us to bring back distributivity. (I can't find the trick again, but when I do I'll come back and add a link and credit here.) Consider the function <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28g%29" style="vertical-align:middle"> defined by <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28g%29%28f%29%20%3D%20f%5Ccirc%20g" style="vertical-align:middle">. In other words <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28g%29" style="vertical-align:middle"> is right-composition by <img src="https://chart.googleapis.com/chart?cht=tx&chl=g" style="vertical-align:middle">. (Ambiguity alert, I'm using <img src="https://chart.googleapis.com/chart?cht=tx&chl=R" style="vertical-align:middle"> here to mean <i>right</i>. It has nothing to do with the ring underlying our formal power series.) Because we have right-distributivity, <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28g%29" style="vertical-align:middle"> is a <i>bona fide</i> linear operator on the space of formal power series. If you think of formal power series as being infinitely long vectors of coefficients then <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28g%29" style="vertical-align:middle"> can be thought of as an infinitely sized matrix. This means that as long as we have convergence, we can get away with using power series to compute <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Clog%20R%28g%29" style="vertical-align:middle"> with the property that <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Clog%28R%28g%29%5En%29%20%3D%20n%5Clog%20R%28g%29" style="vertical-align:middle">. Define: <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperator%7BLOG%7D%28f%29%20%3D%20%5Clog%28R%28f%29%29z" style="vertical-align:middle">. </blockquote>We have: <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperator%7BLOG%7D%28f%29%20%3D%20%5Clog%28R%28f%29%29z%20%3D%20%5Clog%281%2B%28R%28f%29-1%29%29z" style="vertical-align:middle"></blockquote>where I'm using <img src="https://chart.googleapis.com/chart?cht=tx&chl=1" style="vertical-align:middle"> to mean the identity linear operator. And now have: <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperator%7BLOG%7D%28f%29%20%3D%20%28R%28f%29-1%29z-%5Cfrac%7B1%7D%7B2%7D%28R%28f%29-1%29%5E2z%2B%5Cfrac%7B1%7D%7B3%7D%28R%28f%29-1%29%5E3z%2B%5Cldots" style="vertical-align:middle">. </blockquote>But does it converge? Suppose <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> is of the form <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%2Ba_2x%5E2%2Ba_3x%5E3%2B%5Cldots" style="vertical-align:middle">. Then <img src="https://chart.googleapis.com/chart?cht=tx&chl=%28R%28f%29-1%29g%20%3D%20g%5Ccirc%20f-g" style="vertical-align:middle">. The leading term in <img src="https://chart.googleapis.com/chart?cht=tx&chl=g%5Ccirc%20f" style="vertical-align:middle"> is the same as the leading term in <img src="https://chart.googleapis.com/chart?cht=tx&chl=g" style="vertical-align:middle">. So <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28f%29-1" style="vertical-align:middle"> kills the first term of whatever it is applied to, which means that when we sum the terms in <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D%28f%29" style="vertical-align:middle">, we only need <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle"> to get a power series correct to <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle"> coefficients. Reusing my code from <a href="http://blog.sigfpe.com/2007/11/small-combinatorial-library.html">here</a>, I call <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D" style="vertical-align:middle"> by the name <tt>flog</tt>. Here is its implementation: <P><BR><pre><br />> import Data.Ratio<br /><P><BR><br />> flog :: (Eq a, Fractional a) => [a] -> [a]<br />> flog f@(0 : 1 : _) =<br />> flog' 1 (repeat 0) (0 : 1 : repeat 0)<br />> where flog' n total term = take (n+1) total ++ (<br />> drop (n+1) $<br />> let pz = p term<br />> in flog' (n+1) (total-map (((-1)^n / fromIntegral n) *) pz) pz)<br />> p total = (total ○ f) - total<br /><P><BR><br /></pre>The <tt>take</tt> and <tt>drop</tt> are how I tell Haskell when the first <img src="https://chart.googleapis.com/chart?cht=tx&chl=n%2B1" style="vertical-align:middle"> coefficients have been exactly computed and so no more terms are necessary. <P><BR>Does it work? <P><BR>Here's an example using the twice iterated sin function: <P><BR><pre><br />> ex1 = do<br />> let lhs = flog (sin (sin z))<br />> let rhs = 2*flog (sin z)<br />> mapM_ print $ take 20 (lhs-rhs)<br /><P><BR><br /></pre>Works to 20 coefficients. Dare we try an inverse function? <P><BR><pre><br />> ex2 = do<br />> let lhs = flog (sin z)<br />> let rhs = flog (asin z)<br />> mapM_ print $ take 20 (lhs+rhs)<br /><P><BR><br /></pre>Seems to work! <P><BR><BR><b>Exponentials</b><p>It's no good having logarithms if we can't invert them. One way to think about the exponential function is that <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cexp%28x%29%20%3D%20%5Clim_%7Bn%5Crightarrow%20%5Cinfty%7D%281%2B%5Cfrac%7Bx%7D%7Bn%7D%29%5En" style="vertical-align:middle"></blockquote>We get better and better approximations by writing the expression inside the limit as a product of more and more terms. We can derive the usual power series for <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cexp" style="vertical-align:middle"> from this, but only if right-distributivity holds. So let's try to use the above expression directly: <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D%28f%29%20%3D%20%5Clim_%7Bn%5Crightarrow%20%5Cinfty%7D%28z%2B%5Cfrac%7Bf%7D%7Bn%7D%29%5E%7B%5Ccirc%20n%7D" style="vertical-align:middle"></blockquote>and get <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D%28f%29%20%3D%20%5Clim_%7Bn%5Crightarrow%20%5Cinfty%7DR%28z%2B%5Cfrac%7Bf%7D%7Bn%7D%29%5Enz" style="vertical-align:middle">. </blockquote>Unfortunately, even though <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28g%29" style="vertical-align:middle"> is linear, <img src="https://chart.googleapis.com/chart?cht=tx&chl=R" style="vertical-align:middle"> itself isn't. So it's going to take some extra work to raise <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28z%2Bf%2Fn%29" style="vertical-align:middle"> to the power of <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle">. <P><BR>The good news is that we're dealing with the special case <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28z%2B%5Cepsilon%29" style="vertical-align:middle"> where <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cepsilon" style="vertical-align:middle"> is something small. We have <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28z%2B%5Cepsilon%29f%3Df%28z%2B%5Cepsilon%29%3Df%28z%29%2B%5Cepsilon%5Cfrac%7Bdf%7D%7Bdz%7D%2BO%28%5Cepsilon%5E2%29" style="vertical-align:middle">. </blockquote>So <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%28z%2Bf%2Fn%29" style="vertical-align:middle"> is actually <img src="https://chart.googleapis.com/chart?cht=tx&chl=1%2B%5Cfrac%7B1%7D%7Bn%7Df%5Cfrac%7Bd%7D%7Bdz%7D" style="vertical-align:middle"> modulo higher order terms. This gives us <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D%28f%29%20%3D%20%5Clim_%7Bn%5Crightarrow%20%5Cinfty%7D%281%2B%5Cfrac%7B1%7D%7Bn%7Df%5Cfrac%7Bd%7D%7Bdz%7D%29%5Enz%3D%5Cexp%28f%5Cfrac%7Bd%7D%7Bdz%7D%29z" style="vertical-align:middle">. </blockquote>This is something we can implement using the power series for ordinary <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cexp" style="vertical-align:middle">: <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D%28f%29%20%3D%20z%2Bf%2B%5Cfrac%7B1%7D%7B2%21%7Df%5Cfrac%7Bdf%7D%7Bdz%7D%2B%5Cfrac%7B1%7D%7B3%21%7Df%5Cfrac%7Bd%7D%7Bdz%7D%28f%5Cfrac%7Bdf%7D%7Bdz%7D%29%2B%5Cldots" style="vertical-align:middle">. </blockquote>In code that becomes: <P><BR><pre><br />> fexp f@(0 : 0 : _) = fexp' f 0 z 1<br />> fexp' f total term n = take (n-1) total ++ drop (n-1)<br />> (fexp' f (total+term) (map (/fromIntegral n) (f*d term)) (n+1))<br /><P><BR><br /></pre>Note how when we differentiate a power series we shift the coefficients down by one place. To counter the effect of that so as to ensure convergence we need <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> to look like <img src="https://chart.googleapis.com/chart?cht=tx&chl=a_2z%5E2%2Ba_3a%5E3%2B%5Cldots" style="vertical-align:middle">. Luckily this is exactly the kind of series <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D" style="vertical-align:middle"> gives us. <P><BR>But does it successfully invert <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D" style="vertical-align:middle">? Let's try: <P><BR><pre><br />> ex3 = do<br />> let lhs = sin z<br />> let rhs = fexp (flog (sin z))<br />> mapM_ print $ take 20 (lhs-rhs)<br /><P><BR><br /></pre>Now we can start computing fractional iterates. Square root first: <P><BR><pre><br />> ex4 = do<br />> mapM_ print $ take 20 $ fexp (flog (sin z)/2)<br /><P><BR><br /></pre>That matches the results at <a href="http://oeis.org/A048602">A048602</a> and <a href="http://oeis.org/A048603">A048603</a>. <P><BR>Cube root: <P><BR><pre><br />> ex5 = do<br />> mapM_ print $ take 20 $ fexp (flog (sin z)/3)<br /><P><BR><br /></pre>Matches <a href="http://oeis.org/A052132">A052132</a> and <a href="http://oeis.org/A052135">A052135</a>. <P><BR>And this gives an alternative to Lagrange inversion for computing power series for inverse functions: <P><BR><pre><br />> ex6 = do<br />> let lhs = fexp (-flog (sin z))<br />> let rhs = asin z<br />> mapM_ print $ take 20 (lhs-rhs)<br /><P><BR><br /></pre><BR><b>What's really going on with <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D" style="vertical-align:middle">?</b><p>Let's approach <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D" style="vertical-align:middle"> in a slightly different way. In effect, <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D" style="vertical-align:middle"> is the composition of <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle"> lots of <img src="https://chart.googleapis.com/chart?cht=tx&chl=z%2B%5Cfrac%7Bf%7D%7Bn%7D" style="vertical-align:middle"> with <img src="https://chart.googleapis.com/chart?cht=tx&chl=z" style="vertical-align:middle">. So let's try composing these one at a time, with one composition every <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7B1%7D%7Bn%7D" style="vertical-align:middle"> seconds. After one second we should have our final result. We can write this as: <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=g%280%29%20%3D%20z" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=g%28t%2B%5Cfrac%7B1%7D%7Bn%7D%29%20%3D%20g%28t%29%2B%5Cfrac%7B1%7D%7Bn%7Df%28g%28t%29%29" style="vertical-align:middle"> to first order. </blockquote>So we're solving the differential equation: <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=g%280%29%20%3D%20z" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7Bdg%7D%7Bdt%7D%20%3D%20f%28g%28t%29%29" style="vertical-align:middle"></blockquote>with <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D%28g%29%20%3D%20g%281%29" style="vertical-align:middle">. <P><BR>So <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D" style="vertical-align:middle"> is the function that solves one of the most fundamental differential equations. This also means I can use Mathematica to solve symbolically and check my results. For example, Mathematica says that the solution to <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7Bdg%7D%7Bdt%7D%3Dsin%28g%28t%29%29%5E2" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=g%280%29%3Dx" style="vertical-align:middle"></blockquote>at <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%3D1" style="vertical-align:middle"> is <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=g%281%29%20%3D%20%5Cfrac%7B%5Ctan%20z%7D%7B1-%5Ctan%20z%7D" style="vertical-align:middle"></blockquote>so let's check: <P><BR><pre><br />> ex7 = do<br />> let lhs = fexp ((sin z)^2)<br />> let rhs = atan (tan z/(1-tan z))<br />> mapM_ print $ take 20 (lhs-rhs)<br /><P><BR><br /></pre>I like this example because it leads to the generalized Catalan numbers <a href="http://oeis.org/A004148">A004148</a>: <P><BR><pre><br />> ex8 = do<br />> mapM_ print $ take 20 $ fexp (z^2/(1-z^2))<br /><P><BR><br /></pre>That suggests this question: what does <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D" style="vertical-align:middle"> mean combinatorially? I don't have a straightforward answer but solving this class of differential equation motivated the original introduction, by Cayley, of the abstract notion of a tree. See <a href="https://arxiv.org/abs/1512.00906">here</a>. <P><BR><BR><b>What is going on geometrically?</b><p>For those who know some differential geometry, The differential equation <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=g%280%29%20%3D%20z" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7Bdg%7D%7Bdt%7D%20%3D%20f%28g%28t%29%29" style="vertical-align:middle"></blockquote>describes a flow on the real line (or complex plane). You can think of <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> as being a one-dimensional vector field describing how points move from time <img src="https://chart.googleapis.com/chart?cht=tx&chl=t" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%2Bdt" style="vertical-align:middle">. When we solve the differential equation we get <a href="https://en.wikipedia.org/wiki/Integral_curve">integral curves</a> that these points follow and <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D" style="vertical-align:middle"> tells us where the points end up after one unit of time. So <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D" style="vertical-align:middle"> is the <a href="https://en.wikipedia.org/wiki/Exponential_map_%28Riemannian_geometry%29">exponential map</a>. In fact, <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D%28f%29%3D%5Cexp%28f%5Cfrac%7Bd%7D%7Bdz%7D%29z" style="vertical-align:middle"> is essentially the exponential of the vector field <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%5Cfrac%7Bd%7D%7Bdz%7D" style="vertical-align:middle"> where we're now using the differential geometer's notion of a vector field as a differential operator. <P><BR><BR><b>Final word</b><p>Unfortunately the power series you get from using <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperator%7BLOG%7D" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperator%7BEXP%7D" style="vertical-align:middle"> don't always have good convergence properties. For example, I'm not sure but I think the series for <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Csin%5E%7B%5Ccirc%201%2F2%7D%20z" style="vertical-align:middle"> has radius of convergence zero. If you truncate the series you get a half-decent approximaion to a square root in the vicinity of the origin, but the approximation gets worse, not better, if you use more terms. <P><BR><BR><b>And the rest of the code</b><p><P><BR><pre><br />> (*!) _ 0 = 0<br />> (*!) a b = a*b<br />> (!*) 0 _ = 0<br />> (!*) a b = a*b<br />> (^+) a b = zipWith (+) a b<br />> (^-) a b = zipWith (-) a b<br /><P><BR><br />> ~(a:as) ⊗ (b:bs) = (a *! b):<br />> ((map (a !*) bs) ^+ (as ⊗ (b:bs)))<br />> (○) (f:fs) (0:gs) = f:(gs ⊗ (fs ○ (0:gs)))<br />> inverse (0:f:fs) = x where x = map (recip f *) (0:1:g)<br />> _:_:g = map negate ((0:0:fs) ○ x)<br />> invert x = r where r = map (/x0) ((1:repeat 0) ^- (r ⊗ (0:xs)))<br />> x0:xs = x <br /><P><BR><br />> (^/) (0:a) (0:b) = a ^/ b<br />> (^/) a b = a ⊗ (invert b)<br /><P><BR><br />> z :: [Rational]<br />> z = 0:1:repeat 0<br /><P><BR><br />> d (_:x) = zipWith (*) (map fromInteger [1..]) x<br /><P><BR><br />> integrate x = 0 : zipWith (/) x (map fromInteger [1..])<br /><P><BR><br />> instance (Eq r, Num r) => Num [r] where<br />> x+y = zipWith (+) x y<br />> x-y = zipWith (-) x y<br />> ~x*y = x ⊗ y<br />> fromInteger x = fromInteger x:repeat 0<br />> negate x = map negate x<br />> signum (x:_) = signum x : repeat 0<br />> abs (x:xs) = error "Can't form abs of a power series"<br /><P><BR><br />> instance (Eq r, Fractional r) => Fractional [r] where<br />> x/y = x ^/ y<br />> fromRational x = fromRational x:repeat 0<br /><P><BR><br />> sqrt' x = 1 : rs where rs = map (/2) (xs ^- (rs ⊗ (0:rs)))<br />> _ : xs = x<br />> instance (Eq r, Fractional r) => Floating [r] where<br />> sqrt (1 : x) = sqrt' (1 : x)<br />> sqrt _ = error "Can only find sqrt when leading term is 1"<br />> exp x = e where e = 1+integrate (e * d x)<br />> log x = integrate (d x/x)<br />> sin x = integrate ((cos x)*(d x))<br />> cos x = [1] ... negate (integrate ((sin x)*(d x)))<br />> asin x = integrate (d x/sqrt(1-x*x))<br />> atan x = integrate (d x/(1+x*x))<br />> acos x = error "Unable to form power series for acos"<br />> sinh x = integrate ((cosh x)*(d x))<br />> cosh x = [1] ... integrate ((sinh x)*(d x))<br />> asinh x = integrate (d x/sqrt(1+x*x))<br />> atanh x = integrate (d x/(1-x*x))<br />> acosh x = error "Unable to form power series for acosh"<br />> pi = error "There is no formal power series for pi"<br /><P><BR><br />> lead [] x = x<br />> lead (a:as) x = a : (lead as (tail x))<br />> a ... x = lead a x<br /><P><BR><br />> (//) :: Fractional a => [a] -> (Integer -> Bool) -> [a]<br />> (//) a c = zipWith (\a-> \b->(if (c a :: Bool) then b else 0)) [(0::Integer)..] a<br /><P><BR><br /></pre>A direct functional square root that doesn't use <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BLOG%7D" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Coperatorname%7BEXP%7D" style="vertical-align:middle">: <P><BR><pre><br />> fsqrt (0 : 1 : fs) =<br />> let gs = (fs-(0 : gs*((0 : delta gs gs)+((2 : gs)*(gs*g)))))/2<br />> g = 0 : 1 : gs<br />> delta (g : gs) h = let g' = delta gs h<br />> in (0 : ((1 : h) * g')) + gs<br />> in g<br /></pre>Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com5tag:blogger.com,1999:blog-11295132.post-81374390082943687642017-01-09T08:33:00.000-08:002017-01-09T08:33:19.385-08:00Building free arrows from components<BR><b>Introduction</b><p>Gabriel Gonzalez has written quite a bit about the <a href="http://www.haskellforall.com/2012/07/purify-code-using-free-monads.html">practical applications of free monads</a>. And "haoformayor" wrote a <a href="http://stackoverflow.com/a/38172390">great stackoverflow post</a> on how arrows are related to strong profunctors. So I thought I'd combine these and apply them to arrows built from profunctors: free arrows. What you get is a way to use arrow notation to build programs, but defer the interpretation of those programs until later. <P><BR><BR><b>Heteromorphisms</b><p>Using the notation <a href="https://ncatlab.org/nlab/show/profunctor">here</a> I'm going to call an element of a type <tt>P a b</tt>, where <tt>P</tt> is a profunctor, a <i>heteromorphism</i>. <P><BR><BR><b>A product that isn't much of a product</b><p>As I described a <a href="http://blog.sigfpe.com/2011/07/profunctors-in-haskell.html">while back</a> you can compose profunctors. Take a look at the code I used, and also <a href="http://hackage.haskell.org/package/Chart-gtk-1.1/docs/Data-Profunctor-Composition.html">Data.Functor.Composition</a>. <P><BR><pre><br />data Compose f g d c = forall a. Compose (f d a) (g a c)<br /><P><BR><br /></pre>An element of <tt>Compose f g d c</tt> is just a pair of heteromorphisms, one from each of the profunctors, <tt>f</tt> and <tt>g</tt>, with the proviso that the "output" type of one is compatible with the "input" type of the other. As products go it's pretty weak in the sense that no composition happens beyond the two objects being stored with each other. And that's the basis of what I'm going to talk about. The <tt>Compose</tt> type is just a placeholder for pairs of heteromorphisms whose actual "multiplication" is being deferred until later. This is similar to the situation with the free monoid, otherwise known as a list. We can "multiply" two lists together using <tt>mappend</tt> but all that really does is combine the elements into a bigger list. The elements themselves aren't touched in any way. That suggests the idea of using profunctor composition in the same way that <tt>(:)</tt> is used to pair elements and lists. <P><BR><BR><b>Free Arrows</b><p>Here's some code: <P><BR><pre><br />> {-# OPTIONS -W #-}<br />> {-# LANGUAGE ExistentialQuantification #-}<br />> {-# LANGUAGE Arrows #-}<br />> {-# LANGUAGE RankNTypes #-}<br />> {-# LANGUAGE TypeOperators #-}<br />> {-# LANGUAGE FlexibleInstances #-}<br /><P><BR><br />> import Prelude hiding ((.), id)<br />> import Control.Arrow<br />> import Control.Category<br />> import Data.Profunctor<br />> import Data.Monoid<br /><P><BR><br />> infixr :-<br /><P><BR><br />> data FreeA p a b = PureP (a -> b)<br />> | forall x. p a x :- FreeA p x b<br /><P><BR><br /></pre>First look at the second line of the definition of <tt>FreeA</tt>. It says that a <tt>FreeA p a b</tt> might be a pair consisting of a head heteromorphism whose output matches the input of another <tt>FreeA</tt>. There's also the <tt>PureP</tt> case which is acting like the empty list <tt>[]</tt>. The reason we use this is that for our composition, <tt>(->)</tt> acts a lot like the identity. In particular <tt>Composition (->) p a b</tt> is isomorphic to <tt>p a b</tt> (modulo all the usual stuff about non-terminating computations and so on). This is because an element of this type is a pair consisting of a function <tt>a -> x</tt> and a heteromorphism <tt>p x b</tt> for some type <tt>x</tt> we don't get to see. We can't project back out either of these items without information about the type of <tt>x</tt> escaping. So the only thing we can possibly do is use <tt>lmap</tt> to apply the function to the heteromorphism giving us an element of <tt>p a b</tt>. <P><BR>Here is a special case of <tt>PureP</tt> we'll use later: <P><BR><pre><br />> nil :: Profunctor p => FreeA p a a<br />> nil = PureP id<br /><P><BR><br /></pre>So an element of <tt>FreeA</tt> is a sequence of heteromorphisms. If heteromorphisms are thought of as operations of some sort, then an element of <tt>FreeA</tt> is a sequence of operations waiting to be composed together into a program that does something. And that's just like the situation with free monads. Once we've build a free monad structure we apply an interpreter to it to evaluate it. This allows us to separate the "pure" structure representing what we want to do from the code that actually does it. <P><BR>The first thing to note is our new type is also a profunctor. We can apply <tt>lmap</tt> and <tt>rmap</tt> to a <tt>PureP</tt> function straightforwardly. We apply <tt>lmap</tt> directly to the head of the list and we use recursion to apply <tt>rmap</tt> to the <tt>PureP</tt> at the end: <P><BR><pre><br />> instance Profunctor b => Profunctor (FreeA b) where<br />> lmap f (PureP g) = PureP (g . f)<br />> lmap f (g :- h) = (lmap f g) :- h<br />> rmap f (PureP g) = PureP (f . g)<br />> rmap f (g :- h) = g :- (rmap f h)<br /><P><BR><br /></pre>We also get a strong profunctor by applying <tt>first'</tt> all the way down the list: <P><BR><pre><br />> instance Strong p => Strong (FreeA p) where<br />> first' (PureP f) = PureP (first' f)<br />> first' (f :- g) = (first' f) :- (first' g)<br /><P><BR><br /></pre>We can now concatenate our lists of heteromorphisms using code that looks a lot like the typical implementation of <tt>(++)</tt>: <P><BR><pre><br />> instance Profunctor p => Category (FreeA p) where<br />> id = PureP id<br />> g . PureP f = lmap f g<br />> k . (g :- h) = g :- (k . h)<br /><P><BR><br /></pre>Note that it's slightly different to what you might have expected compared to <tt>(++)</tt> because we tend to write composition of functions "backwards". Additionally, there is another definition of <tt>FreeA</tt> we could have used that's analogous to using snoc lists instead of cons lists. <P><BR>And now we have an arrow. I'll leave the proofs that the arrow laws are obeyed as an exercise :-) <P><BR><pre><br />> instance (Profunctor p, Strong p) => Arrow (FreeA p) where<br />> arr = PureP<br />> first = first'<br /><P><BR><br /></pre>The important thing about free things is that we can apply interpreters to them. For lists we have folds: <P><BR><pre><br />foldr :: (a -> b -> b) -> b -> [a] -> b<br /><P><BR><br /></pre>In <tt>foldr f e</tt> we can think of <tt>f</tt> as saying how <tt>(:)</tt> should be interpreted and <tt>e</tt> as saying how <tt>[]</tt> should be interpreted. <P><BR>Analogously, in <tt>Control.Monad.Free</tt> in the <tt>free</tt> package we have: <P><BR><pre><br />foldFree :: Monad m => (forall x . f x -> m x) -> Free f a -> m a<br />foldFree _ (Pure a) = return a<br />foldFree f (Free as) = f as >>= foldFree f<br /><P><BR><br /></pre>Given a natural transformation from <tt>f</tt> to <tt>m</tt>, <tt>foldFree</tt> extends it to all of <tt>Free f</tt>. <P><BR>Now we need a fold for free arrows: <P><BR><pre><br />> foldFreeA :: (Profunctor p, Arrow a) =><br />> (forall b c.p b c -> a b c) -> FreeA p b c -> a b c<br />> foldFreeA _ (PureP g) = arr g<br />> foldFreeA f (g :- h) = foldFreeA f h . f g<br /><P><BR><br /></pre>It's a lot like an ordinary fold but uses the arrow composition law to combine the interpretation of the head with the interpretation of the tail. <P><BR><BR><b>"Electronic" components</b><p>Let me revisit the example from my previous <a href="http://blog.sigfpe.com/2017/01/addressing-pieces-of-state-with.html">article</a>. I'm going to remove things I won't need so my definition of <tt>Circuit</tt> is less general here. Free arrows are going to allow us to define individual components for a circuit, but defer exactly how those components are interpreted until later. <P><BR>I'll use four components this time: a register we can read from, one we can write from and a register incrementer, as well as a "pure" component. But before that, let's revisit Gabriel's article that gives some clues about how components should be built. In particular, look at the definition of <tt>TeletypeF</tt>: <P><BR><pre><br />data TeletypeF x<br /> = PutStrLn String x<br /> | GetLine (String -> x)<br /> | ExitSuccess<br /><P><BR><br /></pre>We use <tt>GetLine</tt> to read a string, and yet the type of <tt>GetLine k</tt> could be <tt>TeletypeF a</tt> for any <tt>a</tt>. The reason is that free monads work with continuations. Instead of <tt>GetLine</tt> returning a string to us, it's a holder for a function that says what we'd like to do with the string once we have it. That means we can leave open the question of where the string comes from. The function <tt>foldFree</tt> can be used to provide the actual string getter. <P><BR>Free arrows are like "two-sided" free monads. We don't just provide a continuation saying what we'd like to do to our output. We also get to say how we prepare our data for input. <P><BR>There's also some burden put on us. Free arrows need strong profunctors. Strong profunctors need to be able to convey extra data alongside the data we care about - that's what <tt>first'</tt> is all about. This means that even though <tt>Load</tt> is functionally similar to <tt>GetLine</tt>, it can't simply ignore its input. So we don't have <tt>Load (Int -> b)</tt>, and instead have <tt>Load ((a, Int) -> b</tt>. Here is our component type: <P><BR><pre><br />> data Component a b = Load ((a, Int) -> b)<br />> | Store (a -> (b, Int))<br />> | Inc (a -> b)<br /><P><BR><br /></pre>The <tt>Component</tt> only knows about the data passing through, of type <tt>a</tt> and <tt>b</tt>. It doesn't know anything about how the data in the registers is stored. That's the part that will be deferred to later. We intend for <tt>Inc</tt> to increment a register. But as it doesn't know anything about registers nothing in the type of <tt>Inc</tt> refers to that. (It took a bit of experimentation for me to figure this out and there may be other ways of doing things. Often with code guided by category theory you can just "follow your nose" as there's one way that works and type checks. Here I found a certain amount of flexibility in how much you store in the <tt>Component</tt> and how much is deferred to the interpreter.) <P><BR>I could implement the strong profunctor instances using various combinators but I think it might be easiest to understand when written explicitly with lambdas: <P><BR><pre><br />> instance Profunctor Component where<br />> lmap f (Load g) = Load $ \(a, s) -> g (f a, s)<br />> lmap f (Store g) = Store (g . f)<br />> lmap f (Inc g) = Inc (g . f)<br /><P><BR><br />> rmap f (Load g) = Load (f . g)<br />> rmap f (Store g) = Store $ \a -> let (b, t) = g a<br />> in (f b, t)<br />> rmap f (Inc g) = Inc (f . g)<br /><P><BR><br />> instance Strong Component where<br />> first' (Load g) = Load $ \((a, x), s) -> (g (a, s), x)<br />> first' (Store g) = Store $ \(a, x) -> let (b, t) = g a<br />> in ((b, x), t)<br />> first' (Inc g) = Inc (first' g)<br /><P><BR><br /></pre>And now we can implement individual components. First a completely "pure" component: <P><BR><pre><br />> add :: Num a => FreeA Component (a, a) a<br />> add = PureP $ uncurry (+)<br /><P><BR><br /></pre>And now the load and store operations. <P><BR><pre><br />> load :: FreeA Component () Int<br />> load = Load (\(_, a) -> a) :- nil<br /><P><BR><br />> store :: FreeA Component Int ()<br />> store = Store (\a -> ((), a)) :- nil<br /><P><BR><br />> inc :: FreeA Component a a<br />> inc = Inc id :- nil<br /><P><BR><br /></pre>Finally we can tie it all together in a complete function using arrow notation: <P><BR><pre><br />> test = proc () -> do<br />> () <- inc -< ()<br />> a <- load -< ()<br />> b <- load -< ()<br />> c <- add -< (a, b)<br />> () <- store -< c<br /><P><BR><br />> returnA -< ()<br /><P><BR><br /></pre>At this point, the <tt>test</tt> object is just a list of operations waiting to be executed. Now I'll give three examples of semantics we could provide. The first uses a state arrow type similar to the previous article: <P><BR><pre><br />> newtype Circuit s a b = C { runC :: (a, s) -> (b, s) }<br /><P><BR><br />> instance Category (Circuit s) where<br />> id = C id<br />> C f . C g = C (f . g)<br /><P><BR><br />> instance Arrow (Circuit s) where<br />> arr f = C $ \(a, s) -> (f a, s)<br />> first (C g) = C $ \((a, x), s) -> let (b, t) = g (a, s)<br />> in ((b, x), t)<br /><P><BR><br /></pre>Here is an interpreter that interprets each of our components as an arrow. Note that this is where, among other things, we provide the meaning of the <tt>Inc</tt> operation: <P><BR><pre><br />> exec :: Component a b -> Circuit Int a b<br />> exec (Load g) = C $ \(a, s) -> (g (a, s), s)<br />> exec (Store g) = C $ \(a, _) -> g a<br />> exec (Inc g) = C $ \(a, s) -> (g a, s+1)<br /><P><BR><br /></pre>Here's a completely different interpreter that is going to make <i>you</i> do the work of maintaining the state used by the resgisters. You'll be told what to do! We'll use the <tt>Kleisli IO</tt> arrow to do the I/O. <P><BR><pre><br />> exec' :: Component a b -> Kleisli IO a b<br />> exec' (Load g) = Kleisli $ \a -> do<br />> putStrLn "What is your number now?"<br />> s <- fmap read getLine<br />> return $ g (a, s)<br />> exec' (Store g) = Kleisli $ \a -> do<br />> let (b, t) = g a<br />> putStrLn $ "Your number is now " ++ show t ++ "."<br />> return b<br />> exec' (Inc g) = Kleisli $ \a -> do<br />> putStrLn "Increment your number."<br />> return $ g a<br /><P><BR><br /></pre>The last interpreter is simply going to sum values associated to various components. They could be costs in dollars, time to execute, or even strings representing some kind of simple execution trace. <P><BR><pre><br />> newtype Labelled m a b = Labelled { unLabelled :: m }<br /><P><BR><br />> instance Monoid m => Category (Labelled m) where<br />> id = Labelled mempty<br />> Labelled a . Labelled b = Labelled (a `mappend` b)<br /><P><BR><br />> instance Monoid m => Arrow (Labelled m) where<br />> arr _ = Labelled mempty<br />> first (Labelled m) = Labelled m<br /><P><BR><br />> exec'' (Load _) = Labelled (Sum 1)<br />> exec'' (Store _) = Labelled (Sum 1)<br />> exec'' (Inc _) = Labelled (Sum 2)<br /><P><BR><br /></pre>Note that we can't assign non-trivial values to "pure" operations. <P><BR>And now we execute all three: <P><BR><pre><br />> main = do<br />> print $ runC (foldFreeA exec test) ((), 10)<br />> putStrLn "Your number is 10." >> runKleisli (foldFreeA exec' test) ()<br />> print $ getSum $ unLabelled $ foldFreeA exec'' test<br /><P><BR><br /></pre><BR><b>Various thoughts</b><p>I don't know if free arrows are anywhere near as useful as free monads, but I hope I've successfully illustrated one application. Note that because arrow composition is essentially list concatenation it may be more efficient to use a version of <a href="http://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/lists.pdf">Hughes lists</a>. This is what the Cayley representation is about in the <a href="https://arxiv.org/abs/1406.4823">monoid notions paper</a>. But it's easier to see the naive list version first. Something missing from here that is essential for electronics simulation is the possibility of using loops. I haven't yet thought too much about what it means to build instances of <tt>ArrowLoop</tt> freely. <P><BR>Profunctors have been described as decategorised matrices in the sense that <tt>p a b</tt>, with <tt>p</tt> a profunctor, is similar to the matrix <img src="https://chart.googleapis.com/chart?cht=tx&chl=P_%7Bab%7D" style="vertical-align:middle">. Or, if you're working in a context where you distinguish between co- and contravariant vectors, it's similar to <img src="https://chart.googleapis.com/chart?cht=tx&chl=P_a%5Eb" style="vertical-align:middle">. The <tt>Composition</tt> operation is a lot like the definition of matrix product. From this perspective, the <tt>FreeA</tt> operation is a lot like the function on matrices that takes <img src="https://chart.googleapis.com/chart?cht=tx&chl=P" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=%281-P%29%5E%7B-1%7D" style="vertical-align:middle">. To work with <tt>ArrowLoop</tt> we need a trace-like operation. <P><BR>One nice application of free monads is in writing plugin APIs. Users can write plugins that link to a small library based on a free monad. These can then be dynamically loaded and interpreted by an application at runtime, completely insulating the plugin-writer from the details of the application. You can think of it as a Haskell version of the <a href="http://wiki.c2.com/?PimplIdiom">PIMPL idiom</a>. Free arrows might give a nice way to write plugins for dataflow applications. <P><BR>People typically think of functors as containers. So in a free monad, each element is a container of possible futures. In a free arrow the relationship between the current heteromorphism and its "future" (and "past") is a bit more symmetrical. For example, for some definitions of <tt>P</tt>, a heteromorphism <tt>P a b</tt> can act on some <tt>a</tt>s to give us some <tt>b</tt>s. But some definitions of <tt>P</tt> can run "backwards" and act on elements of <tt>b -> r</tt> to give us elements of <tt>a -> r</tt>. So when I use the words "input" and "output" above, you might not want to take them too literally. Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com7tag:blogger.com,1999:blog-11295132.post-12746724578568533832017-01-07T13:46:00.000-08:002017-01-07T13:46:44.073-08:00Addressing Pieces of State with Profunctors<BR><b>Attempted segue</b><p>Since I first wrote about <a href="http://blog.sigfpe.com/2011/07/profunctors-in-haskell.html">profunctors</a> there has been quite a bit of activity in the area so I think it's about time I revisited them. I could just carry on from where I left off 5 years ago but there have been so many tutorials on the subject that I think I'll have to assume you've looked at them. My favourite is probably Phil Freeman's <a href="https://www.youtube.com/watch?v=OJtGECfksds">Fun with Profunctors</a>. What I intend to do here is solve a practical problem with profunctors. <P><BR><BR><b>The problem</b><p><a href="https://www.haskell.org/arrows/">Arrows</a> are a nice mechanism for building circuit-like entities in code. In fact, they're quite good for simulating electronic circuits. Many circuits are very much like pieces of functional code. For example an AND gate like this <pre><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/--Ar1nA4cFDg/WHERrmCLLrI/AAAAAAAAxqY/e7-VbCVnRkIGQfelausx_IWszy7lCYhQgCLcB/s1600/and.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/--Ar1nA4cFDg/WHERrmCLLrI/AAAAAAAAxqY/e7-VbCVnRkIGQfelausx_IWszy7lCYhQgCLcB/s320/and.png" width="320" height="119" /></a></div><br /></pre>can be nicely modelled using a pure function: <tt>c = a && b</tt>. But some components, like flip-flops, have internal state. What comes out of the outputs isn't a simple function of the inputs right now, but depends on what has happened in the past. (Alternatively you can take the view that the inputs and outputs aren't the current values but the complete history of the values.) <P><BR>We'll use (Hughes) arrows rather than simple functions. For example, one kind of arrow is the <a href="https://wiki.haskell.org/Arrow_tutorial#Kleisli_Arrows">Kleisli arrow</a>. For the case of Kleisli arrows built from the state monad, these are essentially functions of type <tt>a -> s -> (b, s)</tt> where <tt>s</tt> is our state. We can write these more symmetrically as functions of type <tt>(a, s) -> (b, s)</tt>. We can think of these as "functions" from <tt>a</tt> to <tt>b</tt> where the output is allowed to depend on some internal state <tt>s</tt>. I'll just go ahead and define arrows like this right now. <P><BR>First the extensions and imports: <P><BR><pre><br />> {-# OPTIONS -W #-}<br />> {-# LANGUAGE Arrows #-}<br />> {-# LANGUAGE RankNTypes #-}<br />> {-# LANGUAGE FlexibleInstances #-}<br /><P><BR><br />> import Prelude hiding ((.), id)<br />> import Control.Arrow<br />> import Control.Category<br />> import Data.Profunctor<br />> import Data.Tuple<br /><P><BR><br /></pre>And now I'll define our stateful circuits. I'm going to make these slightly more general than I described allowing circuits to change the type of their state: <P><BR><pre><br />> newtype Circuit s t a b = C { runC :: (a, s) -> (b, t) }<br /><P><BR><br />> instance Category (Circuit s s) where<br />> id = C id<br />> C f . C g = C (f . g)<br /><P><BR><br />> instance Arrow (Circuit s s) where<br />> arr f = C $ \(a, s) -> (f a, s)<br />> first (C g) = C $ \((a, x), s) -> let (b, t) = g (a, s)<br />> in ((b, x), t)<br /><P><BR><br /></pre>This is just a more symmetrical rewrite of the state monad as an arrow. The <tt>first</tt> method allows us to pass through some extra state, <tt>x</tt>, untouched. <P><BR>Now for some circuit components. First the "pure" operations, a multiplier and a negater: <P><BR><pre><br />> mul :: Circuit s s (Int, Int) Int<br />> mul = C $ \((x, y), s) -> (x*y, s)<br /><P><BR><br />> neg :: Circuit s s Int Int<br />> neg = C $ \(x, s) -> (-x, s)<br /><P><BR><br /></pre>And now some "impure" ones that read and write some registers as well as an accumulator: <P><BR><pre><br />> store :: Circuit Int Int Int ()<br />> store = C $ \(x, _) -> ((), x)<br /><P><BR><br />> load :: Circuit Int Int () Int<br />> load = C $ \((), s) -> (s, s)<br /><P><BR><br />> accumulate :: Circuit Int Int Int Int<br />> accumulate = C $ \(a, s) -> (a, s+a)<br /><P><BR><br /></pre>I'd like to make a circuit that has lots of these components, each with its own state. I'd like to store all of these bits of state in a larger container. But that means that each of these components needs to have a way to address its own particular substate. That's the problem I'd like to solve. <P><BR><BR><b>Practical profunctor optics</b><p>In an alternative universe lenses were <a href="https://www.reddit.com/r/haskell/comments/1nwetz/lenses_that_work_with_arrows/ccmq2ka/?st=ixklzuz3&sh=cb40d2f3">defined using profunctors</a>. To find out more I recommend Phil Freeman's talk that I linked to above. Most of the next paragraph is just a reminder of what he says in that talk and I'm going to use the bare minimum to do the job I want. <P><BR>Remember that one of the things lenses allow you to do is this: suppose we have a record <tt>s</tt> containing a field of type <tt>a</tt> and another similar enough kind of record <tt>t</tt> with a field of type <tt>b</tt>. Among other things, a lens gives a way to take a rule for modifying the <tt>a</tt> field to a <tt>b</tt> field and extend it to a way to modify the <tt>s</tt> record into a <tt>t</tt> record. So we can think of lenses as giving us functions of type <tt>(a -> b) -> (s -> t)</tt>. Now if <tt>p</tt> is a profunctor then you can think of <tt>p a b</tt> as being a bit function-like. Like functions, profunctors typically (kinda, sorta) get used to consume (zero or more) objects of type <tt>a</tt> and output (zero or more) objects of type <tt>b</tt>. So it makes sense to ask our lenses to work with these more general objects too, i.e. we'd like to be able to get something of type <tt>p a b -> p s t</tt> out of a lens. A strong profunctor is one that comes pre-packed with a lens that can do this for the special case where the types <tt>s</tt> and <tt>t</tt> are 2-tuples. But you can think of simple records as being syntactic sugar for tuples of fields, so strong profunctors also automatically give us lenses for records. Again, watch Phil's talk for details. <P><BR>So here is our lens type: <P><BR><pre><br />> type Lens s t a b = forall p. Strong p => p a b -> p s t<br /><P><BR><br /></pre>Here are lenses that mimic the well known ones from <tt>Control.Lens</tt>: <P><BR><pre><br />> _1 :: Lens (a, x) (b, x) a b<br />> _1 = first'<br /><P><BR><br />> _2 :: Lens (x, a) (x, b) a b<br />> _2 = dimap swap swap . first'<br /><P><BR><br /></pre>(Remember that <tt>dimap</tt> is a function to pre- and post- compose a function with two others.) <P><BR><a href="https://hackage.haskell.org/package/profunctors-3.3.0.1/docs/Data-Profunctor.html">Arrows are profunctors</a>. So <tt>Circuit s s</tt>, when wrapped in <tt>WrappedArrow</tt>, is a profunctor. So now we can directly use the <tt>Circuit</tt> type with profunctor lenses. This is cool, but it doesn't directly solve our problem. So we're not going to use this fact. We're interested in addressing the state of type <tt>s</tt>, not the values of type <tt>a</tt> and <tt>b</tt> passed through our circuits. In other words, we're interested in the fact that <tt>Circuit s t a b</tt> is a profunctor in <tt>s</tt> and <tt>t</tt>, not <tt>a</tt> and <tt>b</tt>. To make this explicit we need a suitable way to permute the arguments to <tt>Circuit</tt>: <P><BR><pre><br />> newtype Flipped p s t a b = F { unF :: p a b s t }<br /><P><BR><br /></pre>(It was <a href="https://hackage.haskell.org/package/bifunctors-3.2.0.1/docs/Data-Bifunctor-Clown.html">tempting</a> to call that <tt>ComedyDoubleAct</tt>.) <P><BR>And now we can define: <P><BR><pre><br />> instance Profunctor (Flipped Circuit a b) where<br />> lmap f (F (C g)) = F $ C $ \(a, s) -> g (a, f s)<br />> rmap f (F (C g)) = F $ C $ \(a, s) -> let (b, t) = g (a, s)<br />> in (b, f t)<br /><P><BR><br />> instance Strong (Flipped Circuit a b) where<br />> first' (F (C g)) = F $ C $ \(a, (s, x)) -> let (b, t) = g (a, s)<br />> in (b, (t, x))<br /><P><BR><br /></pre>Any time we want to use this instance of <tt>Profunctor</tt> with a <tt>Circuit</tt> we have to wrap everything with <tt>F</tt> and <tt>unF</tt>. The function <tt>dimap</tt> gives us a convenient way to implement such wrappings. <P><BR>Let's implement an imaginary circuit with four bits of state in it. <pre><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-v8mhMGIcJmQ/WHESx2RmexI/AAAAAAAAxqg/qGLZOzp1JO4eDLVamnucTJA2Q6jSSHWTQCLcB/s1600/circuit.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-v8mhMGIcJmQ/WHESx2RmexI/AAAAAAAAxqg/qGLZOzp1JO4eDLVamnucTJA2Q6jSSHWTQCLcB/s320/circuit.png" width="480" height="166" /></a></div><br /></pre>Here is the state: <P><BR><pre><br />> data CPU = CPU { _x :: Int, _y :: Int, _z :: Int, _t :: Int } deriving Show<br /><P><BR><br /></pre>As I don't have a complete profunctor version of a library like <tt>Control.Lens</tt> with its template Haskell magic I'll set things up by hand. Here's a strong-profunctor-friendly version of the <tt>CPU</tt> and a useful isomorphism to go with it: <P><BR><pre><br />> type ExplodedCPU = (Int, (Int, (Int, Int)))<br /><P><BR><br />> explode :: CPU -> ExplodedCPU<br />> explode (CPU u v w t) = (u, (v, (w, t)))<br /><P><BR><br />> implode :: ExplodedCPU -> CPU<br />> implode (u, (v, (w, t))) = CPU u v w t<br /><P><BR><br /></pre>And now we need adapters that take lenses for an <tt>ExplodedCPU</tt> and (1) apply them to a CPU the way <tt>Control.Lens</tt> would... <P><BR><pre><br />> upgrade :: Profunctor p =><br />> (p a a -> p ExplodedCPU ExplodedCPU) -><br />> (p a a -> p CPU CPU)<br />> upgrade f = dimap explode implode . f<br /><P><BR><br />> x, y, z, t :: Flipped Circuit a b Int Int -> Flipped Circuit a b CPU CPU<br />> x = upgrade _1<br />> y = upgrade $ _2 . _1<br />> z = upgrade $ _2 . _2 . _1<br />> t = upgrade $ _2 . _2 . _2<br /><P><BR><br /></pre>...and (2) wrap them so they can be used on the flipped profunctor instance of <tt>Circuit</tt>: <P><BR><pre><br />> (!) :: p s t a b -> (Flipped p a b s t -> Flipped p a b s' t') -><br />> p s' t' a b<br />> x ! f = dimap F unF f x<br /><P><BR><br /></pre>After all that we can now write a short piece of code that represents our circuit. Notice how we can apply the lenses <tt>x, ..., t</tt> directly to our components to get them to use the right pieces of state: <P><BR><pre><br />> test :: Circuit CPU CPU () ()<br />> test = proc () -> do<br />> a <- load ! x -< ()<br />> b <- load ! y -< ()<br />> c <- mul -< (a, b)<br />> d <- neg -< c<br />> e <- accumulate ! t -< d<br />> () <- store ! z -< e<br /><P><BR><br />> returnA -< ()<br /><P><BR><br />> main :: IO ()<br />> main = do<br />> print $ runC test ((), CPU 2 30 400 5000)<br /><P><BR><br /></pre>Of course with a suitable profunctor lens library you can do a lot more, like work with traversable containers of components. <P><BR>Note that we could also write a version of all this code using monads instead of arrows. But it's easier to see the symmetry in <tt>Flipped Circuit</tt> when using arrows, and it also sets the scene for the next thing I want to write about... Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com0tag:blogger.com,1999:blog-11295132.post-7517128154540577622016-10-16T16:04:00.000-07:002016-10-18T06:59:25.884-07:00Expectation-Maximization with Less Arbitrariness<b>Introduction</b><p>There are many introductions to the Expectation-Maximisation algorithm. Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic. They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems. Examples of tricks I've seen used are: <ol><li>Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to <i>this</i> way of writing this expression and not that one which is equal?</li><li>Substituting <img src="https://chart.googleapis.com/chart?cht=tx&chl=1%3DA%2FA" style="vertical-align:middle"> in the middle of an expression. Again, you can use <img src="https://chart.googleapis.com/chart?cht=tx&chl=1%3DA%2FA" style="vertical-align:middle"> just about anywhere. Why choose this <img src="https://chart.googleapis.com/chart?cht=tx&chl=A" style="vertical-align:middle"> at this time? Similarly I found derivations that insert a <img src="https://chart.googleapis.com/chart?cht=tx&chl=B-B" style="vertical-align:middle"> into an expression.</li><li>Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.</li></ol>My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary. There's nothing original here, I'm merely expanding on a <a href="http://stats.stackexchange.com/questions/44513/the-relationship-between-expectation-maximization-and-majorization-minimization/59470#59470">stackexchange answer</a>. <P><BR><b>Generalities about EM</b><p>The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe. <P><BR>First assume no hidden variables. We assume there is a vector of parameters <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta%3D%28%5Ctheta_i%29" style="vertical-align:middle"> that defines some model. We make some observations <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%3D%28x_j%29" style="vertical-align:middle">. We have a probability density <img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28x%7C%5Ctheta%29" style="vertical-align:middle"> that depends on <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta" style="vertical-align:middle">. The likelihood of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta" style="vertical-align:middle"> given the observations <img src="https://chart.googleapis.com/chart?cht=tx&chl=x" style="vertical-align:middle"> is <img src="https://chart.googleapis.com/chart?cht=tx&chl=l%28%5Ctheta%7Cx%29%3DP%28x%7C%5Ctheta%29" style="vertical-align:middle">. The maximum likelhood estimator for <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta" style="vertical-align:middle"> is the choice of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta" style="vertical-align:middle"> that maximises <img src="https://chart.googleapis.com/chart?cht=tx&chl=l%28%5Ctheta%7Cx%29" style="vertical-align:middle"> for the <img src="https://chart.googleapis.com/chart?cht=tx&chl=x" style="vertical-align:middle"> we have observed. <P><BR>Now suppose there are also some variables <img src="https://chart.googleapis.com/chart?cht=tx&chl=z%3D%28z_k%29" style="vertical-align:middle"> that we didn't get to observe. We assume a density <img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28x%2Cz%7C%5Ctheta%29" style="vertical-align:middle">. We now have <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28x%7C%5Ctheta%29%3D%5Csum_z%20P%28x%2Cz%7C%5Ctheta%29" style="vertical-align:middle"></blockquote>where we sum over all possible values of <img src="https://chart.googleapis.com/chart?cht=tx&chl=z" style="vertical-align:middle">. The MLE approach says we now need to maximise <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=l%28%5Ctheta%7Cx%29%3D%5Csum_z%20P%28x%2Cz%7C%5Ctheta%29." style="vertical-align:middle"></blockquote>One of the things that is a challenge here is that the components of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta" style="vertical-align:middle"> might be mixed up among the terms in the sum. If, instead, each term only referred to its own unique block of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta_i" style="vertical-align:middle">, then the maximisation would be easier as we could maximise each term independently of the others. Here's how we might move in that direction. Consider instead the log-likelihood <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Clog%20l%28%5Ctheta%7Cx%29%3D%5Clog%5Csum_z%20P%28x%2Cz%7C%5Ctheta%29." style="vertical-align:middle"></blockquote>Now imagine that by magic we could commute the logarithm with the sum. We'd need to maximise <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Csum_z%20%5Clog%20P%28x%2Cz%7C%5Ctheta%29." style="vertical-align:middle"></blockquote>One reason this would be to our advantage is that <img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28x%2Cz%7C%5Ctheta%29" style="vertical-align:middle"> often takes the form <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cexp%28f%28x%2Cz%2C%5Ctheta%29%29" style="vertical-align:middle"> where <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> is a simple function to optimise. In addition, <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> may break up as a sum of terms, each with its own block of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta_i" style="vertical-align:middle">'s. Moving the logarithm inside the sum would give us something we could easily maximise term by term. What's more, the <img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28x%2Cz%7C%5Ctheta%29" style="vertical-align:middle"> for each <img src="https://chart.googleapis.com/chart?cht=tx&chl=z" style="vertical-align:middle"> is often a standard probability distribution whose likelihood we already know how to maximise. But, of course, we can't just move that logarithm in. <P><BR><b>Maximisation by proxy</b><p>Sometimes a function is too hard to optimise directly. But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead. That will give us a new guess and we can continue from there. This is the basis of gradient descent. Suppose <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> is a differentiable function in a neighbourhood of <img src="https://chart.googleapis.com/chart?cht=tx&chl=x_0" style="vertical-align:middle">. Then around <img src="https://chart.googleapis.com/chart?cht=tx&chl=x_0" style="vertical-align:middle"> we have <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=f%28x%29%20%5Capprox%20f%28x_0%29%2Bf%27%28x_0%29%5Ccdot%20%28x-x_0%29." style="vertical-align:middle"></blockquote>We can try optimising <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%28x_0%29%2Bf%27%28x_0%29%5Ccdot%20%28x-x_0%29" style="vertical-align:middle"> with respect to <img src="https://chart.googleapis.com/chart?cht=tx&chl=x" style="vertical-align:middle"> within a neighbourhood of <img src="https://chart.googleapis.com/chart?cht=tx&chl=x_0" style="vertical-align:middle">. If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent. (Note that picking a circular neighbourhood is itself a somewhat arbitrary step, but that's another story.) For gradient descent we're choosing <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%28x_0%29%2Bf%27%28x_0%29%5Ccdot%20%28x-x_0%29" style="vertical-align:middle"> because it matches both the value and derivatives of <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> at <img src="https://chart.googleapis.com/chart?cht=tx&chl=x_0" style="vertical-align:middle">. We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration. <P><BR>We want our logarithm of a sum to be a sum of logarithms. But instead we'll settle for a proxy function that is a sum of logarithms. We'll make the derivatives of the proxy match those of the original function precisely so we're not making an arbitrary choice. <P><BR>Write <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Clog%20l%28%5Ctheta%7Cx%29%0A%3D%20%5Clog%5Csum_z%20P%28x%2Cz%7C%5Ctheta%29%0A%5Capprox%20%5Csum_z%5Cbeta_z%5Clog%20P%28x%2Cz%7C%5Ctheta%29%2B%5Cmbox%7Bconstant%7D." style="vertical-align:middle"></blockquote>The <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cbeta_z" style="vertical-align:middle"> are constants we'll determine. We want to match the derivatives on either side of the <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Capprox" style="vertical-align:middle">at <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta%3D%5Ctheta_0" style="vertical-align:middle">: <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7B%5Cpartial%20%5Clog%20l%28%5Ctheta_0%7Cx%29%7D%7B%5Cpartial%5Ctheta_0%7D" style="vertical-align:middle"> <img src="https://chart.googleapis.com/chart?cht=tx&chl=%3D%5Cfrac%7B1%7D%7Bl%28%5Ctheta_0%7Cx%29%7D%20%5Cfrac%7B%5Cpartial%20l%28%5Ctheta_0%7Cx%29%7D%7B%5Cpartial%5Ctheta_0%7D%20%3D%5Csum_z%5Cfrac%7B1%7D%7Bl%28%5Ctheta_0%7Cx%29%7D%20%5Cfrac%7B%5Cpartial%20P%28x%2Cz%7C%5Ctheta_0%29%7D%7B%5Cpartial%5Ctheta_0%7D." style="vertical-align:middle"></blockquote>On the other hand we have <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%0A%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Ctheta_0%7D%5Csum_z%5Cbeta_z%5Clog%20P%28x%2Cz%7C%5Ctheta_0%29%0A%3D%5Csum_z%5Cbeta_z%5Cfrac%7B1%7D%7BP%28x%2Cz%7C%5Ctheta_0%29%7D%5Cfrac%7B%5Cpartial%20P%28x%2Cz%7C%5Ctheta_0%29%7D%7B%5Cpartial%5Ctheta_0%7D%0A" style="vertical-align:middle"></blockquote><P><BR>To achieve equality we want to make these expressions match. We choose <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%0A%5Cbeta_z%20%3D%20%5Cfrac%7BP%28x%2Cz%7C%5Ctheta_0%29%7D%7Bl%28%5Ctheta_0%7Cx%29%7D%0A%3D%20%5Cfrac%7BP%28x%2Cz%7C%5Ctheta_0%29%7D%7BP%28x%7C%5Ctheta_0%29%7D%0A%3D%20P%28z%7Cx%2C%5Ctheta_0%29.%0A" style="vertical-align:middle"></blockquote>Our desired proxy function is: <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%0A%5Csum_z%20P%28z%7Cx%2C%5Ctheta_0%29%5Clog%20P%28x%2Cz%7C%5Ctheta%29%2B%5Cmbox%7Bconst.%7D%0A%3D%20E_%7BZ%7Cx%2C%5Ctheta_0%7D%28%5Clog%20P%28x%2CZ%7C%5Ctheta%29%29%2B%5Cmbox%7Bconst.%7D%0A" style="vertical-align:middle"></blockquote><P><BR>So the procedure is to take an estimated <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta_0" style="vertical-align:middle"> and obtain a new estimate by optimising this proxy function with respect to <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ctheta" style="vertical-align:middle">. This is the standard EM algorithm. <P><BR>It turns out that this proxy has some other useful properties. For example, because of the concavity of the logarithm, the proxy is always smaller than the original likelihood. This means that when we optimise it we never optimise ``too far'' and that progress optimising the proxy is always progress optimising the original likelihood. But I don't need to say anything about this as it's all part of the standard literature. <P><BR><b>Afterword</b><p>As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%0A%5Cmbox%7Bargmax%7D_x%5Csum_i%5Cexp%28f_i%28x%29%29%0A" style="vertical-align:middle"></blockquote>you can iterate, at each step computing <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%0A%5Cmbox%7Bargmax%7D_x%5Csum_i%5Cexp%28f_i%28x_0%29%29f_i%28x%29%0A" style="vertical-align:middle"></blockquote>where <img src="https://chart.googleapis.com/chart?cht=tx&chl=x_0" style="vertical-align:middle"> is the previous iteration. If the <img src="https://chart.googleapis.com/chart?cht=tx&chl=f_i" style="vertical-align:middle"> take a convenient form then this may turn out to be much easier. <P><BR><b>Note</b><p>This was originally written as a PDF using LaTeX. It'll be available <a href="https://t.co/hyZrCiAh6h">here</a> for a while. Some fidelity was lost when converting it to HTML. Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com2tag:blogger.com,1999:blog-11295132.post-15935523875580312352016-08-06T19:23:00.000-07:002016-08-06T19:23:49.135-07:00Dimensionful Matrices<b>Introduction</b><p>Programming languages and libraries for numerical work tend not to place a lot of emphasis on the types of their data. For example Matlab, R, Octave, Fortran, and Numpy (but not the now defunct <a href="https://projectfortress.java.net">Fortress</a>) all tend to treat their data as plain numbers meaning that any time you have a temperature and a mass, say, there is nothing to prevent you adding them. <P><BR>I've been wondering how much dimensions (in the sense of <a href="https://en.wikipedia.org/wiki/Dimensional_analysis">dimensional analysis</a>) and units could help with numerical programming. As I pointed out on <a href="https://plus.google.com/+DanPiponi/posts/VNkdwwJL4LM">G+</a> recently (which is where I post shorter stuff these days), you don't have to limit dimensions to the standard ones of length, mass, time, dollars and so on. Any scale invariance in the equations you're working with can be exploited as a dimension giving you a property that can be statically checked by a compiler. <P><BR>There are quite a few libraries to statically check dimensions and units now. For example <a href="http://www.boost.org/doc/libs/1_61_0/doc/html/boost_units.html">Boost.Units</a> for C++, <a href="https://hackage.haskell.org/package/units">units</a> for Haskell and even <a href="https://github.com/timjb/quantities">quantities</a> for Idris. <P><BR><b>A matrix that breaks things</b><p>Even if a language supports dimensions, it's typical to define objects like vectors and matrices as homogeneous containers of quantities. But have a look at the Wikipedia page on the <a href="https://en.wikipedia.org/wiki/Metric_tensor_%28general_relativity%29">metric tensor</a>. There is a matrix <P><BR><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ceta%3D%5Cbegin%7Bpmatrix%7D-c%5E2%260%260%260%5C%5C0%261%260%260%5C%5C0%260%261%260%5C%5C0%260%260%261%5Cend%7Bpmatrix%7D" style="vertical-align:middle"><P><BR>which has the curious property that 3 entries on the diagonal seem to be dimensionless while the first entry is a squared velocity with dimension <img src="https://chart.googleapis.com/chart?cht=tx&chl=L%5E2T%5E%7B-2%7D" style="vertical-align:middle">. This will break many libraries that support units. An obvious workaround is to switch to use <a href="https://en.wikipedia.org/wiki/Natural_units">natural units</a>, which is much the same as abandoning the usefulness of dimensions. But there's another way, even if it may be tricky to set up with existing languages. <P><BR><b>Heterogeneous vectors and matrices</b><p>According to a common convention in physics, a 4-vector <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%3D%28t%3Dx%5E0%2Cx%5E1%2Cx%5E2%2Cx%5E3%29" style="vertical-align:middle"> has dimensions <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Bx%5D%3D%28T%2C%20L%2C%20L%2C%20L%29" style="vertical-align:middle"> where I'm using the convention that we can represent the units of a vector or matrix simply as a vector or matrix of dimensions, and here <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> is time and <img src="https://chart.googleapis.com/chart?cht=tx&chl=L" style="vertical-align:middle"> is length. The metric tensor is used like this: <img src="https://chart.googleapis.com/chart?cht=tx&chl=ds%5E2%3Dx%5Ei%5Ceta_%7Bij%7Dx%5Ej" style="vertical-align:middle"> (where I'm using the Einstein summation convention so the <img src="https://chart.googleapis.com/chart?cht=tx&chl=i" style="vertical-align:middle">'s and <img src="https://chart.googleapis.com/chart?cht=tx&chl=j" style="vertical-align:middle">'s are summed over). If we think of <img src="https://chart.googleapis.com/chart?cht=tx&chl=ds%5E2" style="vertical-align:middle"> having units of length squared (it is a pseudo-Riemannian <i>metric</i> after all) then it makes sense to think of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Ceta" style="vertical-align:middle"> having dimensions given by <P><BR><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5B%5Ceta%5D%3D%5Cbegin%7Bpmatrix%7DL%5E2T%5E%7B-2%7D%26LT%5E%7B-1%7D%26LT%5E%7B-1%7D%26LT%5E%7B-1%7D%5C%5CLT%5E%7B-1%7D%261%261%261%5C%5CLT%5E%7B-1%7D%261%261%261%5C%5CLT%5E%7B-1%7D%261%261%261%5Cend%7Bpmatrix%7D" style="vertical-align:middle"><P><BR>We can write this more succinctly as <P><BR><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5B%5Ceta%5D%3D%28LT%5E%7B-1%7D%2C1%2C1%2C1%29%5Cotimes%28LT%5E%7B-1%7D%2C1%2C1%2C1%29" style="vertical-align:middle"><P><BR>where <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cotimes" style="vertical-align:middle"> is the usual outer product. <P><BR>I'll use the notation <img src="https://chart.googleapis.com/chart?cht=tx&chl=a%3AA" style="vertical-align:middle"> to mean <img src="https://chart.googleapis.com/chart?cht=tx&chl=a" style="vertical-align:middle"> is of type <img src="https://chart.googleapis.com/chart?cht=tx&chl=A" style="vertical-align:middle">. So, for example, <img src="https://chart.googleapis.com/chart?cht=tx&chl=%28t%2Cx%5E1%2Cx%5E2%2Cx%5E3%29%3A%28T%2CL%2CL%2CL%29" style="vertical-align:middle">. I'll also use pointwise notation for types such as <img src="https://chart.googleapis.com/chart?cht=tx&chl=%28A%2CB%2CC%2CD%29%2A%28E%2CF%2CG%2CH%29%3D%28AE%2CBF%2CCG%2CDH%29" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=%28A%2CB%2CC%29%5E%7B-1%7D%3D%28A%5E%7B-1%7D%2CB%5E%7B-1%7D%2CC%5E%7B-1%7D%29" style="vertical-align:middle">. <P><BR>Now I can give some general rules. If <img src="https://chart.googleapis.com/chart?cht=tx&chl=m%3AM" style="vertical-align:middle"> is a matrix, <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%3AX" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=y%3AY" style="vertical-align:middle"> are vectors, and <img src="https://chart.googleapis.com/chart?cht=tx&chl=s%3AS" style="vertical-align:middle"> is a scalar, then <img src="https://chart.googleapis.com/chart?cht=tx&chl=y%3Dmx" style="vertical-align:middle"> only makes sense if <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%3DY%5Cotimes%20X%5E%7B-1%7D" style="vertical-align:middle">. Similarly the "inner product" <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%5ETmy%3Ds" style="vertical-align:middle"> only makes sense if <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%3DX%5E%7B-1%7D%5Cotimes%20Y%5E%7B-1%7DS" style="vertical-align:middle">. <P><BR><b>Generic vectors and matrices</b><p>Although these kinds of types might be useful if you're dealing with the kind of heterogeneous matrices that appear in relativity, there's another reason they might be useful. If you write code (in the imaginary language that supports these structures and understands dimensions and units) to be as generic as possible in the types of the vector and matrix entries, failures to type check will point out parts of the code where there are hidden assumptions, or even errors, about scaling. For example, consider a routine to find the inverse of a 3 by 3 matrix. Writing this generically as possible means we should write it to operate on a matrix of type <img src="https://chart.googleapis.com/chart?cht=tx&chl=%28A%2CB%2CC%29%5Cotimes%28D%2CE%2CF%29" style="vertical-align:middle">, say. The result should have type <img src="https://chart.googleapis.com/chart?cht=tx&chl=%28D%2CE%2CF%29%5E%7B-1%7D%5Cotimes%28A%2CB%2CC%29%5E%7B-1%7D" style="vertical-align:middle">. If this type checks when used with a suitably powerful type checker then it means that if we replace the units for type A, say, with units twice as large, it should have no effect on the result, taking into account those units. In this case, it means that if we multiply the numbers of the first row of the input by 0.5 then the numbers of the first column of the output should get multiplied by 2. In fact this is a basic property of matrix inverses. In other words, this mathematical property of matrix inverses is guaranteed by a type system that can handle units and heterogeneous matrices. It would be impossible to write a matrix inverter that type checks and fails to have this property. Unfortunately it's still possible to write a matrix inverter that type checks and is incorrect some other way. Nonetheless this kind of type system would put a very big constraint on the code and is likely to eliminate many sources of error. <P><BR><b>An example, briefly sketched</b><p>I thought I'd look at an actual example of a matrix inverter to see what would happen if I used a type checker like the one I've described. I looked at the <a href="https://en.wikipedia.org/wiki/Conjugate_gradient_method">conjugate gradient method</a>. At the Wikipedia page, note the line <P><BR><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Calpha_k%3A%3D%5Cfrac%7B%5Cmathbf%7Br%7D_k%5ET%5Cmathbf%7Br%7D_%7Bk%2B1%7D%7D%7B%5Cmathbf%7Bp%7D_k%5ET%5Cmathbf%7BAp%7D_k%7D" style="vertical-align:middle"><P><BR>This would immediately fail to type check because if <img src="https://chart.googleapis.com/chart?cht=tx&chl=r" style="vertical-align:middle"> is of generic vector type <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%3D%28R_1%2C%5Cldots%2CR_n%29" style="vertical-align:middle"> then <img src="https://chart.googleapis.com/chart?cht=tx&chl=R_1%5E2" style="vertical-align:middle"> isn't the same type as <img src="https://chart.googleapis.com/chart?cht=tx&chl=R_2%5E2" style="vertical-align:middle"> so they can't be added. I won't go into any of the details but the easiest way to patch up this code to make it type check is to introduce a new matrix <img src="https://chart.googleapis.com/chart?cht=tx&chl=P" style="vertical-align:middle"> of type <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%5E%7B-1%7D%5Cotimes%20R%5E%7B-1%7D" style="vertical-align:middle"> and besides using it to make this inner product work (replacing the numerator by <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cmathbf%7Br%7D_k%5ET%20P%5Cmathbf%7Br%7D_%7Bk%2B1%7D" style="vertical-align:middle">) we also use <img src="https://chart.googleapis.com/chart?cht=tx&chl=P" style="vertical-align:middle"> anywhere in the code we need to convert a vector of type <img src="https://chart.googleapis.com/chart?cht=tx&chl=R" style="vertical-align:middle"> to a vector of type <img src="https://chart.googleapis.com/chart?cht=tx&chl=R%5E%7B-1%7D" style="vertical-align:middle">. If you try to do this as sparingly as possible you'll end up with a modified algorithm. But at first this seems weird. Why should this matrix inverse routine rely on someone passing in a second matrix to make it type check? And what is this new algorithm anyway? Well scroll down the Wikipedia page and you get to the <i>preconditioned</i> conjugate gradient algorithm. The extra matrix we need to pass in is the preconditioner. This second algorithm would type check. Preconditioned conjugate gradient, with a suitable preconditioner, generally performs better than pure conjugate gradient. So in this case we're getting slightly more than a check on our code's correctness. The type checker for our imaginary language would give a hint on how to make the code perform better. There's a reason for this. The original conjugate gradient algorithm is implicitly making a choice of units that sets scales along the axes. These determine the course taken by the algorithm. It's not at all clear that picking these scalings randomly (which is in effect what you're doing if you throw a random problem at the algorithm) is any good. It's better to pick a preconditioner adapted to the scale of the problem and the type checker is hinting (or would be if it existed) that you need to do this. Compare with the gradient descent algorithm whose <a href="http://www.machinedlearnings.com/2011/06/dimensional-analysis-and-gradient.html">scaling problems are better known</a>. <P><BR><b>But which language?</b><p>I guess both Agda and Idris could be made to implement what I've described. However, I've a hunch it might not be easy to use in practice. Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com7tag:blogger.com,1999:blog-11295132.post-17971572217192575942014-05-23T22:21:00.000-07:002014-05-23T22:21:28.593-07:00Cofree meets Free<pre><br />> {-# LANGUAGE RankNTypes, MultiParamTypeClasses, TypeOperators #-}<br /><P><BR><br /></pre><b>Introduction</b><p>After I spoke at <a href="http://www.haskell.org/haskellwiki/BayHac2014">BayHac 2014</a> about free monads I was asked about cofree comonads. So this is intended as a sequel to that talk. Not only am I going to try to explain what cofree comonads are. I'm also going to point out a very close relationship between cofree comonads and free monads. <P><BR>At the beginning of the talk the Google Hangout software seems to have switched to the laptop camera so you can't see the slides in the <a href="https://www.youtube.com/watch?v=OGUuGL0AgYs">video</a>. However the slides are <a href="https://drive.google.com/file/d/0B51SFgxqMDS-NDBOX0ZDdW52dEE/edit">here</a>. <P><BR><b>Cothings as machines</b><p>I often think of coalgebraic things as machines. They have some internal state and you can press buttons to change that internal state. For example here is a type class for a machine with two buttons that's related to a magma: <P><BR><pre><br />> class TwoButton a where<br />> press :: a -> (a, a)<br /><P><BR><br /></pre>The idea is that the state of the machine is given by some type <tt>a</tt> and you could press either the left button or the right button. The result of pressing one or other button is given by these two functions: <P><BR><pre><br />> pressLeft, pressRight :: TwoButton a => a -> a<br />> pressLeft = fst . press<br />> pressRight = snd . press<br /><P><BR><br /></pre>(As with many metaphors used to explain Haskell type classes your mileage may vary. Sometimes you'll have to stretch your imagination to see what the set of buttons is for a particular cothing.) <P><BR><b>Comonads</b><p>Just as monads are a kind of generalised algebraic structure (for example see my talk), comonads are a generalised kind of machine. The idea is that for any state of the machine there is a bunch of buttons we could press. But we don't have two buttons, or any fixed number of buttons. We instead have a functorful of buttons (if you think of functors by analogy with containers). We also don't get to directly see the internal state of the machine but instead we get to make observations. <P><BR>Here's the type class: <P><BR><pre><br />> class Comonad w where<br />> extract :: w a -> a<br />> duplicate :: w a -> w (w a)<br /><P><BR><br /></pre>The state of the machine is given by <tt>w a</tt>. We observe the state using the <tt>extract</tt> function. And when we come to press a button, we have a functorful of new states that it could end up in. The <tt>duplicate</tt> function gives the container of those new states. <P><BR>For example, various kinds of zipper give rise to comonads. Zippers allow you to "focus" on a part of a data structure. The <tt>extract</tt> operation allows you to observe the point that currently has focus. There is one button for every position in the structure where the focus could be. Pressing the corresponding button moves the focus to that point. Similarly the <tt>Store</tt> comonad has one button for each value you can store in the field it represents. Press the button and the value gets stored in the field. <P><BR><b>Cofreeness as a way to memoise</b><p>Cofree coalgebras can be thought of as memoised forms of elements of coalgebras. For example, the <tt>TwoButton</tt> machine above has a function, <tt>press</tt>, as part of its definition. Memoising an element of such a thing means tabulating everything that could possibly happen if you pressed the buttons so we no longer need the <tt>press</tt> function. One approach is to try something like this: <P><BR><pre><br />data CofreeTwoButton = Memo CofreeTwoButton CofreeTwoButton<br /><P><BR><br /></pre>The structure contains two <tt>CofreeTwoButton</tt>s, each giving the result of pressing one of the two buttons. Any element of <tt>CofreeTwoButton</tt> may now be memoised like so: <P><BR><pre><br />memoiseTwoButton :: TwoButton m => m -> CofreeTwoButton<br />memoiseTwoButton m = Memo (memoiseTwoButton (pressLeft m)) (memoiseTwoButton (pressRight m))<br /><P><BR><br /></pre>It definitely tabulates the result of pressing buttons. But it has a major flaw. We have no way of seeing what's stored in the table! To make this useful we want to also store some data in the table that we can peek at. So here is a better definition: <P><BR><pre><br />> data CofreeTwoButton a = Memo a (CofreeTwoButton a) (CofreeTwoButton a)<br />> memoiseTwoButton :: TwoButton m => (m -> a) -> m -> CofreeTwoButton a<br />> memoiseTwoButton f m = Memo (f m) (memoiseTwoButton f (pressLeft m)) (memoiseTwoButton f (pressRight m))<br /><P><BR><br /></pre>The first argument to <tt>memoiseTwoButton</tt> says what we want to store in the table and then <tt>memoiseTwoButton</tt> goes ahead and stores it. We can use the identity function if we want to store the original elements. <P><BR>Note how this is like <tt>foldMap</tt>: <P><BR><pre><br />foldMap :: Monoid m => (a -> m) -> t a -> m<br /><P><BR><br /></pre>if we replace <tt>t</tt> by the list functor and remember that lists are free monoids. The main difference is that arrows have been reversed. Where <tt>foldMap</tt> takes an element of a free monoid and interprets it as an element of another monoid, <tt>memoiseTwoButton</tt> packs an element of a <tt>TwoButton</tt> into a cofree structure. The "interpretation" and "packing" here are both homomorphisms for their respective structures. Homomorphisms respect equations so if an equation holds between elements of a free monoid we expect it to also hold when interpreted in another monoid. But any element of a free monoid can be interpreted in any other monoid meaning that any equation that holds between elements of a free monoid must hold in any monoid. That's why free monoids are designed so that the only equations that hold between elements are those that follow from the monoid laws. <P><BR>With the <tt>TwoButton</tt> we have a dualised version of the above. Every element of every <tt>TwoButton</tt> can be packed into the <tt>CofreeTwoButton</tt>. So every equation in the original structure will still hold after the packing. So every equation that holds in some <tt>TwoButton</tt> must have some solution in <tt>CofreeTwoButton</tt>. That gives an idea of what a <tt>CofreeTwoButton</tt> is by analogy with the free monoid. <P><BR><b>Cofree comonads</b><p>A cofree comonad is basically a memoised comonad. So the data structure is: <P><BR><pre><br />> data Cofree f a = Cofree a (f (Cofree f a))<br /><P><BR><br /></pre>At each point in the "table" we store some observable value of type <tt>a</tt>. And we have a functorful of buttons, so we expect to have a functorful of new states we could transition to. The <tt>Functor</tt> instance looks like: <P><BR><pre><br />> instance Functor f => Functor (Cofree f) where<br />> fmap f (Cofree a fs) = Cofree (f a) (fmap (fmap f) fs)<br /><P><BR><br /></pre>We apply <tt>f</tt> to the observable value and then push the <tt>fmap f</tt> down to the child nodes. <P><BR>The <tt>duplicate</tt> function takes a memoised state and replaces the observable stored at each position with the memoised state that gives rise to the observable. <P><BR><pre><br />> instance Functor f => Comonad (Cofree f) where<br />> extract (Cofree a _) = a<br />> duplicate c@(Cofree _ fs) = Cofree c (fmap duplicate fs)<br /><P><BR><br /></pre>Now by analogy with <tt>memoiseTwoButton</tt> we can memoise comonads. <P><BR><pre><br />> memoiseComonad :: (Comonad w, Functor f) =><br />> (forall x.w x -> f x) -> (forall x.w x -> Cofree f x)<br />> memoiseComonad f w = Cofree (extract w) (fmap (memoiseComonad f) (f (duplicate w)))<br /><P><BR><br /></pre>So that's what a cofree comonad is: it's a type that can be used to memoise all of the states that are accessible from a state in a comonad by pressing its buttons. <P><BR><b>Cofree comonad meets free monad</b><p>But that's not all. There is a close relationship between cofree comonads and free monads. So to get going, here's a free monad type: <P><BR><pre><br />> data Free f a = Id a | Free (f (Free f a))<br /><P><BR><br />> join' :: Functor f => Free f (Free f a) -> Free f a<br />> join' (Id x) = x<br />> join' (Free fa) = Free (fmap join' fa)<br /><P><BR><br />> instance Functor f => Functor (Free f) where<br />> fmap f (Id x) = Id (f x)<br />> fmap f (Free fa) = Free (fmap (fmap f) fa)<br /><P><BR><br />> instance Functor f => Monad (Free f) where<br />> return = Id<br />> m >>= f = join' (fmap f m)<br /><P><BR><br /></pre>Now I'll define a kind of pairing between functors. Given a way to combine two kinds of element, the pairing gives a way to combine a pair of containers of those elements. <P><BR><pre><br />> class (Functor f, Functor g) => Pairing f g where<br />> pair :: (a -> b -> r) -> f a -> g b -> r<br /><P><BR><br />> data Identity a = Identity a<br />> instance Functor Identity where<br />> fmap f (Identity x) = Identity (f x)<br /><P><BR><br />> instance Pairing Identity Identity where<br />> pair f (Identity a) (Identity b) = f a b<br /><P><BR><br />> data (f :+: g) x = LeftF (f x) | RightF (g x)<br />> instance (Functor f, Functor g) => Functor (f :+: g) where<br />> fmap f (LeftF x) = LeftF (fmap f x)<br />> fmap f (RightF x) = RightF (fmap f x)<br /><P><BR><br />> data (f :*: g) x = f x :*: g x<br />> instance (Functor f, Functor g) => Functor (f :*: g) where<br />> fmap f (x :*: y) = fmap f x :*: fmap f y<br /><P><BR><br />> instance (Pairing f f', Pairing g g') => Pairing (f :+: g) (f' :*: g') where<br />> pair p (LeftF x) (a :*: _) = pair p x a<br />> pair p (RightF x) (_ :*: b) = pair p x b<br /><P><BR><br />> instance (Pairing f f', Pairing g g') => Pairing (f :*: g) (f' :+: g') where<br />> pair p (a :*: _) (LeftF x) = pair p a x<br />> pair p (_ :*: b) (RightF x) = pair p b x<br /><P><BR><br />> instance Pairing ((->) a) ((,) a) where<br />> pair p f = uncurry (p . f)<br /><P><BR><br /></pre>Given a pairing between <tt>f</tt> and <tt>g</tt> we get one between <tt>Cofree f</tt> and <tt>Free g</tt>. <P><BR><pre><br />> instance Pairing f g => Pairing (Cofree f) (Free g) where<br />> pair p (Cofree a _) (Id x) = p a x<br />> pair p (Cofree _ fs) (Free gs) = pair (pair p) fs gs<br /><P><BR><br /></pre>An element of <tt>Free g</tt> can be thought of as an expression written in a DSL. So this pairing gives a way to apply a monadic expression to a memoised comonad. In other words, if you think of comonads as machines, monads give a language that can be used to compute something based on the output of the machine. <P><BR>Here's an almost trivial example just so you can see everything working together. A reasonable definition of a comagma structure on the type <tt>a</tt> is <tt>a -> UpDown a</tt> with <tt>UpDown</tt> defined as: <P><BR><pre><br />> data UpDown a = Up a | Down a<br /><P><BR><br />> instance Functor UpDown where<br />> fmap f (Up a) = Up (f a)<br />> fmap f (Down a) = Down (f a)<br /><P><BR><br />> type CofreeComagma a = Cofree UpDown a<br /><P><BR><br /></pre>A well known comagma structure on the positive integers is given by the famous <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">Collatz conjecture</a>: <P><BR><pre><br />> collatz :: Integer -> UpDown Integer<br />> collatz n = if even n then Down (n `div` 2) else Up (3*n+1)<br /><P><BR><br /></pre>We can memoise this as a cofree comonad: <P><BR><pre><br />> memoisedCollatz :: Integer -> CofreeComagma Integer<br />> memoisedCollatz n = Cofree n (fmap memoisedCollatz (collatz n))<br /><P><BR><br /></pre>Here's a picture of <tt>memoisedCollatz 12</tt>: <div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-48VwzL8iXk0/U39aecynoWI/AAAAAAAAIOI/KvJ4ZArwmjw/s1600/cofree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-48VwzL8iXk0/U39aecynoWI/AAAAAAAAIOI/KvJ4ZArwmjw/s1600/cofree.png" /></a></div> <P><BR>Now let's make the dual functor in readiness for building the dual monad: <P><BR><pre><br />> data Two a = Two a a<br />> instance Functor Two where<br />> fmap f (Two a b) = Two (f a) (f b)<br /><P><BR><br /></pre>And here we set up a pairing: <P><BR><pre><br />> instance Pairing UpDown Two where<br />> pair f (Up a) (Two b _) = f a b<br />> pair f (Down a) (Two _ c) = f a c<br /><P><BR><br />> execute :: Cofree UpDown x -> Free Two (x -> r) -> r<br />> execute w m = pair (flip ($)) w m<br /><P><BR><br /></pre>This gives rise to a free monad isomorphic to the one in my talk: <P><BR><pre><br />> data Direction = WentUp | WentDown deriving Show<br /><P><BR><br />> choose :: Free Two Direction<br />> choose = Free (Two (return WentUp) (return WentDown))<br /><P><BR><br /></pre>And here's an example of some code written in the corresponding DSL: <P><BR><pre><br />> ex1 :: Free Two (Integer -> String)<br />> ex1 = do<br />> x <- choose<br />> y <- choose<br />> case (x, y) of<br />> (WentDown, WentDown) -> return (\z -> "Decreased twice " ++ show z)<br />> _ -> return show<br /><P><BR><br /></pre>It can be represented as: <P><BR> <div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-IlPy3xTUjvk/U39aw8ENnGI/AAAAAAAAIOQ/uHkKOBo_n8o/s1600/free.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-IlPy3xTUjvk/U39aw8ENnGI/AAAAAAAAIOQ/uHkKOBo_n8o/s1600/free.png" /></a></div> <P><BR>And here's what happens when they meet: <P><BR><pre><br />> go1 :: String<br />> go1 = execute (memoisedCollatz 12) ex1<br /><P><BR><br /></pre>This can be understood through the combined picture: <P><BR> <div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-r1-8N9YcvdU/U39bLwjPrTI/AAAAAAAAIOY/LVb5euAVD7Y/s1600/cofreemeetsfree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-r1-8N9YcvdU/U39bLwjPrTI/AAAAAAAAIOY/LVb5euAVD7Y/s1600/cofreemeetsfree.png" /></a></div> <P><BR><b>References</b><p>On getting monads from comonads more generally see <a href="http://comonad.com/reader/2011/monads-from-comonads/">Monads from Comonads</a>. For more on memoising and how it's really all about the Yoneda lemma see <a href="http://blog.sigfpe.com/2009/11/memoizing-polymorphic-functions-with.html">Memoizing Polymorphic Functions</a>. I'm waiting for Tom Leinster to publish some related work. The pairing above gives a way for elements of free monads to pick out elements of cofree comonads and is a special case of what I'm talking about <a href="http://mathoverflow.net/a/12460">here</a>. But I think Tom has some unpublished work that goes further. <P><BR>If you think of a comonad as a compressed object that is decompressed by a monadic decision tree, then you'd expect some form of information theoretical description to apply. That makes me think of <a href="http://ncatlab.org/johnbaez/show/Convex+spaces+and+an+operadic+approach+to+entropy">Convex spaces and an operadic approach to entropy</a>. Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com8tag:blogger.com,1999:blog-11295132.post-49990623468641023252014-05-17T08:22:00.000-07:002014-05-17T09:07:37.059-07:00Types, and two approaches to problem solving<h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"><span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Introduction</span></h2><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">There are two broad approaches to problem solving that I see frequently in mathematics and computing. One is attacking a problem via subproblems, and another is attacking a problem via quotient problems. The former is well known though I’ll give some examples to make things clear. The latter can be harder to recognise but there is one example that just about everyone has known since infancy.</span></div><b id="docs-internal-guid-840c1bc0-0aa5-4d74-4ec7-038b3dfaa261" style="font-weight: normal;"><br /></b><br /><h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"><span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Subproblems</span></h2><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Consider sorting algorithms. A large class of sorting algorithms, including <a href="http://en.wikipedia.org/wiki/Quicksort">quicksort</a>, break a sequence of values into two pieces. The two pieces are smaller so they are easier to sort. We sort those pieces and then combine them, using some kind of merge operation, to give an ordered version of the original sequence. Breaking things down into subproblems is ubiquitous and is useful far outside of mathematics and computing: in cooking, in finding our path from A to B, in learning the contents of a book. So I don’t need to say much more here.</span></div><b style="font-weight: normal;"><br /></b><br /><h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"><span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Quotient problems</span></h2><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The term </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">quotient</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> is a technical term from mathematics. But I want to use the term loosely to mean something like this: a </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">quotient problem</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> is what a problem looks like if you wear a certain kind of filter over your eyes. The filter hides some aspect of the problem that simplifies it. You solve the simplified problem and then take off the filter. You now ‘lift’ the solution of the simplified problem to a solution to the full problem. The catch is that your filter needs to match your problem so I’ll start by giving an example where the filter doesn’t work.</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Suppose we want to add a list of integers, say: 123, 423, 934, 114. We can try simplifying this problem by wearing a filter that makes numbers fuzzy so we can’t distinguish numbers that differ by less than 10. When we wear this filter 123 looks like 120, 423 looks like 420, 934 looks like 930 and 114 looks like 110. So we can try adding 120+420+930+110. This is a simplified problem and in fact this is a common technique to get approximate answers via mental arithmetic. We get 1580. We might hope that when wearing our filters, 1580 looks like the correct answer. But it doesn’t. The correct answer is 1594. This filter doesn’t respect addition in the sense that if </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">a</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> looks like </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">a’</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> and </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">b</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> looks like </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">b’</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> it doesn’t follow that </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">a</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">+</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">b</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> looks like </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">a’</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">+</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">b</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">’.</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">To solve a problem via quotient problems we usually need to find a filter that does respect the original problem. So let’s wear a different filter that allows us just to see the last digit of a number. Our original problem now looks like summing the list 3, 3, 4, 4. We get 4. This </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">is</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> the correct last digit. If we now try a filter that allows us to see just the last two digits we see that summing 23, 23, 34, 14 does in fact give the correct last two digits. This is why the standard elementary school algorithms for addition and multiplication work through the digits from right to left: at each stage we’re solving a quotient problem but the filter only respects the original problem if it allows us to see the digits to the right of some point, not digits to the left. This filter </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">does</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> respect addition in the sense that if </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">a</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> looks like </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">a’</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> and </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">b</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> looks like </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">b’</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> then </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">a</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">+</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">b</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> looks like </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">a’</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">+</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">b</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">’.</span></div><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><br /></span></div><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-M6YpX5gqPU8/U3d7-m612uI/AAAAAAAAIMI/foFvFkr6ui8/s1600/chessboard.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-M6YpX5gqPU8/U3d7-m612uI/AAAAAAAAIMI/foFvFkr6ui8/s1600/chessboard.png" height="320" width="320" /></a></div><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Another example of the quotient approach is to look at the <a href="http://en.wikipedia.org/wiki/Knight's_tour">knight’s tour</a> problem in the case where two opposite corners have been removed from the chessboard. A knight’s tour is a sequence of knight’s moves that visit each square on a board exactly once. If we remove opposite corners of the chessboard, there is no knight’s tour of the remaining 62 squares. How can we prove this? If you don’t see the trick you can get get caught up in all kinds of complicated reasoning. So now put on a filter that removes your ability to see the spatial relationships between the squares so you can only see the colours of the squares. This respects the original problem in the sense that a knight’s move goes from a black square to a white square, or from a white square to a black square. The filter doesn’t stop us seeing this. But now it’s easier to see that there are two more squares of one colour than the other and so no knight’s tour is possible. We didn’t need to be able to see the spatial relationships at all.</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">(Note that this is the same trick as we use for arithmetic, though it’s not immediately obvious. If we think of the spatial position of a square as being given by a pair of integers (x, y), then the colour is given by x+y modulo 2. In other words, by the last digit of x+y written in binary. So it’s just the see-only-digits-on-the-right filter at work again.)</span></div><b style="font-weight: normal;"><br /></b><br /><h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"><span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Wearing filters while programming</span></h2><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">So now think about developing some code in a dynamic language like Python. Suppose we execute the line:</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><span style="font-family: Courier New, Courier, monospace;"><b>a = 1</b></span></span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The Python interpreter doesn’t just store the integer 1 somewhere in memory. It also stores a tag indicating that the data is to be interpreted as an integer. When you come to execute the line:</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b><span style="font-family: Courier New, Courier, monospace;">b = a+1</span></b></span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">it will first examine the tag in </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">a</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> indicating its type, in this case </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">int</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">, and use that to determine what the type for </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">b</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> should be.</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now suppose we wear a filter that allows us to see the tag indicating the type of some data, but not the data itself. Can we still reason about what our program does?</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">In many cases we can. For example we can, in principle, deduce the type of</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b><span style="font-family: Courier New, Courier, monospace;">a+b*(c+1)/(2+d)</span></b></span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">if we know the types of </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b>a</b></span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">, </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b>b</b></span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">, </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b>c</b></span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">, </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b>d</b></span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. (As I’ve said once before, it’s hard to make any reliable statement about a bit of Python code so let's suppose that </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b>a</b></span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">, </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b>b</b></span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">, </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b>c</b></span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> and </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b>d</b></span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> are all either of type </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">int</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> or type </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">float</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">.) We can read and understand quite a bit of Python code wearing this filter. But it’s easy to go wrong. For example consider</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b><span style="font-family: Courier New, Courier, monospace;">if a>1 then:</span></b></span></div><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><b><span style="font-family: Courier New, Courier, monospace;"><span style="background-color: transparent; color: black; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="background-color: transparent; color: black; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">return 1.0</span></span></b></div><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><b><span style="font-family: Courier New, Courier, monospace;">else:</span></b></span></div><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><b><span style="font-family: Courier New, Courier, monospace;"><span style="background-color: transparent; color: black; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="background-color: transparent; color: black; font-size: 15px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">return 1</span></span></b></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The type of the result depends on the value of the variable a. So if we’re wearing the filter that hides the data, then we can’t predict what this snippet of code does. When we run it, it might return an </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">int</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> sometimes and a </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">float</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> other times, and we won’t be able to see what made the difference.</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">In a statically typed language you </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">can</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> predict the type of an expression knowing the type of its parts. This means you can reason reliably about code while wearing the hide-the-value filter. This means that almost any programming problem can be split into two parts: a quotient problem where you forget about the values, and then problem of lifting a solution to the quotient problem to a solution to the full problem. Or to put that in more conventional language: designing your data and function types, and then implementing the code that fits those types.</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">I chose to make the contrast between dynamic and static languages just to make the ideas clear but actually you can happily use similar reasoning for both types of language. Compilers for statically typed languages, give you a lot of assistance if you choose to solve your programming problems this way.</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">A good example of this at work is given in Haskell. If you're writing a compiler, say, you might want to represent a piece of code as an abstract syntax tree, and implement algorithms that recurse through the tree. In Haskell the type system is strong enough that once you’ve defined the tree type the form of the recursion algorithms is often more or less given. In fact, it can be tricky to implement tree recursion </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">in</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">correctly and have the code compile without errors. Solving the quotient problem of getting the types right gets you much of the way towards solving the full problem.</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">And that’s my main point: types aren’t simply a restriction mechanism to help you avoid making mistakes. Instead they are a way to reduce some complex programming problems to simpler ones. But the simpler problem isn’t a </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sub</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">problem, it’s a </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">quotient</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> problem.</span></div><h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"><span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Dependent types</span></h2><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Dependently typed languages give you even more flexibility with what filters you wear. They allow you to mix up values and types. For example both C++ and Agda (to pick an unlikely pair) allow you to wear filters that hide the values of elements in your arrays while allowing you to see the length of your arrays. This makes it easier to concentrate on some aspects of your problem while completely ignoring others.</span></div><b style="font-weight: normal;"><br /></b><br /><h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"><span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Notes</span></h2><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">I wrote the first draft of this a couple of years ago but never published it. I was motivated to post by a discussion kicked off by Voevodsky on the TYPES mailing list </span><span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"><a href="http://lists.seas.upenn.edu/pipermail/types-list/2014/001745.html" style="text-decoration: none;">http://lists.seas.upenn.edu/pipermail/types-list/2014/001745.html</a></span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">This article isn’t a piece of rigorous mathematics and I’m using mathematical terms as analogies.</span></div><b style="font-weight: normal;"><br /></b><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The notion of a subproblem isn’t completely distinct from a quotient problem. Some problems are both, and in fact some problems can be solved by transforming them so they become both.</span></div><br /><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">More generally, looking at computer programs through different filters is one approach to abstract interpretation </span><a href="http://en.wikipedia.org/wiki/Abstract_interpretation" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">http://en.wikipedia.org/wiki/Abstract_interpretation</span></a><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. The intuition section there (</span><a href="http://en.wikipedia.org/wiki/Abstract_interpretation#Intuition" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">http://en.wikipedia.org/wiki/Abstract_interpretation#Intuition</span></a><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">) has much in common with what I’m saying.</span></div>Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com4tag:blogger.com,1999:blog-11295132.post-31935670455334094142014-04-25T21:42:00.000-07:002014-04-25T21:51:11.820-07:00The Monad called Free<b>Introduction</b><p>As Dan Doel points out <a href="https://www.fpcomplete.com/user/dolio/many-roads-to-free-monads">here</a>, the gadget <tt>Free</tt> that turns a functor into a monad is itself a kind of monad, though not the usual kind of monad we find in Haskell. I'll call it a higher order monad and you can find a type class corresponding to this in various places including an old version of Ed Kmett's <a href="http://comonad.com/haskell/category-extras/dist/doc/html/category-extras/Control-Monad-HigherOrder.html">category-extras</a>. I'll borrow some code from there. I hunted around and couldn't find an implementation of <tt>Free</tt> as an instance of this class so I thought I'd plug the gap. <P><BR><pre><br />> {-# LANGUAGE RankNTypes, FlexibleContexts, InstanceSigs, ScopedTypeVariables #-}<br /><P><BR><br />> import Control.Monad<br />> import Data.Monoid<br /><P><BR><br /></pre>To make things unambiguous I'll implement free monads in the usual way here: <P><BR><pre><br />> data Free f a = Pure a | Free (f (Free f a))<br /><P><BR><br />> instance Functor f => Functor (Free f) where<br />> fmap f (Pure a) = Pure (f a)<br />> fmap f (Free a) = Free (fmap (fmap f) a)<br /><P><BR><br />> instance Functor f => Monad (Free f) where<br />> return = Pure<br />> Pure a >>= f = f a<br />> Free a >>= f = Free (fmap (>>= f) a)<br /><P><BR><br /></pre>The usual Haskell typeclass <tt>Monad</tt> corresponds to monads in the category of types and functions, <tt>Hask</tt>. We're going to want monads in the category of endomorphisms of <tt>Hask</tt> which I'll call <tt>Endo</tt>. <P><BR>The objects in <tt>Endo</tt> correspond to Haskell's <tt>Functor</tt>. The arrows in <tt>Endo</tt> are the natural transformations between these functors: <P><BR><pre><br />> type Natural f g = (Functor f, Functor g) => forall a. f a -> g a<br /><P><BR><br /></pre>So now we are led to consider functors in <tt>Endo</tt>. <P><BR><pre><br />> class HFunctor f where<br /><P><BR><br /></pre>A functor in <tt>Endo</tt> must map functors in <tt>Hask</tt> to functors in <tt>Hask</tt>. So if <tt>f</tt> is a functor in <tt>Endo</tt> and <tt>g</tt> is a functor in <tt>Hask</tt>, then <tt>f g</tt> must be another functor in <tt>Hask</tt>. So there must be an <tt>fmap</tt> associated with this new functor. There's an associated <tt>fmap</tt> for every <tt>g</tt> and we collect them all into one big happy natural family: <P><BR><pre><br />> ffmap :: Functor g => (a -> b) -> f g a -> f g b<br /><P><BR><br /></pre>But note also that by virtue of being a functor itself, <tt>f</tt> must have its own <tt>fmap</tt> type function associated with it. The arrows in <tt>Endo</tt> are natural transformations in <tt>Hask</tt> so the <tt>fmap</tt> for <tt>HFunctor</tt> must take arrows in <tt>Endo</tt> to arrows in <tt>Endo</tt> like so: <P><BR><pre><br />> hfmap :: (Functor g, Functor h) => Natural g h -> Natural (f g) (f h)<br /><P><BR><br /></pre>Many constructions in the category <tt>Hask</tt> carry over to <tt>Endo</tt>. In <tt>Hask</tt> we can form a product of type types <tt>a</tt> and <tt>b</tt> as <tt>(a, b)</tt>. In <tt>Endo</tt> we form the product of two functors <tt>f</tt> and <tt>g</tt> as <P><BR><pre><br />> data Product f g a = Product (f (g a))<br /><P><BR><br /></pre>Note that this product isn't commutative. We don't necessarily have an isomorphism from <tt>Product f g</tt> to <tt>Product g f</tt>. (This breaks many attempts to transfer constructions from <tt>Hask</tt> to <tt>Endo</tt>.) We also won't explicitly use <tt>Product</tt> because we can simply use the usual Haskell composition of functors inline. <P><BR>We can implement some functions that act on product types in both senses of the word "product": <P><BR><pre><br />> left :: (a -> c) -> (a, b) -> (c, b)<br />> left f (a, b) = (f a, b)<br /><P><BR><br />> right :: (b -> c) -> (a, b) -> (a, c)<br />> right f (a, b) = (a, f b)<br /><P><BR><br />> hleft :: (Functor a, Functor b, Functor c) => Natural a c -> a (b x) -> c (b x)<br />> hleft f = f<br /><P><BR><br />> hright :: (Functor a, Functor b, Functor c) => Natural b c -> a (b x) -> a (c x)<br />> hright f = fmap f<br /><P><BR><br /></pre>(Compare with what I wrote <a href="http://blog.sigfpe.com/2008/11/from-monoids-to-monads.html">here</a>.) <P><BR>We have something in <tt>Endo</tt> a bit like the type with one element in <tt>Hask</tt>, namely the identity functor. The product of a type <tt>a</tt> with the one element type in <tt>Hask</tt> gives you something isomorphic to <tt>a</tt>. In <tt>Endo</tt> the product is composition for which the identity functor is the identity. (Two different meanings of the word "identity" there.) <P><BR>We also have sums. For example, if we define a functor like so <P><BR><pre><br />> data F a = A a | B a a<br /><P><BR><br /></pre>we can think of <tt>F</tt> as a sum of two functors: one with a single constructor <tt>A</tt> and another with constructor <tt>B</tt>. <P><BR>We can now think about reproducing an <tt>Endo</tt> flavoured version of lists. The usual definition is isomorphic to: <P><BR><pre><br />> data List a = Nil | Cons a (List a)<br /><P><BR><br /></pre>And it has a <tt>Monoid</tt> instance: <P><BR><pre><br />> instance Monoid (List a) where<br />> mempty = Nil<br />> mappend Nil as = as<br />> mappend (Cons a as) bs = Cons a (mappend as bs)<br /><P><BR><br /></pre>We can try to translate that into <tt>Endo</tt>. The <tt>Nil</tt> part can be thought of as being an element of a type with one element so it should become the identity functor. The <tt>Cons a (List a)</tt> part is a product of <tt>a</tt> and <tt>List a</tt> so that should get replaced by a composition. So we expect to see something vaguely like: <P><BR><pre><br />List' a = Nil' | Cons' (a (List' a))<br /><P><BR><br /></pre>That's not quite right because <tt>List' a</tt> is a functor, not a type, and so acts on types. So a better definition would be: <P><BR><pre><br />List' a b = Nil' b | Cons' (a (List' a b))<br /><P><BR><br /></pre>That's just the definition of <tt>Free</tt>. So free monads are lists in <tt>Endo</tt>. As everyone knows :-) monads are just monoids in the category of endofunctors. Free monads are also just free monoids in the category of endofunctors. <P><BR>So now we can expect many constructions associated with monoids and lists to carry over to monads and free monads. <P><BR>An obvious one is the generalization of the singleton map <tt>a -> List a</tt>: <P><BR><pre><br />> singleton :: a -> List a<br />> singleton a = Cons a Nil<br /><P><BR><br />> hsingleton :: Natural f (Free f)<br />> hsingleton f = Free (fmap Pure f)<br /><P><BR><br /></pre>Another is the generalization of <tt>foldMap</tt>. This can be found under a variety of names in the various <a href="http://www.haskellforall.com/2012/06/you-could-have-invented-free-monads.html">free monad</a> libraries out there but this implementation is designed to highlight the similarity between monoids and monads: <P><BR><pre><br />> foldMap :: Monoid m => (a -> m) -> List a -> m<br />> foldMap _ Nil = mempty<br />> foldMap f (Cons a as) = uncurry mappend $ left f $ right (foldMap f) (a, as)<br /><P><BR><br />> fold :: Monoid m => List m -> m<br />> fold = foldMap id<br /><P><BR><br />> hFoldMap :: (Functor f, Functor m, Monad m) => Natural f m -> Natural (Free f) m<br />> hFoldMap _ (Pure x) = return x<br />> hFoldMap f (Free x) = join $ hleft f $ hright (hFoldMap f) x<br /><P><BR><br />> hFold :: Monad f => Natural (Free f) f<br />> hFold = hFoldMap id<br /><P><BR><br /></pre>The similarity here isn't simply formal. If you think of a list as a sequence of instructions then <tt>foldMap</tt> interprets the sequence of instructions like a computer program. Similarly <tt>hFoldMap</tt> can be used to <a href="http://debasishg.blogspot.com/2013/01/a-language-and-its-interpretation.html">interpret</a> programs for which the free monad provides an abstract syntax tree. <P><BR>You'll find some of these functions <a href="https://hackage.haskell.org/package/control-monad-free-0.5.3/docs/Control-Monad-Free.html">here</a> by different names. <P><BR>Now we can consider <tt>Free</tt>. It's easy to show this is a <tt>HFunctor</tt> by copying a suitable definition for <tt>List</tt>: <P><BR><pre><br />> instance Functor List where<br />> fmap f = foldMap (singleton . f)<br /><P><BR><br />> instance HFunctor Free where<br />> ffmap = fmap<br />> hfmap f = hFoldMap (hsingleton . f)<br /><P><BR><br /></pre>We can define <tt>HMonad</tt> as follows: <P><BR><pre><br />> class HMonad m where<br />> hreturn :: Functor f => f a -> m f a<br />> hbind :: (Functor f, Functor g) => m f a -> Natural f (m g) -> m g a<br /><P><BR><br /></pre>Before making <tt>Free</tt> an instance, let's look at how we'd make <tt>List</tt> an instance of <tt>Monad</tt><P><BR><pre><br />> instance Monad List where<br />> return = singleton<br />> m >>= f = fold (fmap f m)<br /><P><BR><br /></pre>And now the instance I promised at the beginning. <P><BR><pre><br />> instance HMonad Free where<br />> hreturn = hsingleton<br />> hbind m f = hFold (hfmap f m)<br /><P><BR><br /></pre>I've skipped the proofs that the monad laws hold and that <tt>hreturn</tt> and <tt>hbind</tt> are actually natural transformations in <tt>Endo</tt>. Maybe I'll leave those as exercises for the reader. <P><BR><b>Update</b><p>After writing this I tried googling for "instance HMonad Free" and I found <a href="http://lpaste.net/93806">this</a> by haasn. There's some other good stuff in there too. Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com6tag:blogger.com,1999:blog-11295132.post-57154742591009961052014-02-01T17:53:00.000-08:002014-02-01T21:24:33.184-08:00Reinversion Revisited<b>Introduction</b><p>A <a href="http://blog.sigfpe.com/2011/10/quick-and-dirty-reinversion-of-control.html">while back</a> I talked about the idea of reinversion of control using the continuation monad to wrest control back from an interface that only wants to call you, but doesn't want you to call them back. I want to return to that problem with a slightly different solution. The idea is that we build an interpreter for an imperative language that's an embedded Haskell DSL. You arrange that the DSL does the work of waiting to be called by the interface, but from the point of view of the user of the DSL it looks like you're calling the shots. To do this I'm going to pull together a bunch of techniques I've talked about before. This approach is largely an application of what apfelmus described <a href="http://apfelmus.nfshost.com/articles/operational-monad.html">here</a>. <P><BR><b>The code</b><p>We'll start with some administrative stuff before getting down to the real code: <P><BR><pre><br />> {-# LANGUAGE TemplateHaskell #-}<br /><P><BR><br />> import Control.Lens<br />> import Control.Monad<br />> import Control.Monad.Loops<br /><P><BR><br /></pre>We'll make our DSL an imperative wrapper around Gloss: <P><BR><pre><br />> import Graphics.Gloss.Interface.Pure.Game<br /><P><BR><br /></pre>We'll define a structure that can be used to represent the abstract syntax tree (AST) of our DSL. Our DSL will support the reading of inputs, adding pictures to the current picture, and clearing the screen. <P><BR>First we'll need a wrapper that allows us to represent ordinary Haskell values in our DSL: <P><BR><pre><br />> data Basic a = Return a<br /><P><BR><br /></pre>Now we want an expression that represents events given to us by Gloss. Internally we'll represent this by a function that says what our program does if it's given an event. It says what our program does by returning another AST saying what happens when the input is received. (I've previously talked about these kinds of expression trees <a href="http://blog.sigfpe.com/2009/12/where-do-monads-come-from.html">here</a>). <P><BR><pre><br />> | Input (Event -> Basic a)<br /><P><BR><br /></pre>We have a command to render some graphics. It appends a new <tt>Picture</tt> to the current picture. Again, part of the AST muct be another AST saying what happens after the picture is rendered: <P><BR><pre><br />> | Render Picture (Basic a)<br /><P><BR><br /></pre>And lastly here's the AST for a clear screen command: <P><BR><pre><br />> | Cls (Basic a)<br /><P><BR><br /></pre>Our AST will form a monad. This will allow us to build ASTs using ordinary Haskell do-notation. This technique is what I described previously <a href="http://blog.sigfpe.com/2010/01/monads-are-trees-with-grafting.html">here</a>. <P><BR><pre><br />> instance Monad Basic where<br />> return = Return<br />> Return a >>= f = f a<br />> Input handler >>= f = Input (\e -> handler e >>= f)<br />> Render p a >>= f = Render p (a >>= f)<br />> Cls a >>= f = Cls (a >>= f)<br /><P><BR><br /></pre>You can think of the expression <tt>x >>= f</tt> as <tt>x</tt> with the tree <tt>f a</tt> grafted in to replace any occurrence of <tt>Return a</tt> in it. This is exactly what <tt>Return a >>= f</tt> does. But applying <tt>>>= f</tt> to the other ASTs simply digs down "inside" the ASTs to find other occurrences of <tt>Return a</tt>. <P><BR>It's convenient to uses lenses to view Gloss's game world: <P><BR><pre><br />> data World = World { _program :: Basic (), _picture :: Picture }<br />> $(makeLenses ''World)<br /><P><BR><br /></pre>And now we have some wrappers around the interpreter's commands. The <tt>return ()</tt> provides the convenient place where we can graft subtrees into our AST. <P><BR><pre><br />> input = Input return<br />> render p = Render p (return ())<br />> cls = Cls (return ())<br /><P><BR><br /></pre>Now we can start coding. Here's a test to see if a Gloss event is a key down event: <P><BR><pre><br />> keydown (EventKey (Char key) Down _ _) = True<br />> keydown (EventKey (SpecialKey KeySpace) Down _ _) = True<br />> keydown _ = False<br /><P><BR><br /></pre>And now here's a complete program using our DSL. It's deliberately very imperative. It simply iterates over a nested pair of loops, collecting keystrokes and displaying them. It reads a lot like an ordinary program written in a language like Python or Basic: <P><BR><pre><br />> mainProgram = do<br />> render (Color white $ Scale 0.2 0.2 $ Text "Type some text")<br /><P><BR><br />> forM_ [780, 760..] $ \ypos -> do<br />> forM_ [0, 20..980] $ \xpos -> do<br /><P><BR><br />> event <- iterateUntil keydown $ input<br /><P><BR><br />> let key = case event of<br />> EventKey (Char key) Down _ _ -> key<br />> EventKey (SpecialKey KeySpace) Down _ _ -> ' '<br /><P><BR><br />> when (ypos == 780 && xpos == 0) $ cls<br />> render $ Color white $ Translate (xpos-500) (ypos-400) $ Scale 0.2 0.2 $ Text $ [key]<br /><P><BR><br /></pre>Here is where we launch everything, placing our program and starting <tt>Blank</tt> picture into the <tt>World</tt>. <P><BR><pre><br />> main = play (InWindow "Basic" (1000, 800) (10, 10))<br />> black <br />> 60<br />> (World mainProgram Blank)<br />> (^. picture)<br />> handleEvent<br />> (const id)<br /><P><BR><br /></pre>So now we need just one more ingredient, an actual interpreter for our AST. It's the event handler: <P><BR><pre><br />> handleEvent :: Event -> World -> World<br /><P><BR><br /></pre>The <tt>Return</tt> command is purely a place to graft in subtrees. It should never be interpreted. <P><BR><pre><br />> handleEvent _ (World (Return a) _) = error "error!"<br /><P><BR><br /></pre>After receiving some input, I want the interpreter to keep interpreting commands such as <tt>Cls</tt> that don't need any more input. I'm going to do this by using a null event <tt>EventMotion (0,0)</tt>. But when an input really is desired, I want this null event to be ignored. <P><BR><pre><br />> handleEvent (EventMotion (0, 0)) state@(World (Input handler) _) = state<br /><P><BR><br /></pre>We render something by <tt>mappend</tt>ing it to the current picture stored in the <tt>World</tt>. But the rendering is carried out by the event handler. We update the state so that at the next event, the subtree of the AST is executed. This means that after updating the picture, the event still needs to be handed back to the event handler: <P><BR><pre><br />> handleEvent event state@(World (Render p cont) _) = state & (picture <>~ p) & (program .~ cont) & handleEvent event<br /><P><BR><br /></pre>Clearing the screen is similar: <P><BR><pre><br />> handleEvent event state@(World (Cls cont) _) = state & (picture .~ Blank) & (program .~ cont) & handleEvent event<br /><P><BR><br /></pre>And now we need to handle inputs. We do this by applying the "what happens when the input is received" function to the event. The result is put back in the state indicating that this is what we want to happen at the next event. So the interpreter doesn't stop here, waiting for the next event, the interpreter sends itself a null event. <P><BR><pre><br />> handleEvent event state@(World (Input handler) _) = state & (program .~ handler event) & handleEvent (EventMotion (0, 0))<br /><P><BR><br /></pre>And that's it! <P><BR>There are many changes that can be made. We can easily add more commands and make the state more complex. But you might also notice that we create the AST only to tear it apart again in the interpreter. We can actually elide the AST creation, but that will eventually bring us back to something like what I originally <a href="http://blog.sigfpe.com/2011/10/quick-and-dirty-reinversion-of-control.html">posted</a>. This shouldn't be a big surprise, I've already shown how any monad can be replaced with the continuation monad <a href="http://blog.sigfpe.com/2008/12/mother-of-all-monads.html">here</a>. By the way, it's pretty easy to add a <tt>Fork</tt> command. You can replace the <tt>_program :: Basic()</tt> field with <tt>_program :: [Basic ()]</tt> and interpret this as a list of threads using a scheduler of your choice. <P><BR><b>Acknowledgements</b><p>I was prompted to write this (a little late, I know) after reading <a href="http://www.thev.net/PaulLiu/invert-inversion.html">this article</a> and Tekmo's <a href="http://www.reddit.com/r/haskell/comments/z4inb/invert_the_inversion_of_control/c61fzrd">post on reddit</a>. I think ultimately continuations may perform better than using ASTs. But sometimes it's nice to build an AST because they give you an object that can easily be reasoned about and manipulated by code. Much as I love trickery with continuations, I find ASTs are much easier to think about. <P><BR><b>Postscript</b><p>My real motivation was that I was thinking about games. The rules of games are often given in imperative style: first player 1 does this. Then they do this. If this happens they do that. And then it's player two's turn. I wanted my Haskell code to reflect that style. <P><BR><b>Update</b><p>Added 'null' event to keep interpreter going when it makes sense to do so, but there's no event pending. Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com4tag:blogger.com,1999:blog-11295132.post-28604911791596145242013-10-25T21:00:00.000-07:002013-10-29T18:50:14.446-07:00Distributed computing with alien technology<b>Introduction</b><p>Suppose we are given a function <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> of <img src="https://chart.googleapis.com/chart?cht=tx&chl=2n" style="vertical-align:middle"> boolean arguments that returns a boolean result. Alice has <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle"> bits, <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5C%7Bx_1%2C%20x_2%2C%20%5Cdots%2C%20x_n%5C%7D" style="vertical-align:middle"> and Bob has another <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle"> bits <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5C%7By_1%2C%20y_2%2C%20%5Cldots%2C%20y_n%5C%7D" style="vertical-align:middle">. Alice and Bob are widely separated and don't know each other's bits. What is the total number of bits that Alice has to send to Bob and that Bob has to send to Alice so that between them they can compute <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%28x_1%2C%20%5Cldots%2C%20x_n%2C%20y_1%2C%20%5Cldots%2C%20y_n%29" style="vertical-align:middle">? Think about how complex <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> might get. The <img src="https://chart.googleapis.com/chart?cht=tx&chl=x_i" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=y_i" style="vertical-align:middle"> might each describe half of a "voxelised" region of space and <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> might answer a question about a computational fluid dynamics (CFD) simulation running in that space. CFD simulations can be chaotic and so we might expect that in the worst case many bits have to be transferred back and forth between Alice and Bob. In the worst case we might expect that Alice has to send Bob all of her bits, or vice versa. <P><BR>But in fact Alice needs to send Bob just one bit. <P><BR><b>A loophole</b><p>To get the communication requirements down to one bit we need to use a loophole. But I hope to (1) justify the cheat to some extent and (2) justify that it's even worthwhile to think about cheats. <P><BR>Alice and Bob have access to some <a href="http://en.wikipedia.org/wiki/Ancient_technology_in_Stargate">Ancient technology</a>. They each have one of a pair of boxes. At prearranged times, Alice puts a bit into her box, and Bob puts a bit into his box. A bit pops back out of Alice's box and a bit pops back out of Bob's box. Whatever the input, both Alice and Box have a 0.5 chance of seeing a one or zero pop out of their respective boxes. But when the two outputs are <a href="http://en.wikipedia.org/wiki/Exclusive_or">XORed</a> together the result is the logical AND of the two inputs. With such boxes, Alice can compute <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> after Bob sends a single bit down a conventional communication channel. <P><BR> <center><a href="http://4.bp.blogspot.com/-W8wrhhYGSYo/Ums6t6E8B2I/AAAAAAAAEUY/38oA2Cdk2fs/s1600/nonlocal.png" imageanchor="1"><img border="0" src="http://4.bp.blogspot.com/-W8wrhhYGSYo/Ums6t6E8B2I/AAAAAAAAEUY/38oA2Cdk2fs/s400/nonlocal.png" /></a></center> <P><BR>"But this is a total cheat!" you complain before I even start to explain their technique. It seems Alice receives a bit that depends on what Bob input, and so Bob is communicating with Alice. But look closely and you'll see that the boxes don't allow any communication. No matter what Bob inputs, Alice has a 0.5 chance of getting zero or one. There is no way Bob can use this to communicate anything. It's like intercepting a message encrypted with a <a href="http://en.wikipedia.org/wiki/One-time_pad">one time pad</a>. Without the pad, the message is basically a sequence of random bits. Nonetheless, it is true that the outputs that Alice and Bob see are correlated. <P><BR>I hope I've convinced you that Alice and Bob can't send any bits with these boxes. Despite this, it is pretty clear that the behaviour of the boxes is non-local. We'll call any kind of boxes that allow instantaneous long range correlations that can't be explained by purely local behaviour <i>non-local boxes</i>. Boxes that can't be used for message sending are called <i>non-signalling local boxes</i>. And the particular non-local box I describe above is called a <i>PR box</i> (eg. see <a href="http://en.wikipedia.org/wiki/Quantum_nonlocality#Superquantum_nonlocality">here</a>). <P><BR>(BTW As an aside note that as the box results in widely separated outputs that are correlated, but doesn't allow communication, it's an example of how non-locality doesn't imply communication. Usually when people want to give examples of such a thing they talk about quantum mechanics. But there's no need to mention quantum mechanics to explain the behaviour of these particular non-local boxes.) <P><BR><b>The method</b><p>Any single bit boolean function of a finite sequence of bits can be written as a polynomial modulo 2. Each monomial in the polynomial can be written as a product of terms involing just the <img src="https://chart.googleapis.com/chart?cht=tx&chl=x_i" style="vertical-align:middle"> and terms involving just the <img src="https://chart.googleapis.com/chart?cht=tx&chl=y_i" style="vertical-align:middle">, ie. <blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=f%28x_1%2C%20%5Cldots%2C%20x_n%2C%20y_1%2C%20%5Cldots%2C%20y_n%29%20%3D%20%5Csum_%7Bj%7DA_jB_j" style="vertical-align:middle"></blockquote>where <img src="https://chart.googleapis.com/chart?cht=tx&chl=A_j" style="vertical-align:middle"> depends only on the <img src="https://chart.googleapis.com/chart?cht=tx&chl=x_i" style="vertical-align:middle">, <img src="https://chart.googleapis.com/chart?cht=tx&chl=B_j" style="vertical-align:middle"> depends only on the <img src="https://chart.googleapis.com/chart?cht=tx&chl=y_i" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=j" style="vertical-align:middle"> is drawn from some finite set. Alice can compute the <img src="https://chart.googleapis.com/chart?cht=tx&chl=A_j" style="vertical-align:middle"> and Bob can compute the <img src="https://chart.googleapis.com/chart?cht=tx&chl=B_j" style="vertical-align:middle">. Now Alice and Bob, in parallel, feed <img src="https://chart.googleapis.com/chart?cht=tx&chl=A_j" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=B_j" style="vertical-align:middle"> respectively into their PR boxes. We know that we could evaluate each term in the sum we want by adding Alice's output to Bob's output. But that would require sending one one-bit message for each <img src="https://chart.googleapis.com/chart?cht=tx&chl=j" style="vertical-align:middle">. But we don't need each term one by one; we just want the sum. So Alice and Bob can individually sum their separate outputs knowing that adding Alice's output and Bob's output modulo 2 will be the correct sum. So Bob sends his sum to Alice. Alice adds that number to her own (modulo 2) and that's the value we want. Only one one-bit message was sent. <P><BR><b>But what about reality?</b><p>Non-local boxes don't exist, do they? So why are we talking about them? <P><BR>Actually, non-local boxes exist both theoretically and in the lab. <a href="http://en.wikipedia.org/wiki/Quantum_nonlocality">Non-local correlations in quantum mechanics</a> allow them to be constructed. But for this article I wanted to abstract from quantum mechanics and talk about the behaviour of a non-local box without getting my hands dirty with the details of quantum mechanics. Having said that, although non-local boxes do exist, the special case of the PR box <a href="http://en.wikipedia.org/wiki/Tsirelson's_bound">can't</a> in fact be constructed with quantum mechanics. In some sense it allows correlations that are "too strong". An <a href="http://blog.sigfpe.com/2010/11/beating-odds-with-entangled-qubits.html">article</a> I wrote a while back describes the closest you can get to building a PR box with quantum correlations. Curiously, if you restrict yourself to the kind of non-local box quantum mechanics allows you to build you find that some functions can still be computed with less communication than you'd need if non-local correlations are disallowed. Nonetheless, the worst case scenario with QM still requires the sending of <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle"> bits. <P><BR>Going further there's an interesting conjecture. It says that any non-local box that is even marginally better (in some sense) than what quantum mechanics allows is powerful enough to allow the computation of any <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> with only a single bit of communication. It suggests that quantum mechanics is right at the edge of the space of possible physics that make life difficult for us. If quantum mechanics were to be tweaked the tiniest amount to make correlations any stronger, large numbers of difficult distributed computing problems would suddenly collapse to become trivial. If the conjecture is true it means that nature looks a bit like a conspiracy to keep computer scientists in work. (It's possible the conjecture has been decided one way or the other by now.) <P><BR><b>Final words</b><p>There are a couple of papers about universes where PR boxes can be built; so called <a href="http://arxiv.org/abs/1210.0618">boxworlds</a>. There is a <a href="http://arxiv.org/abs/quant-ph/0211089">lot</a> of interesting theoretical work in characterising quantum mechanics. In particular there are a number of theorems and conjectures that describe QM in the form "the most X theory that doesn't allow Y" where X is an interesting property and Y is something you'd like to do. <P><BR><b>References</b><p>I learnt all of this from the paper <a href="http://arxiv.org/abs/quant-ph/0501159">Implausible Consequences of Superstrong Nonlocality</a> by Wim van Dam. Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com3tag:blogger.com,1999:blog-11295132.post-23255842911365007142013-10-12T09:12:00.000-07:002013-10-12T09:47:35.839-07:00What stops us defining Truth?<b>Introduction</b><br />Recall the standard cartoon sketch of the proof of Gödel's first <a href="http://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems">incompleteness theorem</a>. We start by defining a predicate, <img src="https://chart.googleapis.com/chart?cht=tx&chl=Prov" style="vertical-align: middle;" />, that is true if and only if its argument is provable. (Or more accurately, <img src="https://chart.googleapis.com/chart?cht=tx&chl=Prov%28g%29" style="vertical-align: middle;" /> is true if <img src="https://chart.googleapis.com/chart?cht=tx&chl=g" style="vertical-align: middle;" /> is the Gödel number of a provable proposition.) With some <a href="http://en.wikipedia.org/wiki/Quine_%28computing%29">quining</a> we can use this to construct the proposition <img src="https://chart.googleapis.com/chart?cht=tx&chl=G" style="vertical-align: middle;" /> which says <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cneg%20Prov%28G%29" style="vertical-align: middle;" />. The proposition <img src="https://chart.googleapis.com/chart?cht=tx&chl=G" style="vertical-align: middle;" /> asserts its own unprovability. <br />Suppose instead we define a predicate <img src="https://chart.googleapis.com/chart?cht=tx&chl=True" style="vertical-align: middle;" /> which holds if its argument is true. We can use this to construct the proposition <img src="https://chart.googleapis.com/chart?cht=tx&chl=H" style="vertical-align: middle;" /> which says <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cneg%20True%28H%29" style="vertical-align: middle;" />. Then if <img src="https://chart.googleapis.com/chart?cht=tx&chl=H" style="vertical-align: middle;" /> is true it must also be false and if it's false then it must be true. We seem to have a paradox. The loophole is that we assumed the existence of the predicate <img src="https://chart.googleapis.com/chart?cht=tx&chl=True" style="vertical-align: middle;" />. So this argument demonstrates that there is actually no such predicate. This is <a href="http://en.wikipedia.org/wiki/Tarski's_undefinability_theorem">Tarski's undefinability theorem</a>. <br />But what exactly stops us defining <img src="https://chart.googleapis.com/chart?cht=tx&chl=True" style="vertical-align: middle;" />? What goes wrong if we attempt to define a predicate that analyses the parts of a proposition to tell us whether or not it is true? <br /><br /><b>Note</b><br />This article is written in English. But as is standard in much of mathematics, unless I state otherwise, I'm using English largely as shorthand for an argument that could, in principle, be written in the formal language of Set Theory. So I will allow myself to use all of the usual reasoning methods that are available in ZF, even when talking about other formal systems such as Peano Arithmetic. <br /><br /><b>Defining Truth for Propositional Calculus</b><br />Suppose we're given a proposition from propositional calculus like <img src="https://chart.googleapis.com/chart?cht=tx&chl=A%5Cwedge%20B" style="vertical-align: middle;" />. We can use a syntactic approach to determining whether or not it is true. We determine whether or not <img src="https://chart.googleapis.com/chart?cht=tx&chl=A" style="vertical-align: middle;" /> is true, then whether or not <img src="https://chart.googleapis.com/chart?cht=tx&chl=B" style="vertical-align: middle;" /> is true, and then the whole proposition is true if both <img src="https://chart.googleapis.com/chart?cht=tx&chl=A" style="vertical-align: middle;" /> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=B" style="vertical-align: middle;" /> are true. Similarly <img src="https://chart.googleapis.com/chart?cht=tx&chl=A%5Cvee%20B" style="vertical-align: middle;" /> is true if either <img src="https://chart.googleapis.com/chart?cht=tx&chl=A" style="vertical-align: middle;" /> or <img src="https://chart.googleapis.com/chart?cht=tx&chl=B" style="vertical-align: middle;" /> is true. Of course <img src="https://chart.googleapis.com/chart?cht=tx&chl=A" style="vertical-align: middle;" /> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=B" style="vertical-align: middle;" /> might themselves be compound propositions using <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cwedge" style="vertical-align: middle;" />, <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cvee" style="vertical-align: middle;" /> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cneg" style="vertical-align: middle;" />. But that's fine, that simply means that to define truth for such propositions we need to employ recursion. In fact, we can straightforwardly turn such a definition into a recursive computer program. <br />(Ultimately with propositional calculus we hit the leaves which are atomic propositions like <img src="https://chart.googleapis.com/chart?cht=tx&chl=P" style="vertical-align: middle;" />. Typically when we ask about the truth of a proposition in propositional calculus we've already made an assignment of truth values to the atomic propositions. So the base case for the recursion is straightforward.) <br />We can illustrate the process with a diagram: <br /><a href="http://1.bp.blogspot.com/-MtdQbPOOZWg/Ula_Cpkz76I/AAAAAAAAEQ0/VTie_xMlRNo/s1600/tree1.png" imageanchor="1"><img border="0" src="http://1.bp.blogspot.com/-MtdQbPOOZWg/Ula_Cpkz76I/AAAAAAAAEQ0/VTie_xMlRNo/s1600/tree1.png" /></a> <br />The truth value of a node in the tree is determined by the truth of the propositions hanging underneath it. We have a parent-child relation between a proposition and its subexpressions. Recursion allows us to make a definition by defining what happens on the leaves of such a tree, and by saying how the definition at a node is built from that of its children. <br /><br /><b>Defining truth for Peano Arithmetic</b><br />We can go further and attempt this approach with Peano Arithmetic (PA). The catch is that we need to consider quantifiers. For example, consider this proposition from Peano arithmetic: <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cforall%20x.%5Cneg%28Sx%3D0%29" style="vertical-align: middle;" />. This proposition is true if and only if <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cneg%28Sx%3D0%29" style="vertical-align: middle;" /> is true whatever number we substitute for <img src="https://chart.googleapis.com/chart?cht=tx&chl=x" style="vertical-align: middle;" /> in the expression. <br /><a href="http://3.bp.blogspot.com/-B_7nYmPprKo/UlbNG_Qj4EI/AAAAAAAAERE/LKs7sGwb45Q/s1600/tree2.png" imageanchor="1"><img border="0" src="http://3.bp.blogspot.com/-B_7nYmPprKo/UlbNG_Qj4EI/AAAAAAAAERE/LKs7sGwb45Q/s1600/tree2.png" /></a> <br />The proposition at the top of the tree above is true if all of the immediate children are true and their truth is in turn determined by the truth of the propositions immediately below them. With some work this eventually leads to a perfectly good definition of truth for propositions in PA. Because we have nodes with infinitely many children we don't get an algorithm guaranteed to terminate, but that's not a problem for a definition in ZF. Note that we don't literally prove the infinitely many child propositions one at a time. Instead what happens is that to define the truth of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cforall%20x.Q" style="vertical-align: middle;" /> we define it in terms of the truth of some infinite family of propositions all based on <img src="https://chart.googleapis.com/chart?cht=tx&chl=Q" style="vertical-align: middle;" />. ZF is perfectly good at dealing with such definitions without us having to list every element of our family explicitly. <br />Note how in this case the tree isn't the parse tree of the proposition. It's much bigger with nodes that have infinite branching. But that's fine, there's nothing about infinite branching that prevents us making a recursive definition. So we can ultimately extend the idea for defining truth in propositional calculus to include quanifiers and then all of Peano arithmetic. <br /><br /><b>Defining truth for ZF</b><br />But the approach used for PA looks like it might work perfectly well for ZF as well. For example, our definition of truth would say that <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5Cforall%20x.P%28x%29" style="vertical-align: middle;" /> is true if <img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28x%29" style="vertical-align: middle;" /> is true whatever set we substitute for <img src="https://chart.googleapis.com/chart?cht=tx&chl=x" style="vertical-align: middle;" />. In ZF there is no difficulty in defining a predicate that uses quantification over all sets. So it seems we can define <img src="https://chart.googleapis.com/chart?cht=tx&chl=True" style="vertical-align: middle;" /> for ZF in ZF, contradicting Tarski's theorem. <br /><br /><b>What went wrong?</b><br />Recursive definitions typically rely on the parent-child relation I mentioned above. To recursively define something we (1) define it for all leaves and then (2) specify how the definition at a parent is given in terms of the value for all of its children. We then invoke a recursion theorem of some sort to show how this uniquely defines our object for everything in our universe. For example, one form of recursion in Peano arithmetic has <img src="https://chart.googleapis.com/chart?cht=tx&chl=0" style="vertical-align: middle;" /> as its leaf and the only child of <img src="https://chart.googleapis.com/chart?cht=tx&chl=S%28n%29" style="vertical-align: middle;" /> is <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align: middle;" />. The induction axiom for PA can be used to show that definitions using this parent-child relation are valid. <br />Similarly in ZF we have the empty set as leaf and the children of a set are simply its elements. But now we need to look closely at the recursion principle we need. For ZF we need to invoke the <a href="http://en.wikipedia.org/wiki/Transfinite_induction#Transfinite_recursion">Transfinite Recursion Theorem</a>. Transfinite recursion is very powerful. It's not just limited to induction over sets. It can also be used for induction over classes. For example if you need to recursively define a function on the class of all sets it can allow this. (Strictly speaking it'll be a <i>function class</i> rather than a function.) But now comes the catch. If you take a look at the Wikipedia article it mentions that the parent-child relation, <img src="https://chart.googleapis.com/chart?cht=tx&chl=R" style="vertical-align: middle;" />, needs to be <i>set-like</i> (though as the article is currently written it's almost an afterthought). For this theorem to apply we need the collection of children of a proposition to form a set. But to prove the truth of a proposition with a quantifier at the front we need to prove something is true for all children where there is one child for each set. This means the children don't form a set. So we can't use transfinite recursion. And this means the informal definition of truth I gave above can't be turned into a rigorous definition. <br /><br /><b>Conclusion</b><br />I think this issue is quite subtle. It's really easy to say in English "this thing is true if that thing is true for all sets". Such a sentence in isolation can often be turned into a rigorous proposition in ZF. But if that sentence is part of a collection of sentences that refer to each other forming an attempt at a mutually recursive definition, you need to check precisely what parent-child relation you're using. <br /><br /><b>Acknowledgement</b><br />Thanks to <a href="http://mathoverflow.net/users/3902/sridhar-ramesh">Sridar Ramesh</a> for making clear to me why the attempted definition of truth in ZF doesn't work. But I've probably made some mistakes above and they have nothing to do with Sridar.Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com5tag:blogger.com,1999:blog-11295132.post-66458143520873744132013-04-14T18:08:00.003-07:002013-04-14T18:08:36.649-07:00Why Heisenberg can't stop atomic collapse<h4>TL;DR</h4><div>A heuristic argument to show that hydrogen atoms are stable and have a minimum energy level is wrong. I will assume undergraduate level quantum mechanics in the discussion.</div><h4>Introduction</h4>There's a popular argument used to explain why atoms are stable. It shows there is a lower bound on the energy level of an electron in the atom that makes it impossible for electrons to keep "falling" forever all the way down to the nucleus. You'll <a href="http://www.chem1.com/acad/webtut/atomic/WhyTheElectron.html">find</a> it not only in popular <a href="http://books.google.com/books?id=KsnbNL_rh04C&lpg=PP1&pg=SA2-PA6#v=onepage&q=compromise&f=false">science</a> <a href="http://books.google.com/books?id=-vhCqN2twGQC&lpg=PA634&dq=hydrogen%20energy%20level%20uncertainty%20principle&pg=PA102#v=onepage&q=uncertainty%20principle&f=false">books</a> but in <a href="http://web.njit.edu/~gary/321/Lecture5.html">courses</a> and textbooks on <a href="http://www.pha.jhu.edu/~rt19/hydro/node1.html">quantum</a> mechanics.<br /><br />A rough version of the argument goes like this: the closer an electron falls towards the nucleus the lower its potential energy gets. But the more closely bound to the nucleus it is, the more accurately we know its position and hence, by Heisenberg's uncertainty principle (HUP), the less accurately we know its momentum. Increased variance in the momentum corresponds to an increase in kinetic energy. Eventually the decrease in potential energy as the electron falls is balanced by an increase in kinetic energy and the electron has reached a stable state.<br /><br />The problem is, this argument is wrong. It's wrong related to the kind of heuristic reasoning about wavefunctions that I've talked about <a href="http://blog.sigfpe.com/2013/01/aliasing-and-heisenberg-uncertainty.html">before</a>.<br /><br />Before showing it's wrong, let's make the argument a bit more rigorous.<br /><h4>Bounding wavefunctions</h4><div class="separator" style="clear: both; text-align: left;">The idea is to show that for any possible normalised wavefunction ψ of an electron in a Coulomb potential, the expected energy is bounded below by some constant. So we need to show that</div><div class="separator" style="clear: both; text-align: center;"><a href="http://chart.apis.google.com/chart?cht=tx&chl=\langle\psi|E|\psi\rangle" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://chart.apis.google.com/chart?cht=tx&chl=\langle\psi|E|\psi\rangle" /></a></div><div class="" style="clear: both; text-align: left;">is bounded below where</div><div class="separator" style="clear: both; text-align: center;"><a href="http://chart.apis.google.com/chart?cht=tx&chl=E=\frac{p^2}{2m}%2D\frac{e^2}{4\pi\epsilon_0r}" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://chart.apis.google.com/chart?cht=tx&chl=E=\frac{p^2}{2m}%2D\frac{e^2}{4\pi\epsilon_0r}" /></a></div><div class="" style="clear: both; text-align: left;">and <i>p</i> is momentum.</div><div class="" style="clear: both; text-align: left;">Consider a wavefunction that is confined mainly around the nucleus so</div><div class="" style="clear: both; text-align: left;"><div class="separator" style="clear: both; text-align: center;"><a href="http://chart.apis.google.com/chart?cht=tx&chl=\langle\psi|x^2|\psi\rangle=a^2" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://chart.apis.google.com/chart?cht=tx&chl=\langle\psi|x^2|\psi\rangle=a^2" style="cursor: move;" /></a></div><br /></div><div class="" style="clear: both; text-align: left;">The first fact we need is that <a href="http://en.wikipedia.org/wiki/Uncertainty_principle">Heisenberg uncertainty principle</a> tells us that </div><div class="separator" style="clear: both; text-align: center;"><a href="http://chart.apis.google.com/chart?cht=tx&chl=\langle\psi|x^2|\psi\rangle\langle\psi|p^2|\psi\rangle\ge\hbar^2/4" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://chart.apis.google.com/chart?cht=tx&chl=\langle\psi|x^2|\psi\rangle\langle\psi|p^2|\psi\rangle\ge\hbar^2/4" /></a></div>(assuming we're in a frame of reference where the expected values of <i>p</i> and <i>x</i> are zero).<br /><br />If the wavefunction is spread out with a standard deviation of <i>a</i> then the electron is mostly around a distance <i>a</i> from the nucleus. So the second fact is that we can roughly approximate the expected value of 1/<i>r</i> as 1/<i>a</i>.<br /><br />Combine these two facts and we get, roughly, that<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://chart.apis.google.com/chart?cht=tx&chl=\langle\psi|E|\psi\rangle\ge\frac{\hbar^2}{8ma^2}-\frac{e^2}{4\pi\epsilon_0a}" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://chart.apis.google.com/chart?cht=tx&chl=\langle\psi|E|\psi\rangle\ge\frac{\hbar^2}{8ma^2}-\frac{e^2}{4\pi\epsilon_0a}" style="cursor: move;" /></a></div>I hope you can see that the right hand side, as a function of <i>a</i>, is bounded below. The graph of the right hand side as a function of <i>a</i> looks like:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-zoXWqteM-gM/UUJswRdpssI/AAAAAAAACFU/VAZ4W8awVN0/s1600/potential.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="206" src="http://4.bp.blogspot.com/-zoXWqteM-gM/UUJswRdpssI/AAAAAAAACFU/VAZ4W8awVN0/s320/potential.png" width="320" /></a></div>It's now an exercise in calculus to find a lower bound on the expected energy. You can find the details in countless places on the web. Here a link to an example from <a href="http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-007-electromagnetic-energy-from-motors-to-lasers-spring-2011/lecture-notes/MIT6_007S11_lec38.pdf">MIT</a>, which may have come directly from Feynman's Lectures on Physics.<br /><br /><h4>The problem</h4><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-HV8KT3aarrg/UUJvvFtF8MI/AAAAAAAACFc/3BIZYD2AbG0/s1600/fuzz.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="http://1.bp.blogspot.com/-HV8KT3aarrg/UUJvvFtF8MI/AAAAAAAACFc/3BIZYD2AbG0/s320/fuzz.png" width="320" /></a></div>The above discussion assumes that the wavefunction is basically a single packet confined around a distance <i>a</i> from the nucleus, something like that graphed above. But if a lower energy state can be found with a different wavefunction the electron will eventually find it, or an even lower energy state. In fact, by using a wavefunction with multiple peaks we will find that the Heisenberg uncertainty principle doesn't give a lower bound at all.<br /><br />We'll use a wavefunction like this:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-98fhKE2Xkc0/UUJw1O7PikI/AAAAAAAACFk/_YSzdFb7e7c/s1600/double.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="http://4.bp.blogspot.com/-98fhKE2Xkc0/UUJw1O7PikI/AAAAAAAACFk/_YSzdFb7e7c/s320/double.png" width="320" /></a></div>It has a packet around the origin just like before but it also has a sharp peak around <i>r</i>=<i>l</i>. As I'm showing ψ as a function of <i>r</i> this means we have a shell of radius <i>l</i>.<br /><br /><div class="separator" style="clear: both;"><br class="Apple-interchange-newline" /></div>Let's say<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://images-onepick-opensocial.googleusercontent.com/gadgets/proxy?container=onepick&gadget=a&rewriteMime=image%2F*&url=http%3A%2F%2Fchart.apis.google.com%2Fchart%3Fcht%3Dtx%26chl%3D%5Cpsi%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D(%5Cpsi_1%252B%5Cpsi_2)" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="http://images-onepick-opensocial.googleusercontent.com/gadgets/proxy?container=onepick&gadget=a&rewriteMime=image%2F*&url=http%3A%2F%2Fchart.apis.google.com%2Fchart%3Fcht%3Dtx%26chl%3D%5Cpsi%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%7D%7D(%5Cpsi_1%252B%5Cpsi_2)" /></a></div><br />where ψ<sub>1</sub> is normalized and peaked near the original and ψ<sub>2 </sub>is our shell of radius <i>l</i>. Assume no overlap between ψ<sub>1</sub> and ψ<sub>2</sub>.<br /><br />In this case you can see that we can make<br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="http://images-onepick-opensocial.googleusercontent.com/gadgets/proxy?container=onepick&gadget=a&rewriteMime=image%2F*&url=http%3A%2F%2Fchart.apis.google.com%2Fchart%3Fcht%3Dtx%26chl%3D%5Clangle%5Cpsi%7Cx%5E2%7C%5Cpsi%5Crangle%3D%5Cfrac%7B1%7D%7B2%7D(%5Clangle%5Cpsi_1%7Cx%5E2%7C%5Cpsi_1%5Crangle%252B%5Clangle%5Cpsi_2%7Cx%5E2%7C%5Cpsi_2%5Crangle)" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://images-onepick-opensocial.googleusercontent.com/gadgets/proxy?container=onepick&gadget=a&rewriteMime=image%2F*&url=http%3A%2F%2Fchart.apis.google.com%2Fchart%3Fcht%3Dtx%26chl%3D%5Clangle%5Cpsi%7Cx%5E2%7C%5Cpsi%5Crangle%3D%5Cfrac%7B1%7D%7B2%7D(%5Clangle%5Cpsi_1%7Cx%5E2%7C%5Cpsi_1%5Crangle%252B%5Clangle%5Cpsi_2%7Cx%5E2%7C%5Cpsi_2%5Crangle)" /></a></div><div class="separator" style="clear: both; text-align: left;">as large as we like by making <i>l</i> as large as we like while still leaving us free to make the central peak whatever shape we want. This means that the estimate of </div><div class="separator" style="clear: both; text-align: center;"><a href="http://images-onepick-opensocial.googleusercontent.com/gadgets/proxy?container=onepick&gadget=a&rewriteMime=image%2F*&url=http%3A%2F%2Fchart.apis.google.com%2Fchart%3Fcht%3Dtx%26chl%3D%5Clangle%5Cpsi%7Cp%5E2%7C%5Cpsi%5Crangle" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://images-onepick-opensocial.googleusercontent.com/gadgets/proxy?container=onepick&gadget=a&rewriteMime=image%2F*&url=http%3A%2F%2Fchart.apis.google.com%2Fchart%3Fcht%3Dtx%26chl%3D%5Clangle%5Cpsi%7Cp%5E2%7C%5Cpsi%5Crangle" /></a></div><div class="separator" style="clear: both; text-align: left;">coming from HUP can be made as small as we like while making the central peak as close to a Dirac delta as we want. Informally, HUP controls of the overall spread of the wave function but not the spread of individual peaks within it.</div><div class="separator" style="clear: both; text-align: left;"><br /></div>For a large enough shell, ψ<sub>2 </sub>contributes little to the total expected potential energy, but ψ<sub>1</sub> can contribute an arbitrarily low amount because we can concentrate it in areas where 1/<i>r</i> is as large as we want. So we can make the total expected potential energy as low as we like. And yet we can also keep the estimate of the kinetic energy given by HUP as close to zero as we like. So contrary to the original argument, the Heisenberg uncertainty principle doesn't give us a lower bound on the energy at all. The argument is wrong.<br /><h4>But wait, we know there is a lowest energy state...</h4><div>Yes, the energy of a wavefunction in a Coulomb potential is in fact bounded below. After all, atoms are stable. But the Heisenberg uncertainty principle doesn't show it. The inequality in HUP becomes an equality when the wavefunction is a Gaussian function. It provides a good bound for functions that are roughly Gaussian, ie. that form a single "lump". But it provides only weak bounds for wavefunctions with multiple peaks and in this case it's not the appropriate tool to use.</div><div><br /></div><div>The Heisenberg uncertainty principle is an inequality about ordinary functions interpreted in the context of quantum mechanics (QM). The field of functional analysis provides many such inequalities. A great paper by Lieb, <a href="http://www.pas.rochester.edu/~rajeev/phy246/lieb.pdf">The Stability of Matter</a>, gives an inequality due to Sobolev that can also be interpreted in the context of QM. Sobolev's inequality is more appropriate when considering the hydrogen atom and it gives a good lower bound, demonstrating that the hydrogen atom is stable after all.</div><h4>But wait, the Heisenberg uncertainty principle argument gives the right energy...</h4><div>Getting a correct answer doesn't always justify the methods. I can give at least two reasons why the original method appears to work.</div><div><br /></div><div>1. The HUP gives a good bound for wavefunctions that are roughly Gaussian. The lowest energy level for the hydrogen atom is given (very roughly) by such a function. So an estimate based on HUP should be roughly correct. However, HUP alone can't tell us that the lowest energy state is Gaussian. The argument is only useful if we can get this information from somewhere else.</div><div><br /></div><div>2. You can get an estimate for the lowest energy level of the hydrogen atom (assuming it exists) by dimensional analysis. Invalid physical arguments that are dimensionally correct will often give the correct result because there is only one dimensionally correct expression possible.</div><h4>But wait, it's just a heuristic argument...</h4><div>Heuristic arguments are crucial to physics. But when similar heuristic arguments give opposite results they become problematic. In particular, it's no good saying an argument is inexact or qualitative when it gives a bound on the energy that isn't just off by an order of magnitude, but completely fails to give a bound at all. Part of the issue here is that the Coulomb potential goes to infinity as <i>r</i> goes to zero and so more care is required. The HUP argument above can be adapted to give good results when the potential is bounded below, for example it gives a reasonable estimate for square wells.<br /><br />But there may be a clever way of using HUP to bound the energy that I haven't seen. If you can see it, please tell me.<br /><h4>The source</h4>Most of what I said above I learnt from the excellent paper on the <a href="http://www.pas.rochester.edu/~rajeev/phy246/lieb.pdf">Stability of Matter</a> by Lieb that I mentioned above.</div>Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com10tag:blogger.com,1999:blog-11295132.post-52725081900210555172013-01-13T16:59:00.002-08:002013-01-13T16:59:59.651-08:00Aliasing and the Heisenberg uncertainty principle.<div><b>TL;DR</b><br /><b><br /></b>The<a href="http://en.wikipedia.org/wiki/Dirac_comb"> Dirac comb</a> is an example of a wavefunction whose position and momentum aren't fuzzy.<br /><b><br /></b><b>Introduction</b></div><div><br /></div>The Heisenberg uncertainty principle says that if you have a particle in some state and observe either its momentum or its position then the products of the standard deviations of distributions of the outcomes satisfy this identity:<br /><div><div class="separator" style="clear: both; text-align: center;"><a href="http://chart.apis.google.com/chart?cht=tx&chl=\Delta_x\Delta_p\ge{\hbar\over2}" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://chart.apis.google.com/chart?cht=tx&chl=\Delta_x\Delta_p\ge{\hbar\over2}" /></a></div></div><div><br /></div><div>I think many people have a mental picture a bit like this:</div><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-0QAKEv2RDj4/UMIToJcuihI/AAAAAAAABvs/8ZebaYdJteg/s1600/clouds.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="144" src="http://3.bp.blogspot.com/-0QAKEv2RDj4/UMIToJcuihI/AAAAAAAABvs/8ZebaYdJteg/s320/clouds.png" width="320" /></a></div><div><div class="separator" style="clear: both; text-align: center;"><br /></div>You can know the position and momentum with some degree of fuzziness and you can trade the fuzziness between the two measurements as long as the product of their sizes is larger than ℏ/2.</div><div><br /></div><div>Here's another way of thinking about that kind of picture (assuming some units I haven't specified):</div><div><br /><center><div><table><tbody><tr><td>position</td><td>=</td><td>123.4???</td></tr><tr><td>momentum</td><td>=</td><td>65?.???</td></tr></tbody></table></div></center></div><div>The idea is that the question mark represents digits we don't know well. As you move towards the right in the decimal representation our certainty in the accuracy of the digit quickly goes downhill to the point where we can't reasonably write digits.</div><div><br /></div><div>But this picture is highly misleading. For example, the following state of affairs is also compatible with the uncertainty principle, in suitably chosen units:<br /><br /><center><table><tbody><tr><td>position</td><td>=</td><td>...???.123...</td></tr><tr><td>momentum</td><td>=</td><td>...???.654...</td></tr></tbody></table></center><br />In other words, it's compatible with the uncertainty principle that we could know the digits beyond the decimal point to as much accuracy as we like as long as we don't know the digits before the point. It trivially satisfies Heisenberg's inequality because the variance of the position and the momentum aren't even finite quantities.<br /><br />But being compatible with Heisenberg uncertainty isn't enough for something to be realisable as a physical state. Is there a wavefunction that allows us to know the digits to the right of the decimal point as far as we want for both position and momentum measurements?<br /><br /><b>Sampling audio and graphics</b><br /><br />Maybe surprisingly, the worlds of audio and graphics can help us answer this question. Here's what a fraction of a second of music might look like when the pressure of the sound wave is plotted against time:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-sRMmpxbJ0QY/UPM-92ABHQI/AAAAAAAAB74/7hnsWP6muPQ/s1600/waveform.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="182" src="http://2.bp.blogspot.com/-sRMmpxbJ0QY/UPM-92ABHQI/AAAAAAAAB74/7hnsWP6muPQ/s320/waveform.png" width="320" /></a></div><br />But if we sample this signal at regular intervals, eg. at 44.1KHz for a CD, then we can graph the resulting signal as something like this:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-s0DrZ4EzU98/UPM-9xCZzCI/AAAAAAAAB70/c7sE9IFyH7I/s1600/sampled.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="182" src="http://3.bp.blogspot.com/-s0DrZ4EzU98/UPM-9xCZzCI/AAAAAAAAB70/c7sE9IFyH7I/s320/sampled.png" width="320" /></a></div><br />The red curve here is just to show what the original waveform looked like. The black vertical lines correspond to regular samples and we can represent them mathematically with Dirac delta functions multiplied by the amplitude measured at the sample.<br /><br />There is a well known problem with sampling like this. If you sample a signal that is a sine wave sin(ωt) at rate f then the signal sin((ω+2πnf)t) will generate exactly the same samples for any integer n. The following illustration shows what might happen:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-hFSUIh_soLI/UPNRsZrICUI/AAAAAAAAB8M/cd5wUiz2Dzw/s1600/alias.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="94" src="http://3.bp.blogspot.com/-hFSUIh_soLI/UPNRsZrICUI/AAAAAAAAB8M/cd5wUiz2Dzw/s320/alias.png" width="320" /></a></div><br />The two waveforms are sampled at the same regular intervals (shown by vertical lines) and give exactly the same amplitudes at those samples.<br /><br />This forms the basis for the famous <a href="http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem">Nyquist-Shannon sampling theorem</a>. You can reconstruct the original signal from regularly spaced samples only if it doesn't contain frequency components higher than half your sampling rate. Otherwise you get ambiguities in the form of high frequency parts of the signal masquerading as low frequency parts. This effect is known as <a href="http://en.wikipedia.org/wiki/Aliasing">aliasing</a>. As a result, the Fourier transform of a sampled function is periodic with the "repeats" corresponding to the aliasing.<br /><div class="separator" style="clear: both; text-align: center;"><br /></div>In the audio world you need to filter your sound to remove the high frequencies before you sample. This is frequently carried out with an analogue filter. In the 3D rendering world you need to do something similar. Ray tracers will send out many rays for each pixel, in effect forming a much higher resolution image than the resolution of the final result, and that high resolution image is filtered before being sampled down to the final resulting image. The "jaggies" you get from rendering polygons are an example of this phenomenon. It seems like jaggies have nothing to do with the world of Fourier transforms. But if you compute the Fourier transform of a polygonal image, remove suitable high frequency components, and then take the inverse Fourier transform before sampling you'll produce an image that's much more pleasing to the eye. In practice there are shortcuts to achieving much the same effect.<br /><br /><b>The connection to physics</b><br /><br />Now consider a particle whose wavefunction takes the form of the <a href="http://en.wikipedia.org/wiki/Dirac_comb">Dirac comb</a>:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-VhZftFEIMuI/UPNS_C_zdqI/AAAAAAAAB8k/Z7hR12vVwUE/s1600/comb.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="85" src="http://2.bp.blogspot.com/-VhZftFEIMuI/UPNS_C_zdqI/AAAAAAAAB8k/Z7hR12vVwUE/s320/comb.png" width="320" /></a></div><br />This is a wavefunction that is concentrated at multiples of some quantity a, ie. ∑δ(x-an) summing over n = ...,-1,0,1,2,... If the wavefunction is ψ(x) then the probability density function for the particle position is |ψ(x)|². So the particle has a zero probability of being found at points other than those where x=na. In other words, modulo a, the particle position is given precisely.<br /><br />But what about the particle momentum. Well the wavefunction has, in some sense, been sampled onto the points na, so we expect that whatever the momentum distribution is it'll be ambiguous modulo b where ab=ℏ. In fact, if we take the Fourier transform of the Dirac comb we get <a href="http://en.wikipedia.org/wiki/Dirac_comb#Fourier_transform">another Dirac comb</a>. So in the frequency domain we get the same kind of phenomenon: the momentum is concentrated at integer multiples of b. So now we know we have a wavefunction whose uncertainty precisely fits the description I gave above. We know the position precisely modulo a and the momentum precisely modulo b. In some sense this isn't contrived: we know the momentum modulo b precisely <i>because</i> of the aliasing that results from knowing the position modulo a.<br /><br /><b>What this means</b><br /><br />The message from this is that position-momentum uncertainty isn't fuzziness. At least it's not fuzziness in the ordinary sense of the word.<br /><br /><b>And in reality</b><br /><br />I'm not very experienced in attaching numbers to results from theoretical physics so I'd find it hard to say how accurately we can create a Dirac comb state in reality. When we measure a position using interferometry techniques we automatically compute the position modulo a wavelength so this isn't an unusual thing to do. Also an electron in a periodic potential may take on a form that consists of a train of equally spaced lumps. Even if not described exactly by a Dirac comb, we can still know the position modulo a and the momentum modulo b much more accurately than you might expect from a naive interpretation of the Heisenberg uncertainty principle as fuzziness.<br /><br /><b>Exercises</b><br />1. Investigate approximations to the Dirac comb: eg. what happens if we sum only a finite number of Dirac deltas, or replace each delta with a finite width Gaussian, or both.<br />2. Investigate the "twisted" Dirac comb: ∑δ(x-an)exp(inθ) where θ is some constant.</div><div></div>Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com9tag:blogger.com,1999:blog-11295132.post-36812113399542983282012-12-30T08:49:00.000-08:002012-12-30T12:21:47.001-08:00Shuffles, Bayes' theorem and continuations.<b>Introduction</b><br />Back in the 80s when I was a kid I came across a <a href="http://www.atomclub.nl/atom_plaatjes/boeken/pracprog/pp_card_trick.htm">program</a> for the BBC Micro that could tell what card you had picked from a deck of cards even though you'd buried your card within the deck wherever you wanted and had cut and shuffled the deck. I thought I'd try to implement a slightly more sophisticated version of the same trick that could handle multiple shuffles, and multiple types of shuffle. <br />The idea is that we prepare a deck of cards in some known sequence and have someone pick out a card and place it at the top of the deck. They perform some kind of randomisation procedure on the cards, eg. cut and shuffle it a couple of times, and then you get to look at the final sequence of cards. Can we tell which card was picked out? <br /><b>Some probability theory</b><br />Let's formalise this a bit. Our decks will have <img src="https://chart.googleapis.com/chart?cht=tx&chl=N" style="vertical-align: middle;" /> cards. There is a small number of initial states our deck can be in, corresponding to the known sequence with a single card moved to the top. Let's label these initial states <img src="https://chart.googleapis.com/chart?cht=tx&chl=I_i" style="vertical-align: middle;" />. There is a (usually large) number of permutations that could be applied through shuffling. We'll label these <img src="https://chart.googleapis.com/chart?cht=tx&chl=T_j" style="vertical-align: middle;" />. We'll try to do arrange that this isn't simply the set of all permutations (though it's not necessarily a disaster if it is). <br />We want to figure out the initial state given some final state <img src="https://chart.googleapis.com/chart?cht=tx&chl=F" style="vertical-align: middle;" />. In other words we want <br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28I_i%7CF%29" style="vertical-align: middle;" /></blockquote>We can use Bayes theorem to get: <br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28I_i%7CF%29%20%3D%20%7BP%28F%7CI_i%29P%28I_i%29%20%5Cover%20%5Csum_i%20P%28F%7CI_i%29P%28I_i%29%7D" style="vertical-align: middle;" /></blockquote>Now <img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28F%7CI_i%29" style="vertical-align: middle;" /> is the sum over all ways of starting with <img src="https://chart.googleapis.com/chart?cht=tx&chl=I_i" style="vertical-align: middle;" /> and ending up with <img src="https://chart.googleapis.com/chart?cht=tx&chl=F" style="vertical-align: middle;" />. So <br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28I_i%7CF%29%20%3D%20%5Csum_%7Bj%7DP%28T_j%29P%28I_i%29" style="vertical-align: middle;" /></blockquote>where the sum is over all <img src="https://chart.googleapis.com/chart?cht=tx&chl=j" style="vertical-align: middle;" /> such that <img src="https://chart.googleapis.com/chart?cht=tx&chl=T_j%28I_i%29%3DF" style="vertical-align: middle;" />. I'm assuming that the shuffles are independent of the initial sequence of cards. This gives us an algorithm. We do a brute force simulation of every possible shuffle that we're considering applied to each possible initial state. After each shuffle we sum the corresponding probability for those shuffles that give our known final state <img src="https://chart.googleapis.com/chart?cht=tx&chl=F" style="vertical-align: middle;" />. <br />Each shuffle <img src="https://chart.googleapis.com/chart?cht=tx&chl=T_j" style="vertical-align: middle;" /> is going to be built up as the product of a sequence of building blocks with each block randomly selected based on what happened before. Let's call the blocks names like <img src="https://chart.googleapis.com/chart?cht=tx&chl=B_k" style="vertical-align: middle;" />. So if <img src="https://chart.googleapis.com/chart?cht=tx&chl=T_j%3DB_3B_2B_1" style="vertical-align: middle;" /> then <img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28T_j%29%3DP%28B_1%29P%28B_2%7CB_1%29P%28B_3%7CB_1%2CB_2%29" style="vertical-align: middle;" />. As we work through the shuffle we will accumulate the probability. After the first block we have a probability of <img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28B_1%29" style="vertical-align: middle;" />. The probability after the second is <img src="https://chart.googleapis.com/chart?cht=tx&chl=P%28B_1%29P%28B_2%7CB_1%29" style="vertical-align: middle;" /> and so on. At any point we'll call the probability accumulated so far the <i>importance</i>. I've borrowed that name from the <a href="http://www.seanet.com/~myandper/importance.htm">world of rendering</a> because this algorithm has a remarkable similarity to recursive ray-tracing. <br /><b>Some computer science</b><br />I'd like to be able to chain a sequence of shuffles. But wait! There's a catch! Today's the day I finally want to get around to checking out the lambda expression support in C++. I've been putting this off for years. (I'm using gcc 4.7.) So I'm not going to have a Haskell non-determinism monad to make life easy. <br />Suppose I have two types of shuffle, type <img src="https://chart.googleapis.com/chart?cht=tx&chl=A" style="vertical-align: middle;" /> and type <img src="https://chart.googleapis.com/chart?cht=tx&chl=B" style="vertical-align: middle;" />. I could easily write a loop to iterate over all shuffles of type <img src="https://chart.googleapis.com/chart?cht=tx&chl=A" style="vertical-align: middle;" />, and in the innermost part of the loop I could call another loop over all shuffles of type <img src="https://chart.googleapis.com/chart?cht=tx&chl=B" style="vertical-align: middle;" />. But then if I want to replace <img src="https://chart.googleapis.com/chart?cht=tx&chl=B" style="vertical-align: middle;" /> with <img src="https://chart.googleapis.com/chart?cht=tx&chl=C" style="vertical-align: middle;" /> I have to change the code to replace the inner part with code for <img src="https://chart.googleapis.com/chart?cht=tx&chl=C" style="vertical-align: middle;" />. That's no good. I'd like to be able to replace the innermost part of the outer loop with any code I want without actually editing that part of the code. It's easy with lambda expressions. I write the type <img src="https://chart.googleapis.com/chart?cht=tx&chl=A" style="vertical-align: middle;" /> loop code so that it takes as argument a lambda function representing what I want done inside the loop. <br />There's another way of looking at this. You can skip this paragraph if you don't care about the connection to Haskell. But in Haskell you might do something like this by using a non-determinism monad, or even a probability monad. But as I pointed out a while back, you can fake every monad using the <a href="http://blog.sigfpe.com/2008/12/mother-of-all-monads.html">continuation monad</a>. One way to implement continuations in C++ is to use continuation passing style. And that's what I'll do. The continuations are just the lambdas that I mentioned in the previous paragraph. <br /><b>Some C++ code</b><br /><br /><br /><pre>> #include <iostream><br />> #include <cstdlib><br />> using namespace std;<br /></pre>You can bump this up the deck size if you have the CPU power: <br /><pre>> const int deck_size = 13;<br /></pre>A deck of cards is represented by a simple array of integers with each card being assigned a unique integer. <br /><pre>> struct Deck {<br />> int card[deck_size];<br />> bool operator==(const Deck &other) {<br />> for (int i = 0; i < deck_size; ++i) {<br />> if (card[i] != other.card[i]) {<br />> return false;<br />> }<br />> }<br />> return true;<br />> }<br />> };<br /></pre>The riffle shuffle works by splitting a deck into two piles and interleaving the parts onto a new destination deck. Here's a schematic diagram with the two piles coloured orange and blue: <a href="http://1.bp.blogspot.com/-0l2lmJO0xwc/UN_dyQZTuzI/AAAAAAAAB2k/wHjAx43U3Q4/s1600/riffle.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-0l2lmJO0xwc/UN_dyQZTuzI/AAAAAAAAB2k/wHjAx43U3Q4/s1600/riffle.png" /></a> <br />The function <tt>riffle_helper</tt> helps loop through all possible riffles. I could assume that each card arriving at the destination is equally likely to come from the left pile or the right pile. But I observe that whenever I do a real riffle shuffle the cards seem to come in 'runs'. So if a card falls from the left pile then the next one is more likely to as well. That's just an empirical observation based on a small number of trials, you can tweak the probabilities yourself to fit reality better. (Oh, and I got this code upside-down compared to what people really do. I need to fix it when I have a moment...) <br /><br /><pre>> enum Side {<br />> LEFT,<br />> RIGHT,<br />> NO_SIDE<br />> };<br /></pre>This function shuffles together cards from the locations given by <tt>left_ptr</tt> and <tt>right_ptr</tt> in <tt>src_deck</tt> into <tt>dest_deck</tt>, eventually calling <tt>cont</tt> on each result. I use a template because I don't know the type of the lambda expression I'm passing in. (If I want to know its type I think I have to mess with <tt>decltype</tt>. It's all a bit weird.) <br /><pre>> template<class Cont><br />> void riffle_helper(double importance, int split,<br />> int left_ptr, int right_ptr, int dest_ptr, Side oldside,<br />> const Deck &src_deck, Deck dest_deck, Cont cont) {<br />> if (dest_ptr == deck_size) {<br />> cont(importance, dest_deck);<br />> return;<br />> }<br /></pre>First I deal with the cases where one or other of the piles is empty so there's no choice about where the next card is coming from: <br /><pre>> if (left_ptr >= split) {<br />> dest_deck.card[dest_ptr] = src_deck.card[right_ptr];<br />> riffle_helper(importance, split, left_ptr, right_ptr+1, dest_ptr+1, RIGHT, src_deck, dest_deck, cont);<br />> return;<br />> }<br />> if (right_ptr >= deck_size) {<br />> dest_deck.card[dest_ptr] = src_deck.card[left_ptr];<br />> riffle_helper(importance, split, left_ptr+1, right_ptr, dest_ptr+1, LEFT, src_deck, dest_deck, cont);<br />> return;<br />> }<br />> double p;<br />> if (oldside == NO_SIDE) {<br />> p = 0.5;<br />> } else {<br />> p = LEFT == oldside ? 0.75 : 0.25;<br />> }<br />> double new_importance = importance*p;<br />> dest_deck.card[dest_ptr] = src_deck.card[left_ptr];<br />> riffle_helper(new_importance, split, left_ptr+1, right_ptr, dest_ptr+1, LEFT, src_deck, dest_deck, cont);<br /><br /><br />> if (oldside == NO_SIDE) {<br />> p = 0.5;<br />> } else {<br />> p = RIGHT == oldside ? 0.75 : 0.25;<br />> }<br />> new_importance = importance*p;<br />> dest_deck.card[dest_ptr] = src_deck.card[right_ptr];<br />> riffle_helper(new_importance, split, left_ptr, right_ptr+1, dest_ptr+1, RIGHT, src_deck, dest_deck, cont);<br />> }<br /><br /><br /></pre>The function <tt>riffle</tt> iterates over all possible riffle shuffles of <tt>src_deck</tt> calling <tt>cont</tt> on each one. Note that I assume that when the deck is split into two before shuffling together, each pile has at least 3 cards. You may want to change that assumption. <br /><pre>> template<class Cont><br />> void riffle(double importance, const Deck &src_deck, Cont cont) {<br />> double new_importance = importance/(deck_size-5);<br />> for (int split = 3; split < deck_size-2; ++split) {<br />> riffle_helper(new_importance, split, 0, split, 0, NO_SIDE, src_deck, Deck(), cont);<br />> }<br />> }<br /></pre>Iterate over all possible cuts of <tt>src_dec</tt> calling <tt>cont</tt> on each result. I assume the cut leaves at least 3 cards in each pile. <br /><pre>> template<class Cont><br />> void cut(double importance, const Deck &src_deck, Cont cont) {<br />> double new_importance = importance/(deck_size-5);<br />> for (int split = 3; split < deck_size-2; ++split) {<br />> Deck new_deck;<br />> for (int i = 0; i < deck_size; ++i) {<br />> if (i < deck_size-split) {<br />> new_deck.card[i] = src_deck.card[i+split];<br />> } else {<br />> new_deck.card[i] = src_deck.card[i-(deck_size-split)];<br />> }<br />> }<br />> cont(new_importance, new_deck);<br />> }<br />> }<br /></pre>Overhand shuffle remaining cards in <tt>src_deck</tt> to <tt>dest_deck</tt>. Here's an attempt to represent what an overhand shuffle does. It reverses the order of a deck that has been split into segments. The order within each segment is left unchanged. <a href="http://1.bp.blogspot.com/-WDBfySFVWq0/UN-wow18WmI/AAAAAAAAB2U/0XCiy6OadgE/s1600/shuffle.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="179" src="http://1.bp.blogspot.com/-WDBfySFVWq0/UN-wow18WmI/AAAAAAAAB2U/0XCiy6OadgE/s400/shuffle.png" width="400" /></a> <br /><br /><pre>> template<class Cont><br />> void overhand_helper(double importance, const Deck &src_deck,<br />> int cards_left, Deck dest_deck, Cont cont) {<br />> if (cards_left <= 0) {<br />> cont(importance, dest_deck);<br />> } else {<br />> double new_importance = importance/cards_left;<br />> for (int ncards = 1; ncards <= cards_left; ++ncards) {<br />> //<br />> // Take i cards from the source and place them at the bottom of the<br />> // destination.<br />> //<br />> for (int j = 0; j < ncards; ++j) {<br />> dest_deck.card[cards_left-ncards+j] = src_deck.card[deck_size-cards_left+j];<br />> }<br />> overhand_helper(new_importance, src_deck, cards_left-ncards, dest_deck, cont);<br />> }<br />> }<br />> }<br /></pre>Iterate over all possible overhand shuffles of cards in <tt>src_deck</tt> calling <tt>cont</tt> on each result. In practice I often find overhand shuffles result in cards mysteriously jumping segments and messing up the algorithm, whereas poorly executed riffle shuffles still work fine. I'm also assuming that each time a pile of cards is transferred the size of the pile is chosen uniformly from the set of all possible segments at that stage. <br /><pre>> template<class Cont><br />> void overhand(double importance, const Deck &src_deck, Cont cont) {<br />> overhand_helper(importance, src_deck, deck_size, Deck(), cont);<br />> }<br /></pre>The final code doesn't bother computing the denominator from Bayes' theorem. The most likely initial state is given by the one that results in the highest score. If you normalise the scores to sum to one you'll get actual probabilities. <br /><pre>> int main() {<br /></pre>This is the array representation of the cards in the following picture: <br /><pre>> Deck target = {{ 10, 11, 6, 12, 1, 13, 8, 2, 9, 3, 5, 4, 7 }};<br /></pre><a href="http://1.bp.blogspot.com/-1uB8ofGbcn4/UN-no9o8zdI/AAAAAAAAB2A/M3hjS6ya0eg/s1600/photo.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://1.bp.blogspot.com/-1uB8ofGbcn4/UN-no9o8zdI/AAAAAAAAB2A/M3hjS6ya0eg/s400/photo.JPG" width="400" /></a> <br /><pre>> Deck deck;<br /></pre>Our known starting sequence is just 1, 2, 3, ..., J, Q, K. We iterate over all ways to pick a card out from this sequence and place it at the top. <br /><pre>> for (int k = 0; k < deck_size; ++k) {<br />> deck.card[0] = k+1;<br />> for (int i = 1; i < deck_size; ++i) {<br />> deck.card[i] = (i > k ? i : i-1)+1;<br />> }<br />> double likelihood = 0.0;<br /></pre>Here is where I use the lambdas. For this example I'm doing an overhand shuffle followed by a riffle shuffle. (The syntax is pretty bizarre and its also weird that I kinda sorta specify the type of my lambda but that's not really what the type of the expression is. But having manually faked and lifted lambdas many times in C++ I can see why it's the way it is.) Note how I've made <tt>likelihood</tt> mutable and have given these lambda expressions write access to it. <br /><pre>> overhand(1.0, deck, [&likelihood, target](double importance, Deck &deck) -> void {<br />> riffle(importance, deck, [&likelihood, target](double importance, Deck &deck) -> void {<br />> if (deck == target) {<br /></pre>We sum the probabilities for all ways of generating the target deck: <br /><pre>> likelihood += importance;<br />> }<br />> }); });<br />> cout << "If top card = " << deck.card[0] << endl;<br />> cout << "then unnormalised probability = " << likelihood << endl;<br />> cout << endl;<br />> }<br /><br /><br />> }<br /><br /><br /></pre>Run the above code and you get unnormalised probabilities <pre>If top card = 4<br />then unnormalised probability = 5.7568e-12<br />If top card = 6<br />then unnormalised probability = 5.37301e-11<br />If top card = 7<br />then unnormalised probability = 1.791e-11<br /></pre>In fact, I had chosen 6. <b>Some discussion</b><br />Don't expect it to work perfectly! It can only give probabilities but it's often surprisingly good. But there is a lot of room for improvement. Some work looking at how people actually shuffle could give a better probabilistic model. <br /><b>Some exercises.</b><br />1. The code can be made orders of magnitude faster. The final shuffle is performed and then the result is compared to the target sequence. But you can start comparing cards with the target before the shuffle is finished. Most times you'll only need to look at the first card of the result of a shuffle before you know you haven't matched the target. Fixing this will give a big speed up. <br />2. The continuation passing style makes it easy to incorporate other sources of knowledge. For example if you 'accidentally' peek at the bottom card after the first shuffle you can incorporate that knowledge into the algorithm. Figure out how. <br />3. Write lots more kinds of shuffles and experiment. I'm hoping someone good with magic will come up with a sequence of operations that looks hopelessly random but allows a good probability of recovering the chosen card. You could also combine this with other techniques such as designing shuffles that maintain various kinds of invariant. <br />4. The code can be rewritten to work backwards from the final state to the initial states. Work out how to do this. (This parallels ray-tracing where you can work from eye to light or from light to eye.) <br />5. We're doing the same work over and over again. We don't need to compute all of the shuffles for each initial state. We can compute each shuffle once and reuse it on each initial state. Try implementing it.<br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com3tag:blogger.com,1999:blog-11295132.post-58597078005838938992012-11-18T16:10:00.003-08:002012-11-18T17:20:48.383-08:00A pictorial proof of the hairy ball theorem<div class="separator" style="clear: both; text-align: left;">The hairy-ball theorem says that there is no continuous non-zero vector field on the surface of a sphere. There are lots of popular accounts that tell you what this means, giving great examples. Here's a Youtube video for example:</div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://3.gvt0.com/vi/B4UGZEjG02s/0.jpg" height="266" width="320"><param name="movie" value="http://www.youtube.com/v/B4UGZEjG02s&fs=1&source=uds" /><param name="bgcolor" value="#FFFFFF" /><param name="allowFullScreen" value="true" /><embed width="320" height="266" src="http://www.youtube.com/v/B4UGZEjG02s&fs=1&source=uds" type="application/x-shockwave-flash" allowfullscreen="true"></embed></object></div><br /><br />My goal is to show why it's always true.<br /><br />A simply connected domain in the plane is one with the property that any loop in it can be shrunk down to a point. Here's an example of a domain D with an example loop L being shrunk down to a point P:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-9mg_veDP_28/UKlWcYqBqII/AAAAAAAABog/n8no7sctL0M/s1600/simple.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="187" src="http://4.bp.blogspot.com/-9mg_veDP_28/UKlWcYqBqII/AAAAAAAABog/n8no7sctL0M/s320/simple.png" width="320" /></a></div>Here's an example of a domain that's not simply connected. It has a hole in the middle. I've drawn a L loop around the hole. You can't shrink that loop to a point because the hole gets in the way:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-4OxymO5spyo/UKlWzaq4xoI/AAAAAAAABoo/cZzX-fvEKus/s1600/Hole.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="http://4.bp.blogspot.com/-4OxymO5spyo/UKlWzaq4xoI/AAAAAAAABoo/cZzX-fvEKus/s320/Hole.png" width="320" /></a></div>Here's a simply connected domain with a vector field on it:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-0Q_WOphBHaY/UKlXVYgxuVI/AAAAAAAABow/lk1z8ig_Oqw/s1600/vectors.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="http://4.bp.blogspot.com/-0Q_WOphBHaY/UKlXVYgxuVI/AAAAAAAABow/lk1z8ig_Oqw/s320/vectors.png" width="320" /></a></div>Think of the vectors as being drawn literally in the surface so that if we were to pick up the surface and stretch it like a piece of rubber the vectors would get stretched with it. Remember that a vector field is defined everywhere in the domain so the arrows are just a random sprinkling of examples to show what's going on. For this to be an accurate picture you want to imagine an infinity of arrows, one at every single point of the domain.<br /><br />Let's put a loop, starting and ending at P, in our simply-connected domain:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-qpBTzqOxnic/UKlX3TG89yI/AAAAAAAABo4/l-ppnCXrh_g/s1600/loop.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="http://4.bp.blogspot.com/-qpBTzqOxnic/UKlX3TG89yI/AAAAAAAABo4/l-ppnCXrh_g/s320/loop.png" width="320" /></a></div>Now imagine travelling along the loop, starting at P and ending at P. As you move along there's an arrow at each point in your journey. Here's what the arrows look like as you travel from P to P anti-clockwise, plotted as a kind of graph:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-6XAYbR1UPwk/UKlYjzLekRI/AAAAAAAABpA/fJVU0IJt8nI/s1600/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="65" src="http://3.bp.blogspot.com/-6XAYbR1UPwk/UKlYjzLekRI/AAAAAAAABpA/fJVU0IJt8nI/s320/graph.png" width="320" /></a></div>The vectors start off pointing to the right. They swing anti-clockwise by about 45º and then swing back to where they started. As the journey is a loop they clearly must end where they started. A different, really swirly vector field, might have resulted in arrows that that rotated around hundreds of times along your journey. But by time you reach the end of the journey they must swing back to where they started. What's slightly less obvious is that they'd also have to rotate back to cancel out the hundreds of swings. You might think "the vectors could rotate round a hundred times but as long as they make exactly 100 turns they'll return to where they started and there's no need for them to unwind". But actually, every bit of rotation in the journey must be unwound. The total amount of rotation, adding all the positive rotations, and subtracting off the negative rotations, is called the winding number for the loop. We count anti-clockwise rotation as positive and clockwise as negative. So I'm claiming that the winding number for a closed loop in a simply-connected domain is always zero.<br /><br />(Note: in most books the winding number normally refers to how many times the loop itself winds around a point. I'm using it to refer to how many times the vector winds around itself you follow the loop. To help with your intuition: the hour hand of a working clock normally accumulates a winding number of -2 in one day. If it ran forward for a day, but then ran backwards for half a day, the winding number would be -1.)<br /><br />Here's why the winding number for simply connected domains must be zero: firstly - it's pretty clear that the winding number for any loop must be an integer. If the winding number was a half, say, the arrow wouldn't end up pointing 180º from where it started which makes no sense for a closed loop. Now the domain is simply connected, so the loop can be shrunk to a point. Now imagine doing the shrinking really slowly and keeping track of the winding number as the loop shrinks. As the loop shrinks, the graph of the vectors along the loop must vary slowly. The total winding number depends continuously on the vectors in the graph so the winding number must vary slowly as the loop shrinks. But the winding number is an integer. It can't change really slowly, it can only change by amounts of a whole integer. So the winding number can't change at all. Every loop in a simply-connected domain must have a winding number that's the same as the winding number of a loop that is just one point ie. zero.<br /><br />On to the sphere. Here's a sphere with a vector field where all of the vectors point along lines of longitude to the north pole:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-uIbgVRNRkgs/UKlbgHPxQyI/AAAAAAAABpQ/CIzzbfBlg98/s1600/north.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://1.bp.blogspot.com/-uIbgVRNRkgs/UKlbgHPxQyI/AAAAAAAABpQ/CIzzbfBlg98/s320/north.png" width="199" /></a></div>(Sorry about my poor quality drawing but I'm sure you know what vectors pointing north look like.)<br /><br />At this point you may be tempted to say "aha! That's a continuous vector field on the sphere that's non-zero everywhere!" Alas, it's not defined everywhere. It's a vector field everywhere except at the north and south poles. If you're at the north pole, no non-zero vector can point north. And at the south pole every non-zero vector points north with no continuous way to pick just one.<br /><br />Given any vector field on the Earth we can imagine slicing the earth through the equator and flattening out the surfaces of the northern and souther hemispheres as two separate disks. Here's what you get if you do this with the north vector field (ignoring the problems at the poles for now):<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-T5sO_pDr2lU/UKlmO9fcXLI/AAAAAAAABqQ/F10iEtAUKt4/s1600/maps.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="278" src="http://4.bp.blogspot.com/-T5sO_pDr2lU/UKlmO9fcXLI/AAAAAAAABqQ/F10iEtAUKt4/s320/maps.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div>You reconstruct the Earth again by gluing the two disks together according to the orange arrows, and then inflating. Any vector field on the surface of the Earth gives rise to a pair of vector fields on disks like this. But there will be a constraint. The vectors around the boundary of the two disks will match. In fact, vectors at the opposite ends of the orange arrow have to match. But they won't necessarily be equal as drawn in this diagram because the disk for the southern hemisphere corresponds to a view from below.<br /><br />Suppose we start at the point P and follow a loop eastwards along the equator. That's an anti-clockwise loop round the upper disk and simultaneously a clockwise loop round the lower disk. Here are the graphs:<br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-EpYQy9WyaUI/UKlmtFQd-nI/AAAAAAAABqY/y1gssmh5DSM/s1600/equator2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="302" src="http://1.bp.blogspot.com/-EpYQy9WyaUI/UKlmtFQd-nI/AAAAAAAABqY/y1gssmh5DSM/s320/equator2.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><br /></div>In the upper map the loop gives rise to winding number one. But in the lower map we get winding number minus one. So here's an important lesson: the winding number makes perfect sense for a flat domain in the plane. But on the surface of 3D objects it depends on how you flatten out your map. In this case, the winding number on the upper map is 2 more than the winding number for the lower map. (Remember, these fields aren't defined at the poles so we haven't contradicted the original theorem that the winding number is zero for any vector field defined in a simply-connected domain.)<br /><br />But here's the most important thing in this proof: the winding number for the upper hemisphere loop will be two more that the winding number for the lower hemisphere loop, no matter what vector field you have. This is because if you've travelled an angle θ around the equator, the vectors at opposite ends of the orange arrows will differ by an angle of 2θ. For example, once you're 90º around the earth, the north arrow is draw as a down-arrow in the upper graph and as an up-arrow in the lower graph. They're already 180º apart. You can see this is true for north pointing vectors literally by tracing with your fingers around the loops. It's also true for vectors pointing east. I'll leave that as an exercise for you, but here's a picture of some eastward vectors to get you started:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-iJfy5KqLQQI/UKlgWt-tajI/AAAAAAAABpw/HaupyQIHOvE/s1600/east.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://3.bp.blogspot.com/-iJfy5KqLQQI/UKlgWt-tajI/AAAAAAAABpw/HaupyQIHOvE/s320/east.png" width="199" /></a></div>Along the equator, every vector on the surface is a linear combination of north and east vectors. So if it's true for both the north and east vectors then it must be true for all vectors. But if the graph for one picture of the equatorial loop has vectors that are 2θ more than the vectors for another graph, the first one must complete two revolutions more than the second one. So the first has a winding number two more than the second.<br /><br />If you had a continuous vector field that really was non-zero over the entire sphere, cutting the sphere in half would give a pair of continuous vector fields defined on disks. As disks are simply-connected, the theorem we started with tells us they must both have winding number zero as you loop around them. But we've also just shown that looping round one has winding number two more than looping around the other. This is a contradiction. So there is no continuous vector field that is non-zero everywhere. ∎<br /><br />If you get stuck above I strongly recommend trying to draw some continuous non-zero vector fields on the sphere, transferring them to disks, and counting winding numbers.<br /><br />Notice how we've done more than prove the theorem. We now know that if we have a continuous vector field on a sphere we can find out whether to look for its zeros in the northern or southern hemisphere by computing the winding numbers as above. At least one of the two winding numbers must be non-zero and that tells us which hemisphere we can be sure contains a zero. The fact that the two winding numbers differ by two, and not by just one, also tells us a bit about the nature of the zeros. But that's another story. That two is also related to the fact that the <a href="http://en.wikipedia.org/wiki/Euler_characteristic">Euler characteristic</a> of the sphere is two. It's also related to the Lefschetz index<br /><br />This proof is based on proofs I studied years ago relating to <a href="http://en.wikipedia.org/wiki/Chern_class">Chern classes</a>. I recently became interested in Chern classes again because they play an important role in understanding phenomena in solid state physics such as the <a href="http://en.wikipedia.org/wiki/Quantum_Hall_effect">quantum Hall effect</a>. That argument about slowly shrinking a loop leaving its winding number unchanged tells you a lot about <a href="http://en.wikipedia.org/wiki/Adiabatic_theorem">slowly changing</a> certain types of quantum system.<br /><br />It's possible I completely messed up. <a href="http://topologicalmusings.wordpress.com/2008/07/22/analyzing-the-hairy-ball-theorem/">Here</a>'s an "elementary" proof. It looks much harder than what I did. But I feel like I did faithfully capture, in pictures, an argument that's buried in <a href="http://www.amazon.com/Lectures-Riemann-surfaces-Princeton-mathematical/dp/B0007FHJES">Lectures on Riemann surfaces</a>. And it seems to correctly reproduce the <a href="http://en.wikipedia.org/wiki/Hairy_ball_theorem#Lefschetz_connection">Lefschetz number</a> of 2.Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com5tag:blogger.com,1999:blog-11295132.post-63100529475328406082012-04-07T16:22:00.000-07:002012-04-07T16:22:02.377-07:00Generalised entropy<b>Introduction</b><br />The <a href="http://en.wikipedia.org/wiki/Entropy">entropy</a> of a probability distribution can be seen as a measure of its uncertainty or a measure of the diversity of samples taken from it. Over the years I've talked lots about how <a href="http://blog.sigfpe.com/2007/03/monads-vector-spaces-and-quantum.html">probability theory</a> gives rise to a monad. This suggests the possibility that maybe the notion of entropy can be generalised to monads other than probability. So here goes...<br /><br /><pre>> {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, GeneralizedNewtypeDeriving #-}<br />> {-# LANGUAGE FunctionalDependencies, TypeSynonymInstances #-}<br /><br />> import Control.Monad<br />> import Control.Monad.Writer hiding (lift)<br /><br /><br /></pre><b>Shannon entropy</b><br />I've talked in the past about how there is some trickiness with defining the probability monad in Haskell because a good implementation requires use of the <tt>Eq</tt> typeclass, and hence restricted monads. Restricted monads are possible through a bunch of methods, but this time I don't want them.<br /><br />It's common to represent probability distributions on finite sets as lists of pairs where each pair <tt>(p, x)</tt> means <tt>x</tt> has a probability <tt>p</tt>. But I'm going to allow lists without the restriction that each <tt>x</tt> appears once and make my code work with these generalised distributions. When I compute the entropy, say, it will only be the usual entropy in the case that each <tt>x</tt> in the list is unique.<br /><br />So here's our type and some instances for it:<br /><br /><pre>> data P a = P [(a, Float)] deriving Show<br /><br />> instance Functor P where<br />> fmap f (P xs) = P [(f a, p) | (a, p) <- xs]<br /><br />> instance Monad P where<br />> return x = P [(x, 1)]<br />> P xss >>= f = P [(y, p*q) | (pxs, p) <- xss, let P ys = f pxs, (y, q) <- ys]<br /><br /></pre>We can easily compute the expected value of a distribution, and its entropy, like this:<br /><br /><pre>> expectation0 (P xs) = sum [x*p | (x, p) <- xs]<br />> entropy0 (P xs) = -sum [if p==0 then 0 else p*log p/log 2.0 | (_, p) <- xs]<br /><br /></pre>An important property of entropy is known as the grouping property which can be illustrated through an example tree like this:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-67NuhgTvHY4/T4BeuNAHpsI/AAAAAAAABBA/D4tg6ATA9CQ/s1600/tree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="http://4.bp.blogspot.com/-67NuhgTvHY4/T4BeuNAHpsI/AAAAAAAABBA/D4tg6ATA9CQ/s320/tree.png" width="480" /></a></div><br />The entropy for the probability distribution of the final leaves is the sum of two components: (1) the entropy of the branch at the root of the tree and (2) the expected entropy of the subtrees. Here's some corresponding code. First simple bernoulli trials:<br /><br /><pre>> bernoulli p a b = P [(a, p), (b, 1-p)]<br /><br /></pre>Now the branch at the root of the tree:<br /><br /><pre>> root = bernoulli 0.3 False True<br /><br /></pre>We can compute the entropy for the distrbution on the leaves:<br /><br /><pre>> test1 = entropy0 $ do<br />> x <- root<br />> if x<br />> then bernoulli 0.2 3 4<br />> else bernoulli 0.4 5 6<br /><br /></pre>Or the sum of the root entropy and the expected subtree entropy:<br /><br /><pre>> test2 = entropy0 root + (expectation0 $ do<br />> x <- root<br />> if x<br />> then return $ entropy0 (bernoulli 0.2 3 4)<br />> else return $ entropy0 (bernoulli 0.4 5 6))<br /><br /></pre>You can confirm for yourself that <tt>test1 == test2</tt>.<br /><br />We can rewrite that a little. We're drawing <tt>True</tt> or <tt>False</tt> from <tt>root</tt> only to decide which distribution to use at the next stage. But we may as will pick the distribution itself at random. So define:<br /><br /><pre>> dist = bernoulli 0.3 (bernoulli 0.4 5 6) (bernoulli 0.2 3 4)<br /><br /></pre>And now we expect the equality of <tt>test3</tt> and <tt>test4</tt>:<br /><br /><pre>> test3 = entropy0 $ do<br />> x <- dist<br />> x<br /><br />> test4 = entropy0 dist + (expectation0 $ do<br />> x <- dist<br />> return $ entropy0 x)<br /><br /></pre>There's a more elegant way of writing this. Define:<br /><br /><pre>> left0 dist = entropy0 (join dist)<br />> right0 dist = entropy0 dist+expectation0 (fmap entropy0 dist)<br /><br /></pre>Now we expect <tt>left0 dist</tt> and <tt>right0 dist</tt> to always be equal. We've almost generalised to something that makes sense in the context of monads other than probability.<br /><br /><b>The algebra of a monad</b><br />Here are a couple of important properties of <tt>expectation0</tt>:<br /><br />1. <tt>expectation0 (return d) = d</tt><br />2. <tt>expectation0 (join d) = expectation0 (fmap expectation d)</tt><br /><br />In English: the expectation of certainty is just the certain value, and the expectation of an expectation is just the expectation. But these rules are precisely the conditions that define an <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-<a href="http://en.wikipedia.org/wiki/Monad_%28category_theory%29#Algebras_for_a_monad">algebra</a>, where <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle"> is a monad.<br /><br />So let's define a type class:<br /><br /><pre>> class Algebra m a | m -> a where<br />> expectation :: m a -> a<br /><br /></pre>We'll assume that when <tt>m</tt> is a monad, any instance satisfies the two laws above. Here's the instance for probability:<br /><br /><pre>> instance Algebra P Float where<br />> expectation (P xs) = sum [x*p | (x, p) <- xs]<br /><br /></pre>In keeping with the notion that entropy measure diversity let's also define:<br /><br /><pre>> class Diverse m r | m -> r where<br />> entropy :: m x -> r<br /><br /></pre>with the instance:<br /><br /><pre>> instance Diverse P Float where<br />> entropy (P xs) = -sum [if p==0 then 0 else p*log p/log 2.0 | (_, p) <- xs]<br /><br /></pre>It's not clear what laws we need but for now we'll assume a generalised entropy satisfies <tt>left dist == right dist </tt>:<br /><br /><pre>> left dist = entropy (join dist)<br />> right dist = entropy dist+expectation (fmap entropy dist)<br /><br /></pre>We'll call that the generalised grouping law.<br /><br /><b>Binary trees</b><br />It's not hard to find other structures that satisfy these laws if we cheat and use alternative structures to represent probabilities. For example We can make <tt>Tree</tt> an instance by assuming <tt>Fork</tt> represents a 50/50 chance of going one way or another:<br /><br /><pre>> data Tree a = Leaf a | Fork (Tree a) (Tree a) deriving Show<br /><br />> instance Functor Tree where<br />> fmap f (Leaf a) = Leaf (f a)<br />> fmap f (Fork l r) = Fork (fmap f l) (fmap f r)<br /><br />> instance Monad Tree where<br />> return x = Leaf x<br />> Leaf a >>= f = f a<br />> Fork l r >>= f = Fork (l >>= f) (r >>= f)<br /><br />> instance Algebra Tree Float where<br />> expectation (Leaf a) = a<br />> expectation (Fork l r) = 0.5*expectation l+0.5*expectation r<br /><br />> instance Diverse Tree Float where<br />> entropy (Leaf a) = 0<br />> entropy (Fork l r) = 1+0.5*entropy l+0.5*entropy r<br /><br /></pre><b>Lists</b><br />We could make non-empty lists into an instance by assuming a uniform distribution on the list. But another way to measure the diversity is simply to count the elements. We subtract one so that <tt>[x]</tt> corresponds to diversity zero. This subtraction gives us a non-trivial instance:<br /><br /><pre>> newtype L a = L [a] deriving (Show, Monad, Functor)<br /><br />> instance Algebra L Int where<br />> expectation (L xs) = sum xs<br /><br />> instance Diverse L Int where<br />> entropy (L xs) = length xs-1<br /><br /></pre><b>Tsallis entropy</b><br />There are measures of diversity for probability distributions that are distinct from Shannon entropy. An example is <a href="Tsallis_entropy">Tsallis entropy</a>. At this point I'd like a family of types parametrised by reals but Haskell doesn't support dependent types. So I'll just fix a real number <tt>q</tt> and we can define:<br /><br /><pre>> q = 2.5<br /><br />> data T a = T [(a, Float)] deriving Show<br /><br />> instance Functor T where<br />> fmap f (T xs) = T [(f a, p) | (a, p) <- xs]<br /><br />> instance Monad T where<br />> return x = T [(x, 1)]<br />> T xss >>= f = T [(y, p*q) | (pxs, p) <- xss, let T ys = f pxs, (y, q) <- ys]<br /><br />> instance Algebra T Float where<br />> expectation (T xs) = sum [x*p**q | (x, p) <- xs]<br /><br />> instance Diverse T Float where<br />> entropy (T xs) = (1-sum [p**q | (_, p) <- xs])/(q-1)<br /><br /></pre>And again we find our generalised grouping rule for entropy holds.<br /><br /><b>Operads</b><br />This is all derived from Tom Leinster's <a href="http://golem.ph.utexas.edu/category/2011/05/an_operadic_introduction_to_en.html">post</a> last year at the <a href="http://golem.ph.utexas.edu/category/">n-category cafe</a>. As I talked about <a href="http://blog.sigfpe.com/2008/10/operads-and-their-monads.html">here</a> there's a close relationship between monads and operads. Operads area a bit like container monads where the containers don't contain anything, but just have holes where contents could be placed. This makes operads a better place to work because you don't have the awkward issue I started with: having to disallow lists of value/probability pairs where the same value can appear more than once. Nonetheless, in (unrestricted) Haskell monads you don't have <tt>Eq</tt> available so you can't actually have definitions of <tt>return</tt> or <tt>>>=</tt> that can notice the equality of two elements. If such definitions were possible, the grouping law would no longer work as stated above.<br /><br /><b>Crossed homomorphisms</b><br />The generalised grouping law even makes sense for very different monads. For the Reader monad the law gives the definition of a <a href="http://en.wikipedia.org/wiki/Group_cohomology#H1">crossed homomorphism</a>. It's pretty weird seeing a notion from group cohomology emerge like this and I recommend skipping to the final section unless you care about this sort of thing. But if you do, this is related to research I did a long time ago. This is to test that the <a href="http://en.wikipedia.org/wiki/Schwarzian_derivative">Schwarzian derivative</a> really does give rise to a crossed homomorphism.<br /><br />Firstly let me set up some automatic differentiation code:<br /><br /><pre>> data D a = D { re::a, im::a } deriving (Show, Ord, Eq)<br /><br />> instance Num a => Num (D a) where<br />> fromInteger n = D (fromInteger n) 0<br />> D a a'+D b b' = D (a+b) (a'+b')<br />> D a a'*D b b' = D (a*b) (a*b'+a'*b)<br />> D a a'-D b b' = D (a-b) (a'-b')<br /><br />> instance Fractional a => Fractional (D a) where<br />> fromRational n = D (fromRational n) 0<br />> D a a'/D b b' = let q = 1/b in D (a*q) ((-a*b'+a'*b)*q*q)<br /><br />> lift x = D x 0<br /><br />> d f x = im (f (D x 1))<br /><br />> raised f = re . f . lift<br />> raised2 = raised . raised<br />> raised3 = raised2 . raised<br /><br /></pre>The <tt>Cn</tt> are the <tt>n</tt>-times (automatically) differentiable functions. Unfortunately the <tt>Endo</tt> defined in <tt>Data.Monoid</tt> acts the wrong way round from what I want so I need a <tt>Dual</tt>:<br /><br /><pre>> type C1 = Dual (Endo (D Double))<br />> type C3 = Dual (Endo (D (D (D Double))))<br />> type C4 = Dual (Endo (D (D (D (D Double)))))<br /><br />> instance Eq (Endo (D Double))<br />> instance Ord (Endo (D Double))<br /><br /></pre>A silly <tt>Show</tt> instance that simply evaluates a function at a number I chose randomly: 1.234.<br /><br /><pre>> instance Show (Endo (D Double)) where<br />> show (Endo f) = show (f 1.234)<br /><br />> instance Num C1 where<br />> fromInteger n = Dual (Endo (\x -> fromInteger n))<br />> Dual (Endo f)+Dual (Endo g) = Dual (Endo (\x -> f x + g x))<br />> Dual (Endo f)-Dual (Endo g) = Dual (Endo (\x -> f x - g x))<br />> Dual (Endo f)*Dual (Endo g) = Dual (Endo (\x -> f x * g x))<br /><br />> instance Fractional C1 where<br />> fromRational n = Dual (Endo (\x -> fromRational n))<br />> Dual (Endo f)/Dual (Endo g) = Dual (Endo (\x -> f x / g x))<br /><br />> newtype Q a = Q (Writer C4 a) deriving (Monad, Functor)<br /><br /></pre>We can give <tt>Q a</tt> a geometrical interpretation. The underlying type is a pair <tt>(a, C4)</tt>. If we think of elements of <tt>C4</tt> as charts <a href="http://en.wikipedia.org/wiki/Manifold#Charts">charts</a> on a piece of Riemann surface then for any <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle">, an element <img src="https://chart.googleapis.com/chart?cht=tx&chl=%28%5cpsi%2c+z%29" style="vertical-align:middle"> of <tt>(a, C4)</tt> represents a local piece of a section of the <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle">th tensor power of the canonical bundle. Ie. we can think of it as representing <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cpsi+dz%5en" style="vertical-align:middle">. I'll concentrate on the case <img src="https://chart.googleapis.com/chart?cht=tx&chl=n%3d2" style="vertical-align:middle"> which gives quadratic differentials. We can think of an element of <tt>((a, C4), C4)</tt> as forms where we're composing two charts. We can collapse down to an ordinary chart by using the chain rule. Here's the code:<br /><br /><pre>> instance Algebra Q C1 where<br />> expectation (Q ma) = let (Dual (Endo a), Dual (Endo f)) = runWriter ma<br />> in Dual (Endo (\x -> a (raised3 f x)*(raised2 (d f) x)^2))<br /><br /></pre>Now we can define the Schwarzian derivative:<br /><br /><pre>> schwarzian f x = let f0 = raised3 f x<br />> f1 = raised2 (d f) x<br />> f2 = raised (d $ d f) x<br />> f3 = (d $ d $ d f) x<br />> in f3/f1-1.5*(f2/f1)^2<br /><br /></pre>And somwehat bizarrely, we now have a generalised entropy:<br /><br /><pre>> instance Diverse Q C1 where<br />> entropy (Q ma) = let (_, Dual (Endo f)) = runWriter ma<br />> in Dual (Endo (\x -> schwarzian f x))<br /><br /></pre>This is the construction that gives rise to the <a href="http://en.wikipedia.org/wiki/Virasoro_algebra">Virasoro algebra</a> which plays such an important role in String Theory.<br /><br /><b>Some tests</b><br />And here's a bunch of tests. I'd have used <tt>QuickCheck</tt> but it won't install for me today...<br /><br /><pre>> test :: (Algebra m t, Diverse m t, Num t, Functor m, Monad m) => m (m x) -> IO ()<br />> test x = do<br />> print (left x, right x)<br /><br />> main = do<br />> test $ L [L [1, 2, 3], L [2, 3, 4], L [1], L [5], L [2, 7::Int]]<br />> test $ P [(P [(0, 0.5), (1, 0.5)], 0.5), (P [(2, 0.5), (3::Int, 0.5)], 0.5::Float)]<br />> test $ T [(T [(0, 0.5), (1, 0.5)], 0.5), (T [(2, 0.5), (3::Int, 0.5)], 0.5::Float)]<br />> test $ Leaf (Leaf 1 `Fork` Leaf 2) `Fork` Leaf (Leaf 3 `Fork` (Leaf 4 `Fork` Leaf 5))<br />> test $ (Q (writer<br />> (Q (writer (Dual (Endo (\x -> x)),<br />> Dual (Endo (\x -> x^2+1)))),<br />> Dual (Endo (\x -> (2+x)/(3+x*x))))) :: Q (Q C3))<br /></pre>Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com1tag:blogger.com,1999:blog-11295132.post-76703357330995948642012-03-17T13:30:00.001-07:002012-03-17T13:45:04.487-07:00Overloading Python list comprehension<b>Introduction</b><br />Python is very flexible in the way it allows you to overload various features of its syntax. For example most of the binary operators can be overloaded. But one part of the syntax that can't be overloaded is list comprehension ie. expressions like <tt>[f(x) for x in y]</tt>.<br /><br />What might it mean to overload this notation? Let's consider something simpler first, overloading the binary operator <tt>+</tt>. The expression <tt>a+b</tt> is interpreted as <tt>a.__add__(b)</tt> if <tt>a</tt> is of class type. So overloading <tt>+</tt> means nothing more than writing a function. So if we can rewrite list comprehensions in terms of a function (or functions) then we can overload the notation by providing alternative definitions for those functions. Python doesn't provide a facility for doing this directly, but we can at least think about what it might mean to do this. Later we'll see how to tweak the Python interpreter to make it possible.<br /><br /><b>map</b><br />Consider the expression<br /><pre>[a for x in y]<br /></pre>Here the single letter variables are 'metavariables' representing fragments of Python code. To a good approximation this is equal to:<br /><pre>map(lambda x: a, y)<br /></pre>(BTW Everything I say here is "to a good approximation". Python is an incredibly complex language and I'm not good enough at it to make any categorical statements about when one fragment of code is the same as another.)<br /><br />So it's tempting to see list comprehensions as syntactic sugar for <tt>map</tt>, in which case one approach to overloading comprehension is to consider interpreting it in terms of replacements for <tt>map</tt>. But this isn't a very powerful overloading. It just gives us a slightly different way to write something that's already straightforward.<br /><br /><b>concatMap</b><br />Another reason for not simply seeing list comprehension in terms of <tt>map</tt> is that nested list comprehensions need another operation. Consider<br /><pre>[(y, z) for y in [1, 2] for z in ['a', 'b']]<br /></pre>This isn't quite the same as<br /><pre>[[(y, z) for z in ['a', 'b']] for y in [1, 2]]<br /></pre>but it's close. The latter produces nested lists whereas the first gives one flat list. We can think of nested comprehensions as applying a flattening operation. Let's use list comprehension to implement flattening:<br /><pre>def concat(xs):<br /> return [y for x in xs for y in x]<br /></pre>We now write our nested comprehension as:<br /><pre>concat([[(y, z) for z in ['a', 'b']] for y in [1, 2]])<br /></pre>We know how to write non-nested comprehensions using <tt>map</tt> so we get:<br /><pre>concat(map(lambda y: [(y, z) for z in ['a', 'b']], [1, 2]))<br /></pre>And rewriting the inner comprehension we get:<br /><pre>concat(map(lambda y: map(lambda z: (y, z), ['a', 'b']), [1, 2]))<br /></pre>Every time we add another level of nesting we're going to need another <tt>concat</tt>. But the innermost <tt>map</tt> doesn't have a <tt>concat</tt>. Purely for reasons of symmetry we can ensure every <tt>map</tt> has a <tt>concat</tt> by enclosing the innermost element as a singleton list:<br /><pre>concat(map(lambda y: concat(map(lambda z: [(y, z)], ['a', 'b'])), [1, 2]))<br /></pre>Every <tt>map</tt> has a <tt>concat</tt> so we can simplify slightly. Let's define:<br /><pre>def concatMap(f, xs):<br /> return [f(y) for x in xs for y in x]<br /><br />def singleton(x):<br /> return [x]<br /></pre>Our expression becomes:<br /><pre>concatMap(lambda y: concatMap(lambda z: singleton((y, z)), ['a', 'b']), [1, 2])<br /></pre>Importantly we've completely rewritten the comprehension in terms of <tt>concatMap</tt> and <tt>singleton</tt>. By changing the meaning of these functions we can change the meaning of comprehension notation, or at least we could if the Python interpreter defined comprehension this way. It doesn't, but we can still reason about it. Although any comprehension that doesn't use <tt>if</tt>s can be rewritten to use these functions, I won't give a formal description of the procedure. Instead I'll provide code to perform the rewrite later. While I'm at it, I'll also handle the <tt>if</tt>s.<br /><br /><b>Laws</b><br />Freely redefining <tt>singleton</tt> and <tt>concatMap</tt> to redefine comprehension could get weird. If we're going to redefine them we should at least try to define them so that list comprehension still has some familiar properties. For example, for <tt>y</tt> a list we usually expect:<br /><pre>y == [x for x in y]<br /></pre>In other words<br /><pre>y == concatMap(lambda x: singleton(x), y)<br /></pre>At this point I could give a whole bunch more laws but it's time to own up.<br /><br /><b>Monads</b><br />A pair of functions <tt>singleton</tt> and <tt>concatMap</tt>, along with a bunch of laws, are essentially the same thing as a <a href="http://en.wikipedia.org/wiki/Monad_%28functional_programming%29">monad</a>. In Haskell, <tt>concatMap</tt> is usually called <tt>bind</tt> and <tt>singleton</tt> is called <tt>return</tt>. What I've done here is show how Wadler's <a href="http://homepages.inf.ed.ac.uk/wadler/topics/monads.html">Comprehending Monads</a> paper might look like in Python. Haskell has specialised monad notation built into its grammar. But what's less well known is that so does Python! The catch is that although the grammar is right, the semantics can't be generalised beyond lists.<br /><br /><b>Monad-Python</b><br />One great thing about Python is that there seem to be libraries for working with every aspect of Python internals. So it's fairly easy to write a simple Python interpreter that rewrites list comprehensions to use <tt>singleton</tt> and <tt>concatMap</tt>. I've placed the source on <a href="https://github.com/dpiponi/Monad-Python">github</a>. Use <tt>mpython.py</tt> instead of <tt>python</tt> as your interpreter. I've tested it with Python 2.6 and 2.7.<br /><br />When using <tt>mpython</tt>, list comprehension uses whatever definitions of <tt>__mapConcat__</tt> and <tt>__singleton__</tt> are currently in scope. By default they are the definitions I gave above so we get something close to the usual list comprehension.<br /><br />An example of the kind of code you can run with <tt>mpython.py</tt> is:<br /><pre>import math<br /><br />def __concatMap__(k, m):<br /> return lambda c:m(lambda a:k(a)(c))<br /><br />def __singleton__(x):<br /> return lambda f:f(x)<br /><br />def callCC(f):<br /> return lambda c:f(lambda a:lambda _:c(a))(c)<br /><br />def __fail__():<br /> raise "Failure is not an option for continuations"<br /><br />def ret(x):<br /> return __singleton__(x)<br /><br />def id(x):<br /> return x<br /><br />def solve(a, b, c):<br /> return callCC(lambda throw: [((-b-d)/(2*a), (-b+d)/(2*a))<br /> for a0 in (throw("Not quadratic") if a==0 else ret(a))<br /> for d2 in ret(b*b-4*a*c)<br /> for d in (ret(math.sqrt(d2)) if d2>=0 else throw("No roots"))<br /> ])<br /><br />print solve(1, 0, -9)(id)<br />print solve(1, 1, 9)(id)<br />print solve(0, 1, 9)(id)<br /></pre>I have defined our functions so that comprehension syntax gives us the continuation monad. This makes continuation passing style relatively painless in Python. (At least easier than chaining many <tt>lambda</tt>s.) I have then defined <a href="http://en.wikibooks.org/wiki/Haskell/Continuation_passing_style#callCC">callCC</a> to be similar to its definition in Haskell. There are many uses for <tt>callCC</tt> including the implementation of <tt>goto</tt>. Above I use it in a trivial way to throw exceptions.<br /><br /><b>Conclusion</b><br />My script <tt>mpython.py</tt> is a long way from an industrial strength interpreter and I'm not proposing the above as an extension to Python. My goal was simply to show how Haskell-style monads are not as alien to Python as you might think. In fact, it's reasonable to say that Python already supports one flavour of specialised monad syntax. Most users don't realise it as such because it has been hard-wired to work with just one monad, lists.<br /><br />BTW if you attempt to implement all of the other Haskell monads you'll find that Haskell behaves a little differently because of its laziness. You can recover some of that laziness by careful use of continuations in Python. But I've no time to go into that now.Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com8tag:blogger.com,1999:blog-11295132.post-54985672461748303472012-02-11T07:47:00.000-08:002012-02-11T07:47:08.597-08:00Using Lawvere theories to combine effects<pre>> {-# LANGUAGE MultiParamTypeClasses, ExplicitForAll, RankNTypes, FlexibleInstances, FlexibleContexts, TypeSynonymInstances #-}<br /><br />> import Data.Monoid<br />> import Data.Functor.Identity<br />> import Control.Monad.Writer<br /><br /></pre>In an <a href="http://blog.sigfpe.com/2009/12/where-do-monads-come-from.html">earlier post</a> I talked about how monads arise from free algebras. Let me recap a bit.<br /><br />In <a href="http://blog.sigfpe.com/2012/02/lawvere-theories-made-bit-easier.html">Part 1</a> I described algebras. They're sets with operations on them satisfying some laws. We can build new elements of an algebra from old ones by using its operations. Eg. if <tt>x</tt> and <tt>y</tt> are in an algebra then <tt>x `mappend` y</tt> must be in it too. Starting with a bunch of symbols, thought of as leaves, we can consider the set of all expressions trees we can build from them. If we consider pairs of trees to be equivalent if the laws say the corresponding expressions are equal, then the set of trees itself forms an algebra known as a free algebra (for the given theory).<br /><br />Let's start with some code. This type class says that the type <tt>b</tt> has leaves of type <tt>a</tt>:<br /><br /><pre>> class Free a b where<br />> leaf :: a -> b<br /><br /></pre><b>Effects from monoids</b><br />Now we can make the type of all trees built from <tt>Monoid</tt> operations and including all leaves of type <tt>a</tt>:<br /><br /><pre>> data FreeMonoid a = FreeMonoid (forall b. (Monoid b, Free a b) => b)<br /><br /></pre>And we have:<br /><br /><pre>> instance Monoid (FreeMonoid a) where<br />> mempty = FreeMonoid mempty<br />> FreeMonoid a `mappend` FreeMonoid b = FreeMonoid (a `mappend` b)<br /><br /></pre>Unfortunately elements like <tt>e1</tt> and <tt>e2</tt> two ought to be equal but Haskell doesn't know this:<br /><br /><pre>> e1, e2 :: FreeMonoid Char<br />> e1 = FreeMonoid (leaf 'a' `mappend` (leaf 'b' `mappend` leaf 'c'))<br />> e2 = FreeMonoid ((leaf 'a' `mappend` leaf 'b') `mappend` leaf 'c')<br /><br /></pre>Instead we can manually construct a type that does respect equality in monoids. Elements of <tt>FreeMonoid</tt> are binary trees with a `mappend` at each node. Associativity means that we can always replace a tree with an equivalent one where the left branch is a leaf. We can also use the laws to eliminate any occurrence of <tt>mempty</tt>. So every element of <tt>FreeMonoid a</tt> is equivalent to one of the form:<br /><pre>Leaf x1 `mappend` (Leaf x2 `mappend` (... mempty))<br /></pre><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-BagBCT39yFU/TzaGZRkhu3I/AAAAAAAAA7c/Aaq9o_2PduU/s1600/list1.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="180" width="400" src="http://3.bp.blogspot.com/-BagBCT39yFU/TzaGZRkhu3I/AAAAAAAAA7c/Aaq9o_2PduU/s400/list1.png" /></a></div><br />In other words, free monoids are lists. We can make this explicit. The standard prelude already makes <tt>[]</tt> an instance of <tt>Monoid</tt> so we just need:<br /><br /><pre>> instance Free a [a] where<br />> leaf x = [x]<br /><br /></pre>Here's the isomorphism (modulo tree equivalence):<br /><br /><pre>> iso1 :: FreeMonoid a -> [a]<br />> iso1 (FreeMonoid x) = x<br /><br />> iso1' :: [a] -> FreeMonoid a<br />> iso1' [] = FreeMonoid mempty<br />> iso1' (a : as) = let FreeMonoid r = iso1' as<br />> in FreeMonoid (leaf a `mappend` r)<br /><br /></pre>As I talked about in that <a href="http://blog.sigfpe.com/2009/12/where-do-monads-come-from.html">earlier article</a>, free algebras give monads and the trees representing expressions in the algebra can be thought of as abstract syntax trees for domain specific languages. In this case it's the usual list monad. So the <tt>Monoid</tt> type class gives us a language for talking about non-determinism. The operation <tt>mappend</tt> gives us a way to "fork" a process and <tt>mempty</tt> gives as a way to "kill a thread". Here's an example using non-determinism to search for some Pythagorean triples:<br /><br /><pre>> test1 :: [(Int, Int, Int)]<br />> test1 = do<br />> a <- return 3 `mappend` return 4<br />> b <- return 4 `mappend` return 5<br />> c <- return 5 `mappend` return 6<br />> if a*a+b*b==c*c then return (a, b, c) else mempty<br /><br /></pre><b>Effects form M-sets</b><br />We can do exactly the same for <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-sets.<br /><br /><pre>> class Monoid m => MSet m s where<br />> act :: m -> s -> s<br /><br />> data FreeMSet w a = FreeMSet (forall b. (MSet w b, Free a b) => b)<br /><br />> instance Monoid w => MSet w (FreeMSet w a) where<br />> m `act` FreeMSet b = FreeMSet (m `act` b)<br /><br /></pre>Again we have the problem that <tt>FreeMSet</tt> doesn't automatically make equivalent elements equal. But it's not hard to see that every element of <tt>FreeMSet</tt> is equivalent to one of the form:<br /><pre>m `act` (leaf x)<br /></pre>So the free <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-set on the set of variables <img src="https://chart.googleapis.com/chart?cht=tx&chl=S" style="vertical-align:middle"> is simply the set of pairs <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5ctimes+S" style="vertical-align:middle">. This is the basis of Haskell's writer monad:<br /><br /><pre>> instance Monoid w => MSet w (Writer w a) where<br />> act w1 m = let (a, w2) = runWriter m in WriterT (Identity (a, w1 `mappend` w2))<br /><br />> instance Monoid w => Free a (Writer w a) where<br />> leaf x = return x<br /><br /></pre>Here's the isomorphism (again treating equivalent elements of <tt>FreeMSet</tt> as equal):<br /><br /><pre>> iso2 :: Monoid w => FreeMSet w a -> Writer w a<br />> iso2 (FreeMSet x) = x<br /><br />> iso2' :: Writer w a -> FreeMSet w a<br />> iso2' m = let (a, w) = runWriter m in FreeMSet (act w (leaf a))<br /><br /></pre>And now the <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-set operation gives us an interface to an effect. This time the side effect of accumulating in a monoid:<br /><br /><pre>> test2 :: Writer String Int<br />> test2 = do<br />> act "foo" (return ())<br />> a <- return 2<br />> act "bar" (return ())<br />> b <- return (10*a)<br />> return b<br /><br /></pre><b>Combining effects</b><br /><br />And now we can finally combine the two effects of non-determinism and accumulation. We make the free algebra that is both a monoid and an <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-set:<br /><br /><pre>> data FreeMMonoid w a = FreeMMonoid (forall b. (Monoid b, MSet w b, Free a b) => b)<br /><br />> instance Monoid w => Monoid (FreeMMonoid w a) where<br />> mempty = FreeMMonoid mempty<br />> FreeMMonoid a `mappend` FreeMMonoid b = FreeMMonoid (a `mappend` b)<br /><br />> instance Monoid w => MSet w (FreeMMonoid w a) where<br />> m `act` FreeMMonoid b = FreeMMonoid (m `act` b)<br /><br /></pre>Again we have the problem that equivalent elements aren't recognised as equal so we have to manually find a suitable type. For this we need to use the compatibility notion I introduced in Part 1. We can take 2 variables <img src="https://chart.googleapis.com/chart?cht=tx&chl=x" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=y" style="vertical-align:middle"> and write them in a 1 by 2 array:<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%0a%5cbegin%7barray%7d%7bcc%7d%0ax+%26+y%0a%5cend%7barray%7d%0a" style="vertical-align:middle"><br /></blockquote>Apply <tt>mappend</tt> horizontally and <tt>act</tt> vertically to get:<br /><pre>m `act` (x `mappend` y)<br /></pre>Now apply <tt>act</tt> vertically and then <tt>mappend</tt> horizontally to get:<br /><pre>(m `act` x) `mappend` (m `act` y)<br /></pre>The law we want is:<br /><pre>m `act` (x `mappend` y) == (m `act` x) `mappend` (m `act` y)<br /></pre>Given an arbitrary tree in <tt>FreeMMonoid</tt> we can use this law to "push" all occurrences of <tt>act</tt> inwards. Ultimately every element can be written uniquely in the form:<br /><pre>act m1 (leaf x1) `mappend` (act m2 (leaf x2) `mappend` (... mempty)<br /></pre><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-NFKm6QlR144/TzaH0NtHeeI/AAAAAAAAA7s/B0gRXbgBk6E/s1600/list2.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="207" width="400" src="http://4.bp.blogspot.com/-NFKm6QlR144/TzaH0NtHeeI/AAAAAAAAA7s/B0gRXbgBk6E/s400/list2.png" /></a></div><br />We can then use the same argument as above to show that we end up with a list of pairs of elements of <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5ctimes+S" style="vertical-align:middle">. This is exactly what we get if we apply the <tt>WriterT</tt> monad transformer to <tt>[]</tt>. Here are the relevant instances:<br /><br /><pre>> instance Monoid w => Monoid (WriterT w [] a) where<br />> mempty = WriterT []<br />> WriterT xs `mappend` WriterT ys = WriterT (xs ++ ys)<br /><br />> instance Monoid w => MSet w (WriterT w [] a) where<br />> m `act` WriterT xs = WriterT $ map (\(x, w) -> (x, m `mappend` w)) xs<br /><br />> instance Monoid w => Free a (WriterT w [] a) where<br />> leaf x = return x<br /><br /></pre>Here's the isomorphism though we won't use it:<br /><br /><pre>> iso3 :: Monoid w => FreeMMonoid w a -> WriterT w [] a<br />> iso3 (FreeMMonoid x) = x<br /><br />> iso3' :: Monoid w => WriterT w [] a -> FreeMMonoid w a<br />> iso3' m = let xws = runWriterT m in FreeMMonoid $<br />> foldr mappend mempty $ map (\(x, w) -> act w (leaf x)) xws<br /><br /></pre>The monad <tt>WriterT (Product Float) []</tt> is in fact the <a href="http://blog.sigfpe.com/2007/02/monads-for-vector-spaces-probability.html">probability monad</a>. Here's an example of its use:<br /><br /><pre>> coin :: (Monoid a, MSet (Product Float) a, Free Bool a) => a<br />> coin = act (Product 0.5 :: Product Float) (leaf False)<br />> `mappend`<br />> act (Product 0.5 :: Product Float) (leaf True)<br /><br /></pre>Compute unnormalised conditional probability distribution on a pair of coin tosses given that first coin can't be <tt>True</tt> unless second one is:<br /><br /><pre>> test3 :: WriterT (Product Float) [] (Bool, Bool)<br />> test3 = do<br />> coin1 <- coin<br />> coin2 <- coin<br />> if coin1>coin2 then mempty else return (coin1, coin2)<br /><br /></pre>(Compare with Eric Kidd's <a href="http://www.randomhacks.net/articles/2007/02/21/refactoring-probability-distributions">article</a> that also 'refactors' probability theory.)<br /><br /><b>What just happened?</b><br />Something miraculous just happened though it may have been lost in the details. We combined the list monad and the writer monad to get a new monad. We did it without using monad transformers and without specifying an order for the two monads. It just so happens in this case that the result was the same as using a monad transformer.<br /><br /><b>M-set with M-set</b><br />We can try other products of theories. It's tricky to deal with a theory combined with itself because repeating a type class in a context doesn't do anything. We need to make another type class that looks exactly like <tt>MSet</tt> but with different names. The result is that the product of the theory of <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-sets and the theory of <img src="https://chart.googleapis.com/chart?cht=tx&chl=N" style="vertical-align:middle">-sets is the theory of <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5ctimes+N" style="vertical-align:middle">-sets. This agrees with what we'd get from using monad transformers. It also agrees with intuition. <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-sets correspond to the effect of accumulating data in a monoid. The product theory corresponds to using two accumulators simultaneously.<br /><br />(This makes me think type classes should take as arguments the name of the operations within them. That way a type can be an instance of the same type class in multiple ways. Compare with Agda modules.)<br /><br /><b>Monoid with monoid</b><br />This example illustrates why we can't expect a programming language to use the above method to combine theories. If an algebra has two multiplication operators with identities on it, and the two operators are compatible, then something <a href="http://en.wikipedia.org/wiki/Eckmann%E2%80%93Hilton_argument">surprising</a> happens. The multiplications turn out to be the same operation. What's more, the operation is commutative. So the product of the theory of monoids with itself is the theory of commutative monoids. A free commutative monoid is a multiset. Multisets require a very different implementation to lists and I doubt any automatic algebra combiner in the near future could discover one. (The Eckmann-Hilton argument also appears <a href="http://homotopytypetheory.org/2011/03/26/higher-fundamental-groups-are-abelian/">here</a>.)<br /><br /><b>The compatibility condition</b><br />To form the product of two theories we add in extra laws to ensure commutativity. If we don't add in such laws we get the sum of two theories. For the example theories I used here these theories can lead to quite complex types. For example the sum of the theory of <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-sets and <img src="https://chart.googleapis.com/chart?cht=tx&chl=N" style="vertical-align:middle">-sets is, I think, the theory of <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5coplus+N" style="vertical-align:middle">-sets where <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5coplus+N" style="vertical-align:middle"> is the "free product" of monoids. I this is a bit of a messy object from the perspective of types. Other effects, however, may behave nicely with respect to <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5coplus" style="vertical-align:middle">. I haven't yet investigated.<br /><br /><b>Conclusion</b><br />If you don't mind computing the relevant types by hand there are perfectly good alternative to monad transformers for combining effects. But it seems very difficult to automatically combine theories. In fact, I expect finding canonical forms for the elements of free algebras for a product theory isn't even computable. So this approach isn't going to replace monad transformers any time soon.<br /><br /><b>Exercise</b><br />Make a multiplication table showing the result of forming the product of algebras for lots of useful effects.Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com8tag:blogger.com,1999:blog-11295132.post-298610297751446202012-02-05T07:57:00.002-08:002012-02-05T09:34:44.031-08:00Lawvere theories made a bit easier<b>Introduction</b><br />I still don't think anyone has found a completely satisfactory way to combine effects in Haskell. (That's computational effects, not visual effects.) Monads are great for one kind of effect at a time, but it's not clear how to combine two arbitrary monads. Instead of monads we can work with monad transformers, but they are tricky both to implement and to use.<br /><br />I want to sketch a different, though incomplete approach to combining effects. There are a bunch of papers that describe this approach, and even some code that implements part of it. Almost everything I say is from a <a href="http://lambda-the-ultimate.org/node/3235">paper</a> by Hyland and Powers that I read a few years ago though I recently ran into this helpful <a href="http://mathoverflow.net/questions/70162/why-lawvere-theories-have-finite-products-and-more">answer</a> by Andrej Bauer on mathoverflow. Even if we don't get code we can immediately use, we still get a good way to think about and analyse effects.<br /><br />I'll get onto the effects part in another post. This one concentrates on what are known as Lawvere theories.<br /><br /><b>Monoids</b><br /><br /><pre>> {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, FlexibleContexts, ExplicitForAll #-}<br />> import Data.Monoid<br /><br /></pre>The (finite) lists of integers form a monoid. Here are some functions that operate on such lists:<br /><br /><pre>> f1 xs = map (+1) $ reverse xs<br />> f2 xs = xs `mappend` xs<br /><br /></pre>They can both be given signature <tt>[Integer] -> [Integer]</tt>. But there's one very important difference between these functions. <tt>f2</tt> has been written using only operations from the type class <tt>Monoid</tt>. It's a sort of universal function that can be applied to any monoid. On the other hand, the function <tt>f1</tt> can only applied to very specific monoids. In fact, the type signature of <tt>f2</tt> can be written as:<br /><br /><pre>> f2 :: forall a. Monoid a => a -> a<br /><br /></pre>That type signature essentially says that we're not going to do anything with elements of <tt>a</tt> except use the interface defined by the <tt>Monoid</tt> type class.<br /><br />(Although Haskell type classes can't enforce them, we also assume that any instance of <tt>Monoid</tt> satisfies the axioms for a monoid.)<br /><br />We can also define functions on tuples of monoids. Here are some examples:<br /><br /><pre>> g1 :: forall a. Monoid a => (a, a, a) -> (a, a)<br />> g1 (xs, ys, zs) = (mempty, ys `mappend` xs)<br /><br />> g2 :: forall a. Monoid a => (a, a) -> (a, a)<br />> g2 (xs, ys) = (xs `mappend` ys, ys `mappend` xs)<br /><br /></pre>Notice that we can compose these functions. So we have<br /><pre>g2 . g1 :: forall a. Monoid a => (a, a, a) -> (a, a)<br /></pre><br />We also have have identity functions for tuples. Armed with functions, identities and compositions can now form a category that I'll call <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle">. I'll call the (distinct) objects of this category <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5f0" style="vertical-align:middle">, <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5f1" style="vertical-align:middle">, <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5f2" style="vertical-align:middle"> and so on. The arrows from <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5fm" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5fn" style="vertical-align:middle"> are the total functions of type <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cforall+a.%5cmbox%7bMonoid+%7da%5cRightarrow+a%5em%5crightarrow+a%5en" style="vertical-align:middle">. So, for example, <tt>g1</tt> is an arrow from <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5f3" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5f2" style="vertical-align:middle">. Note that it doesn't matter what the objects are (as long as they're distinct). They're just placeholders between which we can string our arrows. Note how because of the universal quantifier, the functions we use have a type not of the form <tt>A -> B</tt>. So we can't represent the objects of our category as types in the usual way. We can think of <tt>mempty</tt> as a 0-ary operator, ie. an element of <tt>forall a. Monoid a => () -> a</tt>.<br /><br />But there's one more detail I want. I'll consider two arrows to be equal if they can be proved equal using the axioms for monoids. For example, these two Haskell functions represent the same arrow in <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> because of the associativity law:<br /><br /><pre>> h1 (x, y, z) = (x `mappend` y) `mappend` z<br />> h2 (x, y, z) = x `mappend` (y `mappend` z)<br /><br /></pre>We now have a bona fide category. Note that <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> contains lots of arrows. Anything you can build using <tt>mempty</tt> and <tt>mappend</tt> as well as all of the projections and permutations between tuples. <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> completely captures everything that can be deduced using the axioms for monoids. For example, the associativity law is contained in the fact that <tt>h1</tt> and <tt>h2</tt> represent the same arrow.<br /><br />The category <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> also has products. In fact it is given by <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5fm%5ctimes+M%5fn+%3d+M%5f%7bm%2bn%7d" style="vertical-align:middle"> with the projections back to the factors being represented by the obvious projection functions. <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5f0" style="vertical-align:middle"> serves as the product of no factors.<br /><br />So <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> captures the properties shared by all monoids. But what is its relationship to actual monoids? It's pretty nice. A monoid is a functor <img src="https://chart.googleapis.com/chart?cht=tx&chl=F%3aT%5crightarrow+Set" style="vertical-align:middle"> that preserves products.<br /><br />Let's unpack that. First <img src="https://chart.googleapis.com/chart?cht=tx&chl=F" style="vertical-align:middle"> must take objects in <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> to sets. But we've stipulated that <img src="https://chart.googleapis.com/chart?cht=tx&chl=F%28A%5ctimes+B%29%3dF%28A%29%5ctimes+F%28B%29" style="vertical-align:middle"> so <img src="https://chart.googleapis.com/chart?cht=tx&chl=F" style="vertical-align:middle"> is completely determined on objects once we know <img src="https://chart.googleapis.com/chart?cht=tx&chl=M+%3d+F%28M%5f1%29" style="vertical-align:middle">. In fact <img src="https://chart.googleapis.com/chart?cht=tx&chl=F%28M%5f1%29" style="vertical-align:middle"> will be the carrier for our monoid and <img src="https://chart.googleapis.com/chart?cht=tx&chl=F%28M%5fn%29" style="vertical-align:middle"> is its <img src="https://chart.googleapis.com/chart?cht=tx&chl=n" style="vertical-align:middle">th power. <img src="https://chart.googleapis.com/chart?cht=tx&chl=F" style="vertical-align:middle"> takes arrows in <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> to functions on sets. So, for example, it gives concrete realisations of <tt>mempty</tt> and <tt>mappend</tt>. Because, for example, <tt>h1</tt> and <tt>h2</tt> represent the same arrow in <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle">, <img src="https://chart.googleapis.com/chart?cht=tx&chl=F%28h1%29" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=F%28h2%29" style="vertical-align:middle"> must be equal. So associativity must hold for these realisations. The same goes for all of the other laws, and everything we can deduce from them. So the requirement of functoriality makes <img src="https://chart.googleapis.com/chart?cht=tx&chl=F%28M%5f1%29" style="vertical-align:middle"> into a monoid with identity <tt>F(mempty)</tt> and product <tt>F(mappend)</tt>.<br /><br />Given an instance of the <tt>Monoid</tt> type class we can immediately apply an arrow from <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> to it. The functor is applied implicitly by the Haskell compiler. For example <tt>h1</tt> can be applied to <tt>("a", "b", "c") :: (String, String, String)</tt>.<br /><br />The object <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5f1" style="vertical-align:middle"> in <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> is weird. It's a lot like the universal monoid sharing all of the properties you expect to hold simultaneously in all monoids. Except for one important one: it's not a monoid itself.<br /><br />Note that in a sense the category <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> isn't anything new. It's just a convenient 'datastructure' into which we can pack everything we can deduce about monoids.<br /><br /><b>M-sets</b><br />Before generalising, let's try a similar treatment for another algebraic structure, the horribly named <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-set. An <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-set is a structure on a set <img src="https://chart.googleapis.com/chart?cht=tx&chl=S" style="vertical-align:middle"> that assumes we already have some choice of monoid <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">. It defines an action of <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle"> on the set. An action is simply a function <img src="https://chart.googleapis.com/chart?cht=tx&chl=S%5crightarrow+S" style="vertical-align:middle"> defined for each element of <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle"> and which is compatible with the operations on <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">. Here's a suitable type class:<br /><br />> class Monoid m => MSet m s where<br />> act :: m -> s -> s<br /><br />and the laws are<br /><pre>act mempty x == x<br />act a (act b x) = act (a `mappend` b) x<br /></pre>We're thinking of <tt>act</tt> not as an operation on two arguments but instead thinking of <tt>act a</tt> being an operation on <tt>s</tt> for each element <tt>a</tt>. Note how that second law is really a lot of laws, one for every pair <tt>(a, b)</tt> in our monoid.<br /><br />A simple example is the way scalars act on vectors.<br /><br /><pre>> data Vector a = V a a a deriving Show<br />> instance Monoid a => MSet a (Vector a) where<br />> act w (V x y z) = V (w `mappend` x) (w `mappend` y) (w `mappend` z)<br /><br /></pre>Given any particular monoid (eg. <tt>String</tt>) we can define functions like:<br /><br /><pre>> j1 :: forall a. MSet String a => (a, a) -> (a, a)<br />> j1 (x, y) = ("x" `act` x, "y" `act` y)<br />> j2 :: forall a. MSet String a => (a, a) -> a<br />> j2 (x, y) = "abc" `act` x<br /><br /></pre>We can form a category in exactly the same way as for monoids. As the objects <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5fn" style="vertical-align:middle"> were just placeholders we may as well reuse them. The arrows from <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5fm" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5fn" style="vertical-align:middle"> are the functions on tuples we can make from repeated applications of <tt>act</tt>, ie. <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cforall+a.%5cmbox%7bMSet+String+%7da%5cRightarrow+a%5em%5crightarrow+a%5en" style="vertical-align:middle">. For each choice of <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle"> we get a new category <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle"> encoding everything we want to know about <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-sets. In much the same way, any functor <img src="https://chart.googleapis.com/chart?cht=tx&chl=F%3aT%27%5crightarrow+Set" style="vertical-align:middle"> gives an <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-set. We have <img src="https://chart.googleapis.com/chart?cht=tx&chl=F%28M%5fn%29+%3d+F%28M%5f1%29%5en" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=F" style="vertical-align:middle"> maps <tt>act a</tt> to the actual action on the set.<br /><br /><b>Lawvere theories</b><br />So what do these two categories have in common?<br /><br />Let's start with the simplest possible algebraic structure: a plain old set with no operations on it. The corresponding category will have objects <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5c%7bM%5f0%2c+M%5f1%2c+%5cldots%5c%7d" style="vertical-align:middle">. The arrows from <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5fm" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5fn" style="vertical-align:middle"> will be represented by functions of the form <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cforall+a.a%5em%5crightarrow+a%5en" style="vertical-align:middle">. That includes functions like the projections onto elements of tuples or permutations of tuples. This category is called <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5caleph%5f0" style="vertical-align:middle">.<br /><br />Both <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle"> defined earlier contain all of the objects and arrows of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5caleph%5f0" style="vertical-align:middle">. But both <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle"> also contain arrows corresponding to all of the operations you can perform in their respective algebraic structures. So we can define a <a href="http://ncatlab.org/nlab/show/Lawvere+theory">Lawvere theory</a> <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> as nothing more than a category that is <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5caleph%5f0" style="vertical-align:middle"> with extra arrows. The category <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> defined earlier is the "theory of monoids" and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle"> is the "theory of <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-sets". A 'model' of <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle">, or a <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle">-algebra, is a product-preserving functor <img src="https://chart.googleapis.com/chart?cht=tx&chl=F%3aT%5crightarrow+Set" style="vertical-align:middle">.<br /><br /><b>Product theories</b><br />Suppose we have two theories <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle">. There's a straightforward way to combine them into one theory. Form the category that (like all Lawvere theories) shares the same objects as <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle"> but has all of the arrows from both. (Obviously we'll have to keep just one identity, not an identity from <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and another from <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle">.) The catch is that if <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%3aA%5crightarrow+B" style="vertical-align:middle"> is an arrow in <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%27%3aB%5crightarrow+C" style="vertical-align:middle"> is an arrow in <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle"> we'll need the composition <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%27%5ccirc+f" style="vertical-align:middle"> too. We simply take the smallest set of arrows that contains all of the arrows from <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle"> and contains all of their compositions modulo all of the laws in <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle">.<br /><br />Thus far we already have a new Lawvere theory built from <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle">. But sometimes it's useful to ensure the operations from <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> are 'compatible', in some sense, with those from <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle">. We want them to commute with each other. I'll describe what that commutativity means now:<br /><br />Suppose we have an arrows <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%3am%5crightarrow+n" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%27%3am%27%5crightarrow+n%27" style="vertical-align:middle"> on <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle"> respectively. If we have a set <img src="https://chart.googleapis.com/chart?cht=tx&chl=S" style="vertical-align:middle"> that is both a <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle">-algebra and a <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle">-algebra, then <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> gives an operation <img src="https://chart.googleapis.com/chart?cht=tx&chl=S%5em%5crightarrow+S%5en" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%27" style="vertical-align:middle"> gives an operation <img src="https://chart.googleapis.com/chart?cht=tx&chl=S%5e%7bm%27%7d%5crightarrow+S%5e%7bn%27%7d" style="vertical-align:middle">. Write out an <img src="https://chart.googleapis.com/chart?cht=tx&chl=m%27%5ctimes+m" style="vertical-align:middle"> array of variables:<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cbegin%7barray%7d%7bcccc%7d%0ax%5f%7b11%7d+%26+x%5f%7b12%7d+%26+%5cldots+%26+x%5f%7b1m%7d+%5c%5c%0ax%5f%7b21%7d+%26+x%5f%7b22%7d+%26+%5cldots+%26+x%5f%7b2m%7d+%5c%5c%0a%26%26+%5cldots+%26+%5c%5c%0ax%5f%7bm%271%7d+%26+x%5f%7bm%272%7d+%26+%5cldots+%26+x%5f%7bm%27m%7d+%5c%5c%0a%5cend%7barray%7d" style="vertical-align:middle"><br /></blockquote>We can apply <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> across the rows to get the <img src="https://chart.googleapis.com/chart?cht=tx&chl=m%27%5ctimes+n" style="vertical-align:middle"> array:<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cbegin%7barray%7d%7brlccr%7d%0af%26%28x%5f%7b11%7d%2c+%26x%5f%7b12%7d%2c+%26%5cldots%2c+%26x%5f%7b1m%7d%29+%5c%5c%0af%26%28x%5f%7b21%7d%2c+%26x%5f%7b22%7d%2c+%26%5cldots%2c+%26x%5f%7b2m%7d%29+%5c%5c%0af%26%28%26%26%5cldots%26%29%5c%5c%0af%26%28x%5f%7bm%271%7d%2c+%26x%5f%7bm%272%7d%2c+%26%5cldots%2c+%26x%5f%7bm%27m%7d%29+%5c%5c%0a%5cend%7barray%7d" style="vertical-align:middle"><br /></blockquote>and now we can apply <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%27" style="vertical-align:middle"> down the columns to get an <img src="https://chart.googleapis.com/chart?cht=tx&chl=n%27%5ctimes+n" style="vertical-align:middle"> array.<br /><br />We could also have done this by applying <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%27" style="vertical-align:middle"> down the columns first and then <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> across the rows. We now throw in extra commutativity laws stating that these two operations give equal results, whatever the operations <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%27" style="vertical-align:middle">.<br /><br />For example, if the theory <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> has a binary multiplication operator <img src="https://chart.googleapis.com/chart?cht=tx&chl=m" style="vertical-align:middle"> and the theory <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle"> has binary multiplication operator <img src="https://chart.googleapis.com/chart?cht=tx&chl=m%27" style="vertical-align:middle"> then commutativity requires <img src="https://chart.googleapis.com/chart?cht=tx&chl=m%28m%27%28a%2cb%29%2cm%27%28c%2cd%29%29+%3d+m%27%28m%28a%2cc%29%2cm%28b%2cd%29%29" style="vertical-align:middle">. This is the usual <a href="http://blog.sigfpe.com/2008/05/interchange-law.html">interchange law</a>. In this example the commutativity law is therefore an assertion about the equality of two arrows in <img src="https://chart.googleapis.com/chart?cht=tx&chl=M%5f4%5crightarrow+M%5f1" style="vertical-align:middle">.<br /><br />The result of combining the two theories <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle">, additionally throwing in these commutativity laws, is known as the product theory <img src="https://chart.googleapis.com/chart?cht=tx&chl=L%5cotimes+L%27" style="vertical-align:middle">.<br /><br /><b>Product theories in Haskell</b><br />Although we can't get Haskell to enforce the commutativity rules we can get part way to defining product theories using type classes. We make our type an instance of both classes. For <img src="https://chart.googleapis.com/chart?cht=tx&chl=T" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%27" style="vertical-align:middle"> defined as the theories of monoids and <img src="https://chart.googleapis.com/chart?cht=tx&chl=M" style="vertical-align:middle">-sets above, a <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%5cotimes+T%27" style="vertical-align:middle">-algebra is given by a type that is an instance of both <tt>Monoid</tt> and <tt>MSet</tt>. Unfortunately we can't make Haskell automatically enforce the rule that elements be equal if the laws say they should be.<br /><br /><b>Coming soon</b><br />We've defined Lawvere theories. I've explained <a href="http://blog.sigfpe.com/2009/12/where-do-monads-come-from.html">previously</a> how many kinds of effects can be modelled by algebraic theories. I've defined a product on Lawvere theories. Which means we now have a way to combine effects. But for that you'll have to wait for the sequel.Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com8tag:blogger.com,1999:blog-11295132.post-79738030661387689412012-01-21T13:38:00.000-08:002012-01-21T16:34:54.159-08:00Some parallels between classical and quantum mechanics<b>Introduction</b><br />This isn't really a blog post. More of something I wanted to interject in a discussion on Google plus but wouldn't fit in the text box.<br /><br />I've always had trouble with the way the <a href="http://en.wikipedia.org/wiki/Legendre_transformation">Legendre transform</a> is introduced in classical mechanics. I know I'm not the only one. Many mathematicians and physicists have recognised that it seems to be plucked out of a hat like a rabbit and have even written papers to address this issue. But however much an author attempts to make it seem natural, it still looks like a rabbit to me.<br /><br />So I have to ask myself, what would make me feel comfortable with the Legendre transform?<br /><br />The Legendre transform is an analogue of the Fourier transform that uses a different semiring to the usual. I wrote <a href="http://blog.sigfpe.com/2005/10/quantum-mechanics-and-fourier-legendre.html">briefly</a> about this many years ago. So if we could write classical mechanics in a form that is analogous to another problem where I'd use a Fourier transform, I'd be happier. This is my attempt to do that.<br /><br />When I wrote about <a href="http://blog.sigfpe.com/2011/06/another-elementary-way-to-approach.html">Fourier transforms</a> a little while back the intention was to immediately follow it with an analogous article about Legendre transforms. Unfortunately that's been postponed so I'm going to just assume you know that Legendre transforms can be used to compute <a href="http://en.wikipedia.org/wiki/Legendre_transformation#Infimal_convolution">inf-convolutions</a>. I'll state clearly what that means below, but I won't show any detail on the analogy with Fourier transforms.<br /><br /><b>Free classical particles</b><br />Let's work in one dimension with a particle of mass <img src="https://chart.googleapis.com/chart?cht=tx&chl=m" style="vertical-align:middle"> whose position at time <img src="https://chart.googleapis.com/chart?cht=tx&chl=t" style="vertical-align:middle"> is <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%28t%29" style="vertical-align:middle">. The kinetic energy of this particle is given by <img src="https://chart.googleapis.com/chart?cht=tx&chl=T%3d%5cfrac%7b1%7d%7b2%7dm%5cdot%7bx%7d%5e2" style="vertical-align:middle">. Its Lagrangian is therefore <img src="https://chart.googleapis.com/chart?cht=tx&chl=L%3d%5cfrac%7b1%7d%7b2%7dm%5cdot%7bx%7d%5e2-V%28x%29" style="vertical-align:middle">.<br /><br />The <a href="http://en.wikipedia.org/wiki/Action_%28physics%29">action</a> of our particle for the time from <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f0" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f1" style="vertical-align:middle"> is therefore<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%5cint%5f%7bt%5f0%7d%5e%7bt%5f1%7d%28%5cfrac%7b1%7d%7b2%7dm%5cdot%7bx%7d%5e2-V%28x%29%29dt%7d" style="vertical-align:middle"><br /></blockquote><br />The particle motion is that which <a href="http://en.wikipedia.org/wiki/Principle_of_least_action">minimises</a> the action.<br /><br />Suppose the position of the particle at time <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f0" style="vertical-align:middle"> is <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%5f0" style="vertical-align:middle"> and the position at time <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f1" style="vertical-align:middle"> is <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%5f1" style="vertical-align:middle">. Then write <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cpsi%28t%5f0%2ct%5f1%2cx%5f0%2cx%5f1%29" style="vertical-align:middle"> for the action minimising path from <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%5f0" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%5f1" style="vertical-align:middle">. So<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%5cpsi%28t%5f0%2cx%5f0%3bt%5f1%2cx%5f1%29+%3d+%5cmin%5fx%5cint%5f%7bt%5f0%7d%5e%7bt%5f1%7d%28%5cfrac%7b1%7d%7b2%7dm%5cdot%7bx%7d%5e2-V%28x%29%29dt%7d" style="vertical-align:middle"><br /></blockquote>where we're minimising over all paths <img src="https://chart.googleapis.com/chart?cht=tx&chl=x" style="vertical-align:middle"> such that <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%28t%5fi%29%3dx%5fi" style="vertical-align:middle">.<br /><br />Now suppose our system evolves from time <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f0" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f2" style="vertical-align:middle">. We can consider this to be two stages, one from <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f0" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f1" style="vertical-align:middle"> followed by one from <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f1" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f2" style="vertical-align:middle">. Let <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cphi" style="vertical-align:middle"> be the minimised action analogous to <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cpsi" style="vertical-align:middle"> for the period <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f1" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f2" style="vertical-align:middle">. The action from <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f0" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f2" style="vertical-align:middle"> is the sum of the actions for the two subperiods. So the minimum total action for the period <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f0" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f2" style="vertical-align:middle"> is given by<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%5cpsi%28t%5f0%2cx%5f0%3bt%5f2%2cx%5f2%29+%3d+%5cmin%5f%7bx%5f1%7d%28%5cpsi%28t%5f0%2cx%5f0%3bt%5f1%2cx%5f1%29%2b%5cpsi%28t%5f1%2cx%5f1%3bt%5f2%2cx%5f2%29%29%7d" style="vertical-align:middle"><br /></blockquote><br />Let me simply that a little. I'll use <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cpsi%28t%2cx%29" style="vertical-align:middle"> where I previously used <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cpsi%28t%5f0%2cx%5f0%3bt%2cx%29" style="vertical-align:middle"> and <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cphi%28x%5f1%2cx%5f2%29" style="vertical-align:middle"> for <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cpsi%28t%5f1%2cx%5f1%3bt%5f2%2cx%5f2%29" style="vertical-align:middle">. So that last equation becomes:<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%5cpsi%28t%5f2%2cx%5f2%29%3d%5cmin%5f%7bx%5f1%7d%28%5cpsi%28t%5f1%2cx%5f1%29%2b%5cphi%28x%5f2-x%5f1%29%29%7d" style="vertical-align:middle"><br /></blockquote><br />Now suppose <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cphi" style="vertical-align:middle"> is translation-independent in the sense that <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cphi%28x%2bs%2cx%27%2bs%29%3d%5cphi%28x%2cx%27%29" style="vertical-align:middle">. So we can write <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cphi%28x%5f1%2cx%5f2%29%3d%5cphi%28x%5f2-x%5f1%29" style="vertical-align:middle">. Then the minimum total action is given by<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%5cpsi%28t%5f2%2cx%5f2%29%3d%5cmin%5f%7bx%5f1%7d%28%5cpsi%28t%5f1%2cx%5f1%29%2b%5cphi%28x%5f2-x%5f1%29%29%7d" style="vertical-align:middle"><br /></blockquote><br />Infimal convolution is defined by<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%28f%5codot+g%29%28x%29+%3d+%5cmin%5fs%28f%28s%29%2bg%28x-s%29%29%7d" style="vertical-align:middle"><br /></blockquote>so the minimum we seek is<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%5cpsi%28t%5f2%2cx%5f2%29+%3d+%28%5cpsi%28t%5f1%2c%5ccdot%29%5codot%5cphi%29%29%28x%5f2%29%7d" style="vertical-align:middle"><br /></blockquote><br />So now it's natural to use the Legendre transform. We have the inf-convolution theorem:<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%28f%5codot+g%29%5e%5cast%3df%5e%5cast%2bg%5e%5cast" style="vertical-align:middle"><br /></blockquote>where <img src="https://chart.googleapis.com/chart?cht=tx&chl=f%5e%5cast" style="vertical-align:middle"> is the Legendre transform of <img src="https://chart.googleapis.com/chart?cht=tx&chl=f" style="vertical-align:middle"> given by<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7bf%5e%5cast%28p%29+%3d+%5csup%5fx%28px-f%28x%29%29%7d" style="vertical-align:middle"><br /></blockquote>and so <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cpsi%5e%5cast%28t%5f2%2cp%29+%3d+%5cpsi%5e%5cast%28t%5f1%2cp%29%2b%5cphi%5e%5cast%28p%29" style="vertical-align:middle"> (where we use <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cast" style="vertical-align:middle"> to represent Legendre transform with respect to the spatial variable).<br /><br />Let's consider the case where from <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f1" style="vertical-align:middle"> onwards the particle motion is free, so <img src="https://chart.googleapis.com/chart?cht=tx&chl=V%3d0" style="vertical-align:middle">. In this case we clearly have translation-invariance and so the time evolution is given by repeated inf-convolution with <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cphi" style="vertical-align:middle"> and in the "Legendre domain" this is nothing other than repeated addition of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cphi%5e%5cast" style="vertical-align:middle">.<br /><br />Let's take a look at <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cphi" style="vertical-align:middle">. We know that if a particle travels freely from <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%5f1" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=x%5f2" style="vertical-align:middle"> over the period from <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f1" style="vertical-align:middle"> to <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f2" style="vertical-align:middle"> then it must have followed the minimum action path and we know, from basic mechanics, this is the path with constant velocity. So<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7bT+%3d+%5cfrac%7b1%7d%7b2%7dm%28x%5f2-x%5f1%29%5e2%2f%28t%5f2-t%5f1%29%5e2%7d" style="vertical-align:middle"><br /></blockquote>and hence the action is given by<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%5cphi%28s%29+%3d+%5cfrac%7b1%7d%7b2%7dms%5e2%2f%28t%5f2-t%5f1%29%7d" style="vertical-align:middle"><br /></blockquote>So the time evolution of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cpsi" style="vertical-align:middle"> is given by repeated inf-convolution with a quadratic function. The time evolution of <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cpsi%5e%5cast" style="vertical-align:middle"> is therefore given by repeated addition of the Legendre transform of a quadratic function. It's not hard to prove that the Legendre transform of a quadratic function is also quadratic. In fact:<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%5cphi%5e%5cast%28p%29+%3d+%5cfrac%7b1%7d%7b8%7dmp%5e2%28t%5f2-t%5f1%29%5e2%7d" style="vertical-align:middle"><br /></blockquote>Addition is easier to work with than inf-convolution so if we wish to understand the time evolution of the action function it's natural to work with this Legendre transformed function.<br /><br />So that's it for classical mechanics in this post. I've tried to look at the evolution of a classical system in a way that makes the Legendre transform natural.<br /><br /><b>Free quantum particles</b><br />Now I want to take a look at the evolution of a free quantum particle to show how similar it is to what I wrote above. In this case we have the Schrödinger equation<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7bi%5chbar%5cfrac%7b%5cpartial%7d%7b%5cpartial+t%7d%5cpsi%28t%2cx%29+%3d+-%5cfrac%7b%5chbar%5e2%7d%7b2m%7d%5cfrac%7b%5cpartial%5e2%7d%7b%5cpartial+x%5e2%7d%5cpsi%28x%2ct%29%2bV%5cpsi%28t%2cx%29%7d" style="vertical-align:middle"><br /></blockquote>Let's suppose that from time <img src="https://chart.googleapis.com/chart?cht=tx&chl=t%5f1" style="vertical-align:middle"> onwards the particle is free so <img src="https://chart.googleapis.com/chart?cht=tx&chl=V%3d0" style="vertical-align:middle">. Then we have<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7bi%5chbar%5cfrac%7b%5cpartial%7d%7b%5cpartial+t%7d%5cpsi%28t%2cx%29+%3d+-%5cfrac%7b%5chbar%5e2%7d%7b2m%7d%5cfrac%7b%5cpartial%5e2%7d%7b%5cpartial+x%5e2%7d%5cpsi%28t%2cx%29" style="vertical-align:middle"><br /></blockquote>Now let's take the Fourier transform in the spatial variable. We get:<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7bi%5chbar%5cfrac%7b%5cpartial%7d%7b%5cpartial+t%7d%5chat%7b%5cpsi%7d%28t%2ck%29+%3d+-%5cfrac%7b%5chbar%5e2%7d%7b2m%7d%28ik%29%5e2%5chat%7b%5cpsi%7d%28t%2ck%29" style="vertical-align:middle"><br /></blockquote>So<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%5chat%7b%5cpsi%7d%28t%2ck%29+%3d+%5cexp%28%5cfrac%7bi%5chbar+k%5e2%28t-t%5f1%29%7d%7b2m%7d%29%5chat%7b%5cpsi%7d%28t%5f1%2ck%29" style="vertical-align:middle"><br /></blockquote>We can write this as<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cdisplaystyle%7b%5chat%7b%5cpsi%7d%28t%2ck%29+%3d+%5chat%7b%5cphi%7d%28k%29%5chat%7b%5cpsi%7d%28t%5f1%2ck%29" style="vertical-align:middle"><br /></blockquote>where<br /><blockquote><img src="https://chart.googleapis.com/chart?cht=tx&chl=%5cphi%28x%29+%3d+%5cexp%28%5cfrac%7bix%5e2%7d%7b2%5chbar+m%7d%29" style="vertical-align:middle"><br /></blockquote>So the time evolution of the free quantum particle is given by repeated convolution with a Gaussian function which in the Fourier domain is repeated multiplication by a Gaussian. The classical section above is nothing but a <a href="http://ncatlab.org/nlab/show/tropical+semiring">tropical</a> version of this section.<br /><br /><b>Conclusion</b><br />I doubt I've said anything original here. Classical mechanics is well known to be the limit of quantum mechanics as <img src="https://chart.googleapis.com/chart?cht=tx&chl=%5chbar%5crightarrow+0" style="vertical-align:middle"> and it's well known that in this limit we find that occurrences of the semiring <img src="https://chart.googleapis.com/chart?cht=tx&chl=%28%5cmathbb%7bR%7d%2c%2b%2c%5ctimes%29" style="vertical-align:middle"> are replaced by the semiring <img src="https://chart.googleapis.com/chart?cht=tx&chl=%28%5cmathbb%7bR%7d%2c%5cmin%2c%2b%29" style="vertical-align:middle">. But I've never seen an article that attempts to describe classical mechanics in terms of repeated inf-convolution even though this is close to Hamilton's formulation and I've never seen an article that shows the parallel with the Schrödinger equation in this way. I'm hoping someone will now be able to say to me "I've seen that before" and post a relevant link below.<br /><br /><b>Note</b><br />I'm not sure how the above applies for a non-trivial potential <img src="https://chart.googleapis.com/chart?cht=tx&chl=V" style="vertical-align:middle">. I wrote this little Schrödinger <a href="http://homepage.mac.com/sigfpe/Harmonic/anharmonic.html">equation solver</a> a while back. As might be expected, it's inconvenient to use the Fourier domain to deal with the part of the evolution due to <img src="https://chart.googleapis.com/chart?cht=tx&chl=V" style="vertical-align:middle">. In order to simulate a time step of <img src="https://chart.googleapis.com/chart?cht=tx&chl=dt" style="vertical-align:middle"> the code simulates <img src="https://chart.googleapis.com/chart?cht=tx&chl=dt%2f2" style="vertical-align:middle"> in the Fourier domain assuming the particle is free and then spends <img src="https://chart.googleapis.com/chart?cht=tx&chl=dt%2f2" style="vertical-align:middle"> solving for the <img src="https://chart.googleapis.com/chart?cht=tx&chl=V" style="vertical-align:middle">-dependent part in the spatial domain. So even in the presence of non-trivial <img src="https://chart.googleapis.com/chart?cht=tx&chl=V" style="vertical-align:middle"> it can still be useful to work with a Fourier transform. Almost the same iteration could be used to numerically compute the action for the classical case.Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com1tag:blogger.com,1999:blog-11295132.post-25431895417435887332012-01-07T09:40:00.000-08:002012-01-07T10:02:25.642-08:00Lossless decompression and the generation of random samplesLet S be some finite set with a probability distribution on it. Here's a diagram showing some example probabilities for the set S={A, B, C, D, E}:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-JxfDOQAMAas/Twh1-a1qfYI/AAAAAAAAA58/xm22ZkMqWd8/s1600/pdf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-JxfDOQAMAas/Twh1-a1qfYI/AAAAAAAAA58/xm22ZkMqWd8/s1600/pdf.png" /></a></div>How can we generate lots of random samples from this distribution? A popular method involves first storing the cumulative distribution function in a table like so:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-QCCqlK9BR1E/Twh2u2f8y2I/AAAAAAAAA6E/j5sYagwcj1s/s1600/cdf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-QCCqlK9BR1E/Twh2u2f8y2I/AAAAAAAAA6E/j5sYagwcj1s/s1600/cdf.png" /></a></div><br />We then use any of a number of popular methods to generate uniform pseudorandom numbers in the range [0,1) and for each one walk through the table until we find the first entry greater than our pseudorandom number. The symbol above the number is the one we generate. So if we generated 0.512, we'd pick symbol C. It's straightforward to prove this gives the correct probability distribution.<br /><br />As described this algorithm can be slow. If the size of the table is N we may have to walk through up to N entries to generate each sample.<br /><br />One approach to accelerating this algorithm is to quantise our pseudorandom number and use it to look up, in a precomputed table, a jump into our cumulative distribution table. I've used this several times in my visual effects career. But today I'm going to take a different approach.<br /><br />Another natural way to speed up sample generation is to use a binary search to find the appropriate point in the cumulative distribution, for example the C++ standard template library's <span style="font-family: 'Courier New', Courier, monospace;">upper_bound</span><span style="font-family: inherit;"> method will do the job.</span><br /><span style="font-family: inherit;"><br /></span><br /><span style="font-family: inherit;">But what is a binary search algorithm going to do? Typically it's going to start by comparing our pseudorandom number with the midpoint of the table. If our number is bigger then it will recursively use binary search on the left (looking next at the midpoint of the left half), otherwise on the right, and so on. If we're generating many samples from the same distribution we're going to be repeatedly looking up the same midpoints in the table. At the end of the day, the process can be described by a decision tree. So we may as well throw away the table and build the decision tree up front. Here's what it might look look like:</span><br /><span style="font-family: inherit;"><br /></span><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-5BoKnwdGeY0/Twh7_PlNU-I/AAAAAAAAA6Q/kesZroqEOXw/s1600/tree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="231" src="http://1.bp.blogspot.com/-5BoKnwdGeY0/Twh7_PlNU-I/AAAAAAAAA6Q/kesZroqEOXw/s320/tree.png" width="320" /></a></div><span style="font-family: inherit;">But maybe we can do better. For example C is three times as likely as A but they both take the same amount of time to generate as they both require walking down to depth 2. Meanwhile D has a probability of 0.25 and requires walking to depth 3. We'd like to rebalance the tree. Note also that there's no reason to list our original PDF in the order I gave. Different orderings might give different trees.</span><br /><span style="font-family: inherit;"><br /></span><br /><span style="font-family: inherit;">It's straightforward to describe what we want to optimise. We want to place sample i at depth d<sub>i</sub> so that the expected value of </span>d<sub>i</sub><span style="font-family: inherit;"> is as small as possible. In other words we want to minimise </span><span style="font-family: inherit;">Σ</span><span style="font-family: inherit;">p</span><sub>i</sub>d<sub>i</sub><span style="font-family: inherit;">. But this is precisely the problem solved by <a href="http://en.wikipedia.org/wiki/Huffman_coding">Huffman coding</a>.</span><br /><span style="font-family: inherit;"><br /></span><br /><span style="font-family: inherit;">And that's the conclusion I wanted to reach: Huffman coding gives the optimal decision tree to generate random samples. It also tells us this interesting consequence: if we use a decision tree method then the performance of our algorithm is bounded by the entropy of the <a href="http://en.wikipedia.org/wiki/Entropy_(information_theory)">probability distribution</a>. I find this connection between entropy and algorithmic complexity pretty surprising.</span><br /><span style="font-family: inherit;"><br /></span><br /><span style="font-family: inherit;">I learnt the above during my interview at Google!</span><br /><span style="font-family: inherit;"><br /></span><br /><span style="font-family: inherit;">Why is there this connection between a compression algorithm and the generation of random samples? It took me a little longer to realise why but it's quite straightforward.</span><br /><span style="font-family: inherit;"><br /></span><br />Huffman coding tries to compress text one letter at a time on the assumption that each letter comes from some fixed and known probability distribution. If the algorithm is successful then we'd expect the compressed text to look like a uniformly distributed sequence of bits. If it didn't then there'd be patterns that could be used for further compression.<br /><br />So when we're decompressing Huffman encoded data we have a machine that takes as input uniformly distributed bits and which outputs letters sampled from some probability distribution. But that's exactly the same problem that I posed above. So, at least from some perspective, decompression is precisely the same thing as generating samples from a probability distribution.<br /><br />Or to put it another way: there is a class of algorithm whose purpose is to convert samples from one probability distribution into samples from another. When we use one of these algorithms to convert from samples from some distribution P that's easy to generate samples from, into samples from Q, we call this "an algorithm to sample from distribution Q". When we use one of these algorithms to convert from some distribution to a uniform distribution, and the function given by the algorithm is invertible, then we call it "lossless compression".Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com21tag:blogger.com,1999:blog-11295132.post-34026757648417262502011-10-30T13:19:00.000-07:002011-10-30T13:19:44.199-07:00Quick and dirty reinversion of controlIt's taken for granted by many people that Haskell and static types are incompatible with prototyping and quick-and-dirty hacks.<br /><br />I wanted to put together some OpenGL code that had a script for how a bunch of graphics should be displayed. It was essentially an imperative specfification for my program. For quick and dirty hacks, GLUT is tolerable. But even when programming in C/C++, it's not supportive of programming in a straightforward imperative style because it uses inversion of control. Many graphics applications are written in a state machine style where the state machine gets to tick once each time there is an event callback. This really doesn't fit the imperative script style.<br /><br />But it is possible to reinvert inversion of control in any language that supports continuations. And that includes languages like Python that support linear continuations in the form of generators. But I'm using Haskell here.<br /><br />Continuations reify the remainder of a computation. Or in more down to earth language: they allow you to grab the stuff you're about to do as a function, put it on ice for a while, and then carry on doing it later. So imagine we had a block of imperative code and that we'd like, at each GLUT callback, to make some progress through this block. We can use continuations like this: each time we want to yield control back to the main loop we simply grab the remainder or our 'script' as a continuation and make it the callback to be executed next time GLUT is ready.<br /><br />The slight wrinkle is that OpenGL/GLUT calls use IO. To combine <tt>IO</tt> and continuations we need the <tt>ContT</tt> monad transformer.<br /><br />I'll do everything except the <tt>yield</tt> function first and get back to that at the end.<br /><br />Some standard library imports:<br /><br /><pre>> import Graphics.UI.GLUT<br />> import Control.Monad.Cont<br /><br /></pre>Some simple code to draw a line from left to right:<br /><br /><pre>> display :: GLdouble -> IO ()<br />> display y = do<br />> clear [ColorBuffer]<br />> <br />> renderPrimitive LineStrip $ do<br />> vertex (Vertex2 (-1) (-y))<br />> vertex (Vertex2 1 y)<br /><br />> swapBuffers<br />> postRedisplay Nothing<br /><br /></pre>Some standard OpenGL/GLUT setup:<br /><br /><pre>> main = do<br />> (progname, _) <- getArgsAndInitialize<br />> initialWindowSize $= Size 500 500<br />> initialDisplayMode $= [DoubleBuffered, RGBMode]<br />> createWindow "Bounce!"<br /><br />> matrixMode $= Modelview 0<br />> loadIdentity<br /><br />> matrixMode $= Projection<br />> loadIdentity<br />> ortho (-1) 1 (-1) 1 (-1) 1<br /><br /></pre>Our script is called before the main loop.<br /><br /><pre>> imperative<br />> mainLoop<br /><br /></pre>And now comes the actual script. Apart from the <tt>liftIO</tt> calls this should be almost as easy to read as BASIC programming from the days of yore:<br /><br /><pre>> imperative = flip runContT return $ do<br /><br />> liftIO $ print "Start!"<br /><br />> forever $ do<br /><br />> forM_ [-1, -0.992 .. 1.0] $ \y -> do<br />> render $ display y<br />> yield<br /><br />> liftIO $ print "Bounce!"<br /><br />> forM_ [-1, -0.992 .. 1.0] $ \y -> do<br />> render $ display (-y)<br />> yield<br /><br />> liftIO $ print "Bounce!"<br />> yield<br /><br /></pre>The first thing to note is that <tt>render</tt> doesn't actually do any rendering. At the end of the day we can't tell GLUT when to render, it only calls you. So instead <tt>render</tt> tells GLUT what to do next time it's in the mood for a bit of rendering:<br /><br /><pre>> render f = liftIO $ displayCallback $= f<br /><br /></pre>That leaves one thing to explain: <tt>yield</tt>. It needs to grab the remainder of the script and package it up in a form suitable for installation as an idle callback. But there's a catch: continuations are notorious for making your head explode. If you're throwing together a quick and dirty hack, that's the last thing you need. Here's where static types come to the rescue. As <a href="https://plus.google.com/115274377971493973150/posts/jMqzaxPEnrd">Conor McBride</a> points out, we want to just do the easy thing and follow gravity downwards.<br /><br />So first we try to guess the type of <tt>yield</tt>. We know we're working with the <tt>ContT IO</tt> monad. So its type is going to be <tt>ContT IO a</tt> for some <tt>a</tt>. There's no particular type of data we want to get out of <tt>yield</tt>, it's just a thing we want executed. So we can guess the type is <tt>ContT IO ()</tt>, the type <tt>()</tt> being the generic filler type when we don't actually have any data.<br /><br />Let's look at the definition of <tt>ContT</tt>:<br /><br /><pre>newtype ContT r m a = Cont {<br /> runContT :: (a -> m r) -> m r<br />}<br /><br /></pre>The type <tt>r</tt> is the final return type from our continuation. We're not interested in a return value, we just want to *do* stuff. So we expect <tt>r</tt> to be <tt>()</tt> as well.<br /><br />So <tt>yield</tt> must essentially be of type <tt>(() -> IO ()) -> IO ()</tt>.<br /><br />So we want to concoct something of this type using GLUT's <tt>idleCallback</tt> function. As <tt>yield</tt> must take a function as argument it must look something like:<br /><br /><pre>yield = ContT $ \f -> ...<br /><br /></pre>We know that <tt>f</tt> is of type <tt>() -> IO ()</tt>. So there's only one thing we can do with it: apply it to <tt>()</tt>. That gives us something of type <tt>IO ()</tt>. That's precisely the type of GLUT's <tt>idleCallback</tt>. So we put it all together:<br /><br /><pre>> yield = ContT $ \f -> idleCallback $= Just (f () )<br /><br /></pre>The code now works. I didn't have to spend even a moment thinking about the meaning of a continuation. Implementing <tt>yield</tt> was about as hard as putting together a jigsaw puzzle with three pieces. There's only so many ways you can put the pieces together.<br /><br />And that's a simple example of why I often like to write quick-and-dirty code with a statically typed language.<br /><br />(Oh, and I'm not trying to take part in a war. I like to prototype in many different languages, some of which are dynamically typed.)<br /><br />PS Note also that the above code illustrates one way to avoid <tt>IORef</tt>s in GLUT code.Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com13tag:blogger.com,1999:blog-11295132.post-7426123242367987882011-08-13T08:12:00.000-07:002011-08-13T13:46:43.205-07:00Computing errors with square roots of infinitesimals.<b>Abstract</b><br />Automatic differentiation (AD) gives a way to carry out <a href="http://en.wikipedia.org/wiki/Propagation_of_uncertainty">uncertainty propagation</a>. But used in the obvious way it leads to <a href="http://en.wikipedia.org/wiki/Propagation_of_uncertainty#Caveats_and_warnings">bias</a>. This article introduces "square roots of infinitesimals" that can be used to give more accurate results.<br /><br /><b>Introduction</b><br />In the real world measurements have errors and we often want to know how much our final answers are affected by those errors. One tool for measuring the sensitivity to errors of our results is calculus. In fact, we can use automatic differentiation to give us a nice way to model error. Here's an implementation:<br /><br /><pre>> import Control.Monad
<br />> import Control.Monad.State
<br />> import Control.Applicative
<br />> import qualified Data.IntMap as I
<br />
<br />> infixl 6 .+.
<br />> infixl 7 .*
<br />> infixl 7 *.
<br />
<br />> data D a = D a a deriving (Eq, Show)
<br />
<br />> instance Num a => Num (D a) where
<br />> D x a+D x' a' = D (x + x') (a + a')
<br />> D x a*D x' a' = D (x*x') (a*x' + x*a')
<br />> negate (D x a) = D (negate x) (negate a)
<br />> fromInteger n = D (fromInteger n) 0
<br />> abs _ = error "No abs"
<br />> signum _ = error "No signum"
<br />
<br />> d = D 0 1
<br />
<br /></pre>As I've talked about <a href="http://blog.sigfpe.com/2006/09/practical-synthetic-differential.html">before</a>, the value <tt>d</tt> can be thought of as an infinitesimal number whose square is zero. However, to first order we can replace <tt>d</tt> with a small number and use it to compute errors. Here's a function to perform such a substitution:<br /><br /><pre>> approx :: Num a => D a -> a -> a
<br />> approx (D x d) e = x+d*e
<br />
<br /></pre>Suppose we have a square whose side we've measured as 1m to an accuracy of 1cm. We can represent this as:<br /><br /><pre>> sq_side = D 1 0.01
<br />
<br /></pre>We can now compute the area:<br /><br /><pre>> sq_area = sq_side^2
<br />
<br /></pre>We get <tt>D 1.0 2.0e-2</tt>. We can interpret this as meaning the area is 1m<sup>2</sup> with an accuracy of 0.02m<sup>2</sup>.<br /><br />We can make "to an accuracy of" more precise. Differentiation models a function locally as an approximate linear function. (1m+δ)<sup>2</sup>=1m<sup>2</sup>+2mδ+O(δ<sup>2</sup>). So at 1m, the function to compute area locally scales lengths by 2m. So if the length measurement is actually a sample from a distribution with given mean 1m and standard deviation 0.01m, the area is approximately a sample from a distribution with mean 1m<sup>2</sup> and SD 0.02m<sup>2</sup>.<br /><br />This approach is nice, but sometimes we want a little more accuracy in our estimates. In particular, if you square a sample from a normal distribution with small variance and positive mean, then the nonlinearity of the squaring operation means that samples that are larger than the mean move further away from the mean, when squared, than samples less than the mean. So we should actually expect our area computations to be slightly <a href="http://en.wikipedia.org/wiki/Propagation_of_uncertainty#Caveats_and_warnings">biased</a> upwards from 1m<sup>2</sup>. Unfortunately, this is a second order effect that isn't visible from looking only at first derivatives.<br /><br />That's not a problem, we can easily compute second derivatives using automatic differentiation. However, that can complicate things. What happens if we use multiple measurements to compute a quantity? Each one is a different sample from a different distribution and we don't want these measurements to be correlated. If we approach this in the obvious way, when we want to use n measurements we'll need to compute n<sup>2</sup> partial second derivatives. However, by tweaking AD slightly we'll only need n derivatives.<br /><br /><b>Square roots of infinitesimals</b><br />In addition to the usual infinitesimal d we want to introduce quantities, w_i, that represent independent random "noise" variables that are infinitesimal in size. We'll be interested in expectation values so we'll also need an expectation function, e. We want e(w_i)=0. But w<sub>i</sub><sup>2</sup> is always positive so its expectation is always greater than or equal to zero. We want our random variables to be infinitesimal so we pick e(w<sub>i</sub><sup>2</sup>)=d. We also want e(w<sub>i</sub>w<sub>j</sub>)=0 because of independence. If the w<sub>i</sub> are already infinitesimal, the dw<sub>i</sub> should be zero. So let's define an algebraic structure that captures these relationships. So we extend <tt>D a</tt> by introducing <tt>w i</tt> so that:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/--86TgXTLQtc/TkZ37tLf4TI/AAAAAAAAAwE/5DYb80Kunuw/s1600/algebra.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="186" src="http://1.bp.blogspot.com/--86TgXTLQtc/TkZ37tLf4TI/AAAAAAAAAwE/5DYb80Kunuw/s320/algebra.png" width="320" /></a></div><br />Any element of this algebra can be written as x+ad+Σb<sub>i</sub>z<sub>i</sub>. We represent b sparsely by using an <tt>IntMap</tt>. Here's an implementation:<br /><br /><pre>> data S a = S a a (I.IntMap a) deriving (Eq, Show)
<br />
<br />> (.+.) :: Num a => I.IntMap a -> I.IntMap a -> I.IntMap a
<br />> ( *.) :: Num a => a -> I.IntMap a -> I.IntMap a
<br />> (.*) :: Num a => I.IntMap a -> a -> I.IntMap a
<br />> (.*.) :: Num a => I.IntMap a -> I.IntMap a -> a
<br />
<br />> (.+.) = I.unionWith (+)
<br />> a *. v = I.map (a *) v
<br />> v .* b = I.map (* b) v
<br />> a .*. b = I.fold (+) 0 $ I.intersectionWith (*) a b
<br />
<br />> instance Num a => Num (S a) where
<br />> S x a b+S x' a' b' = S (x + x') (a + a') (b .+. b')
<br />> S x a b*S x' a' b' = S (x*x') (a*x' + x*a' + b.*.b') (x*.b' .+. b.*x')
<br />> negate (S x a b) = S (negate x) (negate a) (I.map negate b)
<br />> fromInteger n = S (fromInteger n) 0 I.empty
<br />> abs _ = error "No abs"
<br />> signum _ = error "No signum"
<br />
<br /></pre>Here are the individual <tt>w i</tt>:<br /><br /><pre>> w :: Num a => Int -> S a
<br />> w i = S 0 0 (I.fromList [(i, 1)])
<br />
<br /></pre>We compute expectation values linearly by mapping the <tt>w i</tt> to zero:<br /><br /><pre>> e :: Num a => S a -> D a
<br />> e (S x a _) = D x a
<br />
<br /></pre>We can also represent numbers whose values we know precisely:<br /><br /><pre>> sure x = S x 0 I.empty
<br />
<br /></pre><b>Example</b><br />Let's revisit the area example. This time we can represent the length of the side of our square as<br /><br /><pre>> sq_side' = 1+0.01*w 0
<br />> sq_area' = sq_side'^2
<br />
<br /></pre>We get <tt>S 1.0 1.0e-4 (fromList [(0,2.0e-2)])</tt>. We can directly read off that we have a bias of 10<sup>-4</sup>m<sup>2</sup> which is 1cm^2. We can encapsulate this as:<br /><br /><pre>> mean f = approx (e f) 1
<br />
<br /></pre>We can directly read off the variance from the element of the algebra. However, we can also compute the variance using <tt>mean</tt>. It's just:<br /><br /><pre>> var f = mean ((f-sure (mean f))^2)
<br />
<br /></pre>(Note that this gives a very slightly different result from the value you can read off directly from the <tt>S</tt> object. It depends on whether we're measuring the deviation around the unbiased or biased mean. To the order we're considering here the difference is small. Here's <tt>var'</tt> anyway:<br /><br /><pre>> var' (S _ _ v) = I.fold (+) 0 $ I.map (\x -> x^2) v
<br />
<br /></pre>)<br /><br />We can also define covariance:<br /><br /><pre>> cov f g = mean ((f-sure (mean f))*(g-sure (mean g)))
<br />
<br /></pre><b>More functions</b><br />We can now follow through just like with automatic differentiation to compute lots more functions. We use the fact that:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-RM7_BU5grdU/TkZ4Kxjz0oI/AAAAAAAAAwI/WIuld_bAnvE/s1600/taylor.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="86" src="http://4.bp.blogspot.com/-RM7_BU5grdU/TkZ4Kxjz0oI/AAAAAAAAAwI/WIuld_bAnvE/s400/taylor.png" width="400" /></a></div><br /><br /><pre>> instance Fractional a => Fractional (S a) where
<br />> fromRational x = S (fromRational x) 0 I.empty
<br />> recip (S x a b) = let r = recip x
<br />> in S r (-a*r*r+r*r*r*(b.*.b)) ((-r*r)*.b)
<br />
<br />> instance Floating a => Floating (S a) where
<br />> pi = sure pi
<br />> sin (S x a b) = let s = sin x
<br />> c = cos x
<br />> in S s (a*c - s/2*(b.*.b)) (c*.b)
<br />> cos (S x a b) = let s = sin x
<br />> c = cos x
<br />> in S c (-a*s - c/2*(b.*.b)) ((-s)*.b)
<br />> exp (S x a b) = let e = exp x
<br />> in S e (a*e + e/2*(b.*.b)) (e*.b)
<br />> sqrt (S x a b) = let s = sqrt x
<br />> in S s (a/(2*s)-1/(4*s*s*s)*(b.*.b)) (1/(2*s)*.b)
<br />> log (S x a b) = let r = 1/x
<br />> in S (log x) (r*a-r*r/2*(b.*.b)) (r*.b)
<br />> asin = undefined
<br />> acos = undefined
<br />> atan = undefined
<br />> sinh = undefined
<br />> cosh = undefined
<br />> tanh = undefined
<br />> asinh = undefined
<br />> acosh = undefined
<br />> atanh = undefined
<br />
<br /></pre><b>A real example</b><br />Let's make this effort worthwhile. We'll compute errors for a computation that uses the errors in a messy nonlinear way. Suppose we're in the lab measuring radioactive decay. We measure the geiger counter reading at times t = 0hr, 1hr, 2hr, 3hr, 4hr at which point we compute an estimate for when the decay will drop to one tenth of its original value. We'll assume the decay fits a model counts/sec = a exp(-λt) and that the counts have an error with SD 0.05. We're going to compute the error in the estimated time to hit one tenth radioactivity in the case when the half life is 30 minutes and a=2:<br /><br /><pre>> t = [0..4]
<br />> counts = map (\i-> 2*exp(-0.5*fromIntegral i)+0.05*w i) t
<br />
<br /></pre><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/--xCHmeCt40M/TkZ7T_WTVxI/AAAAAAAAAwY/CdU21UOAOjA/s1600/conc.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="290" src="http://2.bp.blogspot.com/--xCHmeCt40M/TkZ7T_WTVxI/AAAAAAAAAwY/CdU21UOAOjA/s400/conc.png" width="400" /></a></div><br /><br />We'll be fitting a curve using logarithmic regression so we'll need the following function. Given a pair of lists <tt>x</tt> and <tt>y</tt> it returns <tt>(m, c)</tt> where y=mx+c is the standard least squares fit.<br /><br /><pre>> regress :: Fractional a => [a] -> [a] -> (a, a)
<br />> regress x y =
<br />> let sx = sum x
<br />> sy = sum y
<br />> sxx = sum $ map (^2) x
<br />> sxy = sum $ zipWith (*) x y
<br />> n = fromIntegral (length x)
<br />> s = 1/(sx*sx-n*sxx)
<br />> in (s*sx*sy-s*n*sxy, -s*sxx*sy+s*sx*sxy)
<br />
<br /></pre>Logarithmic regression:<br /><br /><pre>> (m, c) = regress (map fromIntegral t) (map log counts)
<br />> lambda = -m
<br />> a = exp c
<br />> t_tenth = -log (0.1/a)/lambda
<br />
<br /></pre>We can now go ahead and compute the mean and variance of our estimate:<br /><pre>*Main> mean t_tenth
<br />5.98036172868899
<br />*Main> var t_tenth
<br />0.15583537298560224
<br /></pre>The correct time is about 5.991 so the regression method above is biased by about 0.01. If we repeated the same experiment over and over again and averaged the estimates we got from logarithmic regression the process would not converge to the correct result. In fact, we can compute "ground truth" by simulating the experiment a million times in Octave and estimate the mean and variance from that. The code is in the appendix. Obviously this is a much slower process but it clearly demonstrates the biasedness of using regression this way.<br /><pre>GNU Octave, version 3.4.0
<br />Copyright (C) 2011 John W. Eaton and others.
<br />ans = 5.9798
<br />ans = 0.15948
<br /></pre><br /><b>Final thoughts</b><br />This is yet another example of extending automatic differentiation. We have variants for single variable differentiation, multivariate differentiation, multiple differentiation, <a href="http://blog.sigfpe.com/2010/07/automatic-divided-differences.html">divided differences</a>, splitting a function into <a href="http://blog.sigfpe.com/2010/09/automatic-evenodd-splitting.html">odd and even parts</a> and now automatic error propagation.<br /><br />This stuff was very loosely inspired by reading <a href="http://jhupbooks.press.jhu.edu/ecom/MasterServlet/GetItemDetailsHandler?iN=9780801868672&qty=1&viewMode=3">An Introduction to Stochastic Processes in Physics</a>. I'm attempting to capture the semi-formal rules used in that book to reason about differentials and you can think of the algebra above as representing stochastic differentials. I made a guess that the algebra is called the <a href="http://en.wikipedia.org/wiki/It%C5%8D_calculus">Itō</a> algebra. Sure enough, you'll get a few <a href="http://www.google.com/search?q=%22ito+algebra%22">hits</a>.<br /><br />The most similar published work I can find is <a href="https://uhra.herts.ac.uk/dspace/bitstream/2299/3600/3/902183.pdf">Automatic Propagation of Uncertainties</a> but it seems to just use ordinary AD.<br /><br />This technique may be useful for <a href="http://en.wikipedia.org/wiki/Extended_Kalman_filter">Extended Kalman Filtering</a>.<br /><br />I haven't done the work to make precise statements about how accurate you can expect my estimates of expectations to be. <br /><br />It's possible to implement a monad with syntax similar to other probability monads by using state to bump up the <tt>i</tt> in <tt>w i</tt> each time you generate a new random variable. But bear in mind, these are always intended to be used as *infinitesimal* random variables.<br /><br /><b>Appendix: Octave code</b><br /><pre>m = 5;
<br />n = 1000000;
<br />
<br />x = repmat([0:m-1]',1,n);
<br />y = repmat([2*exp(-0.5*[0:m-1]')],1,n)+0.05*normrnd(0,1,m,n);
<br />
<br />sx = sum(x);
<br />sxx = sum(x.*x);
<br />p = sum(log(y));
<br />q = sum(x.*log(y));
<br />
<br />s = 1./(sx.*sx-m*sxx);
<br />m = s.*sx.*p-m*s.*q; # Redefined
<br />c = -s.*sxx.*p+s.*sx.*q;
<br />
<br />lambda = -m;
<br />a = exp(c);
<br />x_tenth = -log(0.1./a)./lambda;
<br />
<br />mean(x_tenth)
<br />var(x_tenth)
<br /></pre>Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com3tag:blogger.com,1999:blog-11295132.post-8659723177764434122011-07-23T14:39:00.000-07:002011-07-23T16:28:04.960-07:00Profunctors in Haskell<pre>> {-# LANGUAGE TypeSynonymInstances, RankNTypes, ExistentialQuantification #-}<br /><br /></pre><b>Introduction</b><br />When I wrote about <a href="http://blog.sigfpe.com/2009/03/dinatural-transformations-and-coends.html">coends</a> a while back I made up a term 'difunctor'. More recently it was pointed out to me that the correct word for this concept is 'profunctor', but unfortunately my knowledge came from <a href="http://books.google.com/books/about/Categories_for_the_working_mathematician.html?id=eBvhyc4z8HQC">MacLane</a> which mentions that word nowhere.<br /><br />Profunctors are ubiquitous in Haskell programming. Probably the most natural definition of Hughes Arrows is via profunctors. Profunctors also play a role a little like tensors leading to a use of the terms 'covariant' and 'contravariant' that looks remarkably like the way those terms are used in tensor calculus.<br /><br />For categories C and D, A profunctor is a functor D<sup>op</sup>×C→Set and is written C↛D. (I hope that arrow between C and D is in your font. It's missing on iOS.)<br /><br />I'll reuse my Haskell approximation to that definition:<br /><br /><pre>> class Profunctor h where<br />> lmap :: (d' -> d) -> h d c -> h d' c<br />> rmap :: (c -> c') -> h d c -> h d c'<br /><br /></pre>We need cofunctoriality for the first argument and functoriality for the second:<br /><br /><pre>lmap (f . g) == lmap g . lmap f<br />rmap (f . g) == rmap f . rmap g<br /></pre><br />(Strictly we probably ought to call these 'endoprofunctors' as we're only really dealing with the category of Haskell types and functions.)<br /><br />There are lots of analogies for thinking about profunctors. For example, some people think of them as generalising functors in the same way that relations generalise functions. More specifically, given a function f:A→B, f associates to each element of A, a single element of B. But if we want f to associate elements of A with elements of B more freely, for example 'mapping' elements of A to multiple elements of B then we instead use a relation which can be written as a function f:A×B→{0,1} where we say xfy iff f(x,y)=1. In this case, profunctors map to Set rather than {0,1}.<br /><br />A good example is the type constructor <tt>(->)</tt><br /><br /><pre>> instance Profunctor (->) where<br />> lmap f g = g . f<br />> rmap f g = f . g<br /><br /></pre>It's common that the first argument of a profunctor describes how an element related to a type is sucked in, and the second describes what is spit out. <tt>a -> b</tt> sucks in an <tt>a</tt> and spits out a <tt>b</tt>.<br /><br />Given a function f we can turn it into a relation by saying that xfy iff y=f(x). Similarly we can turn a functor into a profunctor. Given a functor F:C→D we can define a profunctor F<sup>*</sup>:C↛D by<br /><br /><pre>> data UpStar f d c = UpStar (d -> f c)<br />> instance Functor f => Profunctor (UpStar f) where<br />> lmap k (UpStar f) = UpStar (f . k)<br />> rmap k (UpStar f) = UpStar (fmap k . f)<br /><br /></pre>You may be able to see how the second argument to a profunctor sort of plays a similar role to the return value of a functor, just as the second argument to a relation sometimes plays a rule similar to the return value of a function.<br /><br />There also an opoosing way to make a profunctor from a functor just as there is with functions and relations:<br /><br /><pre>> data DownStar f d c = DownStar (f d -> c)<br />> instance Functor f => Profunctor (DownStar f) where<br />> lmap k (DownStar f) = DownStar (f . fmap k)<br />> rmap k (DownStar f) = DownStar (k . f)<br /><br /></pre>Note that the identity functor gives us something isomorphic to <tt>(->)</tt> whether you use <tt>UpStar</tt> or <tt>DownStar</tt>.<br /><br /><b>Dinatural transformations</b><br />Just as we have natural transformations between functors, we have dinatural transformations between profunctors. My <a href="http://blog.sigfpe.com/2009/03/dinatural-transformations-and-coends.html">previous definition</a> of dinatural was specialised to a particular case - dinaturals between a profunctor and the constant profunctor.<br /><br />Firstly, let's think about natural transformations. If F and G are functors, and h is a natural transformation h:F⇒G, then we have that<br /><pre>h . fmap f = fmap f . h<br /></pre>If we think of F and G as containers, then this rule says that a natural transformation relates the structures of the containers, not the contents. So using f to replace the elements with other elements should be invisible to h and hence commute with it.<br /><br />Something similar happens with dinatural transformations. But this time, instead of relating the argument to a natural transformation to its return result, it instead relates the two arguments to a profunctor.<br /><br />Given two profunctors, F and G, A dinatural transformation is a polymorphic function of type:<br /><br /><pre>> type Dinatural f g = forall a. f a a -> g a a<br /><br /></pre>but we also want something analogous to the case of natural transformations. We want to express the fact that if <tt>phi :: Dinatural F G</tt>, then <tt>phi</tt> doesn't see the elements of <tt>F a a</tt> or <tt>G a a</tt>. Here's a way to achieve this. Suppose we have a dinatural transformation:<br /><pre>phi :: Dinatural G F<br /></pre>and a function <tt>f :: X -> X'</tt> then we can use <tt>lmap</tt> to apply <tt>f</tt> on the left or right of <tt>F</tt> and <tt>G</tt>. The definition of dinaturals demands that:<br /><pre>rmap f . phi . lmap f = lmap f . phi . rmap f<br /></pre>ie. that we can apply <tt>f</tt> on the left before applying <tt>phi</tt>, and then do <tt>f</tt> on the right, or vice versa, and still get the same result.<br /><br />I'm not sure but I think that we don't need to check this condition and that just like the case of naturals it just comes as a <a href="http://ttic.uchicago.edu/~dreyer/course/papers/wadler.pdf">free theorem</a>.<br /><br /><b>Composing profunctors</b><br />It's easy to see how to compose functors. A functor is a polymorphic function from one type to another. It's not straightforward to compose profunctors. It's tempting to say that a profunctor maps a pair of types to a type so they can be composed like functions. But the original definition says the definition is D<sup>op</sup>×C→Set. So as a function it doesn't map back to the category but to Set. For Haskell we replace Set with Hask, the category of Haskell functions and types. So we have Hask<sup>op</sup>×Hask→Hask. It's easy invent a scheme to compose these because Hask appears 3 times. But it'd be wrong to exploit this in a general definition applying to many categories because in the proper definition of profunctor we can't assume that a profunctor maps back to the spaces you started with.<br /><br />We can try composing profunctors by analogy with composing relations. Suppose R and S are relations. If T=S○R is the composition of R and S then xTz if and only if there exists a y such that xRy and ySz. If our relations are on finite sets then we can define T(x,z) = Σ<sub>y</sub>R(x,y)S(y,z) where we work in the semiring on {0,1} with 0+0=0, 0+1=1+0=1+1=1 but with the usual product.<br /><br />There is an analogue of "there exists" in Haskell - the existential type. Remembering that we write Haskell existential types using <tt>forall</tt> we can define:<br /><br /><pre>> data Compose f g d c = forall a.Compose (f d a) (g a c)<br /><br /></pre>As mentioned above, functors give rise to profunctors. It'd be good if composition of functors were compatible with composition of profunctors. So consider<br /><pre>Compose (UpStar F) (UpStar G)<br /></pre>for some <tt>F</tt> and <tt>G</tt>. This is essentially the same as<br /><pre>exists a. (d -> F a, a -> G c)<br /></pre>What can we discover about an element of such a type? It consists of a pair of functions <tt>(f, g)</tt>, but we can't ever extract the individual functions because the type of <tt>a</tt> has been erased. To get anything meaningful out of <tt>g</tt> we need to apply it to an <tt>a</tt>, but we don't have one immediately to hand, after all, we can't even know what <tt>a</tt> is. But we do have an <tt>F a</tt> if we can make a <tt>d</tt>. So we can use <tt>fmap</tt> to apply <tt>g</tt> to the result of <tt>a</tt>. So we can construct <tt>fmap g . f :: d -> F (G c)</tt>. There is no other information we can obtain. So the composition is isomorphic to <tt>UpStar</tt> of the functorial composition of <tt>F</tt> and <tt>G</tt>. Again, we can probably make this a rigorous proof by making use of free theorems, but I haven't figured that out yet.<br /><br />But there's a catch: I said I wanted a definition that applies to more categories than just Hask. Well we can replace <tt>exists a</tt> with the <a href="http://blog.sigfpe.com/2009/03/dinatural-transformations-and-coends.html">coend</a> operator. We also implicitly used the product operation in the constructor <tt>Compose</tt> so this definition will work in categories with suitable products. <a href="http://ncatlab.org/nlab/show/symmetric+monoidal+category">Symmetric monodial categories</a> in fact.<br /><br />Under composition of profunctors, <tt>(->)</tt> is the identity. At least up to isomorphism. This composition of profunctors is also associative up to isomorphism. Unfortunately the "up to isomorphism" means that we can't make a category out of profunctors in the obvious way. But we can make a <a href="http://ncatlab.org/nlab/show/bicategory">bicategory</a> - essentially a category where we have to explicitly track the isomorphisms between things that are equal in ordinary categories.<br /><br /><b>Profunctors as tensors</b><br />Given a profunctor <tt>F</tt> we can write <tt>F i j</tt> suggestively as F<sub>i</sub><sup>j</sup>. Let's write the composition of <tt>F</tt> and <tt>G</tt> as ∃k. F<sub>i</sub><sup>k</sup> G<sub>k</sub><sup>j</sup>. We can use the <a href="http://en.wikipedia.org/wiki/Einstein_notation">Einstein summation convention</a> to automatically 'contract' on pairs of upper and lower indices and write the composition as F<sub>i</sub><sup>k</sup> G<sub>k</sub><sup>j</sup>. The analogy is even more intriguing when we remember that in tensor notation, the upper indices are covariant indices and the lower ones are contravariant indices. In the case of profunctors, the two arguments act like the arguments to covariant and contravariant functors respectively. Note alse that because <tt>(->)</tt> is essentially the identity, we have →<sub>i</sub><sup>j</sup>F<sub>j</sub><sup>k</sup>=F<sub>i</sub><sup>k</sup>. So <tt>(->)</tt> acts like the <a href="http://en.wikipedia.org/wiki/Kronecker_delta">Kronecker delta</a>. You can read more about this at <a href="http://mathoverflow.net/questions/59892/co-ends-as-a-trace-operation-on-profunctors">mathoverflow</a> where it is hinted that this analogy is not yet well understood. Note that we're naturally led to the trace of a profunctor: <tt>exists a. F a a</tt>.<br /><br /><b>Arrows as profunctors</b><br />The last thing I want to mention is that <a href="http://www.haskell.org/arrows/">Hughes' Arrows</a> are profunctors. There is an intuition that fits. If <tt>A</tt> is an Arrow, we often think of <tt>A d c</tt> as consuming something related to type <tt>d</tt> and emitting something related to type <tt>c</tt>. The same goes for profunctors. The full paper explaining this is Asada and Hasuo's <a href="http://www-mmm.is.s.u-tokyo.ac.jp/~ichiro/papers/fromComptoComp.pdf"> Categorifying Computations into Components via Arrows as Profunctors</a> with the profunctorial definition of Arrows given as Definition 3.2 (though that definition also appears in some earlier papers.)Dan Piponihttps://plus.google.com/107913314994758123748noreply@blogger.com20