<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-11295132</id><updated>2012-01-26T06:46:53.316-08:00</updated><category term='monad'/><category term='mathematics'/><category term='physics'/><category term='optimisation'/><category term='astronomy'/><category term='self-reference'/><category term='probability'/><category term='comonads'/><category term='haskell'/><category term='types'/><category term='programming'/><category term='quantum'/><title type='text'>A Neighborhood of Infinity</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default?start-index=101&amp;max-results=100'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>273</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-11295132.post-7973803066138768941</id><published>2012-01-21T13:38:00.000-08:00</published><updated>2012-01-21T16:34:54.159-08:00</updated><title type='text'>Some parallels between classical and quantum mechanics</title><content type='html'>&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;This isn't really a blog post. More of something I wanted to interject in a discussion on Google plus but wouldn't fit in the text box.&lt;br /&gt;&lt;br /&gt;I've always had trouble with the way the &lt;a href="http://en.wikipedia.org/wiki/Legendre_transformation"&gt;Legendre transform&lt;/a&gt; is introduced in classical mechanics. I know I'm not the only one. Many mathematicians and physicists have recognised that it seems to be plucked out of a hat like a rabbit and have even written papers to address this issue. But however much an author attempts to make it seem natural, it still looks like a rabbit to me.&lt;br /&gt;&lt;br /&gt;So I have to ask myself, what would make me feel comfortable with the Legendre transform?&lt;br /&gt;&lt;br /&gt;The Legendre transform is an analogue of the Fourier transform that uses a different semiring to the usual. I wrote &lt;a href="http://blog.sigfpe.com/2005/10/quantum-mechanics-and-fourier-legendre.html"&gt;briefly&lt;/a&gt; about this many years ago. So if we could write classical mechanics in a form that is analogous to another problem where I'd use a Fourier transform, I'd be happier. This is my attempt to do that.&lt;br /&gt;&lt;br /&gt;When I wrote about &lt;a href="http://blog.sigfpe.com/2011/06/another-elementary-way-to-approach.html"&gt;Fourier transforms&lt;/a&gt; a little while back the intention was to immediately follow it with an analogous article about Legendre transforms. Unfortunately that's been postponed so I'm going to just assume you know that Legendre transforms can be used to compute &lt;a href="http://en.wikipedia.org/wiki/Legendre_transformation#Infimal_convolution"&gt;inf-convolutions&lt;/a&gt;. I'll state clearly what that means below, but I won't show any detail on the analogy with Fourier transforms.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Free classical particles&lt;/b&gt;&lt;br /&gt;Let's work in one dimension with a particle of mass &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=m" style="vertical-align:middle"&gt; whose position at time &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t" style="vertical-align:middle"&gt; is &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=x%28t%29" style="vertical-align:middle"&gt;. The kinetic energy of this particle is given by &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=T%3d%5cfrac%7b1%7d%7b2%7dm%5cdot%7bx%7d%5e2" style="vertical-align:middle"&gt;. Its Lagrangian is therefore &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=L%3d%5cfrac%7b1%7d%7b2%7dm%5cdot%7bx%7d%5e2-V%28x%29" style="vertical-align:middle"&gt;.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://en.wikipedia.org/wiki/Action_%28physics%29"&gt;action&lt;/a&gt; of our particle for the time from &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f0" style="vertical-align:middle"&gt; to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f1" style="vertical-align:middle"&gt; is therefore&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%5cint%5f%7bt%5f0%7d%5e%7bt%5f1%7d%28%5cfrac%7b1%7d%7b2%7dm%5cdot%7bx%7d%5e2-V%28x%29%29dt%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;The particle motion is that which &lt;a href="http://en.wikipedia.org/wiki/Principle_of_least_action"&gt;minimises&lt;/a&gt; the action.&lt;br /&gt;&lt;br /&gt;Suppose the position of the particle at time &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f0" style="vertical-align:middle"&gt; is &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=x%5f0" style="vertical-align:middle"&gt; and the position at time &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f1" style="vertical-align:middle"&gt; is &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=x%5f1" style="vertical-align:middle"&gt;. Then write &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cpsi%28t%5f0%2ct%5f1%2cx%5f0%2cx%5f1%29" style="vertical-align:middle"&gt; for the action minimising path from &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=x%5f0" style="vertical-align:middle"&gt; to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=x%5f1" style="vertical-align:middle"&gt;. So&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%5cpsi%28t%5f0%2cx%5f0%3bt%5f1%2cx%5f1%29+%3d+%5cmin%5fx%5cint%5f%7bt%5f0%7d%5e%7bt%5f1%7d%28%5cfrac%7b1%7d%7b2%7dm%5cdot%7bx%7d%5e2-V%28x%29%29dt%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;where we're minimising over all paths &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=x" style="vertical-align:middle"&gt; such that &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=x%28t%5fi%29%3dx%5fi" style="vertical-align:middle"&gt;.&lt;br /&gt;&lt;br /&gt;Now suppose our system evolves from time &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f0" style="vertical-align:middle"&gt; to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f2" style="vertical-align:middle"&gt;. We can consider this to be two stages, one from &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f0" style="vertical-align:middle"&gt; to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f1" style="vertical-align:middle"&gt; followed by one from &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f1" style="vertical-align:middle"&gt; to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f2" style="vertical-align:middle"&gt;. Let &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cphi" style="vertical-align:middle"&gt; be the minimised action analogous to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cpsi" style="vertical-align:middle"&gt; for the period &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f1" style="vertical-align:middle"&gt; to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f2" style="vertical-align:middle"&gt;. The action from &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f0" style="vertical-align:middle"&gt; to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f2" style="vertical-align:middle"&gt; is the sum of the actions for the two subperiods. So the minimum total action for the period &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f0" style="vertical-align:middle"&gt; to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f2" style="vertical-align:middle"&gt; is given by&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%5cpsi%28t%5f0%2cx%5f0%3bt%5f2%2cx%5f2%29+%3d+%5cmin%5f%7bx%5f1%7d%28%5cpsi%28t%5f0%2cx%5f0%3bt%5f1%2cx%5f1%29%2b%5cpsi%28t%5f1%2cx%5f1%3bt%5f2%2cx%5f2%29%29%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Let me simply that a little. I'll use &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cpsi%28t%2cx%29" style="vertical-align:middle"&gt; where I previously used &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cpsi%28t%5f0%2cx%5f0%3bt%2cx%29" style="vertical-align:middle"&gt; and &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cphi%28x%5f1%2cx%5f2%29" style="vertical-align:middle"&gt; for &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cpsi%28t%5f1%2cx%5f1%3bt%5f2%2cx%5f2%29" style="vertical-align:middle"&gt;. So that last equation becomes:&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%5cpsi%28t%5f2%2cx%5f2%29%3d%5cmin%5f%7bx%5f1%7d%28%5cpsi%28t%5f1%2cx%5f1%29%2b%5cphi%28x%5f2-x%5f1%29%29%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Now suppose &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cphi" style="vertical-align:middle"&gt; is translation-independent in the sense that &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cphi%28x%2bs%2cx%27%2bs%29%3d%5cphi%28x%2cx%27%29" style="vertical-align:middle"&gt;. So we can write &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cphi%28x%5f1%2cx%5f2%29%3d%5cphi%28x%5f2-x%5f1%29" style="vertical-align:middle"&gt;. Then the minimum total action is given by&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%5cpsi%28t%5f2%2cx%5f2%29%3d%5cmin%5f%7bx%5f1%7d%28%5cpsi%28t%5f1%2cx%5f1%29%2b%5cphi%28x%5f2-x%5f1%29%29%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Infimal convolution is defined by&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%28f%5codot+g%29%28x%29+%3d+%5cmin%5fs%28f%28s%29%2bg%28x-s%29%29%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;so the minimum we seek is&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%5cpsi%28t%5f2%2cx%5f2%29+%3d+%28%5cpsi%28t%5f1%2c%5ccdot%29%5codot%5cphi%29%29%28x%5f2%29%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;So now it's natural to use the Legendre transform. We have the inf-convolution theorem:&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%28f%5codot+g%29%5e%5cast%3df%5e%5cast%2bg%5e%5cast" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;where &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=f%5e%5cast" style="vertical-align:middle"&gt; is the Legendre transform of &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=f" style="vertical-align:middle"&gt; given by&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7bf%5e%5cast%28p%29+%3d+%5csup%5fx%28px-f%28x%29%29%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;and so &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cpsi%5e%5cast%28t%5f2%2cp%29+%3d+%5cpsi%5e%5cast%28t%5f1%2cp%29%2b%5cphi%5e%5cast%28p%29" style="vertical-align:middle"&gt; (where we use &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cast" style="vertical-align:middle"&gt; to represent Legendre transform with respect to the spatial variable).&lt;br /&gt;&lt;br /&gt;Let's consider the case where from &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f1" style="vertical-align:middle"&gt; onwards the particle motion is free, so &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=V%3d0" style="vertical-align:middle"&gt;. In this case we clearly have translation-invariance and so the time evolution is given by repeated inf-convolution with &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cphi" style="vertical-align:middle"&gt; and in the "Legendre domain" this is nothing other than repeated addition of &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cphi%5e%5cast" style="vertical-align:middle"&gt;.&lt;br /&gt;&lt;br /&gt;Let's take a look at &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cphi" style="vertical-align:middle"&gt;. We know that if a particle travels freely from &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=x%5f1" style="vertical-align:middle"&gt; to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=x%5f2" style="vertical-align:middle"&gt; over the period from &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f1" style="vertical-align:middle"&gt; to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f2" style="vertical-align:middle"&gt; then it must have followed the minimum action path and we know, from basic mechanics, this is the path with constant velocity. So&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7bT+%3d+%5cfrac%7b1%7d%7b2%7dm%28x%5f2-x%5f1%29%5e2%2f%28t%5f2-t%5f1%29%5e2%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;and hence the action is given by&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%5cphi%28s%29+%3d+%5cfrac%7b1%7d%7b2%7dms%5e2%2f%28t%5f2-t%5f1%29%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;So the time evolution of &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cpsi" style="vertical-align:middle"&gt; is given by repeated inf-convolution with a quadratic function. The time evolution of &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cpsi%5e%5cast" style="vertical-align:middle"&gt; is therefore given by repeated addition of the Legendre transform of a quadratic function. It's not hard to prove that the Legendre transform of a quadratic function is also quadratic. In fact:&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%5cphi%5e%5cast%28p%29+%3d+%5cfrac%7b1%7d%7b8%7dmp%5e2%28t%5f2-t%5f1%29%5e2%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;Addition is easier to work with than inf-convolution so if we wish to understand the time evolution of the action function it's natural to work with this Legendre transformed function.&lt;br /&gt;&lt;br /&gt;So that's it for classical mechanics in this post. I've tried to look at the evolution of a classical system in a way that makes the Legendre transform natural.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Free quantum particles&lt;/b&gt;&lt;br /&gt;Now I want to take a look at the evolution of a free quantum particle to show how similar it is to what I wrote above. In this case we have the Schr&amp;ouml;dinger equation&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7bi%5chbar%5cfrac%7b%5cpartial%7d%7b%5cpartial+t%7d%5cpsi%28t%2cx%29+%3d+-%5cfrac%7b%5chbar%5e2%7d%7b2m%7d%5cfrac%7b%5cpartial%5e2%7d%7b%5cpartial+x%5e2%7d%5cpsi%28x%2ct%29%2bV%5cpsi%28t%2cx%29%7d" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;Let's suppose that from time &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=t%5f1" style="vertical-align:middle"&gt; onwards the particle is free so &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=V%3d0" style="vertical-align:middle"&gt;. Then we have&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7bi%5chbar%5cfrac%7b%5cpartial%7d%7b%5cpartial+t%7d%5cpsi%28t%2cx%29+%3d+-%5cfrac%7b%5chbar%5e2%7d%7b2m%7d%5cfrac%7b%5cpartial%5e2%7d%7b%5cpartial+x%5e2%7d%5cpsi%28t%2cx%29" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;Now let's take the Fourier transform in the spatial variable. We get:&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7bi%5chbar%5cfrac%7b%5cpartial%7d%7b%5cpartial+t%7d%5chat%7b%5cpsi%7d%28t%2ck%29+%3d+-%5cfrac%7b%5chbar%5e2%7d%7b2m%7d%28ik%29%5e2%5chat%7b%5cpsi%7d%28t%2ck%29" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;So&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%5chat%7b%5cpsi%7d%28t%2ck%29+%3d+%5cexp%28%5cfrac%7bi%5chbar+k%5e2%28t-t%5f1%29%7d%7b2m%7d%29%5chat%7b%5cpsi%7d%28t%5f1%2ck%29" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;We can write this as&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cdisplaystyle%7b%5chat%7b%5cpsi%7d%28t%2ck%29+%3d+%5chat%7b%5cphi%7d%28k%29%5chat%7b%5cpsi%7d%28t%5f1%2ck%29" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;where&lt;br /&gt;&lt;blockquote&gt;&lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5cphi%28x%29+%3d+%5cexp%28%5cfrac%7bix%5e2%7d%7b2%5chbar+m%7d%29" style="vertical-align:middle"&gt;&lt;br /&gt;&lt;/blockquote&gt;So the time evolution of the free quantum particle is given by repeated convolution with a Gaussian function which in the Fourier domain is repeated multiplication by a Gaussian. The classical section above is nothing but a &lt;a href="http://ncatlab.org/nlab/show/tropical+semiring"&gt;tropical&lt;/a&gt; version of this section.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;br /&gt;I doubt I've said anything original here. Classical mechanics is well known to be the limit of quantum mechanics as &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%5chbar%5crightarrow+0" style="vertical-align:middle"&gt; and it's well known that in this limit we find that occurrences of the semiring &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%28%5cmathbb%7bR%7d%2c%2b%2c%5ctimes%29" style="vertical-align:middle"&gt; are replaced by the semiring &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=%28%5cmathbb%7bR%7d%2c%5cmin%2c%2b%29" style="vertical-align:middle"&gt;. But I've never seen an article that attempts to describe classical mechanics in terms of repeated inf-convolution even though this is close to Hamilton's formulation and I've never seen an article that shows the parallel with the Schr&amp;ouml;dinger equation in this way. I'm hoping someone will now be able to say to me "I've seen that before" and post a relevant link below.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Note&lt;/b&gt;&lt;br /&gt;I'm not sure how the above applies for a non-trivial potential &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=V" style="vertical-align:middle"&gt;. I wrote this little Schr&amp;ouml;dinger &lt;a href="http://homepage.mac.com/sigfpe/Harmonic/anharmonic.html"&gt;equation solver&lt;/a&gt; a while back. As might be expected, it's inconvenient to use the Fourier domain to deal with the part of the evolution due to &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=V" style="vertical-align:middle"&gt;. In order to simulate a time step of &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=dt" style="vertical-align:middle"&gt; the code simulates &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=dt%2f2" style="vertical-align:middle"&gt; in the Fourier domain assuming the particle is free and then spends &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=dt%2f2" style="vertical-align:middle"&gt; solving for the &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=V" style="vertical-align:middle"&gt;-dependent part in the spatial domain. So even in the presence of non-trivial &lt;img src="https://chart.googleapis.com/chart?cht=tx&amp;chl=V" style="vertical-align:middle"&gt; it can still be useful to work with a Fourier transform. Almost the same iteration could be used to numerically compute the action for the classical case.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7973803066138768941?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/7973803066138768941/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=7973803066138768941' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7973803066138768941'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7973803066138768941'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2012/01/some-parallels-between-classical-and.html' title='Some parallels between classical and quantum mechanics'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-2543189541743588733</id><published>2012-01-07T09:40:00.000-08:00</published><updated>2012-01-07T10:02:25.642-08:00</updated><title type='text'>Lossless decompression and the generation of random samples</title><content type='html'>Let S be some finite set with a probability distribution on it. Here's a diagram showing some example probabilities for the set S={A, B, C, D, E}:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-JxfDOQAMAas/Twh1-a1qfYI/AAAAAAAAA58/xm22ZkMqWd8/s1600/pdf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-JxfDOQAMAas/Twh1-a1qfYI/AAAAAAAAA58/xm22ZkMqWd8/s1600/pdf.png" /&gt;&lt;/a&gt;&lt;/div&gt;How can we generate lots of random samples from this distribution? A popular method involves first storing the cumulative distribution function in a table like so:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-QCCqlK9BR1E/Twh2u2f8y2I/AAAAAAAAA6E/j5sYagwcj1s/s1600/cdf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-QCCqlK9BR1E/Twh2u2f8y2I/AAAAAAAAA6E/j5sYagwcj1s/s1600/cdf.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;We then use any of a number of popular methods to generate uniform pseudorandom numbers in the range [0,1) and for each one walk through the table until we find the first entry greater than our pseudorandom number. The symbol above the number is the one we generate. So if we generated 0.512, we'd pick symbol C. It's straightforward to prove this gives the correct probability distribution.&lt;br /&gt;&lt;br /&gt;As described this algorithm can be slow. If the size of the table is N we may have to walk through up to N entries to generate each sample.&lt;br /&gt;&lt;br /&gt;One approach to accelerating this algorithm is to quantise our pseudorandom number and use it to look up, in a precomputed table, a jump into our cumulative distribution table. I've used this several times in my visual effects career. But today I'm going to take a different approach.&lt;br /&gt;&lt;br /&gt;Another natural way to speed up sample generation is to use a binary search to find the appropriate point in the cumulative distribution, for example the C++ standard template library's &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;upper_bound&lt;/span&gt;&lt;span style="font-family: inherit;"&gt;&amp;nbsp;method will do the job.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;But what is a binary search algorithm going to do? Typically it's going to start by comparing our pseudorandom number with the midpoint of the table. If our number is bigger then it will recursively use binary search on the left (looking next at the midpoint of the left half), otherwise on the right, and so on. If we're generating many samples from the same distribution we're going to be repeatedly looking up the same midpoints in the table. At the end of the day, the process can be described by a decision tree. So we may as well throw away the table and build the decision tree up front. Here's what it might look look like:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-5BoKnwdGeY0/Twh7_PlNU-I/AAAAAAAAA6Q/kesZroqEOXw/s1600/tree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="231" src="http://1.bp.blogspot.com/-5BoKnwdGeY0/Twh7_PlNU-I/AAAAAAAAA6Q/kesZroqEOXw/s320/tree.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span style="font-family: inherit;"&gt;But maybe we can do better. For example C is three times as likely as A but they both take the same amount of time to generate as they both require walking down to depth 2. Meanwhile D has a probability of 0.25 and requires walking to depth 3. We'd like to rebalance the tree. Note also that there's no reason to list our original PDF in the order I gave. Different orderings might give different trees.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;It's straightforward to describe what we want to optimise. We want to place sample i at depth d&lt;sub&gt;i&lt;/sub&gt; so that the expected value of&amp;nbsp;&lt;/span&gt;d&lt;sub&gt;i&lt;/sub&gt;&lt;span style="font-family: inherit;"&gt;&amp;nbsp;is as small as possible. In other words we want to minimise&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: inherit;"&gt;Σ&lt;/span&gt;&lt;span style="font-family: inherit;"&gt;p&lt;/span&gt;&lt;sub&gt;i&lt;/sub&gt;d&lt;sub&gt;i&lt;/sub&gt;&lt;span style="font-family: inherit;"&gt;. But this is precisely the problem solved by &lt;a href="http://en.wikipedia.org/wiki/Huffman_coding"&gt;Huffman coding&lt;/a&gt;.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;And that's the conclusion I wanted to reach: Huffman coding gives the optimal decision tree to generate random samples. It also tells us this interesting consequence: if we use a decision tree method then the performance of our algorithm is bounded by the entropy of the &lt;a href="http://en.wikipedia.org/wiki/Entropy_(information_theory)"&gt;probability distribution&lt;/a&gt;. I find this connection between entropy and algorithmic complexity pretty surprising.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;I learnt the above during my interview at Google!&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;Why is there this connection between a compression algorithm and the generation of random samples? It &amp;nbsp;took me a little longer to realise why but it's quite straightforward.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Huffman coding tries to compress text one letter at a time on the assumption that each letter comes from some fixed and known probability distribution. If the algorithm is successful then we'd expect the compressed text to look like a uniformly distributed sequence of bits. If it didn't then there'd be patterns that could be used for further compression.&lt;br /&gt;&lt;br /&gt;So when we're decompressing Huffman encoded data we have a machine that takes as input uniformly distributed bits and which outputs letters sampled from some probability distribution. But that's exactly the same problem that I posed above. So, at least from some perspective, decompression is precisely the same thing as generating samples from a probability distribution.&lt;br /&gt;&lt;br /&gt;Or to put it another way: there is a class of algorithm whose purpose is to convert samples from one probability distribution into samples from another. When we use one of these algorithms to convert from samples from some distribution P that's easy to generate samples from, into samples from Q, we call this "an algorithm to sample from distribution Q". When we use one of these algorithms to convert from some distribution to a uniform distribution, and the function given by the algorithm is invertible, then we call it "lossless compression".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-2543189541743588733?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/2543189541743588733/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=2543189541743588733' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2543189541743588733'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2543189541743588733'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2012/01/lossless-decompression-and-generation.html' title='Lossless decompression and the generation of random samples'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-JxfDOQAMAas/Twh1-a1qfYI/AAAAAAAAA58/xm22ZkMqWd8/s72-c/pdf.png' height='72' width='72'/><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-3402675764841726250</id><published>2011-10-30T13:19:00.000-07:00</published><updated>2011-10-30T13:19:44.199-07:00</updated><title type='text'>Quick and dirty reinversion of control</title><content type='html'>It's taken for granted by many people that Haskell and static types are incompatible with prototyping and quick-and-dirty hacks.&lt;br /&gt;&lt;br /&gt;I wanted to put together some OpenGL code that had a script for how a bunch of graphics should be displayed. It was essentially an imperative specfification for my program. For quick and dirty hacks, GLUT is tolerable. But even when programming in C/C++, it's not supportive of programming in a straightforward imperative style because it uses inversion of control. Many graphics applications are written in a state machine style where the state machine gets to tick once each time there is an event callback. This really doesn't fit the imperative script style.&lt;br /&gt;&lt;br /&gt;But it is possible to reinvert inversion of control in any language that supports continuations. And that includes languages like Python that support linear continuations in the form of generators. But I'm using Haskell here.&lt;br /&gt;&lt;br /&gt;Continuations reify the remainder of a computation. Or in more down to earth language: they allow you to grab the stuff you're about to do as a function, put it on ice for a while, and then carry on doing it later. So imagine we had a block of imperative code and that we'd like, at each GLUT callback, to make some progress through this block. We can use continuations like this: each time we want to yield control back to the main loop we simply grab the remainder or our 'script' as a continuation and make it the callback to be executed next time GLUT is ready.&lt;br /&gt;&lt;br /&gt;The slight wrinkle is that OpenGL/GLUT calls use IO. To combine &lt;tt&gt;IO&lt;/tt&gt; and continuations we need the &lt;tt&gt;ContT&lt;/tt&gt; monad transformer.&lt;br /&gt;&lt;br /&gt;I'll do everything except the &lt;tt&gt;yield&lt;/tt&gt; function first and get back to that at the end.&lt;br /&gt;&lt;br /&gt;Some standard library imports:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; import Graphics.UI.GLUT&lt;br /&gt;&amp;gt; import Control.Monad.Cont&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Some simple code to draw a line from left to right:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; display :: GLdouble -&amp;gt; IO ()&lt;br /&gt;&amp;gt; display y = do&lt;br /&gt;&amp;gt;   clear [ColorBuffer]&lt;br /&gt;&amp;gt;  &lt;br /&gt;&amp;gt;   renderPrimitive LineStrip $ do&lt;br /&gt;&amp;gt;     vertex (Vertex2 (-1) (-y))&lt;br /&gt;&amp;gt;     vertex (Vertex2 1 y)&lt;br /&gt;&lt;br /&gt;&amp;gt;   swapBuffers&lt;br /&gt;&amp;gt;   postRedisplay Nothing&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Some standard OpenGL/GLUT setup:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; main = do&lt;br /&gt;&amp;gt;   (progname, _) &amp;lt;- getArgsAndInitialize&lt;br /&gt;&amp;gt;   initialWindowSize  $= Size 500 500&lt;br /&gt;&amp;gt;   initialDisplayMode $= [DoubleBuffered, RGBMode]&lt;br /&gt;&amp;gt;   createWindow "Bounce!"&lt;br /&gt;&lt;br /&gt;&amp;gt;   matrixMode $= Modelview 0&lt;br /&gt;&amp;gt;   loadIdentity&lt;br /&gt;&lt;br /&gt;&amp;gt;   matrixMode $= Projection&lt;br /&gt;&amp;gt;   loadIdentity&lt;br /&gt;&amp;gt;   ortho (-1) 1 (-1) 1 (-1) 1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Our script is called before the main loop.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;   imperative&lt;br /&gt;&amp;gt;   mainLoop&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And now comes the actual script. Apart from the &lt;tt&gt;liftIO&lt;/tt&gt; calls this should be almost as easy to read as BASIC programming from the days of yore:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; imperative = flip runContT return $ do&lt;br /&gt;&lt;br /&gt;&amp;gt;   liftIO $ print "Start!"&lt;br /&gt;&lt;br /&gt;&amp;gt;   forever $ do&lt;br /&gt;&lt;br /&gt;&amp;gt;     forM_ [-1, -0.992 .. 1.0] $ \y -&amp;gt; do&lt;br /&gt;&amp;gt;       render $ display y&lt;br /&gt;&amp;gt;       yield&lt;br /&gt;&lt;br /&gt;&amp;gt;     liftIO $ print "Bounce!"&lt;br /&gt;&lt;br /&gt;&amp;gt;     forM_ [-1, -0.992 .. 1.0] $ \y -&amp;gt; do&lt;br /&gt;&amp;gt;       render $ display (-y)&lt;br /&gt;&amp;gt;       yield&lt;br /&gt;&lt;br /&gt;&amp;gt;     liftIO $ print "Bounce!"&lt;br /&gt;&amp;gt;     yield&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The first thing to note is that &lt;tt&gt;render&lt;/tt&gt; doesn't actually do any rendering. At the end of the day we can't tell GLUT when to render, it only calls you. So instead &lt;tt&gt;render&lt;/tt&gt; tells GLUT what to do next time it's in the mood for a bit of rendering:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; render f = liftIO $ displayCallback $= f&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;That leaves one thing to explain: &lt;tt&gt;yield&lt;/tt&gt;. It needs to grab the remainder of the script and package it up in a form suitable for installation as an idle callback. But there's a catch: continuations are notorious for making your head explode. If you're throwing together a quick and dirty hack, that's the last thing you need. Here's where static types come to the rescue. As &lt;a href="https://plus.google.com/115274377971493973150/posts/jMqzaxPEnrd"&gt;Conor McBride&lt;/a&gt; points out, we want to just do the easy thing and follow gravity downwards.&lt;br /&gt;&lt;br /&gt;So first we try to guess the type of &lt;tt&gt;yield&lt;/tt&gt;. We know we're working with the &lt;tt&gt;ContT IO&lt;/tt&gt; monad. So its type is going to be &lt;tt&gt;ContT IO a&lt;/tt&gt; for some &lt;tt&gt;a&lt;/tt&gt;. There's no particular type of data we want to get out of &lt;tt&gt;yield&lt;/tt&gt;, it's just a thing we want executed. So we can guess the type is &lt;tt&gt;ContT IO ()&lt;/tt&gt;, the type &lt;tt&gt;()&lt;/tt&gt; being the generic filler type when we don't actually have any data.&lt;br /&gt;&lt;br /&gt;Let's look at the definition of &lt;tt&gt;ContT&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;newtype ContT r m a = Cont {&lt;br /&gt;    runContT :: (a -&amp;gt; m r) -&amp;gt; m r&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The type &lt;tt&gt;r&lt;/tt&gt; is the final return type from our continuation. We're not interested in a return value, we just want to *do* stuff. So we expect &lt;tt&gt;r&lt;/tt&gt; to be &lt;tt&gt;()&lt;/tt&gt; as well.&lt;br /&gt;&lt;br /&gt;So &lt;tt&gt;yield&lt;/tt&gt; must essentially be of type &lt;tt&gt;(() -&amp;gt; IO ()) -&amp;gt; IO ()&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;So we want to concoct something of this type using GLUT's &lt;tt&gt;idleCallback&lt;/tt&gt; function. As &lt;tt&gt;yield&lt;/tt&gt; must take a function as argument it must look something like:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;yield = ContT $ \f -&amp;gt; ...&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We know that &lt;tt&gt;f&lt;/tt&gt; is of type &lt;tt&gt;() -&amp;gt; IO ()&lt;/tt&gt;. So there's only one thing we can do with it: apply it to &lt;tt&gt;()&lt;/tt&gt;. That gives us something of type &lt;tt&gt;IO ()&lt;/tt&gt;. That's precisely the type of GLUT's &lt;tt&gt;idleCallback&lt;/tt&gt;. So we put it all together:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; yield = ContT $ \f -&amp;gt; idleCallback $= Just (f () )&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The code now works. I didn't have to spend even a moment thinking about the meaning of a continuation. Implementing &lt;tt&gt;yield&lt;/tt&gt; was about as hard as putting together a jigsaw puzzle with three pieces. There's only so many ways you can put the pieces together.&lt;br /&gt;&lt;br /&gt;And that's a simple example of why I often like to write quick-and-dirty code with a statically typed language.&lt;br /&gt;&lt;br /&gt;(Oh, and I'm not trying to take part in a war. I like to prototype in many different languages, some of which are dynamically typed.)&lt;br /&gt;&lt;br /&gt;PS Note also that the above code illustrates one way to avoid &lt;tt&gt;IORef&lt;/tt&gt;s in GLUT code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-3402675764841726250?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/3402675764841726250/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=3402675764841726250' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3402675764841726250'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3402675764841726250'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/10/quick-and-dirty-reinversion-of-control.html' title='Quick and dirty reinversion of control'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-742612324236798788</id><published>2011-08-13T08:12:00.000-07:00</published><updated>2011-08-13T13:46:43.205-07:00</updated><title type='text'>Computing errors with square roots of infinitesimals.</title><content type='html'>&lt;b&gt;Abstract&lt;/b&gt;&lt;br /&gt;Automatic differentiation (AD) gives a way to carry out &lt;a href="http://en.wikipedia.org/wiki/Propagation_of_uncertainty"&gt;uncertainty propagation&lt;/a&gt;. But used in the obvious way it leads to &lt;a href="http://en.wikipedia.org/wiki/Propagation_of_uncertainty#Caveats_and_warnings"&gt;bias&lt;/a&gt;. This article introduces "square roots of infinitesimals" that can be used to give more accurate results.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;In the real world measurements have errors and we often want to know how much our final answers are affected by those errors. One tool for measuring the sensitivity to errors of our results is calculus. In fact, we can use automatic differentiation to give us a nice way to model error. Here's an implementation:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; import Control.Monad&lt;br /&gt;&amp;gt; import Control.Monad.State&lt;br /&gt;&amp;gt; import Control.Applicative&lt;br /&gt;&amp;gt; import qualified Data.IntMap as I&lt;br /&gt;&lt;br /&gt;&amp;gt; infixl 6 .+.&lt;br /&gt;&amp;gt; infixl 7 .*&lt;br /&gt;&amp;gt; infixl 7 *.&lt;br /&gt;&lt;br /&gt;&amp;gt; data D a = D a a deriving (Eq, Show)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Num a =&amp;gt; Num (D a) where&lt;br /&gt;&amp;gt;   D x a+D x' a' = D (x + x') (a + a')&lt;br /&gt;&amp;gt;   D x a*D x' a' = D (x*x') (a*x' + x*a')&lt;br /&gt;&amp;gt;   negate (D x a) = D (negate x) (negate a)&lt;br /&gt;&amp;gt;   fromInteger n = D (fromInteger n) 0&lt;br /&gt;&amp;gt;   abs _ = error "No abs"&lt;br /&gt;&amp;gt;   signum _ = error "No signum"&lt;br /&gt;&lt;br /&gt;&amp;gt; d = D 0 1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;As I've talked about &lt;a href="http://blog.sigfpe.com/2006/09/practical-synthetic-differential.html"&gt;before&lt;/a&gt;, the value &lt;tt&gt;d&lt;/tt&gt; can be thought of as an infinitesimal number whose square is zero. However, to first order we can replace &lt;tt&gt;d&lt;/tt&gt; with a small number and use it to compute errors. Here's a function to perform such a substitution:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; approx :: Num a =&amp;gt; D a -&amp;gt; a -&amp;gt; a&lt;br /&gt;&amp;gt; approx (D x d) e = x+d*e&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Suppose we have a square whose side we've measured as 1m to an accuracy of 1cm. We can represent this as:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; sq_side = D 1 0.01&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can now compute the area:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; sq_area = sq_side^2&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We get &lt;tt&gt;D 1.0 2.0e-2&lt;/tt&gt;. We can interpret this as meaning the area is 1m&lt;sup&gt;2&lt;/sup&gt; with an accuracy of 0.02m&lt;sup&gt;2&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;We can make "to an accuracy of" more precise. Differentiation models a function locally as an approximate linear function. (1m+&amp;delta;)&lt;sup&gt;2&lt;/sup&gt;=1m&lt;sup&gt;2&lt;/sup&gt;+2m&amp;delta;+O(&amp;delta;&lt;sup&gt;2&lt;/sup&gt;). So at 1m, the function to compute area locally scales lengths by 2m. So if the length measurement is actually a sample from a distribution with given mean 1m and standard deviation 0.01m, the area is approximately a sample from a distribution with mean 1m&lt;sup&gt;2&lt;/sup&gt; and SD 0.02m&lt;sup&gt;2&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;This approach is nice, but sometimes we want a little more accuracy in our estimates. In particular, if you square a sample from a normal distribution with small variance and positive mean, then the nonlinearity of the squaring operation means that samples that are larger than the mean move further away from the mean, when squared, than samples less than the mean. So we should actually expect our area computations to be slightly &lt;a href="http://en.wikipedia.org/wiki/Propagation_of_uncertainty#Caveats_and_warnings"&gt;biased&lt;/a&gt; upwards from 1m&lt;sup&gt;2&lt;/sup&gt;. Unfortunately, this is a second order effect that isn't visible from looking only at first derivatives.&lt;br /&gt;&lt;br /&gt;That's not a problem, we can easily compute second derivatives using automatic differentiation. However, that can complicate things. What happens if we use multiple measurements to compute a quantity? Each one is a different sample from a different distribution and we don't want these measurements to be correlated. If we approach this in the obvious way, when we want to use n measurements we'll need to compute n&lt;sup&gt;2&lt;/sup&gt; partial second derivatives. However, by tweaking AD slightly we'll only need n derivatives.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Square roots of infinitesimals&lt;/b&gt;&lt;br /&gt;In addition to the usual infinitesimal d we want to introduce quantities, w_i, that represent independent random "noise" variables that are infinitesimal in size. We'll be interested in expectation values so we'll also need an expectation function, e. We want e(w_i)=0. But w&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;2&lt;/sup&gt; is always positive so its expectation is always greater than or equal to zero. We want our random variables to be infinitesimal so we pick e(w&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;2&lt;/sup&gt;)=d. We also want e(w&lt;sub&gt;i&lt;/sub&gt;w&lt;sub&gt;j&lt;/sub&gt;)=0 because of independence. If the w&lt;sub&gt;i&lt;/sub&gt; are already infinitesimal, the dw&lt;sub&gt;i&lt;/sub&gt; should be zero. So let's define an algebraic structure that captures these relationships. So we extend &lt;tt&gt;D a&lt;/tt&gt; by introducing &lt;tt&gt;w i&lt;/tt&gt; so that:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/--86TgXTLQtc/TkZ37tLf4TI/AAAAAAAAAwE/5DYb80Kunuw/s1600/algebra.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="186" src="http://1.bp.blogspot.com/--86TgXTLQtc/TkZ37tLf4TI/AAAAAAAAAwE/5DYb80Kunuw/s320/algebra.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Any element of this algebra can be written as x+ad+&amp;Sigma;b&lt;sub&gt;i&lt;/sub&gt;z&lt;sub&gt;i&lt;/sub&gt;. We represent b sparsely by using an &lt;tt&gt;IntMap&lt;/tt&gt;. Here's an implementation:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data S a = S a a (I.IntMap a) deriving (Eq, Show)&lt;br /&gt;&lt;br /&gt;&amp;gt; (.+.) :: Num a =&amp;gt; I.IntMap a -&amp;gt; I.IntMap a -&amp;gt; I.IntMap a&lt;br /&gt;&amp;gt; ( *.) :: Num a =&amp;gt; a -&amp;gt; I.IntMap a -&amp;gt; I.IntMap a&lt;br /&gt;&amp;gt; (.*)  :: Num a =&amp;gt; I.IntMap a -&amp;gt; a -&amp;gt; I.IntMap a&lt;br /&gt;&amp;gt; (.*.) :: Num a =&amp;gt; I.IntMap a -&amp;gt; I.IntMap a -&amp;gt; a&lt;br /&gt;&lt;br /&gt;&amp;gt; (.+.) = I.unionWith (+)&lt;br /&gt;&amp;gt; a *. v = I.map (a *) v&lt;br /&gt;&amp;gt; v .* b = I.map (* b) v&lt;br /&gt;&amp;gt; a .*. b = I.fold (+) 0 $ I.intersectionWith (*) a b&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Num a =&amp;gt; Num (S a) where&lt;br /&gt;&amp;gt;   S x a b+S x' a' b' = S (x + x') (a + a') (b .+. b')&lt;br /&gt;&amp;gt;   S x a b*S x' a' b' = S (x*x') (a*x' + x*a' + b.*.b') (x*.b' .+. b.*x')&lt;br /&gt;&amp;gt;   negate (S x a b) = S (negate x) (negate a) (I.map negate b)&lt;br /&gt;&amp;gt;   fromInteger n = S (fromInteger n) 0 I.empty&lt;br /&gt;&amp;gt;   abs _ = error "No abs"&lt;br /&gt;&amp;gt;   signum _ = error "No signum"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Here are the individual &lt;tt&gt;w i&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; w :: Num a =&amp;gt; Int -&amp;gt; S a&lt;br /&gt;&amp;gt; w i = S 0 0 (I.fromList [(i, 1)])&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We compute expectation values linearly by mapping the &lt;tt&gt;w i&lt;/tt&gt; to zero:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; e :: Num a =&amp;gt; S a -&amp;gt; D a&lt;br /&gt;&amp;gt; e (S x a _) = D x a&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can also represent numbers whose values we know precisely:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; sure x = S x 0 I.empty&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;Example&lt;/b&gt;&lt;br /&gt;Let's revisit the area example. This time we can represent the length of the side of our square as&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; sq_side' = 1+0.01*w 0&lt;br /&gt;&amp;gt; sq_area' = sq_side'^2&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We get &lt;tt&gt;S 1.0 1.0e-4 (fromList [(0,2.0e-2)])&lt;/tt&gt;. We can directly read off that we have a bias of 10&lt;sup&gt;-4&lt;/sup&gt;m&lt;sup&gt;2&lt;/sup&gt; which is 1cm^2. We can encapsulate this as:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; mean f = approx (e f) 1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can directly read off the variance from the element of the algebra. However, we can also compute the variance using &lt;tt&gt;mean&lt;/tt&gt;. It's just:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; var f = mean ((f-sure (mean f))^2)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;(Note that this gives a very slightly different result from the value you can read off directly from the &lt;tt&gt;S&lt;/tt&gt; object. It depends on whether we're measuring the deviation around the unbiased or biased mean. To the order we're considering here the difference is small. Here's &lt;tt&gt;var'&lt;/tt&gt; anyway:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; var' (S _ _ v) = I.fold (+) 0 $ I.map (\x -&amp;gt; x^2) v&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;)&lt;br /&gt;&lt;br /&gt;We can also define covariance:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; cov f g = mean ((f-sure (mean f))*(g-sure (mean g)))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;More functions&lt;/b&gt;&lt;br /&gt;We can now follow through just like with automatic differentiation to compute lots more functions. We use the fact that:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-RM7_BU5grdU/TkZ4Kxjz0oI/AAAAAAAAAwI/WIuld_bAnvE/s1600/taylor.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="86" src="http://4.bp.blogspot.com/-RM7_BU5grdU/TkZ4Kxjz0oI/AAAAAAAAAwI/WIuld_bAnvE/s400/taylor.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Fractional a =&amp;gt; Fractional (S a) where&lt;br /&gt;&amp;gt;   fromRational x = S (fromRational x) 0 I.empty&lt;br /&gt;&amp;gt;   recip (S x a b) = let r = recip x&lt;br /&gt;&amp;gt;                     in S r (-a*r*r+r*r*r*(b.*.b)) ((-r*r)*.b)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Floating a =&amp;gt; Floating (S a) where&lt;br /&gt;&amp;gt;   pi = sure pi&lt;br /&gt;&amp;gt;   sin (S x a b) = let s = sin x&lt;br /&gt;&amp;gt;                       c = cos x&lt;br /&gt;&amp;gt;                   in S s (a*c - s/2*(b.*.b)) (c*.b)&lt;br /&gt;&amp;gt;   cos (S x a b) = let s = sin x&lt;br /&gt;&amp;gt;                       c = cos x&lt;br /&gt;&amp;gt;                   in S c (-a*s - c/2*(b.*.b)) ((-s)*.b)&lt;br /&gt;&amp;gt;   exp (S x a b) = let e = exp x&lt;br /&gt;&amp;gt;                   in S e (a*e + e/2*(b.*.b)) (e*.b)&lt;br /&gt;&amp;gt;   sqrt (S x a b) = let s = sqrt x&lt;br /&gt;&amp;gt;                   in S s (a/(2*s)-1/(4*s*s*s)*(b.*.b)) (1/(2*s)*.b)&lt;br /&gt;&amp;gt;   log (S x a b) = let r = 1/x&lt;br /&gt;&amp;gt;                   in S (log x) (r*a-r*r/2*(b.*.b)) (r*.b)&lt;br /&gt;&amp;gt;   asin = undefined&lt;br /&gt;&amp;gt;   acos = undefined&lt;br /&gt;&amp;gt;   atan = undefined&lt;br /&gt;&amp;gt;   sinh = undefined&lt;br /&gt;&amp;gt;   cosh = undefined&lt;br /&gt;&amp;gt;   tanh = undefined&lt;br /&gt;&amp;gt;   asinh = undefined&lt;br /&gt;&amp;gt;   acosh = undefined&lt;br /&gt;&amp;gt;   atanh = undefined&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;A real example&lt;/b&gt;&lt;br /&gt;Let's make this effort worthwhile. We'll compute errors for a computation that uses the errors in a messy nonlinear way. Suppose we're in the lab measuring radioactive decay. We measure the geiger counter reading at times t = 0hr, 1hr, 2hr, 3hr, 4hr at which point we compute an estimate for when the decay will drop to one tenth of its original value. We'll assume the decay fits a model counts/sec = a exp(-&amp;lambda;t) and that the counts have an error with SD 0.05. We're going to compute the error in the estimated time to hit one tenth radioactivity in the case when the half life is 30 minutes and a=2:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; t = [0..4]&lt;br /&gt;&amp;gt; counts = map (\i-&amp;gt; 2*exp(-0.5*fromIntegral i)+0.05*w i) t&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/--xCHmeCt40M/TkZ7T_WTVxI/AAAAAAAAAwY/CdU21UOAOjA/s1600/conc.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="290" src="http://2.bp.blogspot.com/--xCHmeCt40M/TkZ7T_WTVxI/AAAAAAAAAwY/CdU21UOAOjA/s400/conc.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;We'll be fitting a curve using logarithmic regression so we'll need the following function. Given a pair of lists &lt;tt&gt;x&lt;/tt&gt; and &lt;tt&gt;y&lt;/tt&gt; it returns &lt;tt&gt;(m, c)&lt;/tt&gt; where y=mx+c is the standard least squares fit.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; regress :: Fractional a =&amp;gt; [a] -&amp;gt; [a] -&amp;gt; (a, a)&lt;br /&gt;&amp;gt; regress x y =&lt;br /&gt;&amp;gt;     let sx = sum x&lt;br /&gt;&amp;gt;         sy = sum y&lt;br /&gt;&amp;gt;         sxx = sum $ map (^2) x&lt;br /&gt;&amp;gt;         sxy = sum $ zipWith (*) x y&lt;br /&gt;&amp;gt;         n = fromIntegral (length x)&lt;br /&gt;&amp;gt;         s = 1/(sx*sx-n*sxx)&lt;br /&gt;&amp;gt;     in (s*sx*sy-s*n*sxy, -s*sxx*sy+s*sx*sxy)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Logarithmic regression:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; (m, c) = regress (map fromIntegral t) (map log counts)&lt;br /&gt;&amp;gt; lambda = -m&lt;br /&gt;&amp;gt; a = exp c&lt;br /&gt;&amp;gt; t_tenth = -log (0.1/a)/lambda&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can now go ahead and compute the mean and variance of our estimate:&lt;br /&gt;&lt;pre&gt;*Main&gt; mean t_tenth &lt;br /&gt;5.98036172868899&lt;br /&gt;*Main&gt; var t_tenth&lt;br /&gt;0.15583537298560224&lt;br /&gt;&lt;/pre&gt;The correct time is about 5.991 so the regression method above is biased by about 0.01. If we repeated the same experiment over and over again and averaged the estimates we got from logarithmic regression the process would not converge to the correct result. In fact, we can compute "ground truth" by simulating the experiment a million times in Octave and estimate the mean and variance from that. The code is in the appendix. Obviously this is a much slower process but it clearly demonstrates the biasedness of using regression this way.&lt;br /&gt;&lt;pre&gt;GNU Octave, version 3.4.0&lt;br /&gt;Copyright (C) 2011 John W. Eaton and others.&lt;br /&gt;ans =  5.9798&lt;br /&gt;ans =  0.15948&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;b&gt;Final thoughts&lt;/b&gt;&lt;br /&gt;This is yet another example of extending automatic differentiation. We have variants for single variable differentiation, multivariate differentiation, multiple differentiation, &lt;a href="http://blog.sigfpe.com/2010/07/automatic-divided-differences.html"&gt;divided differences&lt;/a&gt;, splitting a function into &lt;a href="http://blog.sigfpe.com/2010/09/automatic-evenodd-splitting.html"&gt;odd and even parts&lt;/a&gt; and now automatic error propagation.&lt;br /&gt;&lt;br /&gt;This stuff was very loosely inspired by reading &lt;a href="http://jhupbooks.press.jhu.edu/ecom/MasterServlet/GetItemDetailsHandler?iN=9780801868672&amp;qty=1&amp;viewMode=3"&gt;An Introduction to Stochastic Processes in Physics&lt;/a&gt;. I'm attempting to capture the semi-formal rules used in that book to reason about differentials and you can think of the algebra above as representing stochastic differentials. I made a guess that the algebra is called the &lt;a href="http://en.wikipedia.org/wiki/It%C5%8D_calculus"&gt;Itō&lt;/a&gt; algebra. Sure enough, you'll get a few &lt;a href="http://www.google.com/search?q=%22ito+algebra%22"&gt;hits&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The most similar published work I can find is &lt;a href="https://uhra.herts.ac.uk/dspace/bitstream/2299/3600/3/902183.pdf"&gt;Automatic Propagation of Uncertainties&lt;/a&gt; but it seems to just use ordinary AD.&lt;br /&gt;&lt;br /&gt;This technique may be useful for &lt;a href="http://en.wikipedia.org/wiki/Extended_Kalman_filter"&gt;Extended Kalman Filtering&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I haven't done the work to make precise statements about how accurate you can expect my estimates of expectations to be. &lt;br /&gt;&lt;br /&gt;It's possible to implement a monad with syntax similar to other probability monads by using state to bump up the &lt;tt&gt;i&lt;/tt&gt; in &lt;tt&gt;w i&lt;/tt&gt; each time you generate a new random variable. But bear in mind, these are always intended to be used as *infinitesimal* random variables.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Appendix: Octave code&lt;/b&gt;&lt;br /&gt;&lt;pre&gt;m = 5;&lt;br /&gt;n = 1000000;&lt;br /&gt;&lt;br /&gt;x = repmat([0:m-1]',1,n);&lt;br /&gt;y = repmat([2*exp(-0.5*[0:m-1]')],1,n)+0.05*normrnd(0,1,m,n);&lt;br /&gt;&lt;br /&gt;sx = sum(x);&lt;br /&gt;sxx = sum(x.*x);&lt;br /&gt;p = sum(log(y));&lt;br /&gt;q = sum(x.*log(y));&lt;br /&gt;&lt;br /&gt;s = 1./(sx.*sx-m*sxx);&lt;br /&gt;m = s.*sx.*p-m*s.*q; # Redefined&lt;br /&gt;c = -s.*sxx.*p+s.*sx.*q;&lt;br /&gt;&lt;br /&gt;lambda = -m;&lt;br /&gt;a = exp(c);&lt;br /&gt;x_tenth = -log(0.1./a)./lambda;&lt;br /&gt;&lt;br /&gt;mean(x_tenth)&lt;br /&gt;var(x_tenth)&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-742612324236798788?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/742612324236798788/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=742612324236798788' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/742612324236798788'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/742612324236798788'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/08/computing-errors-with-square-roots-of.html' title='Computing errors with square roots of infinitesimals.'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/--86TgXTLQtc/TkZ37tLf4TI/AAAAAAAAAwE/5DYb80Kunuw/s72-c/algebra.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-865972317776443412</id><published>2011-07-23T14:39:00.000-07:00</published><updated>2011-07-23T16:28:04.960-07:00</updated><title type='text'>Profunctors in Haskell</title><content type='html'>&lt;pre&gt;&amp;gt; {-# LANGUAGE TypeSynonymInstances, RankNTypes, ExistentialQuantification #-}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;When I wrote about &lt;a href="http://blog.sigfpe.com/2009/03/dinatural-transformations-and-coends.html"&gt;coends&lt;/a&gt; a while back I made up a term 'difunctor'. More recently it was pointed out to me that the correct word for this concept is 'profunctor', but unfortunately my knowledge came from &lt;a href="http://books.google.com/books/about/Categories_for_the_working_mathematician.html?id=eBvhyc4z8HQC"&gt;MacLane&lt;/a&gt; which mentions that word nowhere.&lt;br /&gt;&lt;br /&gt;Profunctors are ubiquitous in Haskell programming. Probably the most natural definition of Hughes Arrows is via profunctors. Profunctors also play a role a little like tensors leading to a use of the terms 'covariant' and 'contravariant' that looks remarkably like the way those terms are used in tensor calculus.&lt;br /&gt;&lt;br /&gt;For categories C and D, A profunctor is a functor D&lt;sup&gt;op&lt;/sup&gt;&amp;times;C&amp;rarr;Set and is written C&amp;#x219b;D. (I hope that arrow between C and D is in your font. It's missing on iOS.)&lt;br /&gt;&lt;br /&gt;I'll reuse my Haskell approximation to that definition:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; class Profunctor h where&lt;br /&gt;&amp;gt;   lmap :: (d' -&amp;gt; d) -&amp;gt; h d c -&amp;gt; h d' c&lt;br /&gt;&amp;gt;   rmap :: (c -&amp;gt; c') -&amp;gt; h d c -&amp;gt; h d c'&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We need cofunctoriality for the first argument and functoriality for the second:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;lmap (f . g) == lmap g . lmap f&lt;br /&gt;rmap (f . g) == rmap f . rmap g&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(Strictly we probably ought to call these 'endoprofunctors' as we're only really dealing with the category of Haskell types and functions.)&lt;br /&gt;&lt;br /&gt;There are lots of analogies for thinking about profunctors. For example, some people think of them as generalising functors in the same way that relations generalise functions. More specifically, given a function f:A&amp;rarr;B, f associates to each element of A, a single element of B. But if we want f to associate elements of A with elements of B more freely, for example 'mapping' elements of A to multiple elements of B then we instead use a relation which can be written as a function f:A&amp;times;B&amp;rarr;{0,1} where we say xfy iff f(x,y)=1. In this case, profunctors map to Set rather than {0,1}.&lt;br /&gt;&lt;br /&gt;A good example is the type constructor &lt;tt&gt;(-&amp;gt;)&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Profunctor (-&amp;gt;) where&lt;br /&gt;&amp;gt;   lmap f g = g . f&lt;br /&gt;&amp;gt;   rmap f g = f . g&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;It's common that the first argument of a profunctor describes how an element related to a type is sucked in, and the second describes what is spit out. &lt;tt&gt;a -&amp;gt; b&lt;/tt&gt; sucks in an &lt;tt&gt;a&lt;/tt&gt; and spits out a &lt;tt&gt;b&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Given a function f we can turn it into a relation by saying that xfy iff y=f(x). Similarly we can turn a functor into a profunctor. Given a functor F:C&amp;rarr;D we can define a profunctor F&lt;sup&gt;*&lt;/sup&gt;:C&amp;#x219b;D by&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data UpStar f d c = UpStar (d -&amp;gt; f c)&lt;br /&gt;&amp;gt; instance Functor f =&amp;gt; Profunctor (UpStar f) where&lt;br /&gt;&amp;gt;   lmap k (UpStar f) = UpStar (f . k)&lt;br /&gt;&amp;gt;   rmap k (UpStar f) = UpStar (fmap k . f)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;You may be able to see how the second argument to a profunctor sort of plays a similar role to the return value of a functor, just as the second argument to a relation sometimes plays a rule similar to the return value of a function.&lt;br /&gt;&lt;br /&gt;There also an opoosing way to make a profunctor from a functor just as there is with functions and relations:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data DownStar f d c = DownStar (f d -&amp;gt; c)&lt;br /&gt;&amp;gt; instance Functor f =&amp;gt; Profunctor (DownStar f) where&lt;br /&gt;&amp;gt;   lmap k (DownStar f) = DownStar (f . fmap k)&lt;br /&gt;&amp;gt;   rmap k (DownStar f) = DownStar (k . f)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Note that the identity functor gives us something isomorphic to &lt;tt&gt;(-&amp;gt;)&lt;/tt&gt; whether you use &lt;tt&gt;UpStar&lt;/tt&gt; or &lt;tt&gt;DownStar&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Dinatural transformations&lt;/b&gt;&lt;br /&gt;Just as we have natural transformations between functors, we have dinatural transformations between profunctors. My &lt;a href="http://blog.sigfpe.com/2009/03/dinatural-transformations-and-coends.html"&gt;previous definition&lt;/a&gt; of dinatural was specialised to a particular case - dinaturals between a profunctor and the constant profunctor.&lt;br /&gt;&lt;br /&gt;Firstly, let's think about natural transformations. If F and G are functors, and h is a natural transformation h:F&amp;rArr;G, then we have that&lt;br /&gt;&lt;pre&gt;h . fmap f = fmap f . h&lt;br /&gt;&lt;/pre&gt;If we think of F and G as containers, then this rule says that a natural transformation relates the structures of the containers, not the contents. So using f to replace the elements with other elements should be invisible to h and hence commute with it.&lt;br /&gt;&lt;br /&gt;Something similar happens with dinatural transformations. But this time, instead of relating the argument to a natural transformation to its return result, it instead relates the two arguments to a profunctor.&lt;br /&gt;&lt;br /&gt;Given two profunctors, F and G, A dinatural transformation is a polymorphic function of type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type Dinatural f g = forall a. f a a -&amp;gt; g a a&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;but we also want something analogous to the case of natural transformations. We want to express the fact that if &lt;tt&gt;phi :: Dinatural F G&lt;/tt&gt;, then &lt;tt&gt;phi&lt;/tt&gt; doesn't see the elements of &lt;tt&gt;F a a&lt;/tt&gt; or &lt;tt&gt;G a a&lt;/tt&gt;. Here's a way to achieve this. Suppose we have a dinatural transformation:&lt;br /&gt;&lt;pre&gt;phi :: Dinatural G F&lt;br /&gt;&lt;/pre&gt;and a function &lt;tt&gt;f :: X -&amp;gt; X'&lt;/tt&gt; then we can use &lt;tt&gt;lmap&lt;/tt&gt; to apply &lt;tt&gt;f&lt;/tt&gt; on the left or right of &lt;tt&gt;F&lt;/tt&gt; and &lt;tt&gt;G&lt;/tt&gt;. The definition of dinaturals demands that:&lt;br /&gt;&lt;pre&gt;rmap f . phi . lmap f = lmap f . phi . rmap f&lt;br /&gt;&lt;/pre&gt;ie. that we can apply &lt;tt&gt;f&lt;/tt&gt; on the left before applying &lt;tt&gt;phi&lt;/tt&gt;, and then do &lt;tt&gt;f&lt;/tt&gt; on the right, or vice versa, and still get the same result.&lt;br /&gt;&lt;br /&gt;I'm not sure but I think that we don't need to check this condition and that just like the case of naturals it just comes as a &lt;a href="http://ttic.uchicago.edu/~dreyer/course/papers/wadler.pdf"&gt;free theorem&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Composing profunctors&lt;/b&gt;&lt;br /&gt;It's easy to see how to compose functors. A functor is a polymorphic function from one type to another. It's not straightforward to compose profunctors. It's tempting to say that a profunctor maps a pair of types to a type so they can be composed like functions. But the original definition says the definition is D&lt;sup&gt;op&lt;/sup&gt;&amp;times;C&amp;rarr;Set. So as a function it doesn't map back to the category but to Set. For Haskell we replace Set with Hask, the category of Haskell functions and types. So we have Hask&lt;sup&gt;op&lt;/sup&gt;&amp;times;Hask&amp;rarr;Hask. It's easy invent a scheme to compose these because Hask appears 3 times. But it'd be wrong to exploit this in a general definition applying to many categories because in the proper definition of profunctor we can't assume that a profunctor maps back to the spaces you started with.&lt;br /&gt;&lt;br /&gt;We can try composing profunctors by analogy with composing relations. Suppose R and S are relations. If T=S&amp;#x25cb;R is the composition of R and S then xTz if and only if there exists a y such that xRy and ySz. If our relations are on finite sets then we can define T(x,z) = &amp;Sigma;&lt;sub&gt;y&lt;/sub&gt;R(x,y)S(y,z) where we work in the semiring on {0,1} with 0+0=0, 0+1=1+0=1+1=1 but with the usual product.&lt;br /&gt;&lt;br /&gt;There is an analogue of "there exists" in Haskell - the existential type. Remembering that we write Haskell existential types using &lt;tt&gt;forall&lt;/tt&gt; we can define:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Compose f g d c = forall a.Compose (f d a) (g a c)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;As mentioned above, functors give rise to profunctors. It'd be good if composition of functors were compatible with composition of profunctors. So consider&lt;br /&gt;&lt;pre&gt;Compose (UpStar F) (UpStar G)&lt;br /&gt;&lt;/pre&gt;for some &lt;tt&gt;F&lt;/tt&gt; and &lt;tt&gt;G&lt;/tt&gt;. This is essentially the same as&lt;br /&gt;&lt;pre&gt;exists a. (d -&gt; F a, a -&gt; G c)&lt;br /&gt;&lt;/pre&gt;What can we discover about an element of such a type? It consists of a pair of functions &lt;tt&gt;(f, g)&lt;/tt&gt;, but we can't ever extract the individual functions because the type of &lt;tt&gt;a&lt;/tt&gt; has been erased. To get anything meaningful out of &lt;tt&gt;g&lt;/tt&gt; we need to apply it to an &lt;tt&gt;a&lt;/tt&gt;, but we don't have one immediately to hand, after all, we can't even know what &lt;tt&gt;a&lt;/tt&gt; is. But we do have an &lt;tt&gt;F a&lt;/tt&gt; if we can make a &lt;tt&gt;d&lt;/tt&gt;. So we can use &lt;tt&gt;fmap&lt;/tt&gt; to apply &lt;tt&gt;g&lt;/tt&gt; to the result of &lt;tt&gt;a&lt;/tt&gt;. So we can construct &lt;tt&gt;fmap g . f :: d -&amp;gt; F (G c)&lt;/tt&gt;. There is no other information we can obtain. So the composition is isomorphic to &lt;tt&gt;UpStar&lt;/tt&gt; of the functorial composition of &lt;tt&gt;F&lt;/tt&gt; and &lt;tt&gt;G&lt;/tt&gt;. Again, we can probably make this a rigorous proof by making use of free theorems, but I haven't figured that out yet.&lt;br /&gt;&lt;br /&gt;But there's a catch: I said I wanted a definition that applies to more categories than just Hask. Well we can replace &lt;tt&gt;exists a&lt;/tt&gt; with the &lt;a href="http://blog.sigfpe.com/2009/03/dinatural-transformations-and-coends.html"&gt;coend&lt;/a&gt; operator. We also implicitly used the product operation in the constructor &lt;tt&gt;Compose&lt;/tt&gt; so this definition will work in categories with suitable products. &lt;a href="http://ncatlab.org/nlab/show/symmetric+monoidal+category"&gt;Symmetric monodial categories&lt;/a&gt; in fact.&lt;br /&gt;&lt;br /&gt;Under composition of profunctors, &lt;tt&gt;(-&amp;gt;)&lt;/tt&gt; is the identity. At least up to isomorphism. This composition of profunctors is also associative up to isomorphism. Unfortunately the "up to isomorphism" means that we can't make a category out of profunctors in the obvious way. But we can make a &lt;a href="http://ncatlab.org/nlab/show/bicategory"&gt;bicategory&lt;/a&gt; - essentially a category where we have to explicitly track the isomorphisms between things that are equal in ordinary categories.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Profunctors as tensors&lt;/b&gt;&lt;br /&gt;Given a profunctor &lt;tt&gt;F&lt;/tt&gt; we can write &lt;tt&gt;F i j&lt;/tt&gt; suggestively as F&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;j&lt;/sup&gt;. Let's write the composition of &lt;tt&gt;F&lt;/tt&gt; and &lt;tt&gt;G&lt;/tt&gt;  as &amp;exist;k. F&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;k&lt;/sup&gt; G&lt;sub&gt;k&lt;/sub&gt;&lt;sup&gt;j&lt;/sup&gt;. We can use the &lt;a href="http://en.wikipedia.org/wiki/Einstein_notation"&gt;Einstein summation convention&lt;/a&gt; to automatically 'contract' on pairs of upper and lower indices and write the composition as F&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;k&lt;/sup&gt; G&lt;sub&gt;k&lt;/sub&gt;&lt;sup&gt;j&lt;/sup&gt;. The analogy is even more intriguing when we remember that in tensor notation, the upper indices are covariant indices and the lower ones are contravariant indices. In the case of profunctors, the two arguments act like the arguments to covariant and contravariant functors respectively. Note alse that because &lt;tt&gt;(-&amp;gt;)&lt;/tt&gt; is essentially the identity, we have &amp;rarr;&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;j&lt;/sup&gt;F&lt;sub&gt;j&lt;/sub&gt;&lt;sup&gt;k&lt;/sup&gt;=F&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;k&lt;/sup&gt;. So &lt;tt&gt;(-&amp;gt;)&lt;/tt&gt; acts like the &lt;a href="http://en.wikipedia.org/wiki/Kronecker_delta"&gt;Kronecker delta&lt;/a&gt;. You can read more about this at &lt;a href="http://mathoverflow.net/questions/59892/co-ends-as-a-trace-operation-on-profunctors"&gt;mathoverflow&lt;/a&gt; where it is hinted that this analogy is not yet well understood. Note that we're naturally led to the trace of a profunctor: &lt;tt&gt;exists a. F a a&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Arrows as profunctors&lt;/b&gt;&lt;br /&gt;The last thing I want to mention is that &lt;a href="http://www.haskell.org/arrows/"&gt;Hughes' Arrows&lt;/a&gt; are profunctors. There is an intuition that fits. If &lt;tt&gt;A&lt;/tt&gt; is an Arrow, we often think of &lt;tt&gt;A d c&lt;/tt&gt; as consuming something related to type &lt;tt&gt;d&lt;/tt&gt; and emitting something related to type &lt;tt&gt;c&lt;/tt&gt;. The same goes for profunctors. The full paper explaining this is Asada and Hasuo's &lt;a href="http://www-mmm.is.s.u-tokyo.ac.jp/~ichiro/papers/fromComptoComp.pdf"&gt; Categorifying Computations into Components via Arrows as Profunctors&lt;/a&gt; with the profunctorial definition of Arrows given as Definition 3.2 (though that definition also appears in some earlier papers.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-865972317776443412?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/865972317776443412/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=865972317776443412' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/865972317776443412'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/865972317776443412'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/07/profunctors-in-haskell.html' title='Profunctors in Haskell'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>18</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-3213030729419630796</id><published>2011-07-16T16:16:00.000-07:00</published><updated>2011-07-16T18:08:48.949-07:00</updated><title type='text'>The Infinitude of the Primes</title><content type='html'>&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; import Data.List hiding (intersect)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;A &lt;a href="http://www.cs.cmu.edu/~treuille/"&gt;colleague&lt;/a&gt; at work reminded me of &lt;a href="http://en.wikipedia.org/wiki/F%C3%BCrstenberg's_proof_of_the_infinitude_of_primes"&gt;Fürstenberg's topological proof of the infinitude of primes&lt;/a&gt;. But as I &lt;a href="http://mathoverflow.net/questions/19152/why-is-a-topology-made-up-of-open-sets/19156#19156"&gt;tried to argue&lt;/a&gt; a while back, topology is really a kind of logic. So Fürstenberg's proof should unpack into a sort of logical proof. In fact, I'm going to unpack it into what I'll call the &lt;a href="http://en.wikipedia.org/wiki/Programming_language_theory"&gt;PLT&lt;/a&gt; proof of the infinitude of the primes. I apologise in advance that I'm just going to present the unpacked proof, and not how I got there from Fürstenberg's.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A small formal language&lt;/b&gt;&lt;br /&gt;We're going to start with a little language. Propositions of this language are of type &lt;tt&gt;Prop&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Prop = Modulo Integer Integer&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The intention is that &lt;tt&gt;Modulo k n&lt;/tt&gt; is the property of an integer being equal to k modulo n. More precisely, it represents the property of being writable in the form sn+k for some s. (We disallow n=0.) But I also want to allow people to combine properties using "and" and "or". So we extend the language with:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;           | Or [Prop] | And [Prop] deriving Show&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The intention now is that &lt;tt&gt;And ...&lt;/tt&gt; holds when all of the properties in the list hold and similarly for &lt;tt&gt;Or ...&lt;/tt&gt;. We can write an interpreter to test whether integers have the specified property:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; eval (Modulo k n) x = (x-k) `mod` n == 0&lt;br /&gt;&amp;gt; eval (Or ps)      x = or $ map (\p -&amp;gt; eval p x) ps&lt;br /&gt;&amp;gt; eval (And ps)     x = and $ map (\p -&amp;gt; eval p x) ps&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;(Note we limit ourselves to *finite* compositions of &lt;tt&gt;And&lt;/tt&gt; and &lt;tt&gt;Or&lt;/tt&gt;, otherwise &lt;tt&gt;eval&lt;/tt&gt; wouldn't actually define a property due to non-termination.&lt;br /&gt;&lt;br /&gt;There are lots of things we can say in our language. For example we can give the 'extreme' properties that are never true or always true:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; never = Or []&lt;br /&gt;&amp;gt; always = And []&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can say that one number is divisible by another:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; divisibleBy k = Modulo 0 k&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can test it with expressions like:&lt;br /&gt;&lt;pre&gt;*Main&gt; eval (divisibleBy 3) 9&lt;br /&gt;True&lt;br /&gt;*Main&gt; eval (divisibleBy 5) 11&lt;br /&gt;False&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can also express non-divisibility. We say that n isn't divisble by k by saying that n is either 1, 2, ..., or k-1 modulo k:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; notDivisibleBy n =&lt;br /&gt;&amp;gt;     let n' = abs n&lt;br /&gt;&amp;gt;     in Or (map (\i -&amp;gt; Modulo i n') [1..(n'-1)])&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;(Disallowing n=0.)&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;*Main&gt; eval (notDivisibleBy 3) 9&lt;br /&gt;False&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;b&gt;Eliminating And&lt;/b&gt;&lt;br /&gt;It's not obvious at first sight, but there is a big redundancy in our language. There is no need for &lt;tt&gt;And&lt;/tt&gt;. Consider &lt;tt&gt;And [Modulo k1 n1, Modulo k2 n2]&lt;/tt&gt;. This asserts, for the number x, that x = s*n1+k1 and x = t*n2+k2. The &lt;a href="http://en.wikipedia.org/wiki/Chinese_remainder_theorem"&gt;Chinese remainder theorem&lt;/a&gt; tells us that either these have no solution, or that this pair of propositions is equivalent to one of the form x = k3 mod n3 for some k3 and n3. So every time we &lt;tt&gt;And&lt;/tt&gt; a pair of propositions we can eliminate the &lt;tt&gt;And&lt;/tt&gt; by using the theorem. Solving for k3 and n3 is straightforward. I use the &lt;a href="http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm"&gt;extended Euclidean algorithm&lt;/a&gt; and the proof of the Chinese remainder theorem given at &lt;a href="http://www.cut-the-knot.org/blue/chinese.shtml"&gt;Cut the Knot&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; egcd n1 n2 | n2 == 0 = (1, 0, n1)&lt;br /&gt;&amp;gt;            | n1 == 0 = (0, 1, n2)&lt;br /&gt;&amp;gt;            | otherwise = (y, x-y*(n1 `quot` n2), g)&lt;br /&gt;&amp;gt;            where (x, y, g) = egcd n2 (n1 `mod` n2)&lt;br /&gt;&lt;br /&gt;&amp;gt; intersect (Modulo k1 n1) (Modulo k2 n2) =&lt;br /&gt;&amp;gt;     let (s, _, g) = egcd n1 n2&lt;br /&gt;&amp;gt;         (q, r) = (k2-k1) `quotRem` g&lt;br /&gt;&amp;gt;     in if r == 0&lt;br /&gt;&amp;gt;         then Modulo (q*s*n1+k1) (n1*n2 `quot` g)&lt;br /&gt;&amp;gt;         else never&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;So now we can repeatedly use &lt;tt&gt;intersect&lt;/tt&gt; pairwise on our properties to eliminate all uses of &lt;tt&gt;And&lt;/tt&gt;. Here is some code to do so. Firstly, it's convenient to sometimes write any property as if it is a list of "subproperties", all &lt;tt&gt;Or&lt;/tt&gt;red together:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; subproperties (Or ps) = ps&lt;br /&gt;&amp;gt; subproperties p = [p]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now we can go ahead and remove all of the &lt;tt&gt;And&lt;/tt&gt;s:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; removeAnd (Or ps) = Or (map removeAnd ps)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The property &lt;tt&gt;always&lt;/tt&gt; can be rewritten as:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; removeAnd (And []) = Modulo 0 1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Remove &lt;tt&gt;And&lt;/tt&gt; from the head of the list, remove it from the tail of the list, and then form all possible intersections of these two parts:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; removeAnd (And (p:ps)) = Or [q `intersect` q' |&lt;br /&gt;&amp;gt;     q &amp;lt;- subproperties (removeAnd p),&lt;br /&gt;&amp;gt;     q' &amp;lt;- subproperties (removeAnd (And ps))]&lt;br /&gt;&amp;gt; removeAnd p = p&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;By induction, the return value from &lt;tt&gt;removeAnd&lt;/tt&gt; can no longer contain an &lt;tt&gt;And&lt;/tt&gt;. Note that the properties can grow in size considerably. Here is the proposition that x isn't divisble by 5 or 7 written out in full:&lt;br /&gt;&lt;pre&gt;*Main&gt; removeAnd (And [notDivisibleBy 5, notDivisibleBy 7])&lt;br /&gt;Or [Modulo 1 35,Modulo 16 35,Modulo 31 35,Modulo 46 35,Modulo 61&lt;br /&gt;35,Modulo 76 35,Modulo (-13) 35,Modulo 2 35,Modulo 17 35,Modulo 32&lt;br /&gt;35,Modulo 47 35,Modulo 62 35,Modulo (-27) 35,Modulo (-12) 35,Modulo&lt;br /&gt;3 35,Modulo 18 35,Modulo 33 35,Modulo 48 35,Modulo (-41) 35,Modulo&lt;br /&gt;(-26) 35,Modulo (-11) 35,Modulo 4 35,Modulo 19 35,Modulo 34 35]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now to the primes. Here's a &lt;a href="http://www.haskell.org/haskellwiki/Prime_numbers"&gt;standard way&lt;/a&gt; to make the list of primes in Haskell:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; isPrime primes n = foldr (\p r -&amp;gt; p*p &amp;gt; n || (rem n p /= 0 &amp;amp;&amp;amp; r))&lt;br /&gt;&amp;gt;                          True primes&lt;br /&gt;&amp;gt; primes = 2 : filter (isPrime primes) [3..]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;The Proof&lt;/b&gt;&lt;br /&gt;Now we can give the proof this set is infinite. Suppose it were finite. Then we could form this property:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; prop = removeAnd $ And (map notDivisibleBy primes)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;It contains no &lt;tt&gt;And&lt;/tt&gt;s, and so must simply be the &lt;tt&gt;Or&lt;/tt&gt; of a bunch of &lt;tt&gt;Modulo&lt;/tt&gt;s. But each &lt;tt&gt;Modulo&lt;/tt&gt; defines an infinite set, so &lt;tt&gt;prop&lt;/tt&gt; must define an infinite set.&lt;br /&gt;&lt;br /&gt;But &lt;tt&gt;prop&lt;/tt&gt; is the property of not being divisible by any prime. So &lt;tt&gt;prop&lt;/tt&gt; can only &lt;tt&gt;eval&lt;/tt&gt; to &lt;tt&gt;True&lt;/tt&gt; on -1 or 1, a finite set. Contradiction. Therefore &lt;tt&gt;primes&lt;/tt&gt; is infinite.&lt;br /&gt;&lt;br /&gt;We can look at approximations to &lt;tt&gt;prop&lt;/tt&gt; like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; prop' n = removeAnd $ And (map notDivisibleBy (take n primes))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;You can see that the proposition grows in size rapidly:&lt;br /&gt;&lt;pre&gt;*Main&gt; removeAnd (prop' 3)&lt;br /&gt;Or [Modulo 1 30,Modulo (-83) 30,Modulo (-167) 30,Modulo (-251)&lt;br /&gt;30,Modulo 71 30,Modulo (-13) 30,Modulo (-97) 30,Modulo (-181) 30]&lt;br /&gt;*Main&gt; removeAnd (prop' 4)&lt;br /&gt;Or [Modulo 1 210,Modulo (-56159) 210,Modulo (-112319) 210,Modulo...]&lt;br /&gt;&lt;/pre&gt;Nonetheless, it would always be finite if there were only finitely many primes. As &lt;tt&gt;primes&lt;/tt&gt; is infinite, you can think of the sequence &lt;tt&gt;prop' n&lt;/tt&gt; as somehow trying to creep up on the set -1, 1, never quite getting there.&lt;br /&gt;&lt;br /&gt;Unfortunately I have no time to explain why a topological proof should lead to one about a simple &lt;a href="http://en.wikipedia.org/wiki/Domain-specific_language"&gt;DSL&lt;/a&gt; beyond mentioning that there's a deeper story relating to the &lt;a href="http://blog.sigfpe.com/2008/01/what-does-topology-have-to-do-with.html"&gt;computability&lt;/a&gt; of &lt;tt&gt;eval&lt;/tt&gt; for possibly infinite expressions of type &lt;tt&gt;Prop&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Addendum&lt;/b&gt;&lt;br /&gt;I'll just say a little on the computability connection. Suppose we have a really dumb algorithm to test whether an integer x equals k mod n by doing a brute force search for s such that x=s*n+k. Suppose this is the only kind of test on x that we have available to us. The test will only terminate if it finds a solution. So with such an algorithm, testing for equality mod N is only semi-decidable. Now suppose we are allowed multi-threaded code. The infinitude of the primes implies that with our dumb tests, membership of -1,1 is also semi-decidable. So we can turn the problem of proving the infinitude of the primes into one about computability. You can see roughly how: we can "semi-test" &lt;tt&gt;Or ps&lt;/tt&gt; by launching a process to test the first element of ps. Then launch a process to check the next, and so on. If any of these processes terminates, we have our answer. The argument presented above gives the details of how to construct a suitable &lt;tt&gt;Or ps&lt;/tt&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-3213030729419630796?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/3213030729419630796/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=3213030729419630796' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3213030729419630796'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3213030729419630796'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/07/infinitude-of-primes.html' title='The Infinitude of the Primes'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-2698212297987135360</id><published>2011-06-25T14:57:00.000-07:00</published><updated>2011-06-25T14:58:22.571-07:00</updated><title type='text'>An elementary way to approach Fourier transforms</title><content type='html'>&lt;b&gt;Why another introduction to the Fourier transform?&lt;/b&gt;&lt;br /&gt;There are many elementary &lt;a href="http://www.google.com/search?q=introduction+fourier+transform"&gt;introductions&lt;/a&gt; to the discrete Fourier transform on the web. But this one is going to be a little different. I hope to motivate and demonstrate an application of the Fourier transform starting from no knowledge of the subject at all. But apart from this sentence, I'll make no mention of complex numbers or trigonometric functions. That may seem weird - the standard &lt;a href="http://www.google.com/search?q=introduction+fourier+definition"&gt;definitions&lt;/a&gt; seem to make it clear that these concepts are crucial parts of the definition. But I claim that in some ways these definitions miss the point. An analogy from software engineering is appropriate: most definitions of the Fourier transform are a bit like explaining an interface to an API by reference to the implementation. That might tell you how it works at a nuts-and-bolts level, but that can obscure what the API is actually for.&lt;br /&gt;&lt;br /&gt;There's code all the way through so if anything I've said is ambiguous, that should help resolve it. I've chosen to write the following in 'literate Octave'. I can't say I love the language but it seems like the easiest way to express what I want here computationally.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Convolution&lt;/b&gt;&lt;br /&gt;Suppose your camera has a hexagonal iris and it's out of focus. When you take a picture of a point light source (towards your lower left) you'll end up with a result like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-5EzWTPUYxkw/TgU92Kuyb_I/AAAAAAAAAs8/6hqCdfiQerw/s1600/hexagon.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="304" width="400" src="http://2.bp.blogspot.com/-5EzWTPUYxkw/TgU92Kuyb_I/AAAAAAAAAs8/6hqCdfiQerw/s400/hexagon.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The effect is known as &lt;a href="http://en.wikipedia.org/wiki/Bokeh"&gt;bokeh&lt;/a&gt; and the wikipedia page has a nice example with an octagonal iris.&lt;br /&gt;&lt;br /&gt;(If you want to use the Octave code in this article, save the image above as &lt;tt&gt;hexagon.png&lt;/tt&gt;.)&lt;br /&gt;&lt;br /&gt;Suppose we take a picture of an ordinary scene instead. We can think of every visible point in the scene as a point light source and so the out-of-focus photograph will be the sum of many hexagons, one for each point in the image, with each hexagon's brightness determined by the brightness of the point it comes from. You can also flip this around and think about the out-of-focus image as being a sum of lots of copies of the original image, one of each point in the hexagon. We can go ahead and code this up directly in octave.&lt;br /&gt;&lt;br /&gt;Here's the picture that we'll apply bokeh to:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-FddK347Dq6U/TgU_7vHcc7I/AAAAAAAAAtI/al1kSGvbZ2I/s1600/marcie1.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="304" width="400" src="http://2.bp.blogspot.com/-FddK347Dq6U/TgU_7vHcc7I/AAAAAAAAAtI/al1kSGvbZ2I/s400/marcie1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;(Background: I used to work for a &lt;a href="http://www.cinesite.com/"&gt;division of Kodak&lt;/a&gt; and that was the test image we frequently used. Save the image as &lt;tt&gt;marcie1.png&lt;/tt&gt;.)&lt;br /&gt;&lt;br /&gt;Let's read in our image and iris shape, converting the image to grayscale:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; I = double(imread('marcie1.png'))/256;&lt;br /&gt;&amp;gt; I = (I(:,:,1)+I(:,:,2)+I(:,:,3))/3;&lt;br /&gt;&amp;gt; J = double(imread('hexagon.png'))/256;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt; h = size(I)(1);&lt;br /&gt;&amp;gt; w = size(I)(2);&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now we'll apply the bokeh by looping over the entire iris, accumulating shifted copies of the original image. We're optimising a little bit. As many of the pixels in &lt;tt&gt;J&lt;/tt&gt; are black we can skip over them. I'm using &lt;tt&gt;circshift&lt;/tt&gt; so that we'll get wraparound at the edge. That turns out to be very convenient later.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; total = 0;&lt;br /&gt;&amp;gt; K = zeros(h, w);&lt;br /&gt;&amp;gt; for i = 1:h&lt;br /&gt;&amp;gt;     i&lt;br /&gt;&amp;gt;     for j = 1:w&lt;br /&gt;&amp;gt;         if J(i, j) != 0&lt;br /&gt;&amp;gt;             K = K + J(i, j)*circshift(I, [i, j]);&lt;br /&gt;&amp;gt;             total = total + J(i, j);&lt;br /&gt;&amp;gt;         endif&lt;br /&gt;&amp;gt;     endfor&lt;br /&gt;&amp;gt; endfor&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We use &lt;tt&gt;total&lt;/tt&gt; to scale the overall brightness back into a reasonable range:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; imshow(K/total)&lt;br /&gt;&amp;gt; pause(5)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Make sure you understand what that code is doing because the rest of this article depends on it. The central line in the double loop is repeatedly adding copies of the original image, shifted by &lt;tt&gt;[i, j]&lt;/tt&gt;, and scaled by &lt;tt&gt;J(i, j)&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;There's just one problem with that code. It's incredibly slow. It's only tolerable because I know that most of &lt;tt&gt;J&lt;/tt&gt; is zero so I could optimize it with a conditional. More general higher resolution images will leave you waiting for a long time.&lt;br /&gt;&lt;br /&gt;The image we have computed above is known as the convolution of &lt;tt&gt;I&lt;/tt&gt; and &lt;tt&gt;J&lt;/tt&gt;. My goal is to show how we can use the Fourier transform to look at this in a very different way. As a side effect we will also get a much faster convolution algorithm - but the reason it runs faster is a story for another time. In this article just want to show what Fourier transforms are and why they're relevant at all.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Fourier transforms&lt;/b&gt;&lt;br /&gt;Central to the code above is the fact that we're shifting and adding the same image over and over again. If we could avoid that we might find a way to speed the code up. So let me define some shorthand. I'll use R to represent the function that shifts an image right one pixel. In Octave that's given by &lt;tt&gt;circshift(I,[0,1])&lt;/tt&gt;. I'll use U to mean a shift up by one pixel. Rather than define *the* Fourier transform I'm going to define a family of operations, all called Fourier transforms. (More precisely, these are two-dimensional discrete Fourier transforms.)&lt;br /&gt;&lt;br /&gt;(1) A Fourier transform is a linear operation that converts an image to another one of the same dimensions. This means that if you apply it to sums and multiples of images you get sums and multiples of the Fourier transforms. In the language of Octave, if &lt;tt&gt;F&lt;/tt&gt; is such an operation, then &lt;tt&gt;F(A+B) == F(A)+F(B)&lt;/tt&gt; and &lt;tt&gt;F(s*A) == s*F(A)&lt;/tt&gt;, for &lt;tt&gt;s&lt;/tt&gt; a scalar.&lt;br /&gt;&lt;br /&gt;(2) A Fourier transform has this property: there is a pair of images, &lt;tt&gt;A&lt;/tt&gt; and &lt;tt&gt;B&lt;/tt&gt;, such that for any image I, F(U(I)) = AF(I) and F(R(I)) = BF(I). (AF(I) means the pixelwise product of A and F(I)). In Octave notation:&lt;br /&gt;&lt;pre&gt;F(circshift(I, [1,0])) == A .* F(I)&lt;br /&gt;F(circshift(I, [0,1])) == B .* F(I)&lt;br /&gt;&lt;/pre&gt;Fourier transforms convert shifts to multiplications. (If you only learn one thing from this article, this should be it.)&lt;br /&gt;&lt;br /&gt;(3) Fourier transforms are invertible so there must be another linear transform F&lt;sup&gt;-1&lt;/sup&gt; such that F&lt;sup&gt;-1&lt;/sup&gt;(F(I)) = I and F(F&lt;sup&gt;-1&lt;/sup&gt;(I)) = I.&lt;br /&gt;&lt;br /&gt;Anything with these properties is an example of a Fourier transform. It's the second property that is crucially important. In jargon it's said to diagonalise translation.&lt;br /&gt;&lt;br /&gt;From (2) it follows that we can compute the Fourier transform of an image shifted by &lt;tt&gt;[i, j]&lt;/tt&gt; by multiplying the original Fourier transform by &lt;tt&gt;A.^i .* B.^j&lt;/tt&gt;. (&lt;tt&gt;.^&lt;/tt&gt; is the pixelwise power function.)&lt;br /&gt;&lt;br /&gt;If h is the image height, then shifting up h times should wrap us around to the beginning again. Similarly for w. So from (2) we know that A&lt;sup&gt;h&lt;/sup&gt;=1 and B&lt;sup&gt;w&lt;/sup&gt;=1. (That's shorthand for saying that each of the individual elements of A raised to the power of h give 1 and so on.)&lt;br /&gt;&lt;br /&gt;It just so happens that Octave comes supplied with a suitable function that satisfies the three conditions I listed. It's called &lt;tt&gt;fft2&lt;/tt&gt;. Let's find out what the corresponding images A and B are:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; r = rand(h, w);&lt;br /&gt;&amp;gt; A = fft2(circshift(r, [1, 0])) ./ fft2(r);&lt;br /&gt;&amp;gt; B = fft2(circshift(r, [0, 1])) ./ fft2(r);&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;(Note that ./ is the pixelwise division operator. If you're unlucky your random numbers will lead to a division by zero. Just try again.)&lt;br /&gt;&lt;br /&gt;Let's try again for another random image:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; s = rand(h, w);&lt;br /&gt;&amp;gt; A0 = fft2(circshift(s, [1, 0])) ./ fft2(s);&lt;br /&gt;&amp;gt; B0 = fft2(circshift(s, [0, 1])) ./ fft2(s);&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can see that A is almost exactly A0 and B is almost B0:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; abs(A-A0)&lt;br /&gt;&amp;gt; abs(B-B0)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;So now we can go back to our original convolution algorithm. Let's define&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; II = fft2(I);&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And now we can use the first and third properties to compute the Fourier transform of the image with bokeh applied to it. We're applying properties (1) and (2) to the central line of the double loop above:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; KK = zeros(h, w);&lt;br /&gt;&amp;gt; for i = 1:h&lt;br /&gt;&amp;gt;     i&lt;br /&gt;&amp;gt;     for j = 1:w&lt;br /&gt;&amp;gt;         if J(i, j) != 0&lt;br /&gt;&amp;gt;             KK = KK + J(i, j).*A.^i.*B.^j.*II;&lt;br /&gt;&amp;gt;         endif&lt;br /&gt;&amp;gt;     endfor&lt;br /&gt;&amp;gt; endfor&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;I said that the definition of Fourier transform requires the existence of an inverse, and claimed that &lt;tt&gt;fft2&lt;/tt&gt; was a Fourier transform. So we must have an inverse. Octave conveniently provides us with the name &lt;tt&gt;ifft2&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; K = ifft2(KK/total);&lt;br /&gt;&amp;gt; imshow(K)&lt;br /&gt;&amp;gt; pause(5)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We've eliminated all that shifting, but at the end of the day this code is slower. But did you notice something curious about the innermost line of the double loop? It's always adding the same multiple of II to KK. We can completely factor it out. So we can rewrite the code as:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; JJ = zeros(h, w);&lt;br /&gt;&amp;gt; for i = 1:h&lt;br /&gt;&amp;gt;     i&lt;br /&gt;&amp;gt;     for j = 1:w&lt;br /&gt;&amp;gt;         if J(i, j) != 0&lt;br /&gt;&amp;gt;             JJ = JJ + J(i, j).*A.^i.*B.^j;&lt;br /&gt;&amp;gt;         endif&lt;br /&gt;&amp;gt;     endfor&lt;br /&gt;&amp;gt; endfor&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can leave the multiplication by II all the way to the end:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; KK = JJ .* II;&lt;br /&gt;&amp;gt; K = ifft2(KK/total);&lt;br /&gt;&amp;gt; imshow(K)&lt;br /&gt;&amp;gt; pause(5)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;This is pretty cool. We can precompute JJ. Any time we want to convolve an image with the hexagon we apply &lt;tt&gt;fft2&lt;/tt&gt; to the image, multiply by JJ and then apply &lt;tt&gt;ifft2&lt;/tt&gt;. But there's something even better going on. Let's look more closely at that double loop above involving powers of the elements of A and B. Let's write out the function it computes in standard mathematical notation:&lt;br /&gt;&lt;br /&gt;f(J) = &amp;Sigma;&lt;sub&gt;i,j&lt;/sub&gt;A&lt;sup&gt;i&lt;/sup&gt; B&lt;sup&gt;j&lt;/sup&gt; J&lt;sub&gt;i,j&lt;/sub&gt;&lt;br /&gt;&lt;br /&gt;What is f(U(J))?&lt;br /&gt;&lt;br /&gt;f(U(J)) = &amp;Sigma;A&lt;sup&gt;i&lt;/sup&gt; B&lt;sup&gt;j&lt;/sup&gt; J&lt;sub&gt;i-1,j&lt;/sub&gt; (dy definition of shifting up)&lt;br /&gt;f(U(J)) = &amp;Sigma;A&lt;sup&gt;i+1&lt;/sup&gt; B&lt;sup&gt;j&lt;/sup&gt; J&lt;sub&gt;i,j&lt;/sub&gt; (by summing over i+1 instead of i and using wraparound)&lt;br /&gt;f(U(J)) = A f(J)&lt;br /&gt;&lt;br /&gt;Similarly,&lt;br /&gt;&lt;br /&gt;f(R(J)) = B f(J)&lt;br /&gt;&lt;br /&gt;In other words, &lt;tt&gt;f&lt;/tt&gt; satisfies the second property of Fourier transforms. It obviously satisfies the first property. Our transform &lt;tt&gt;f&lt;/tt&gt; looks like a Fourier transform. In fact, it is, and Octave's &lt;tt&gt;fft2&lt;/tt&gt; is defined this way. So now we have this procedure for convolving:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; II = fft2(I);&lt;br /&gt;&amp;gt; JJ = fft2(J);&lt;br /&gt;&amp;gt; KK = II .* JJ;&lt;br /&gt;&amp;gt; K = ifft2(KK/total);&lt;br /&gt;&lt;br /&gt;&amp;gt; imshow(K)&lt;br /&gt;&amp;gt; pause(5)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We now have a fast convolution algorithm. Of course I've left out an important point: &lt;tt&gt;fft2&lt;/tt&gt; and &lt;tt&gt;ifft2&lt;/tt&gt; are fast. They don't use the obvious summation algorithm suggested by my definition of f. But that's an implementation detail. We're able to reason successfully about important properties of &lt;tt&gt;fft2&lt;/tt&gt; using the properties I listed above.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;br /&gt;Let me recapitulate so you can see &lt;br /&gt;&lt;br /&gt;1. I defined Fourier transforms&lt;br /&gt;2. I showed how convolution could be rewritten using one&lt;br /&gt;3. I told you that &lt;tt&gt;fft2&lt;/tt&gt; was a Fourier transform, giving an alternative algorithm for convolution&lt;br /&gt;4. I showed that another part of the convolution algorithm looks like a Fourier transform. (I didn't prove it had an inverse.)&lt;br /&gt;5. I told you that (by amazing coincidence!) this other part is also &lt;tt&gt;fft2&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;I haven't shown you how to implement a Fourier transform. But I have shown you how you can reason about many of their properties. Enough to get much of the way to the &lt;a href="http://en.wikipedia.org/wiki/Convolution_theorem"&gt;convolution theorem&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Approaching Fourier transforms through the properties I listed is common in the more advanced mathematical literature. But for some reason, in the elementary literature people often choose to describe things in a more complicated way. This is true of many things in mathematics.&lt;br /&gt;&lt;br /&gt;I hope the above serves as useful motivation when you come to check out a more standard and more complete introduction to Fourier transforms. In particular, now's a good time to try to understand why the usual &lt;a href="http://en.wikipedia.org/wiki/Discrete_Fourier_transform#Definition"&gt;definition&lt;/a&gt; of the discrete Fourier transform satisfies the properties above and so fill in the steps I missed out.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Note&lt;/b&gt;&lt;br /&gt;You may need to set up Octave for graphics. On MacOSX I started the X server and used "export GNUTERM=x11" before running Octave.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-2698212297987135360?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/2698212297987135360/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=2698212297987135360' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2698212297987135360'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2698212297987135360'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/06/another-elementary-way-to-approach.html' title='An elementary way to approach Fourier transforms'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-5EzWTPUYxkw/TgU92Kuyb_I/AAAAAAAAAs8/6hqCdfiQerw/s72-c/hexagon.png' height='72' width='72'/><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-8207351591103855578</id><published>2011-06-04T12:02:00.000-07:00</published><updated>2011-06-15T09:05:49.584-07:00</updated><title type='text'>Simulating visual artifacts with Fourier optics</title><content type='html'>&lt;b&gt;The problem&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;One of the fun things about working in the visual effects industry is the large number of disciplines involved, especially when working for a smaller company where everyone has to do everything. At one point many years ago we did some work on simulating camera artifacts including &lt;a href="http://en.wikipedia.org/wiki/Diffraction_spike"&gt;diffraction spikes&lt;/a&gt;. Although the wikipedia page is in the context of astronomy you can get diffraction artifacts with any kind of camera, and you'll get spikes whenever the iris has a polygonal shape. I hope to sketch why later.&lt;br /&gt;&lt;br /&gt;I noticed an intriguing question about a &lt;a href="http://www.quora.com/What-causes-this-diffraction-pattern"&gt;diffraction pattern&lt;/a&gt; on Quora. (For those not following the link: this photo is a picture of a reflection of the photographer and camera in an LCD screen.) My immediate thought was "how could I have faked this in a render?".&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-eCRE-5YU9k4/Tep9jOO9HEI/AAAAAAAAAsw/0x-D3v0Q4Qk/s1600/original.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://2.bp.blogspot.com/-eCRE-5YU9k4/Tep9jOO9HEI/AAAAAAAAAsw/0x-D3v0Q4Qk/s320/original.jpeg" width="152" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;My main interest was in the 'X' shape. And the first question is this: how do I know this is a diffraction effect? It might be that there are specular reflections off periodic structures in the LCD display orthogonal to the arms of the 'X'. This would explain everything with only &lt;a href="http://en.wikipedia.org/wiki/Geometrical_optics"&gt;geometric optics&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;There are two big clues I know of that diffraction is at work: we have separation of white light into separate colours. This is typical of diffraction effects. We also see a discrete structure: rays directed along a discrete set of directions in addition to the arms of the 'X'. This is typical of diffraction from periodic microstructures, and again I hope to sketch why later.&lt;br /&gt;&lt;br /&gt;So how can we explain the appearance of this photograph?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A simple model&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;If we consider diffraction effects we need to model light as waves. Let's start by working with just one frequency of light from the camera flash. Let's also suppose that the flash is far enough away from the monitor that the spherical wavefronts can be modelled as plane waves when they meet the monitor. Here's a picture:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-cWozuEZAw7Y/Tepdl-4xWmI/AAAAAAAAAsQ/UczRJu18pZ0/s1600/incoming.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="288" src="http://1.bp.blogspot.com/-cWozuEZAw7Y/Tepdl-4xWmI/AAAAAAAAAsQ/UczRJu18pZ0/s400/incoming.png" width="288" /&gt;&lt;/a&gt;&lt;/div&gt;The thick black vertical line is the monitor screen. The other vertical lines are the incoming plane waves. And the red arrow shows the direction of motion.&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Huygens%E2%80%93Fresnel_principle"&gt;Huygen's principle&lt;/a&gt;&amp;nbsp;tells us that when the waves reflect off the monitor surface we can treat the entire monitor surface as a collection of points, each of which is emitting spherical waves equally in all directions. If you're unfamiliar with Huygen's principle it may seem wrong to you. When you shine a light at a mirror we get reflection along one particular direction, not in all directions. That's true, but we can view this fact as a consequence of interference between all of the little points on the surface emitting in all directions. Again, that's another thing that will emerge from the calculation below.&lt;br /&gt;&lt;br /&gt;The question I now want to answer is this: how much light gets reflected in direction &lt;i&gt;k&lt;/i&gt;? I'll assume we're observing things from far enough away that we can neglect the details of the geometry. We'll just compute the contribution in direction &lt;i&gt;k&lt;/i&gt;&amp;nbsp;(a unit vector (&lt;i&gt;l, m, n&lt;/i&gt;)) from all of the points on our surface. We'll assume that because of variations on the surface, the intensity of the emission varies along the surface. We'll model the emission as a function &lt;i&gt;I&lt;/i&gt;, so &lt;i&gt;I(y, t)&lt;/i&gt;&amp;nbsp;is the emission from position &lt;i&gt;y&lt;/i&gt;&amp;nbsp;at time &lt;i&gt;t&lt;/i&gt;. We want to collect the light going in direction &lt;i&gt;k&lt;/i&gt;&amp;nbsp;through some light collector at time &lt;i&gt;t&lt;/i&gt;. Call that &lt;i&gt;J(k,t)&lt;/i&gt;. Here's a picture:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-qCiiTuHwpr8/TepdmJjuGjI/AAAAAAAAAsY/5g7GXDcHhBw/s1600/outgoing.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="360" src="http://3.bp.blogspot.com/-qCiiTuHwpr8/TepdmJjuGjI/AAAAAAAAAsY/5g7GXDcHhBw/s400/outgoing.png" width="360" /&gt;&lt;/a&gt;&lt;/div&gt;Remember: there are waves going in all directions, but I've just drawn the waves going in direction &lt;i&gt;k&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Fourier optics for beginners&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Now comes the interesting bit: the light arriving at the collector is the integral of all the light emitted from the points on the monitor surface. But the time of flight from each point on the monitor surface is different. So when we perform our integral we need to build in a suitable delay. It's straightforward geometry to show that the time delay between waves arriving from &lt;i&gt;y&lt;/i&gt;₁ and &lt;i&gt;y&lt;/i&gt;₂ is &lt;i&gt;my&lt;/i&gt;/&lt;i&gt;c&lt;/i&gt; where &lt;i&gt;c&lt;/i&gt;&amp;nbsp;is the speed of light and &lt;i&gt;m,&lt;/i&gt;&amp;nbsp;as defined above, is the &lt;i&gt;y&lt;/i&gt;-component of &lt;i&gt;k&lt;/i&gt;. So, up to a constant of proportionality, and some absolute choice if time, we want this integral&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;i&gt;J&lt;/i&gt;(&lt;i&gt;k,t&lt;/i&gt;) = ∫ d&lt;i&gt;y&lt;/i&gt;&amp;nbsp;&lt;i&gt;I&lt;/i&gt;(&lt;i&gt;y,t&lt;/i&gt;-&lt;i&gt;my&lt;/i&gt;/&lt;i&gt;c&lt;/i&gt;)&lt;/div&gt;&lt;br /&gt;We assumed that the monitor surface was being struck by plane waves. Assuming they were orthogonal to the surface and coherent this means that &lt;i&gt;I&lt;/i&gt;&amp;nbsp;is proportional to exp(&lt;i&gt;iωt&lt;/i&gt;). The time dependence in &lt;i&gt;J&lt;/i&gt; is just the sinusoid, so we drop that from further calculations. So we can write, ignoring another constant of proportionality,&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;i&gt;J&lt;/i&gt;(&lt;i&gt;k&lt;/i&gt;) = ∫ d&lt;i&gt;y&lt;/i&gt;&amp;nbsp;&lt;i&gt;I&lt;/i&gt;(&lt;i&gt;y)&lt;/i&gt;exp(-&lt;i&gt;iω&lt;/i&gt;&lt;i&gt;my/c&lt;/i&gt;) =&amp;nbsp;∫&amp;nbsp;d&lt;i&gt;y&lt;/i&gt;&amp;nbsp;&lt;i&gt;I&lt;/i&gt;(&lt;i&gt;y)&lt;/i&gt;exp(-&lt;i&gt;i2πmy/λ&lt;/i&gt;)&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Where I used &lt;i&gt;ω = 2πc/λ &lt;/i&gt;and&amp;nbsp;&lt;i&gt;λ&lt;/i&gt; is the wavelength of the light. In other words, &lt;i&gt;J&lt;/i&gt;&amp;nbsp;is the Fourier transform of &lt;i&gt;I&lt;/i&gt;. More generally, if we modelled the &lt;i&gt;z&lt;/i&gt;-axis we'd find that &lt;i&gt;J&lt;/i&gt;&amp;nbsp;is given by the 2D Fourier transform of &lt;i&gt;I&lt;/i&gt;&amp;nbsp;as a function of the 2D surface of the monitor. The actual power striking the sensor is given by the absolute value of the square of &lt;i&gt;J&lt;/i&gt;.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;Ordinary reflections&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;So now I can pop the first of my pending explanations off the stack. Suppose that the surface is completely uniform. Then &lt;i&gt;I(y) = &lt;/i&gt;constant. The Fourier transform of a &lt;a href="http://en.wikipedia.org/wiki/Fourier_transform#Distributions"&gt;constant function&lt;/a&gt; is the dirac delta function. In other words - despite the fact that we're modelling every point on the surface as its own emitter, the resulting reflection from a perfectly smooth surface is a wave going straight back where it came from. (Exercise, modify the above treatment to show that waves striking a mirror at an angle obey the usual reflection law for mirrors.)&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;Periodic structure&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Now suppose that the surface of our monitor has a periodic structure. The Fourier transform of a periodic function is concentrated at spikes corresponding to the fundamental frequency of the structure and its overtones. So we expect to see a grid-like structure in the result. I believe that's what we're seeing in the spray of quantised vectors coming from the center of the image and the fact that the arms of the 'X' look like discrete blobs of colour arranged in a line rather than the white spikes in the astronomical example I linked to above. That tells us we're probably seeing artifacts caused by the microstructure of the LCD pixels.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;An edge&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Consider the function &lt;i&gt;I&lt;/i&gt;(&lt;i&gt;y,z&lt;/i&gt;) = -1&amp;nbsp;for &lt;i&gt;y&lt;/i&gt;&amp;lt;0&amp;nbsp;and 1 for &lt;i&gt;y&lt;/i&gt;&amp;gt;0. In other words, the sgn function along the &lt;i&gt;y&lt;/i&gt;-axis and constant along the &lt;i&gt;z&lt;/i&gt;-axis. Up to multiplication by a constant, its &lt;a href="http://en.wikipedia.org/wiki/Fourier_transform#Distributions"&gt;Fourier transform&lt;/a&gt; is given by the dirac delta along the &lt;i&gt;z&lt;/i&gt;-axis and 1/ω along the &lt;i&gt;y&lt;/i&gt;-axis. On other words, it's a spike, starting at the origin, extending along the &lt;i&gt;y&lt;/i&gt;-axis, but fading away as we approach infinity. This is the source of diffraction spikes. Any time we see such a spike it's likely there was a perpendicular edge somewhere in the function &lt;i&gt;I&lt;/i&gt;. For telescopes it often comes from the struts supporting the secondary mirror. For cameras it comes from the polygonal shape of the iris. In this case, it looks like we must have pixels whose shape has edges perpendicular to the 'X'. I have to admit, when I came to this conclusion it sounded very implausible. But then someone posted this image from &lt;a href="http://cmitja.wordpress.com/2011/03/02/displays-pixel-size/"&gt;here&lt;/a&gt;:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-OuVFlF-RE_Y/Tepr-1zi8wI/AAAAAAAAAsg/TFI7zaPgMTw/s1600/lcd-pattern.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/-OuVFlF-RE_Y/Tepr-1zi8wI/AAAAAAAAAsg/TFI7zaPgMTw/s320/lcd-pattern.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;There are those edges at the predicted angles.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;Simulating the artifact&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Now we have enough information to simulate the artifact. You can think of the remainder of this article as written in "literate octave".&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Save the above close-up of the pixels in a file and read it in.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;I = imread('lcd-pattern.jpg');&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;I crop out one quarter as it simply repeats. This also removes the text.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;I1 = I(251:500,251:500,1);&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Now clean up some stray (image) pixels and pick out just one (LCD) pixel.&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;I1(1:250,100:250) = 0;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;The photo is blurry, probably has compression artifacts and seems quite noisy to me. So I threshold it at intensity 200 to get a hard shape. I then compute the discrete Fourier transform and scale the intensity to something nice.&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;I2 = 0.0001*abs(fft2(I1&amp;gt;200).^2);&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;The Fourier transform puts the origin in the corner so I shift it to the centre.&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;I2 = circshift(I2,[125,125]);&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Now we need to do the same thing for red and green. We reuse the red image but it needs scaling in proportion to the wavelength. The numbers 1.2 and 1.4 are intended to be the ratio of the wavelengths to that of blue light. They're just ballpark figures: I chose them to make the image sizes convenient when rescaled. I also crop out a piece from the centre of the rescaled images so they line up nicely. I use &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;imresize2&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt; from &lt;a href="http://www.irit.fr/PERSONNEL/SAMOVA/joly/Teaching/M2IRR/IRR05/index.html"&gt;here&lt;/a&gt;.&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Ib = I2;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Ig = imresize2(Ir,1.2,'nearest')(26:275, 26:275);&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Ir = imresize2(Ir,1.4,'nearest')(51:300, 51:300);&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Now I assemble the final RGB image.&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;J = zeros(250, 250, 3);&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;J(:, :, 1) = Ir;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;J(:, :, 2) = Ig;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;J(:, :, 3) = Ib;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;imwrite(J,'cross.jpg')&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Here's the result:&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-ISPugKvM7_k/Tep5g7ogTII/AAAAAAAAAso/4vTHCLAwxgM/s1600/cross.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-ISPugKvM7_k/Tep5g7ogTII/AAAAAAAAAso/4vTHCLAwxgM/s1600/cross.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;Analysis&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Not bad, but it's not perfect. That's unsurprising. Have a look at the thresholded image straight after thresholding. It's a very poor representation of the pixel shape. It would be better to redraw the shape correctly at a higher resolution (or even work analytically, not infeasible for simple shapes like these pixels). Stray (image) pixels can make a big difference in Fourier space. That explains the 'dirtiness' of my image. Nonetheless, it has the big X and it has the rows of dots near the middle. Qualitatively it's doing the right thing.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Note that the big X in the middle seems to have ghost reflections at the left and right. These are, I think,&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Aliasing#Sampling_sinusoidal_functions"&gt;aliasing&lt;/a&gt;&amp;nbsp;and purely artifacts of the digitisation.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;It'd look slightly better with some multi-spectral rendering. I've treated the pixels as reflecting one wavelength but actually each one reflects a band.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;There are also, almost certainly, lots more effects going in that picture. The close-up photograph is only revealing one part of the structure. I'm sure there is 3D structure to those pixels, not just a flat surface. I suspect that's where the horizontal white line is coming from. Because it's white it suggests an artifact due to something interacting equally with all visible wavelengths. Maybe a vertical edge between pixels.&lt;br /&gt;&lt;br /&gt;But overall I think I've showed that Fourier optics is a step in the right direction for simulating this kind of effect.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;Note&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Everything I know about Fourier optics I learnt from my ex-colleague &lt;a href="http://www.imdb.com/name/nm0416803/"&gt;Oliver James&lt;/a&gt;.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;All the errors are mine of course.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-8207351591103855578?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/8207351591103855578/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=8207351591103855578' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8207351591103855578'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8207351591103855578'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/06/simulating-visual-artifacts-with.html' title='Simulating visual artifacts with Fourier optics'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-eCRE-5YU9k4/Tep9jOO9HEI/AAAAAAAAAsw/0x-D3v0Q4Qk/s72-c/original.jpeg' height='72' width='72'/><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-5475755539849835914</id><published>2011-05-28T16:47:00.000-07:00</published><updated>2011-05-28T17:14:19.825-07:00</updated><title type='text'>Fast forwarding lrand48()</title><content type='html'>A break from abstract nonsense to answer a question I've seen asked online a number of times. It requires nothing more than elementary modular arithmetic and it ends in some exercises.&lt;br /&gt;&lt;br /&gt;Given a pseudo-random number generator, say BSD Unix &lt;tt&gt;lrand48()&lt;/tt&gt;, is there a quick way to jump forward a billion numbers in the sequence, say, without having to work through all of the intermediate numbers? The method is no secret, but I couldn't find explicit code online so I thought I'd put some here. Literate Haskell of course.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; {-# LANGUAGE ForeignFunctionInterface #-}&lt;br /&gt;&amp;gt; {-# OPTIONS_GHC -fno-warn-missing-methods #-}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;On MacOSX, if you type 'man lrand48', you'll see the function &lt;tt&gt;lrand48()&lt;/tt&gt; returns a sequence of 31 bit non-negative integers defined using the sequence r&lt;sub&gt;n+1&lt;/sub&gt; = ar&lt;sub&gt;n&lt;/sub&gt;+c mod m where&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; a = 25214903917&lt;br /&gt;&amp;gt; c = 11&lt;br /&gt;&amp;gt; m = 2^48&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The actual returned value is the floor of r&lt;sub&gt;n&lt;/sub&gt;/2&lt;sup&gt;17&lt;/sup&gt; and r&lt;sub&gt;0&lt;/sub&gt; = 20017429951246.&lt;br /&gt;&lt;br /&gt;We can compute the nth element in the sequence the hard way by importing &lt;tt&gt;lrand48&lt;/tt&gt; and looping n times:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; foreign import ccall "lrand48" lrand48 :: IO Int&lt;br /&gt;&lt;br /&gt;&amp;gt; nthrand 1 = lrand48&lt;br /&gt;&amp;gt; nthrand n = lrand48 &amp;gt;&amp;gt; nthrand (n-1)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;But there is a better way. If we iterate twice we get that r&lt;sub&gt;n+2&lt;/sub&gt; = a(ar&lt;sub&gt;n&lt;/sub&gt;+c)+c mod m = a&lt;sup&gt;2&lt;/sup&gt;r&lt;sub&gt;n&lt;/sub&gt;+ac+c mod m. Note how two applications of the iteration give you back another iteration in the same form: a multiplication followed by an addition modulo m. We can abstract this a bit. Given two function f(x) = ax+c mod m and g(x) = a'x+c' mod m we get g(f(x)) = (a'*a)*x + a'*c+c' mod m. We can represent functions of this type using a simple Haskell type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Affine = Affine { multiply :: Integer, add :: Integer } deriving (Show, Eq, Ord)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can now write a function to compose these functions. I'm going to use the operator &lt;tt&gt;*&lt;/tt&gt; to represent composition:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Num Affine where&lt;br /&gt;&amp;gt;    Affine a' c' * Affine a c = Affine (a'*a `mod` m) ((a'*c+c') `mod` m)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;To skip forward n steps we just need to multiply n of these together, ie. raise &lt;tt&gt;Affine a c&lt;/tt&gt; to the power of n using &lt;tt&gt;^&lt;/tt&gt;. We then need to apply this function to r&lt;sub&gt;0&lt;/sub&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; initial = Affine 0 20017429951246&lt;br /&gt;&lt;br /&gt;&amp;gt; nthrand' n = (add $ Affine a c ^ n * initial) `div` (2^17)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now try firing up ghci and comparing the outputs of &lt;tt&gt;nthrand 1000000&lt;/tt&gt; and &lt;tt&gt;nthrand' 1000000&lt;/tt&gt;. Don't run &lt;tt&gt;nthrand&lt;/tt&gt; more than once without resetting the seed, eg. by restarting ghci. (I know someone will post a reply  below that it doesn't work...)&lt;br /&gt;&lt;br /&gt;There are lots of &lt;a href="http://www.google.com/search?q=jump+ahead+random+number+generator"&gt;papers&lt;/a&gt; on how to do this with other kinds of random number generator. My example is probably the easiest. The main application I can see is for jumping straight to that annoying regression test failure without going through all of the intermediates.&lt;br /&gt;&lt;br /&gt;Exercises.&lt;br /&gt;1. Read the corresponding man page for Linux. Port the above code to work there. Or any other OS you feel like. Or any other random number generator.&lt;br /&gt;2. Can you split &lt;tt&gt;lrand48()&lt;/tt&gt; into two? Ie. can you make two random generators that produce sequences s&lt;sub&gt;i&lt;/sub&gt; and t&lt;sub&gt;i&lt;/sub&gt; so that s&lt;sub&gt;0&lt;/sub&gt;, t&lt;sub&gt;0&lt;/sub&gt;, s&lt;sub&gt;1&lt;/sub&gt;, t&lt;sub&gt;1&lt;/sub&gt;, ... form the sequence given by &lt;tt&gt;lrand48()&lt;/tt&gt;.&lt;br /&gt;3. I've neglected to mention some special sauce in the code above. Why does it actually run so fast? (Clue: why did I use &lt;tt&gt;Num&lt;/tt&gt;?)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-5475755539849835914?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/5475755539849835914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=5475755539849835914' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5475755539849835914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/5475755539849835914'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/05/fast-forwarding-lrand48.html' title='Fast forwarding lrand48()'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-393797559188947149</id><published>2011-04-30T10:34:00.000-07:00</published><updated>2011-04-30T18:46:54.839-07:00</updated><title type='text'>Perturbation confusion confusion</title><content type='html'>Update: I'm making a bit of a turnabout here.&lt;br /&gt;&lt;br /&gt;Firstly, I have to point out that at no point have I disagreed with S&amp;P on any purely technical issue. We're all working with the same code and agree on what it does. The issue is a human one: is it possible to get wrong results by writing code that *looks* correct. I was sent this example:&lt;br /&gt;&lt;br /&gt;d (\x -&gt; (d (x*) 2)) 1&lt;br /&gt;&lt;br /&gt;Read naively it produces a different result to what you might expect. We can see why it fails by looking at this:&lt;br /&gt;&lt;br /&gt;d (\x -&gt; (d (x*) 2)) (1::Integer)&lt;br /&gt;&lt;br /&gt;It fails to typecheck! That "1" is actually of type D Integer. So the semantics of that code are entirely different to what you might expect if you read it like an ordinary mathematical expression and ignored the types of all the subexpressions.&lt;br /&gt;&lt;br /&gt;So I agree that it is possible for a programmer to misread that code. I still don't consider the algorithm to have failed in this case. It is standard in Haskell that the appearance of an integer in code doesn't necessarily mean it's of Integer type, so that when we write Haskell code we always need to be aware of the types of all of our terms. When we write d (\x -&gt; (d (x*) 2)) 1, we're asking for the wrong thing before the code has started executing. But I have been convinced that there are dangers here for people reading the code naively.&lt;br /&gt;&lt;br /&gt;However I am now completely convinced the situation is more complex than this and I'll have to address it again.&lt;br /&gt;&lt;br /&gt;Anyway, I suggest using AD regardless. It's an awesomely powerful technique. But don't capture variables in a lambda or function without wrapping them immediately them in a lift, and if you multiply nest, you need to multiply lift. Essentially lift is a signal to the compiler that you desire the variable to be held constant with respect to the derivative. It also makes the code more readable. It's analogous to using fmap the correct number of times in order to push a function down into nested lists.&lt;br /&gt;&lt;br /&gt;So I'll leave the following in place because removing it would be bad form and some of it still stands.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;In a &lt;a href="http://eprints.nuim.ie/566/1/Perturbation.pdf"&gt;recent paper&lt;/a&gt;, Siskind and Pearlmutter warn how the use of differentiation operators in functional programming languages is "fraught with danger" and discuss a problem common to "all attempts to integrate a forward-mode AD operator into Haskell".&lt;br /&gt;&lt;br /&gt;This is curious. I have had great success using automatic differentiation code both in Haskell and functional-style C++ and failed to notice the danger. Clearly I needed to take better notice of what I was doing.&lt;br /&gt;&lt;br /&gt;So let's go ahead and implement AD and try to reproduce the problem they point out.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Automatic Differentiation&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data D a = D { real :: a, infinitesimal :: a } deriving (Eq, Show)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Num a =&amp;gt; Num (D a) where&lt;br /&gt;&amp;gt;   fromInteger n = D (fromInteger n) 0&lt;br /&gt;&amp;gt;   D a a'+D b b' = D (a+b) (a'+b')&lt;br /&gt;&amp;gt;   D a a'*D b b' = D (a*b) (a*b'+a'*b)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can now define a differentiation operator:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; d f x = infinitesimal (f (D x 1))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can use &lt;tt&gt;d&lt;/tt&gt; to differentiate a function like f(x) = x&lt;sup&gt;3&lt;/sup&gt;+2x&lt;sup&gt;2&lt;/sup&gt;+x+1 at 2 to get:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; example0 =  d (\x -&amp;gt; x^3+2*x^2+x+1) 2&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Imagine you were confused enough by AD to write Siskind and Pearlmutter's example exactly as described in equation (2) of the paper:&lt;br /&gt;&lt;pre&gt;example1 =  d (\x -&gt; x*(d (\y -&gt; x+y) 1)) 1&lt;br /&gt;&lt;/pre&gt;We don't get an incorrect result. Instead, we get this error message:&lt;br /&gt;&lt;pre&gt;Occurs check: cannot construct the infinite type: a0 = D a0&lt;br /&gt;Expected type: D (D a0)&lt;br /&gt;  Actual type: D a0&lt;br /&gt;In the first argument of `(+)', namely `x'&lt;br /&gt;In the expression: x + y&lt;br /&gt;&lt;/pre&gt;The Haskell type checker identifies precisely where the problem is. &lt;tt&gt;x&lt;/tt&gt; and &lt;tt&gt;y&lt;/tt&gt; don't have the same types so they can't be added. Rather than immediately analyse how to fix this, let's try a completely different approach to calculus: symbolic differentiation. We'll define an expression type and write code to differentiate it&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Symbolic Differentiation&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data E a = X | Const a | E a :+: E a | E a :*: E a deriving (Eq, Show)&lt;br /&gt;&amp;gt; diff X = 1&lt;br /&gt;&amp;gt; diff (Const _) = 0&lt;br /&gt;&amp;gt; diff (a :+: b) = diff a + diff b&lt;br /&gt;&amp;gt; diff (a :*: b) = a:*: diff b + diff a :*: b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We want to be able to evaluate these expressions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; eval X x = x&lt;br /&gt;&amp;gt; eval (Const a) x = a&lt;br /&gt;&amp;gt; eval (a :+: b) x = eval a x + eval b x&lt;br /&gt;&amp;gt; eval (a :*: b) x = eval a x * eval b x&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can make this easier to use by making &lt;tt&gt;E a&lt;/tt&gt; an instance of &lt;tt&gt;Num&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Num a =&amp;gt; Num (E a) where&lt;br /&gt;&amp;gt;   fromInteger n = Const (fromInteger n)&lt;br /&gt;&amp;gt;   a + b = a :+: b&lt;br /&gt;&amp;gt;   a * b = a :*: b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And now we can write an alternative to &lt;tt&gt;d&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; d' f x = eval (diff (f X)) x&lt;br /&gt;&amp;gt; example1 =  d' (\x -&amp;gt; x^3+2*x^2+x+1) 2&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We of course get the same result.&lt;br /&gt;&lt;br /&gt;So let's try the example from the paper again:&lt;br /&gt;&lt;pre&gt;example1 =  d' (\x -&gt; x*(d' (\y -&gt; x+y) 1)) 1&lt;br /&gt;&lt;/pre&gt;We don't get an incorrect result. Instead, we get this error message:&lt;br /&gt;&lt;pre&gt;Occurs check: cannot construct the infinite type: t0 = E t0&lt;br /&gt;Expected type: E (E t0)&lt;br /&gt;  Actual type: E t0&lt;br /&gt;In the first argument of `(+)', namely `x'&lt;br /&gt;In the expression: x + y&lt;br /&gt;&lt;/pre&gt;An almost identical error message. So what's going on?&lt;br /&gt;&lt;br /&gt;Look at the type signature to &lt;tt&gt;d'&lt;/tt&gt;:&lt;br /&gt;&lt;pre&gt;d' :: Num t =&gt; (E a -&gt; E t) -&gt; t -&gt; t&lt;br /&gt;&lt;/pre&gt;When we evaluate&lt;br /&gt;&lt;pre&gt;d' (\x -&gt; x*(d' (\y -&gt; x+y) 1)) 1&lt;br /&gt;&lt;/pre&gt;the outer &lt;tt&gt;d'&lt;/tt&gt; differentiates symbolically so internally it uses an expression type. This means that &lt;tt&gt;x&lt;/tt&gt; is an expression, not a numerical value. But this means that the inner &lt;tt&gt;d'&lt;/tt&gt; is being asked to evaluate its result at a value that is itself an expression, not a value. So internally it uses expressions of expressions. So &lt;tt&gt;x&lt;/tt&gt; is of type &lt;tt&gt;E a&lt;/tt&gt; and &lt;tt&gt;y&lt;/tt&gt; is of type &lt;tt&gt;E (E a)&lt;/tt&gt;. It's no surprise we can't add them. Our bug is easily fixed. We define a lifting function:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; lift' x = Const x&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;and now we can correctly evaluate:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; example2 =  d' (\x -&amp;gt; x*(d' (\y -&amp;gt; lift' x+y) 1)) 1&lt;br /&gt;&lt;br /&gt;&amp;gt; lift x = D x 0&lt;br /&gt;&amp;gt; example3 =  d (\x -&amp;gt; x*(d (\y -&amp;gt; lift x+y) 1)) 1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Much the same discussion applies to the AD code. In that case, &lt;tt&gt;D a&lt;/tt&gt; is the type of &lt;tt&gt;a&lt;/tt&gt;'s that have been infinitesimally perturbed. The type &lt;tt&gt;D (D a)&lt;/tt&gt; is a value whose perturbation is itself perturbed. But Siskind and Pearlmutter say that we might be subject to perturbation confusion and could get 2. Curious. There is no way we can 'confuse' these distinct perturbations. Not no how. Not no way. They correspond to data of entirely different types. Far from being fraught with danger, the Haskell type system keeps us safe. Perturbation confusion in Haskell is about as likely as expression tree confusion.&lt;br /&gt;&lt;br /&gt;Let's take a look at the analysis in section 2. They introduce a perturbation &amp;epsilon; and show how we might have computed 2 instead of 1. But this derivation doesn't correspond to any computation we could possibly have made using the function &lt;tt&gt;d&lt;/tt&gt; in Haskell. The only issue we have identified is that we have to write our code using correct types. This has nothing to do with automatic differentiation or perturbations. It applies equally well to the symbolic differentiation code. In fact, it has nothing to do with differentiation as it applies equally well to my code to compute the &lt;a href="http://blog.sigfpe.com/2010/09/automatic-evenodd-splitting.html"&gt;even and odd parts&lt;/a&gt; of a function.&lt;br /&gt;&lt;br /&gt;We can press on with the paper. Section three tries to sell us a remedy - tagging. But we have no need for tagging. The Haskell compiler already deduced that the perturbation inside the inner &lt;tt&gt;d&lt;/tt&gt; is of a different type to that in the outer &lt;tt&gt;d&lt;/tt&gt;. The only explanation I can come up with for this section is that the authors have some experience with functional programming languages that are dynamically typed and are trying to apply that experience to Haskell.&lt;br /&gt;&lt;br /&gt;Section 4 reiterates the point in the abstract that implementations of AD "fail to preserve referential transparency". I think we can safely ignore this claim. The AD code above isn't using any unsafe Haskell operations. It clearly is referentially transparent.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Confusion lifted&lt;/b&gt;&lt;br /&gt;So now to section 6. They talk about the &lt;tt&gt;lift&lt;/tt&gt; function and how correctly inserting it requires "sophisticated non-local analysis". Now everything becomes clear. Siskind and Pearlmutter don't consider this algorithm to be "automatic differentiation" unless they can leave out the &lt;tt&gt;lift&lt;/tt&gt; operations. But it is not common practice to write Haskell code by deliberately writing a significant body of code that doesn't type check, and then expect to automatically insert the missing pieces. You simply write the code correctly to start with. In fact, when I first found myself inserting a &lt;tt&gt;lift&lt;/tt&gt; in my own code I didn't even think of myself as having solved a problem - I just wrote the code that was analogous to code that Haskell programmers write every day. If I had forgotten the &lt;tt&gt;lift&lt;/tt&gt;, the compiler would have set me straight anyway.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;br /&gt;"Perturbation confusion" is a non-existent problem in Haskell AD. Siskind and Pearlmutter correctly point out that Haskell code using AD is not exactly analogous to mathematical notation for derivatives. This is unsurprising, differentiation is not computable (for deep reasons reasons that are similar to the reason that &lt;a href="http://blog.sigfpe.com/2010/05/constructing-intermediate-values.html"&gt;intermediate values are not computable&lt;/a&gt;). But functions like &lt;tt&gt;lift&lt;/tt&gt; are an everyday part of typed functional programming. This is not a feature of AD, it applies equally as well to symbolic differentiation (and indeed many other parts of Haskell such as using monad transformers and nested functors). The algorithm is no less a form of AD because of the use of &lt;tt&gt;lift&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;I have been advocating AD for many applications over recent years. It has been an uphill struggle. It is often incorrectly assumed that I am talking about numerical or symbolic differentiation. Programmers will often assert that what I am describing is completely impossible. But when they grasp what I am explaining the response often changes to "why didn't anyone tell me about this years ago?" It is disappointing to see two major contributors to computer science whose work I greatly respect scaring away potential users with FUD about a non-existent problem.&lt;br /&gt;&lt;br /&gt;(And if you got here but skipped the update at the top you really need to go to the top as I now think Siskind and Pearlmutter have a good point! I'm leaving the above just for the sake of posterity.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-393797559188947149?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/393797559188947149/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=393797559188947149' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/393797559188947149'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/393797559188947149'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/04/perturbation-confusion-confusion.html' title='Perturbation confusion confusion'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-7096752050077606328</id><published>2011-04-09T09:37:00.000-07:00</published><updated>2011-04-09T09:42:43.524-07:00</updated><title type='text'>Image-based rendering and some ancient history</title><content type='html'>&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;I've not said much about my work in visual effects in this blog. This is mainly because I try very carefully to avoid any kind of intellectual property conflict with my employer. But now I've left the world of visual effects to work at Google, I think I might throw in the occasional article. So here's something about my own work from many years ago. In particular, back in the year 2000 I and my colleagues George and Kim received an &lt;a href="http://www.oscars.org/awards/scitech/index.html"&gt;Academy Scientific and Technical Achievement Award&lt;/a&gt; for "the development of a system for image-based rendering allowing choreographed camera movements through computer graphic reconstructed sets." I rarely tell people what this work was for. But now I'll have a convenient web page to which I can direct anyone who asks. I'm hoping to aim this at people who know a little mathematics and geometry, but not necessarily anything about computer graphics.&lt;br /&gt;&lt;br /&gt;Many years ago I worked at a company called &lt;a href="http://en.wikipedia.org/wiki/Manex_Visual_Effects"&gt;Manex Visual Effects&lt;/a&gt;. One day the story of that company needs to be told. Here are some things that have been alleged about Manex that I neither confirm nor deny: its stock was traded in one of the largest ever pump-and-dump scams in the Far East, it extracted millions of dollars from Trenton, New Jersey for a fake scheme to create an East Coast "Hollywood", and it spent a couple of years making press releases about how it was doing work on the Matrix sequels despite the fact that there was nobody at the company who was actually doing work on the movies, including making press releases quoting me describing the ongoing work long after I had left. At one point I was apparently being trailed by a private detective who even came rowing a boat past the back of my house in order to get pictures of me. I haven't checked the fine print, but I think the contracts that prevented me from speaking about any of this expired long ago.&lt;br /&gt;&lt;br /&gt;But back around 1998-2000, when Manex was doing real effects work, we developed a pipeline for a technique known as &lt;a href="http://en.wikipedia.org/wiki/Image-based_modeling_and_rendering"&gt;image based rendering&lt;/a&gt;. We became adept at taking large numbers of photographs of locations and then reproducing those locations as photorealistic renders. When I say photorealistic here I don't mean something that's merely supposed to look real, but actually looks fake. I mean renders that were indistinguishable from photography. That's commonplace now but back in 2000 it was a challenge, especially on a large scale.&lt;br /&gt;&lt;br /&gt;The last ten seconds of this video clip from &lt;a href="http://www.imdb.com/title/tt0120755/"&gt;MI:2&lt;/a&gt; should give some idea of what I'm talking about. The city around Tom Cruise as he jumps from the building is entirely CGI, but using real photography:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;object height="344" width="425"&gt; &lt;param name="movie" value="http://www.youtube.com/v/1UqM_PHSv8w?fs=1&amp;start=265"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/1UqM_PHSv8w?fs=1&amp;start=265"  type="application/x-shockwave-flash"    allowfullscreen="true"      allowscriptaccess="always"        width="425" height="344"&gt;  &lt;/embed&gt;  &lt;/object&gt;&lt;/div&gt;&lt;br /&gt;&lt;b&gt;Texturing Cities&lt;/b&gt;&lt;br /&gt;Although MI:2 isn't set in Sydney, that is the location that was used. A team returned with many hundreds, if not thousands of photographs of the central business district. We also managed to construct 3D models of these buildings by various means. We started by using &lt;a href="http://ict.debevec.org/~debevec/Research/"&gt;Façade&lt;/a&gt; to construct the geometry, but ultimately we used a variety of methods including buying 3D models from, I think, the city itself. In later years I completely replaced the way we built geometry so we didn't need to use any other source.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-K-8ZwJ6mtdg/TaBseXpnEmI/AAAAAAAAArk/bYXgc-2l-w0/s1600/projection.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://3.bp.blogspot.com/-K-8ZwJ6mtdg/TaBseXpnEmI/AAAAAAAAArk/bYXgc-2l-w0/s320/projection.png" width="251" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The goal was to map the photography onto the models. Here's a picture illustrating three points being mapped onto a building. To render correctly we needed every visible point on every 3D object to be coloured (correct terminology: textured) using a pixel from one of the photographs.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Matrix&lt;/b&gt;&lt;br /&gt;We're ready for a theorem. Let &lt;i&gt;P&lt;/i&gt; be a photograph of a scene &lt;i&gt;S&lt;/i&gt;. Let (&lt;i&gt;u&lt;/i&gt;, &lt;i&gt;v&lt;/i&gt;) be ordinary rectilinear coordinates in &lt;i&gt;P&lt;/i&gt;. Let (&lt;i&gt;x&lt;/i&gt;, &lt;i&gt;y&lt;/i&gt;, &lt;i&gt;z&lt;/i&gt;) be 3D coordinates in the scene &lt;i&gt;S&lt;/i&gt;. Define proj(&lt;i&gt;u&lt;/i&gt;, &lt;i&gt;v&lt;/i&gt;, &lt;i&gt;w&lt;/i&gt;) = (&lt;i&gt;u&lt;/i&gt;/&lt;i&gt;w&lt;/i&gt;, &lt;i&gt;v&lt;/i&gt;/&lt;i&gt;w&lt;/i&gt;). Then there is a 3×4 matrix &lt;i&gt;M&lt;/i&gt; such that for every point (&lt;i&gt;x&lt;/i&gt;, &lt;i&gt;y&lt;/i&gt;, &lt;i&gt;z&lt;/i&gt;) visible in &lt;i&gt;S&lt;/i&gt;, its colour is given by the colour of the point with coordinates &lt;i&gt;proj&lt;/i&gt;(&lt;i&gt;M&lt;/i&gt;(&lt;i&gt;x&lt;/i&gt;,&lt;i&gt;y&lt;/i&gt;,&lt;i&gt;z&lt;/i&gt;,1)&lt;sup&gt;&lt;i&gt;T&lt;/i&gt;&lt;/sup&gt;) in &lt;i&gt;P&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;(This assumes a pinhole projection model for the camera. In fact, real cameras have &lt;a href="http://en.wikipedia.org/wiki/Distortion_(optics)"&gt;lens distortion&lt;/a&gt;. I wrote software to measure and fix this.)&lt;br /&gt;&lt;br /&gt;The important point is that for each camera we just needed one matrix. We generated it by a very simple scheme: we had artists mark correspondences between points on our 3D models and points in the photography. I implemented a least squares solver to find the matrix that best fit each view. Pity the artists. They came fresh from college with well developed 3d modelling skills and I was responsible for reducing their careers to this simple point and click operation. But don't feel too bad. Most of these artists have since gone on to very successful careers in visual effects. They were a really great team.&lt;br /&gt;&lt;br /&gt;But just mapping a single photograph is no good. The theorem tells us what to do for each point visible in a photograph, but how do we select which photograph to use? When producing the final render for production we had the luxury of being able to allow hours to render a single frame. We could make the decision on a pixel by pixel basis by performing a test for visibility from each camera, and then using heuristics to make the selection. (My colleague George had already proved this worked for the backgrounds to most of the bullet-time shots from &lt;i&gt;The Matrix&lt;/i&gt;.)&lt;br /&gt;&lt;br /&gt;But that was no good for the artists. They needed to see their work as they were doing it. If we could get the render time down to seconds it would be good. In fact, we got it down to a fraction of a second.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Splitter&lt;/b&gt;&lt;br /&gt;The first thing to note is that if we render a single triangle then the mapping defined by the theorem above is provided by modern graphics hardware. Back in 2000 it wasn't universally available, but it was in the SGI hardware we used. So we only needed to solve the camera selection problem fast. The solution was simple, we'd preprocess the geometry of the scene by splitting it into pieces (reducible to triangles), each piece corresponding a a part visible from one camera. &lt;a href="http://ict.debevec.org/~debevec/Research/"&gt;Debevec et al.&lt;/a&gt; had already given demos of a technique for approximately partitioning a scene between cameras like this, but it didn't have the quality we wanted.&lt;br /&gt;&lt;br /&gt;(By the way, my career in graphics started when I figured out an algorithm for eliminating the division from the &lt;i&gt;proj&lt;/i&gt;() function and my new &lt;a href="http://en.wikipedia.org/wiki/Jez_San"&gt;boss-to-be&lt;/a&gt; noticed I'd &lt;a href="http://groups.google.com/group/comp.graphics.algorithms/browse_thread/thread/f6dceefbf6c980e8/e2d7a73e2cb621f1"&gt;posted&lt;/a&gt; it online.)&lt;br /&gt;&lt;br /&gt;There were three things needed:&lt;br /&gt;&lt;br /&gt;1. For each camera, we needed to compute the subscene that was the intersection of the scene S with the volume viewed by the camera, its &lt;a href="http://en.wikipedia.org/wiki/Frustum"&gt;frustum&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;2. For each subscene we needed to remove any parts occluded from view by any other part of the scene.&lt;br /&gt;&lt;br /&gt;3. Ensuring that no two subscenes overlapped - ie. ensuring that no part of any subscene is visible in more than one camera.&lt;br /&gt;&lt;br /&gt;It seemed like a messy computational geometry problem. But after a little thought it dawned on me that most of the geometry work, at least as measured by computation time, could be factored into one tiny geometry routine and the rest was logic. Here's a picture of the algorithm:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-s0E5lezs_8g/TaB6cmQhl8I/AAAAAAAAArw/suVfjFGVtZg/s1600/clipping.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://2.bp.blogspot.com/-s0E5lezs_8g/TaB6cmQhl8I/AAAAAAAAArw/suVfjFGVtZg/s320/clipping.png" width="251" /&gt;&lt;/a&gt;&lt;/div&gt;It takes a convex polygon and slices it using a plane (indicated in grey). Now everything else follows. For example, step 1 above can be achieved like this: the frustum associated with a camera is a 6-sided convex polyhedron (or a 4- or 5-sided polyhedron if you want it to extend all the way to infinity and/or all the way to the point of projection in the camera.) We can decide which points are in the frustum by slicing using the 6 planes and keeping the pieces that fall inside.&lt;br /&gt;&lt;br /&gt;Step 2 works a little like this:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-Lv7M3QZyZuQ/TaB7c0xlyyI/AAAAAAAAAr4/kY8W-BNG_mQ/s1600/carving.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/-Lv7M3QZyZuQ/TaB7c0xlyyI/AAAAAAAAAr4/kY8W-BNG_mQ/s320/carving.png" width="251" /&gt;&lt;/a&gt;&lt;/div&gt;There is a 5-sided polygon lying in front of a box casting a shadow. The shadow volume is itself a 6-sided frustum (with a seventh side "at infinity", so to speak). So to remove the parts shadowed by a polygon we use the same slicing algorithm and collect up all the pieces that fall outside of the shadow volume. To remove&amp;nbsp;&lt;i&gt;all&lt;/i&gt;&amp;nbsp;of the scene that is occluded from view by this camera we simply remove shadow volume corresponding to every single polygon in the scene. One of the virtues of image based rendering is that the detail in the photography makes up for the simplicity of the geometry, keeping the polygon count low. So the total number of operations might not seem too bad. Unfortunately, every time you slice through the scene you risk doubling the total number of polygons. It could have taken worse than exponential time in the number of polygons. But I took the risk, and surprisingly the time to preprocess was measured in minutes for typical scenes. The individual pieces of geometry were very intricate due to the fact that occlusion from any object could carve into a building. But like a jigsaw, the union of all the pieces gave something that looked just like a real city.&lt;br /&gt;&lt;br /&gt;The last stage was ensuring that each part is visible from one camera. This was straightforward. As every polygon was processed with respect to every camera then I could associate to every polygon a list of cameras in which it was visible. At the end I could just sweep through and pick the best camera based on a heuristic. Typically we wanted to use the camera with the most undistorted detail.&lt;br /&gt;&lt;br /&gt;Along the way I had to apply a few more heuristics to make things manageable. Many thin slivers would appear in the slicing. If they were small enough I threw them away. I'd also sweep through from time to time and fuse together neighbouring polygons that had been sliced but ended up still being visible in the same cameras, and whose union was still convex. That would reduce the total polygon count and speed up further processing.&lt;br /&gt;&lt;br /&gt;It worked. The artists would mark their correspondences, run the optimiser to extract the per-camera matrices, kick off the 'splitter', have a coffee, and then return to a fully interactive view of Sydney, or wherever. They could now examine it from all angles and quickly ascertain what further work was needed, eg. if there were holes in one of the buildings. That allowed us to develop an efficient workflow and carry out the work on the scale needed to complete movie shots. We could also interactively visualise how other objects interacted with our cities. We could also use this tool to plan the photography and ensure we had all the coverage we needed.&lt;br /&gt;&lt;br /&gt;In retrospect it all seems trivial. But at the time I don't think any other company could churn out entire city blocks the way we could. Today, Google do similar work on a much larger scale, and much more cleverly, with Street View.&lt;br /&gt;&lt;br /&gt;And that was just one part of the technique that we marketed as "virtual cinematography" and which won George, Kim and I our award. But it's really important to remember that movie making is a big team effort. It took the work of many dozens of people to create our photorealistic cities, and the fact that I've chosen to write about my contribution doesn't mean that it was the only part that was important. It wouldn't have happened if George hadn't taught us the method originally, and if Kim hadn't believed in us and formed part of the company around our pipeline. And of course nothing wouldn't have happened if the artists didn't politely put up with our crudely engineered tools.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Ackowledgement&lt;/b&gt;&lt;br /&gt;The example city image above was derived from a picture by Calvin Teo on &lt;a href="http://en.wikipedia.org/wiki/File:Singapore_Skyline_Raffles_Place.jpg"&gt;wikipedia&lt;/a&gt;. I don't have the original photography we used.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7096752050077606328?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/7096752050077606328/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=7096752050077606328' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7096752050077606328'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7096752050077606328'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/04/image-based-rendering-and-some-ancient.html' title='Image-based rendering and some ancient history'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-K-8ZwJ6mtdg/TaBseXpnEmI/AAAAAAAAArk/bYXgc-2l-w0/s72-c/projection.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-8959524458695651456</id><published>2011-04-02T15:04:00.000-07:00</published><updated>2011-04-02T15:04:58.290-07:00</updated><title type='text'>Generalising Gödel's Theorem with Multiple Worlds. Part IV.</title><content type='html'>&lt;b&gt;Interpolating Between Propoitions&lt;/b&gt;&lt;br /&gt;Suppose a, b and c are propositions. Then a&amp;and;b&amp;rarr;b&amp;or;c. It seems that the a is irrelevant to what's happening on the right hand side. We could remove it and still have a true proposition: b&amp;rarr;b&amp;or;c. It seems that the c on the right hand side is also irrelevant so that this is also true: a&amp;and;b&amp;rarr;b. In fact, we can "refactor" the original proposition as a&amp;and;b&amp;rarr;b&amp;rarr;b&amp;or;c. (By &lt;a href="http://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence"&gt;Curry Howard&lt;/a&gt; it *is* a kind of refactoring.)&lt;br /&gt;&lt;br /&gt;Let's try using the methods I described in Part I to demonstrate the validity of a&amp;and;b&amp;rarr;b&amp;or;c:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-rwMNywE8tyE/TZPPHqkXxuI/AAAAAAAAAq4/BddwLjabAm0/s1600/interp1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/-rwMNywE8tyE/TZPPHqkXxuI/AAAAAAAAAq4/BddwLjabAm0/s320/interp1.png" width="214" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;I've done two things slightly differently. I brought the negation inside the implication at the start. This means we start with an implication with a clearly defined left- and right-hand side. It makes no difference to the outcome. But I've also coloured terms coming from the left-hand side in bLue and those on the right in oRange. Eventually we see a b and a &amp;not;b meet each other resulting in the big red X telling us that the negation of a&amp;and;b&amp;rarr;b&amp;or;c is invalid. But notice how the b and &amp;not;b came from opposite sides of the implication. This tells us that the implication is valid because of the b's and that the appearance of a and c plays no role in establishing validity.&lt;br /&gt;&lt;br /&gt;Now suppose a blue a met a blue &amp;not;a. They would have both originated from the left hand side, telling us that the left hand side was a contradiction, regardless of what's on the right. So if the original implication we wished to establish were written as L&amp;rarr;R then we'd have established that we could factor it as L&amp;rarr;&amp;perp;&amp;rarr;R. Similarly, if an orange a met an orange &amp;not;a we'd have established the invalidity of &amp;not;R meaning that we get L&amp;rarr;&amp;#x22a4;&amp;rarr;R.&lt;br /&gt;&lt;br /&gt;In all three cases we've managed to find an "interpolating" formula F such that L&amp;rarr;F&amp;rarr;R with the property that F only refers to letters that occur *both* in L and R. It may seem intuitively obvious that we can do this. Irrelevant hypotheses shouldn't play a role in establishing an implication. In fact, this is generally true of propositional calculus and is known as the &lt;a href="http://en.wikipedia.org/wiki/Craig_interpolation"&gt;Craig Interpolation Lemma&lt;/a&gt;. It also holds for Provability Logic. This is certainly not an obvious fact. The interpolation property fails for many logics.&lt;br /&gt;&lt;br /&gt;I'm only going to roughly sketch how the proof of the interpolation lemma looks. Essentially it will be a proof by construction with the construction being the following code. We'll implement a bunch of tableau rules that construct the interpolation lemma. We've also seen what some of these rules look like in the discussion above.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Constructing Interpolations&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I coloured propositions orange and blue above. I can't use colour in Haskell code so I'll instead label propositions &lt;tt&gt;L&lt;/tt&gt; and &lt;tt&gt;R&lt;/tt&gt; using the following type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data SignedProp a = L a | R a deriving (Eq, Ord, Show)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Think of the colour or sidedness of a proposition as its 'sign'. Sometimes we'll make a selection based on the sign:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; select (L _) l r = l&lt;br /&gt;&amp;gt; select (R _) l r = r&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Functor SignedProp where&lt;br /&gt;&amp;gt;     fmap f (L a) = L (f a)&lt;br /&gt;&amp;gt;     fmap f (R a) = R (f a)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Sometimes we'll want to remove the sign:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; unsign (L a) = a&lt;br /&gt;&amp;gt; unsign (R b) = b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now you may see why I had to use Haskell's &lt;tt&gt;ViewPattern&lt;/tt&gt; extension. I want to do the same pattern matching as before on these propositions even though they are a different type. As all the rules view the patterns through &lt;tt&gt;propType&lt;/tt&gt; we can achieve this by making &lt;tt&gt;SignedProp a&lt;/tt&gt; an instance of &lt;tt&gt;PropTypeable&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance PropTypeable a =&amp;gt; PropTypeable (SignedProp a) where&lt;br /&gt;&amp;gt;     propType (L a)    = fmap L (propType a)&lt;br /&gt;&amp;gt;     propType (R a)    = fmap R (propType a)&lt;br /&gt;&amp;gt;     neg               = fmap neg&lt;br /&gt;&amp;gt;     isF               = isF . unsign&lt;br /&gt;&amp;gt;     positiveComponent = positiveComponent . unsign&lt;br /&gt;&amp;gt;     negative          = negative . unsign&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;With this framework in place the code to interpolate is quite small. Instead of returning a &lt;tt&gt;Bool&lt;/tt&gt; or a diagram, these rules return &lt;tt&gt;Just&lt;/tt&gt; the interpolating proposition when possible, otherwise a &lt;tt&gt;Nothing&lt;/tt&gt;. But the code is also going to do something slightly more general. If we close a potential world containing propositions l&lt;sub&gt;1&lt;/sub&gt;,l&lt;sub&gt;2&lt;/sub&gt;,...,r&lt;sub&gt;1&lt;/sub&gt;,r&lt;sub&gt;2&lt;/sub&gt;,... we have shown that l&lt;sub&gt;1&lt;/sub&gt;&amp;and;l&lt;sub&gt;2&lt;/sub&gt;&amp;and;...&amp;and;r&lt;sub&gt;1&lt;/sub&gt;&amp;and;r&lt;sub&gt;2&lt;/sub&gt;&amp;and;... isn't valid. Ie. that l&lt;sub&gt;1&lt;/sub&gt;&amp;and;l&lt;sub&gt;2&lt;/sub&gt;&amp;and;...&amp;rarr;&amp;not;r&lt;sub&gt;1&lt;/sub&gt;&amp;or;&amp;not;r&lt;sub&gt;2&lt;/sub&gt;&amp;or;... *is* valid. If the l&lt;sub&gt;i&lt;/sub&gt; are L-propositions, and the r&lt;sub&gt;i&lt;/sub&gt; are R-propositions, then the rules I define below will find an interpolating formula for l&lt;sub&gt;1&lt;/sub&gt;&amp;and;l&lt;sub&gt;2&lt;/sub&gt;&amp;and;...&amp;rarr;&amp;not;r&lt;sub&gt;1&lt;/sub&gt;&amp;or;&amp;not;r&lt;sub&gt;2&lt;/sub&gt;&amp;or;.... For the original case of p&amp;rarr;q we prime it with p on the left and &amp;not;q on the right:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; interp (p :-&amp;gt; q) = let t = runTableau interpRules [L p, R (Neg q)]&lt;br /&gt;&amp;gt;                    in simplify &amp;lt;$&amp;gt; t&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And now we can give the rules:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; interpRules = TableauRules {&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The first three branches of this case are the three rules discussed at the beginning. The fourth should be fairly obvious:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     foundContradiction = \a -&amp;gt; Just $ case a of&lt;br /&gt;&amp;gt;        (L _, L _) -&amp;gt; F&lt;br /&gt;&amp;gt;        (R _, R _) -&amp;gt; T&lt;br /&gt;&amp;gt;        (L n, R _) -&amp;gt; n&lt;br /&gt;&amp;gt;        (R n, L _) -&amp;gt; Neg n,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The algorithm needs to know when a world has closed. The original validity rules returned a boolean. These rules return a &lt;tt&gt;Maybe&lt;/tt&gt;, and we know a world closed if the algorithm succeeded in returning an interpolating proposition:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     closes = isJust,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;These are the trivial cases of finding a blue or orange &amp;perp;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     foundF = \a -&amp;gt; Just $ select a F T,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Some rules that really just serve as glue:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     open     = \_ -&amp;gt; Nothing,&lt;br /&gt;&amp;gt;     doubleNegation = \_ t -&amp;gt; t,&lt;br /&gt;&amp;gt;     conjRule = \_ _ t -&amp;gt; t,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now suppose we want to find an interpolating formula for l&lt;sub&gt;1&lt;/sub&gt;&amp;or;l&lt;sub&gt;2&lt;/sub&gt;&amp;rarr;r. If we set up our tableau we get:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-SjoRpiUsidw/TZPPH6cKfUI/AAAAAAAAAq8/RJvBW8ZMsZk/s1600/interp2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://2.bp.blogspot.com/-SjoRpiUsidw/TZPPH6cKfUI/AAAAAAAAAq8/RJvBW8ZMsZk/s320/interp2.png" width="263" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;(Note I'm using l and r as metavariables here, so l&lt;sub&gt;1&lt;/sub&gt;, l&lt;sub&gt;2&lt;/sub&gt; and r represent propositions made up of (possibly many) ordinary single letter variables.) If we complete the two sides of the divide using our interpolation algorithm recursively we'll find propositions f&lt;sub&gt;i&lt;/sub&gt; such that l&lt;sub&gt;i&lt;/sub&gt;&amp;rarr;f&lt;sub&gt;i&lt;/sub&gt;&amp;rarr;r. Hence we find l&lt;sub&gt;1&lt;/sub&gt;&amp;or;l&lt;sub&gt;2&lt;/sub&gt;&amp;rarr;f&lt;sub&gt;1&lt;/sub&gt;&amp;or;f&lt;sub&gt;2&lt;/sub&gt;&amp;rarr;r. Clearly the middle proposition only contains letters that appear on both sides of the original implication. A similar analysis allows us to find an interpolating proposition for l&amp;rarr;r&lt;sub&gt;1&lt;/sub&gt;&amp;and;r&lt;sub&gt;2&lt;/sub&gt;. That gives us this rule:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     disjRule = \p _ _ tl tr -&amp;gt; select p (:\/) (:/\) &amp;lt;$&amp;gt; tl &amp;lt;*&amp;gt; tr,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Those are the rules for classical propositional calculus.&lt;br /&gt;&lt;br /&gt;Now comes the tricky bit. If we draw a big red X in a subworld it allows us to back out and deduce an interpolating proposition in the parent world. The rule is simple but I'll leave the proof sketch as an appendix:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     processWorld  = \p t -&amp;gt; select p Dia Box &amp;lt;$&amp;gt; t,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;If any subworld of a world is closed, so is the parent world, so we can ignore all but the first closed subworld:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     combineWorlds = mplus,&lt;br /&gt;&lt;br /&gt;&amp;gt;     tableau = \_ t -&amp;gt; t&lt;br /&gt;&amp;gt; }&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;You can reproduce the wikipedia &lt;a href="http://en.wikipedia.org/wiki/Craig_interpolation#Example"&gt;example&lt;/a&gt; with this &lt;tt&gt;interp ((neg (p /\q) --&amp;gt; neg r /\q) --&amp;gt; (t --&amp;gt; p) \/ (t --&amp;gt; neg r))&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Definability&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In my previous post I talked about how in provability logic we can define propositions implicitly. Now we're in a position to do the construction. Firstly we need to say precisely what we mean by a definition inside the language of provability logic, and then we need to say what ensures that definitions make sense.&lt;br /&gt;&lt;br /&gt;An implicit definition of a proposition in provability logic is a function &lt;tt&gt;Prop -&amp;gt; Prop&lt;/tt&gt; that doesn't analyze its argument. Informally, it defines a proposition &lt;tt&gt;p&lt;/tt&gt; if &lt;tt&gt;f p&lt;/tt&gt; is valid. Some candidates might be:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; def1 p = T --&amp;gt; p&lt;br /&gt;&amp;gt; def2 p = p --&amp;gt; T&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The first, &lt;tt&gt;def1&lt;/tt&gt;, seems fine. It seems to uniquely single out &lt;tt&gt;p == T&lt;/tt&gt; because &lt;tt&gt;def1 T&lt;/tt&gt; is valid. But &lt;tt&gt;def1 (neg F)&lt;/tt&gt; is also valid. So we can't uniquely pin down propositions, but only up to some sort of equivalence. In fact, we can use &lt;tt&gt;&amp;lt;-&amp;gt;&lt;/tt&gt; as our equivalence relation.&lt;br /&gt;&lt;br /&gt;But the second attempted definition is useless. Any proposition satisfies it. So we only consider a definition &lt;tt&gt;d&lt;/tt&gt; to be valid if any two propositions satisfying it are equivalent. So we can say that &lt;tt&gt;d&lt;/tt&gt; defines a proposition &lt;tt&gt;h&lt;/tt&gt; if &lt;tt&gt;d p --&amp;gt; (p &amp;lt;-&amp;gt; h)&lt;/tt&gt; is valid.&lt;br /&gt;&lt;br /&gt;So here's the beginning of a function that attempts to find a proposition satisfying an implicit definition:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; beth d = let p = Letter "__p"&lt;br /&gt;&amp;gt;              q = Letter "__q"&lt;br /&gt;&amp;gt;          in if not $ valid $ (d p /\ d q) --&amp;gt; (p &amp;lt;-&amp;gt; q)&lt;br /&gt;&amp;gt;             then error $ show (d p) ++ " doesn't satisfy precondition"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;You can see how I've encoded our precondition for a definition to be good. I also used &lt;tt&gt;__p&lt;/tt&gt; and &lt;tt&gt;__q&lt;/tt&gt; to make sure we didn't clash with any letters in the definition. It's called &lt;tt&gt;beth&lt;/tt&gt; because the theorem that ensures we can write the &lt;tt&gt;else&lt;/tt&gt; clause is known as the Beth Definability Theorem. The last line is astonishingly short:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;             else interp (d p /\ p --&amp;gt; (d q --&amp;gt; q))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Suppose we have proved that d(p)&amp;and;d(q)&amp;rarr;(p&amp;harr;q). It immediately follows that d(p)&amp;and;p&amp;rarr;(d(q)&amp;rarr;q). (It's essentially a bit of currying.) This is a candidate for Craig interpolation as the left hand side has no q and the right hand side has no p. So we can make a sentence h so that d(p)&amp;and;p&amp;rarr;h and h&amp;rarr;(d(q)&amp;rarr;q)). The lettes p and q are just letters. If a proposition is true for q, it's also true for p. So with a little rearrangement we get that d(p)&amp;rarr;(p&amp;harr;h). Or in English, if p satisfies our definition, h is equivalent to it. So we've constructed an h that does what we want.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Fixed Points&lt;/b&gt;&lt;br /&gt;Now it's a small step to get a fixed point. We just make a definition of fixed point and apply &lt;tt&gt;beth&lt;/tt&gt;. This looks like it might work:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; isFixedPoint' f p = p &amp;lt;-&amp;gt; f p&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Unfortunately it's possible for &lt;tt&gt;p&lt;/tt&gt; and &lt;tt&gt;p'&lt;/tt&gt; to be inequivalent and yet both satisfy &lt;tt&gt;isFixedPoint' f&lt;/tt&gt;. We have to use a "stronger" definition. I'll leave the proof to Boolos's book, but what we'll do is assert not just that &lt;tt&gt;p&lt;/tt&gt; is a fixed point, but also that it is provably so. So we use:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; box' a = a /\ Box a&lt;br /&gt;&amp;gt; isFixedPoint f p = box' (p &amp;lt;-&amp;gt; f p)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;In order for the uniqueness to work, every occurence of the argument of &lt;tt&gt;f&lt;/tt&gt; must be inside a &lt;tt&gt;Box&lt;/tt&gt; or &lt;tt&gt;Dia&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;And that's it. We can now churn out fixed points to our heart's content.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;G&amp;ouml;del's Second Incompleteness Theorem again&lt;/b&gt;&lt;br /&gt;Let's find a proposition that asserts its own unprovability:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; godel = fixedpoint $ \p -&amp;gt; neg (Box p)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We get &lt;tt&gt;Dia T&lt;/tt&gt;, which is the claim that arithmetic is consistent. So &lt;tt&gt;godel&lt;/tt&gt; shows that if arithmetic is consistent, then it can't prove its consistency.&lt;br /&gt;&lt;br /&gt;Here's a large number that I use for regression testing that I lifted from Boolos's book:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; fpexamples = [&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Neg (Box p), Dia T),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Box p, T),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Box (Neg p), Box F),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Neg (Box (Neg p)), F),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Neg (Box (Box p)), Dia (Dia T)),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Box p :-&amp;gt; Box (Neg p), Dia (Dia T) \/ Box F),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Box (Neg p :-&amp;gt; Box F) :-&amp;gt; Box (p :-&amp;gt; Box F),&lt;br /&gt;&amp;gt;                Dia (Dia (Neg (Box F) /\ Neg (Box F)) /\&lt;br /&gt;&amp;gt;                Neg (Box F)) \/ Box (Box F)),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Box p :-&amp;gt; q, Dia (Neg q) \/ q /\ q),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Box (p :-&amp;gt; q), Box q),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Box p /\ q, Box q /\ q),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Box (p /\ q), Box q),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; q \/ Box p, T),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Neg (Box (q :-&amp;gt; p)), Dia q),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Box (p :-&amp;gt; q) :-&amp;gt; Box (Neg p),&lt;br /&gt;&amp;gt;                Dia (Box F /\ Neg q) \/ Box F),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; q /\ (Box (p :-&amp;gt; q) :-&amp;gt; Box (Neg p)), q /\ Box (Neg q)),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Dia p :-&amp;gt; (q /\ Neg (Box (p :-&amp;gt; q))),&lt;br /&gt;&amp;gt;                Box F /\ Box F \/ q /\ Dia ((Box F /\ Box F) /\ Neg q)),&lt;br /&gt;&amp;gt;         (\p -&amp;gt; Box (Box (p /\ q) /\ Box (p /\ r)),&lt;br /&gt;&amp;gt;                Box (Box q /\ Box r))&lt;br /&gt;&amp;gt;     ]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Note how in every case, &lt;tt&gt;p&lt;/tt&gt; is inside a &lt;tt&gt;Box&lt;/tt&gt; or &lt;tt&gt;Dia&lt;/tt&gt;.  It's pretty mind-bending to try to think about what all of these propositions could possibly mean.&lt;br /&gt;&lt;br /&gt;We can easily write to test to see whether two propositions are equivalent (in the sense that they imply each other):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; equiv p q = valid (p &amp;lt;-&amp;gt; q)&lt;br /&gt;&lt;br /&gt;&amp;gt; regress2 = do&lt;br /&gt;&lt;br /&gt;&amp;gt;     print $ and $ map (\(f, x) -&amp;gt; fromJust (fixedpoint f) `equiv` x) fpexamples&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;I am disconcerted that one of the easy looking examples in Boolos fails my tests. Given that the difficult cases agree with my code I think it is likely an error by Boolos, though that's a scary claim to make against one of the best known logicians in the world.&lt;br /&gt;&lt;br /&gt;The company &lt;a href="http://theorymine.co.uk/"&gt;Theory Mine&lt;/a&gt; has been selling theorems. But they don't seem too interesting. Far better to generate theorems that generalise G&amp;ouml;del's work and tell you about the very nature of provability. Just send me $19.99 and I'll send you a certificate.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; main = do&lt;br /&gt;&amp;gt;           regress1&lt;br /&gt;&amp;gt;           regress2&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;Notes&lt;/b&gt;&lt;br /&gt;Here is some &lt;a href="http://comet.lehman.cuny.edu/fitting/bookspapers/glfixpt.pro"&gt;Prolog code&lt;/a&gt; by Melvyn Fitting to do much the same thing. Note that code relies crucially on backtracking whereas my code explicitly searches through subworlds.&lt;br /&gt;&lt;br /&gt;The function &lt;tt&gt;runTableau&lt;/tt&gt; is a kind of generalised fold. Such things have associated induction principles. In this case it's a generalisation of the "Unifying Principle" defined by Smullyan in &lt;a href="http://www.amazon.com/gp/product/0486683702/ref=as_li_ss_tl?ie=UTF8&amp;tag=sigfpe-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0486683702"&gt;First-Order Logic&lt;/a&gt;. I guess it is an ordinary generalised fold over a tree structure representing fully expanded tableaux, but we never explicitly build such a structure.&lt;br /&gt;&lt;br /&gt;Oh...and apologies (if anyone has come this far) for how long this took. Last year I threw together code to find fixed points in a couple of hours thinking "that was easy, I can blog about it". But I hadn't realised how many hours of work it would take to explain what I had done. And even if nobody came this far, the &lt;a href="http://en.wikipedia.org/wiki/Rubber_duck_debugging"&gt;rubber ducking&lt;/a&gt; improved my own understanding greatly.&lt;br /&gt;&lt;br /&gt;&lt;HR&gt;&lt;br /&gt;&lt;b&gt;Appendix&lt;/b&gt;&lt;br /&gt;Suppose we successfully showed that this closed:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh4.googleusercontent.com/-qk9r93ckOS4/TYtoCbcjNGI/AAAAAAAAAqY/-4ZyKKls8jA/s1600/modal1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="https://lh4.googleusercontent.com/-qk9r93ckOS4/TYtoCbcjNGI/AAAAAAAAAqY/-4ZyKKls8jA/s1600/modal1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Then the generalised Craig Interpolation Lemma says that for some suitable f, both of the following close:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh6.googleusercontent.com/-vauslEykiPw/TYtobZ81qlI/AAAAAAAAAqc/LKqWrAY5vOM/s1600/modal2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="https://lh6.googleusercontent.com/-vauslEykiPw/TYtobZ81qlI/AAAAAAAAAqc/LKqWrAY5vOM/s1600/modal2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The way we'll prove this is to follow what we must have done to get the first potential world to close, assume inductively that the interpolation lemma works for all of the potential subworlds that we entered.&lt;br /&gt;&lt;br /&gt;So let's start by supposing that we used &amp;#x25ca;l&lt;sub&gt;2&lt;/sub&gt; to open up a subworld, giving us:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh4.googleusercontent.com/-lzrSuBKswig/TYtpyazTw1I/AAAAAAAAAqk/nmM6nl29bzc/s1600/modal3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="https://lh4.googleusercontent.com/-lzrSuBKswig/TYtpyazTw1I/AAAAAAAAAqk/nmM6nl29bzc/s1600/modal3.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;If the subworld closed then recursively using interpolation we know that for some appropriate choice of g, these closed too:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh5.googleusercontent.com/-Vcze3LDLyyI/TYtqYYXExHI/AAAAAAAAAqo/UnoJVk4SFDc/s1600/modal4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="https://lh5.googleusercontent.com/-Vcze3LDLyyI/TYtqYYXExHI/AAAAAAAAAqo/UnoJVk4SFDc/s1600/modal4.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Now consider trying to close these two worlds:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh3.googleusercontent.com/-Zi9R2P2NuDQ/TYtq6nnN5gI/AAAAAAAAAqs/sedlT2LmYA4/s1600/modal5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="https://lh3.googleusercontent.com/-Zi9R2P2NuDQ/TYtq6nnN5gI/AAAAAAAAAqs/sedlT2LmYA4/s1600/modal5.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;Using Worlds 1 and 2 above we find they close immediately:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh3.googleusercontent.com/-LmzdxR_IPWg/TYtrPqR99II/AAAAAAAAAqw/C09B-YvGyCU/s1600/modal6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="249" src="https://lh3.googleusercontent.com/-LmzdxR_IPWg/TYtrPqR99II/AAAAAAAAAqw/C09B-YvGyCU/s320/modal6.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;And that now tells us that when we recurse back up, we can use f=&amp;#x25ca;g.&lt;br /&gt;&lt;br /&gt;I started with world containing just 6 propositions so I seem only to have proved this for that case. But in fact, any combination of propositions just ends up with an argument that is substantially the same. The only thing that might change is that if we use &amp;#x25ca;r&lt;sub&gt;2&lt;/sub&gt; to try to open up a subworld we find that f=&amp;#x25fb;g for some other g. And this is all summarised compactly by the rule &lt;tt&gt;processWorld&lt;/tt&gt; above.&lt;br /&gt;&lt;br /&gt;&lt;BR&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;bc1=000000&amp;IS2=1&amp;bg1=FFFFFF&amp;fc1=000000&amp;lc1=0000FF&amp;t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as4&amp;m=amazon&amp;f=ifr&amp;ref=ss_til&amp;asins=0486683702" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;bc1=000000&amp;IS2=1&amp;bg1=FFFFFF&amp;fc1=000000&amp;lc1=0000FF&amp;t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as4&amp;m=amazon&amp;f=ifr&amp;ref=ss_til&amp;asins=0521483255" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-8959524458695651456?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/8959524458695651456/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=8959524458695651456' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8959524458695651456'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8959524458695651456'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/04/generalising-godels-theorem-with.html' title='Generalising Gödel&apos;s Theorem with Multiple Worlds. Part IV.'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-rwMNywE8tyE/TZPPHqkXxuI/AAAAAAAAAq4/BddwLjabAm0/s72-c/interp1.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-6304954680216261346</id><published>2011-02-27T14:42:00.000-08:00</published><updated>2011-02-27T14:42:11.515-08:00</updated><title type='text'>Build Yourself a Bluetooth Controlled Six-Legged Robot</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;Introduction&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;I don't know why you can't easily find toy robots that can be controlled by Bluetooth. But that lack can be remedied by a few hours work. So you can see exactly what I mean, here is the robot I'm going to describe being controlled by me via a program running on a Mac:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://1.gvt0.com/vi/kO8aaN1X-lk/0.jpg" height="266" width="320"&gt;&lt;param name="movie" value="http://www.youtube.com/v/kO8aaN1X-lk&amp;fs=1&amp;source=uds" /&gt;&lt;param name="bgcolor" value="#FFFFFF" /&gt;&lt;embed width="320" height="266" src="http://www.youtube.com/v/kO8aaN1X-lk&amp;fs=1&amp;source=uds" type="application/x-shockwave-flash"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;/div&gt;As it's controlled from a much more powerful computer you can use sophisticated algorithms to direct it. And if you don't want to do that, it can easily be programmed to run autonomously. Either way, you can make use of a pair of infrared proximity sensors as input to your algorithms.&lt;br /&gt;&lt;br /&gt;The first thing I must do is give credit where it's due. This is a modification of someone else's design. In fact, it's a&amp;nbsp;&lt;a href="http://www.pololu.com/docs/0J42"&gt;sample project&lt;/a&gt;&amp;nbsp;at&amp;nbsp;&lt;a href="http://www.pololu.com/"&gt;Pololu Electronics and Robotics&lt;/a&gt;&amp;nbsp;modified by addition of an off-the-shelf &lt;a href="http://www.sparkfun.com/products/582"&gt;Bluetooth board&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;You'll need:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Everything required to build the &lt;a href="http://www.pololu.com/docs/0J42"&gt;Pololu sample project&lt;/a&gt;. That includes Windows. I was developing on a Mac so I made use of VMWare. I can't vouch for any other Windows emulation because you'll need a USB port and I've experienced troubles with emulation and USB in the past. You'll only need Window for initial configuration of the robot. After that you can use the development platform of your choice. The list of parts is&amp;nbsp;&lt;a href="http://www.pololu.com/docs/0J42/2"&gt;here&lt;/a&gt;. Order the partial kit for the motor controller. You'll need the header pins it comes with.&lt;/li&gt;&lt;li&gt;One &lt;a href="http://www.sparkfun.com/products/582"&gt;BlueSMiRF Gold Bluetooth modem&lt;/a&gt;&amp;nbsp;from &lt;a href="http://www.sparkfun.com/"&gt;Sparkfun&lt;/a&gt;.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;The first step is to build the Pololu project exactly as described. At this point, you'll have a complete robot that can run autonomously. You can program it from Windows using Pololu's scripting language described in the &lt;a href="http://www.pololu.com/docs/0J40"&gt;manual&lt;/a&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Note that the battery used is&amp;nbsp;&lt;a href="http://www.pololu.com/catalog/product/2251"&gt;this&lt;/a&gt;. Its connector is different to that described in the project. You simply need to solder a pair of pins rather than a special type of connector:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh6.googleusercontent.com/-ZGpa34zlmNk/TWmZGZkvuiI/AAAAAAAAAoU/MxVy_oct0DY/s1600/IMG_4048.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="https://lh6.googleusercontent.com/-ZGpa34zlmNk/TWmZGZkvuiI/AAAAAAAAAoU/MxVy_oct0DY/s320/IMG_4048.JPG" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;By the way, there is a minor flaw in Pololu's design. The sensor and servos are glued to the battery using hot glue. When the battery is charged it heats. You can guess the rest. So once everything works, you may want to switch to a different glue. I used epoxy.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The strands of wire from the servos are very fine. At one point I think some came astray causing a short-circuit. I was surprised the controller board still worked after I saw smoke rise up from it. So after that I used some dabs of hot glue to act as insulation around my soldering.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I found that I couldn't use the motor controller via an external USB hub, I always had to use a cable directly from my Mac.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In the process of making the robot you cut the connectors off the servos. Keep two of these with 2 or 3 inches of wire attached. We'll reuse them later.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Setting up Bluetooth&lt;/b&gt;&lt;/div&gt;&lt;div&gt;In its default mode, the robot is scriptable and controllable via the USB port. However, two pins on the board are attached to a serial port (running at 5V, not the usual RS232 voltage). These can be connected directly to a pair of pins on the BlueSMiRF board.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First we'll get the modem working independently of the robot. So go ahead and solder 6 header pins to the board like so:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh4.googleusercontent.com/-UQOAS77vl-4/TWl2Qh58jgI/AAAAAAAAAoM/bTeCSte2mXY/s1600/IMG_4045.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="https://lh4.googleusercontent.com/-UQOAS77vl-4/TWl2Qh58jgI/AAAAAAAAAoM/bTeCSte2mXY/s320/IMG_4045.JPG" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Now we can connect it directly to the robot's battery using its connector to make the following circuit:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh4.googleusercontent.com/-ea6GB8PqAD8/TWmdgo-wZBI/AAAAAAAAAoc/wdrZuacOuRM/s1600/Hexapod.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="https://lh4.googleusercontent.com/-ea6GB8PqAD8/TWmdgo-wZBI/AAAAAAAAAoc/wdrZuacOuRM/s320/Hexapod.png" width="196" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Conveniently, VCC and GND are next to each other, just like in the battery connector. The little red LED marked &lt;b&gt;Stat&lt;/b&gt; should start flashing. This means it's ready to go. Take care not to reverse the polarity (this isn't Star Trek) or make an off by one pin error.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now go ahead and pair it with your main computer. On the Mac it's the usual process. It'll probably appear as a device called &lt;b&gt;Firefly-something&lt;/b&gt;. There have been various revisions of documentation for BlueSMiRFs. I guessed, correctly, that the correct passkey was &lt;b&gt;1234&lt;/b&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Once it's connected you should get a new serial port. Mine appeared as&amp;nbsp;&lt;b&gt;/dev/tty.FireFly-DB29-SPP&lt;/b&gt;. SPP is &lt;a href="http://en.wikipedia.org/wiki/Bluetooth_profile#Serial_Port_Profile_.28SPP.29"&gt;Serial Port Profile&lt;/a&gt;. Now you need a terminal application that can talk to a serial port. On Unix machines you can use &lt;b&gt;screen&lt;/b&gt;&amp;nbsp;followed by the path to the dev file. You'll find help on the web for other OSes. On Windows I expect you'll get a new &lt;b&gt;COM&lt;/b&gt; port. It didn't seem to matter what baud rate I chose at this stage. I guess that SPP presents an interface that looks like a serial port but that the connection rate doesn't really mean anything. But I may be wrong and you may need to experiment and/or read documentation. It may take several attempt to successfully connect. (It likes to prove it has the power to say no if it wants.) When you do, you'll see the &lt;b&gt;Conn&lt;/b&gt; LED light up green.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If you connect within one minute of powering up the BlueSMiRF you can configure it by entering its command mode. One revision of the documentation says you should type &lt;b&gt;+++&lt;/b&gt;. My version used &lt;b&gt;$$$&lt;/b&gt;. You won't see your keys echoed on the screen but it should reply &lt;b&gt;CMD&lt;/b&gt;. Now type &lt;b&gt;d&lt;/b&gt; and return to get a summary of its status. Now type &lt;b&gt;SU,19200&lt;/b&gt;&amp;nbsp;(and return) to set the baud rate on the serial port side. It'll go much faster than this but it's worth being cautious to start with. It acknowledges this but it doesn't actually change speed until you restart the device (eg. by power cycling it). Quit command mode by typing &lt;b&gt;---&lt;/b&gt;. At this point, anything you type in the terminal will come out at 19200 baud on the TX pin. Quit screen by hitting ^A^X.&lt;br /&gt;&lt;br /&gt;I'm not sure how you can easily test this short of completing the robot. I tested it using a program on a&amp;nbsp;&lt;a href="http://processors.wiki.ti.com/index.php/MSP430_LaunchPad_(MSP-EXP430G2)?DCMP=launchpad&amp;amp;HQS=Other+OT+launchpadwiki"&gt;MSP430 Launchpad&lt;/a&gt;. I wish I had a fancy diagnostic device to read serial transmissions. I ought to build one.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;By the way, talking to the BlueSMiRF as if it's a serial port is a form of backward compatibility. It should be possible to talk to it directly via the &lt;a href="http://en.wikipedia.org/wiki/Bluetooth_protocols#Radio_frequency_communication_.28RFCOMM.29"&gt;RFCOMM API&lt;/a&gt;. Among other things, this would allow better diagnostics.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Configuring the Robot&lt;/b&gt;&lt;/div&gt;&lt;div&gt;Now you need to configure the robot to use 19200 baud on its TX/RX pins. Connect via USB to the Windows Control Center application (like you've done before if you built the original design) and set it to use the UART at 19200 baud. Make sure CRC is off. Like so:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh6.googleusercontent.com/-Zdr6rBpS_E4/TWmoGUKTxdI/AAAAAAAAAok/lEVJlFJ4ng0/s1600/Pololu.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="172" src="https://lh6.googleusercontent.com/-Zdr6rBpS_E4/TWmoGUKTxdI/AAAAAAAAAok/lEVJlFJ4ng0/s320/Pololu.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Once that is done you don't need Windows again unless you change the baud rate.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Connecting Modem to Robot&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;Now you need to connect the robot to the modem. You have 6 pins on the modem board so you need a 6 pin connector. I glued together the two three pin connectors I mentioned above to make a connector like this: &amp;nbsp;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh4.googleusercontent.com/-lBJR3c16rPo/TWmrqeItRhI/AAAAAAAAAos/qeqET9u1B7c/s1600/IMG_4050.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="https://lh4.googleusercontent.com/-lBJR3c16rPo/TWmrqeItRhI/AAAAAAAAAos/qeqET9u1B7c/s320/IMG_4050.JPG" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;The two outer wires can be soldered together because we're not using flow control, and so the BlueSMiRF is clear to send (CTS) whenever it is ready to send (RTS), The ends of the other four wires can be soldered directly to the motor controller board to make this circuit:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh4.googleusercontent.com/-HkzcSQRzrvk/TWmwMSH99mI/AAAAAAAAAo0/6ITUN4N0L6s/s1600/Hexapod2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="https://lh4.googleusercontent.com/-HkzcSQRzrvk/TWmwMSH99mI/AAAAAAAAAo0/6ITUN4N0L6s/s320/Hexapod2.png" width="143" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Now comes the iffy bit. I hot glued the BlueSMiRF board directly to the side of the servos like this:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh6.googleusercontent.com/-z3C3v78au8c/TWrPX_I08ZI/AAAAAAAAAo8/ExG6p11J2VU/s1600/IMG_4056.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="https://lh6.googleusercontent.com/-z3C3v78au8c/TWrPX_I08ZI/AAAAAAAAAo8/ExG6p11J2VU/s320/IMG_4056.JPG" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;That's iffy because it places electric motors directly next to a device using RF. Motors emit lots of RF noise. But they can't be that bad, after all servos have been used for years on radio controlled models, and Bluetooth probably has some error correction. It seemed to work for me. (But see note below.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Talking to the Robot&lt;/b&gt;&lt;/div&gt;&lt;div&gt;Now we need to talk to the robot. Just about every development platform has a way of talking to serial ports. I used Haskell with the &lt;a href="http://hackage.haskell.org/package/serialport"&gt;serialport&lt;/a&gt; package. As I had fixed the baud rate, I used the &lt;a href="http://www.pololu.com/docs/0J40/5.c"&gt;Pololu compact protocol&lt;/a&gt;. As the focus in this article is on hardware, I'll just give the code at the end. Once it's running you should be able to control the robot using&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;f - take a step forward&lt;/li&gt;&lt;li&gt;b - take a step back&lt;/li&gt;&lt;li&gt;l - turn left&lt;/li&gt;&lt;li&gt;r - turn right&lt;/li&gt;&lt;li&gt;e - read the state of the two proximity sensors&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Now you're free to code up whatever you want. I experimented with localization using a method I learnt from &lt;a href="http://www.randomhacks.net/articles/2007/04/19/robot-localization-particle-system-monad"&gt;Eric Kidd&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Notes&lt;/b&gt;&lt;/div&gt;&lt;div&gt;I occasionally had dropped bytes. I haven't yet worked out a way to 100% reliably recover when this happens and I'm not sure if it's due to the RF noise I mentioned above. You can probably mess with the timeout parameter in the Pololu Control Center to ensure the robot returns to a known state if communications cease for a bit.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The Apple SPP driver is pretty crappy. It generally works (apart from the annoying need to make multiple attempts to connect), but if you interrupt your code at the wrong time with ^C (say) you can find it locks up so hard that you end up with an unkillable zombie process. The OS didn't even seem able to kill it on shutdown (maybe it would have eventually timed out) and I had to power down the Mac if I wanted to use the robot again. It only happened a couple of times in two weeks of intensive use, but it's annoying.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And if the robot stops working - remember these batteries are pretty small and are doing a lot, so you may just need a recharge.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Code&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;The code uses two threads, a server that talks to the serial port and a client for users. This means that independent threads can control the legs and eyes, say, without the serial port transactions becoming interleaved. This isn't very polished but should be good enough to start experimentation. I've only tested it under Mac OS X but it looks platform independent to me.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; module Robot where&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; import Prelude hiding (Left, Right)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; import Control.Concurrent&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; import Control.Monad.Trans&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; import Control.Monad.Trans.Maybe&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; import Control.Monad.Trans.Reader&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; import Control.Exception&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; import System.Hardware.Serialport&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Server side using &lt;a href="http://www.pololu.com/docs/0J40/5.c"&gt;Pololu compact protocol&lt;/a&gt;.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; sendByte :: Int -&amp;gt; SerialPort -&amp;gt; IO ()&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; sendByte b s = sendChar s (toEnum b)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; getByte :: SerialPort -&amp;gt; IO (Maybe Int)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; getByte s = do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; b &amp;lt;- recvChar s&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; return $ fmap fromEnum b&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; setTarget :: Int -&amp;gt; Int -&amp;gt; SerialPort -&amp;gt; IO ()&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; setTarget channel target port = do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; sendByte 0x84 port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; sendByte channel port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; let (b1, b0) = target `divMod` 0x80&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; sendByte b0 port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; sendByte b1 port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; getPosition channel port = runMaybeT $ do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; liftIO $ sendByte 0x90 port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; liftIO $ sendByte channel port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; lo &amp;lt;- MaybeT $ liftIO $ getByte port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; hi &amp;lt;- MaybeT $ liftIO $ getByte port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; return $ lo + 0x100*hi&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; data SerialCommand = SetTarget Int Int&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| GetPosition Int (Maybe Int -&amp;gt; IO ())&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;| End&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; serialExec command port =&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; case command of&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; SetTarget channel target -&amp;gt; setTarget channel target port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; GetPosition channel continuation -&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;getPosition channel port &amp;gt;&amp;gt;= continuation&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; otherwise -&amp;gt; return ()&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; serialThread commandChannel port = do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; command &amp;lt;- liftIO $ readChan commandChannel&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; serialExec command port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; case command of&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; End -&amp;gt; do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; closeSerial port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return ()&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; otherwise -&amp;gt; serialThread commandChannel port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Client side&lt;/span&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; data Servo = Servo { channel :: Int, loLimit :: Int, hiLimit :: Int }&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; data Direction = Left | Right | Forward | Back&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; start tty = try (openSerial tty defaultSerialSettings&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;{ baudRate = B19200 }) &amp;gt;&amp;gt;=&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; either&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (\ex -&amp;gt; do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; print (ex :: IOException)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; threadDelay 250000&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; start tty)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (\port -&amp;gt; do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; commandChannel &amp;lt;- newChan&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; forkIO $ serialThread commandChannel port&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return commandChannel)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; type RobotM a = ReaderT (Chan SerialCommand) IO a&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; writeChan' = flip writeChan&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; end = ReaderT $ writeChan' End&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; setServo limit servo = ReaderT $ writeChan' $&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp;SetTarget (channel servo) (limit servo)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; setLo = setServo loLimit&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; setHi = setServo hiLimit&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; readEye channel cont = ReaderT $ writeChan' $ GetPosition channel cont&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; delay = 100*1000&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; raise Left &amp;nbsp; &amp;nbsp;= setLo midLegs&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; raise Right &amp;nbsp; = setHi midLegs&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; forward Left &amp;nbsp;= setLo leftLegs&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; forward Right = setHi rightLegs&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; back Left &amp;nbsp; &amp;nbsp; = setHi leftLegs&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; back Right &amp;nbsp; &amp;nbsp;= setLo rightLegs&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; opposite Left &amp;nbsp; &amp;nbsp;= Right&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; opposite Right &amp;nbsp; = Left&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; opposite Forward = Back&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; opposite Back &amp;nbsp; &amp;nbsp;= Forward&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; move Forward = forward&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; move Back &amp;nbsp; &amp;nbsp;= back&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; halfCycle side direction0 direction1 = do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; raise side&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; liftIO $ threadDelay delay&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; move direction0 Left&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; move direction1 Right&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; liftIO $ threadDelay delay&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; motion direction0 direction1 = do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; halfCycle Left &amp;nbsp;direction0 direction1&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; halfCycle Right (opposite direction0) (opposite direction1)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; walkCycle &amp;nbsp; &amp;nbsp;= motion Forward Back&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; reverseCycle = motion Back Forward&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; turn Right &amp;nbsp; = motion Forward Forward&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; turn Left &amp;nbsp; &amp;nbsp;= motion Back Back&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; withRobot tty cmds = do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; commandChannel &amp;lt;- start tty&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; flip runReaderT commandChannel (cmds &amp;gt;&amp;gt; end)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; eyes = do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; readEye 4 $ print . (("Left eye &amp;nbsp;= " ++) . show . fmap (&amp;lt;512))&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; readEye 3 $ print . (("Right eye = " ++) . show . fmap (&amp;lt;512))&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; commandLoop = do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; key &amp;lt;- liftIO $ getChar&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; case key of&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'f' -&amp;gt; walkCycle&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'b' -&amp;gt; reverseCycle&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'l' -&amp;gt; turn Left&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'r' -&amp;gt; turn Right&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 'e' -&amp;gt; eyes&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; otherwise -&amp;gt; return ()&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; if key=='q'&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; then return ()&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; else commandLoop&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; testRobot = do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; withRobot tty $ do&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; liftIO $ print "Ready"&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; commandLoop&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Here's some stuff for you to configure. In particular, set the limits of the servos so that the legs don't whack into the body of the robot, stripping the gears. These numbers will be a little different depending on exactly how you built the robot. You may want to start with the upper and lower limits closer to 6000, the middle of the range of the servo motion.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; tty = "/dev/tty.FireFly-DB29-SPP"&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; rightLegs = Servo 0 5000 6500&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; midLegs &amp;nbsp; = Servo 1 5000 6500&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;gt; leftLegs &amp;nbsp;= Servo 2 5000 6500&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;b&gt;One Last Thing&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;Double check everything I say above. I could have easily made a mistake. In particular, make sure that I haven't inadvertently introduced short circuits by comparing my diagrams against the online documentation for the parts.&lt;br /&gt;&lt;hr /&gt;&lt;iframe align="left" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=sigfpe-20&amp;amp;o=1&amp;amp;p=8&amp;amp;l=bpl&amp;amp;asins=0596514980&amp;amp;fc1=000000&amp;amp;IS2=1&amp;amp;lt1=_blank&amp;amp;m=amazon&amp;amp;lc1=0000FF&amp;amp;bc1=000000&amp;amp;bg1=FFFFFF&amp;amp;f=ifr" style="align: left; height: 245px; padding-right: 10px; padding-top: 5px; width: 131px;"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-6304954680216261346?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/6304954680216261346/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=6304954680216261346' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6304954680216261346'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6304954680216261346'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/02/build-yourself-bluetooth-controlled-six.html' title='Build Yourself a Bluetooth Controlled Six-Legged Robot'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='https://lh6.googleusercontent.com/-ZGpa34zlmNk/TWmZGZkvuiI/AAAAAAAAAoU/MxVy_oct0DY/s72-c/IMG_4048.JPG' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-2016935179779856749</id><published>2011-02-25T15:55:00.000-08:00</published><updated>2011-02-25T15:55:03.008-08:00</updated><title type='text'>Generalising Gödel's Theorem with Multiple Worlds. Part III.</title><content type='html'>&lt;b&gt;Fixed Points&lt;/b&gt;&lt;br /&gt;A number of branches of mathematics have some sort of implicit function theorem. These guarantee that we can define functions implicitly, rather than explicitly. For example, we can define the function f from the positive reals to the positive reals by the relation f(x)^2 = x. In this case we can write an explicit formula, f(x) = x&lt;sup&gt;1/2&lt;/sup&gt;, but the &lt;a href="http://en.wikipedia.org/wiki/Implicit_function_theorem"&gt;implicit function theorem&lt;/a&gt; gives quite general conditions when the implicit equation defines a function uniquely, even when it is very hard to write an explicit formula. We have something similar in the theory of datatypes. We define the list type implicitly in terms of lists: L(X) = 1+X L(X). There is a unique smallest type (and unique largest type) that satisfies this equation and we use such equations in Haskell programs to uniquely specify our types. (Haskell uses the largest type.)&lt;br /&gt;&lt;br /&gt;Provability logic also has a kind of implicit function theorem. Consider the following function:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; f0 p = Neg (Box p)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Suppose the equation &lt;tt&gt;p &amp;lt;-&amp;gt; f0 p&lt;/tt&gt; had a solution. Then we would have a proposition that asserts its own unprovability. In fact, we know such a proposition: the Godel sentence from Godel's first incompleteness theorem. However, the Godel sentence uses a clever &lt;a href="http://blog.sigfpe.com/2011/01/quine-central.html"&gt;Quining&lt;/a&gt; technique to make a proposition that makes assertions about its own Godel number. We can't express such a thing in the language of provability logic. However, consider Godel's second incompleteness theorem. This tells us that if PA can prove its consistency then it is in fact inconsistent. Or to turn it around, if PA is consistent, then PA can't prove its consistency. Asserting that PA is consistent is the same as &amp;#x25ca;&amp;#x22a4; (because asserting that &amp;#x22a4; is consistent with the rest of PA is the same as saying the rest of PA is consistent). So Godel's second incompleteness theorem can be written in provability logic as &amp;not;&amp;#x25fb;(&amp;#x25ca;&amp;#x22a4;)&amp;rarr;&amp;#x25ca;&amp;#x22a4;. Using the methods of part one we can show that &amp;#x25ca;&amp;#x22a4;&amp;rarr;&amp;not;&amp;#x25fb;(&amp;#x25ca;&amp;#x22a4;) is also valid. So &lt;tt&gt;Dia T&lt;/tt&gt; is the solution to &lt;tt&gt;p &amp;lt;-&amp;gt; f0 p&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; test0 = valid $ let p = Dia T in p &amp;lt;-&amp;gt; f0 p&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Considered as a data structure, notice how &lt;tt&gt;f0&lt;/tt&gt; places its argument inside a &lt;tt&gt;Box&lt;/tt&gt;. Consider just the functions &lt;tt&gt;Prop -&amp;gt; Prop&lt;/tt&gt; that (1) simply copy &lt;tt&gt;p&lt;/tt&gt; into various places in its return value (ie. that don't analyse &lt;tt&gt;p&lt;/tt&gt; in any way) and that (2) put all copies of its arguments somewhere inside a &lt;tt&gt;Box&lt;/tt&gt; (possibly in a deeply nested way). We'll call these 'modalising' functions. Then there is an amazing result due to Solovay:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Fixed Point Theorem&lt;/b&gt;&lt;br /&gt;If &lt;tt&gt;f&lt;/tt&gt; is modalizing then it has a unique fixed point in provability logic. (Unique in the sense that if &lt;tt&gt;p&lt;/tt&gt; and &lt;tt&gt;q&lt;/tt&gt; are fixed points for &lt;tt&gt;f&lt;/tt&gt;, then &lt;tt&gt;p &amp;lt;-&amp;gt; q&lt;/tt&gt; is valid.&lt;br /&gt;&lt;br /&gt;Our goal in the next post in this series will be to implement an algorithm to find these fixed points.&lt;br /&gt;&lt;br /&gt;Think about what this means. We can make up any old function that has this property and find a corresponding Godel-like theorem. For example, pick&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; f1 p = Box p --&amp;gt; Box (Neg p)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;This has a fixed point &lt;tt&gt;p = Box (Box F) -&amp;gt; Box F&lt;/tt&gt; which we can test with&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; test1 = valid $ let p = Box (Box F) --&amp;gt; Box F in p &amp;lt;-&amp;gt; f1 p&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Or in English (quoting verbatim from Boolos) "a sentence is equivalent to the assertion that it is disprovable-if-provable if and only if it is equivalent to the assertion that arithmetic is inconsistent if the inconsistency of arithmetic is provable". (That's a good argument for using provability logic instead of English!) These are the generalisations of Godel's theorem in my title.&lt;br /&gt;&lt;br /&gt;But there are many other important consequences. The solution to this fixed point equation doesn't involve any self-reference (because we can't do self-reference in provability logic as we have no way to talk about Godel numbers). Self-reference is less essential than you might think - we can solve these equations without using it.&lt;br /&gt;&lt;br /&gt;At first the fixed point theorem seems like a positive thing: we can crank out as many of these theorems as we like. But there's also a flip side: it says that once we've proved Lob's theorem, there aren't any more techniques we need in order to construct fixed points. So there aren't any funky new variations on what Godel did, waiting to be discovered, that will give us a bunch of new fixed points. We have them all already.&lt;br /&gt;&lt;br /&gt;In order to find these fixed points I'm going to use the &lt;tt&gt;runTableau&lt;/tt&gt; function we wrote last time. But that's for the next article in the series. In this post I want to put it to a simpler use:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Automatically Drawing Tableau Diagrams&lt;/b&gt;&lt;br /&gt;First we'll need a very simple library of functions for drawing ASCII art diagrams. We'll represent a drawing simply as a list of ASCII strings, one for each row. Think of diagrams as forming a box. The width of a box is the length of its longest row:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; width box = foldr max 0 (map length box)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Sometimes we'll want to pad the length of the rows so that they all have the same length:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; pad len b a = a ++ replicate (len - length a) b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The &lt;tt&gt;aside&lt;/tt&gt; function allows us to 'typeset' one box next to another:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; aside a b = let w = width a&lt;br /&gt;&amp;gt;                 h = max (length a) (length b)&lt;br /&gt;&amp;gt;                 a' = pad h [] a&lt;br /&gt;&amp;gt;                 b' = pad h [] b&lt;br /&gt;&amp;gt;                 a'' = map (pad w ' ') a'&lt;br /&gt;&amp;gt;             in zipWith (++) a'' b'&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can draw nice frames around our boxes:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; frame a = let w = width a&lt;br /&gt;&amp;gt;               h = length a&lt;br /&gt;&amp;gt;               strut = replicate h "|"&lt;br /&gt;&amp;gt;               rule = "+" ++ replicate w '-' ++ "+"&lt;br /&gt;&amp;gt;           in [rule] ++ strut `aside` a `aside` strut ++ [rule]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And the rule for disjunctions requires us to draw side by side boxes with a vertical line between them:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; alt a b = let h = max (length a) (length b)&lt;br /&gt;&amp;gt;               strut = replicate h " | "&lt;br /&gt;&amp;gt;           in a `aside` strut `aside` b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Generating diagrams is now a matter of generating a diagram for each of the 'hooks' in our algorithm:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; diagramRules = TableauRules {&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We'll modify the rules so that instead of simply returning a &lt;tt&gt;Bool&lt;/tt&gt; to indicate closure we return a pair. The first element is a list of strings representing the rows of the ASCII art, but the second element plays the same role as before:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     closes             = snd,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;These are the functions that handle discovery of a contradiction. They draw a "X" in a circle to indicate this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     foundF             = \_ -&amp;gt; (["(X)"], True),&lt;br /&gt;&amp;gt;     foundContradiction = \_ -&amp;gt; (["(X)"], True),&lt;br /&gt;&lt;br /&gt;&amp;gt;     open               = \_ -&amp;gt; ([], False),&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We deal with conjunctions simply by listing the two subpropositions that went into them:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     conjRule           = \a b -&amp;gt; first ([show a, show b] ++),&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;For disjunctions we draw the diagrams:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     disjRule           = \p a b (tl, lb) (tr, rb) -&amp;gt;&lt;br /&gt;&amp;gt;                             (([show a] ++ tl) `alt` ([show b] ++ tr), lb &amp;amp;&amp;amp; rb),&lt;br /&gt;&lt;br /&gt;&amp;gt;     doubleNegation     = \q -&amp;gt; first ([show q] ++),&lt;br /&gt;&amp;gt;     combineWorlds      = \(t0, b0) (t1, b1) -&amp;gt; (t0 ++ t1, b0 || b1),&lt;br /&gt;&amp;gt;     processWorld       = \_ t -&amp;gt; t,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;When we create a box for a new world we 'import' some propositions from the parent world. We mark these with an asterisk:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     tableau            = \br (t, b) -&amp;gt; (frame (map (("* " ++) . show) br ++ map ("  "++) t), b)&lt;br /&gt;&amp;gt; }&lt;br /&gt;&lt;br /&gt;&amp;gt; diagram p = do&lt;br /&gt;&amp;gt;     let (t, b) = runTableau diagramRules [Neg p]&lt;br /&gt;&amp;gt;     mapM_ putStrLn t&lt;br /&gt;&amp;gt;     print b&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now we can prove that we really do have a fixed point of &lt;tt&gt;f1&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; diagram1 = diagram $ let p = Box (Box F) --&amp;gt; Box F in p &amp;lt;-&amp;gt; f1 p&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;You may want to pick a really small font and stretch your terminal wide before you run that:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-DrTVYlmPLUI/TWg_pjbvNvI/AAAAAAAAAoE/NP6SB8jAYlk/s1600/Screen+shot+2011-02-25+at+3.41.03+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="180" src="http://2.bp.blogspot.com/-DrTVYlmPLUI/TWg_pjbvNvI/AAAAAAAAAoE/NP6SB8jAYlk/s320/Screen+shot+2011-02-25+at+3.41.03+PM.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;See how every box ends in an &lt;tt&gt;(X)&lt;/tt&gt;, just as we want.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;note&lt;/b&gt;&lt;br /&gt;To run the code above, just append this article to the previous one to make a single literate Haskell program.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-2016935179779856749?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/2016935179779856749/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=2016935179779856749' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2016935179779856749'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2016935179779856749'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/02/generalising-godels-theorem-with.html' title='Generalising Gödel&apos;s Theorem with Multiple Worlds. Part III.'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-DrTVYlmPLUI/TWg_pjbvNvI/AAAAAAAAAoE/NP6SB8jAYlk/s72-c/Screen+shot+2011-02-25+at+3.41.03+PM.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-2220443145187268402</id><published>2011-01-30T18:01:00.000-08:00</published><updated>2011-01-30T18:19:08.230-08:00</updated><title type='text'>Quine Central</title><content type='html'>&lt;b&gt;Quines for Everyone&lt;/b&gt;&lt;br /&gt;This is an interruption to the sequence on provability logic (though it's not entirely unrelated).&lt;br /&gt;&lt;br /&gt;The code below spits out a Haskell program that prints out a Perl program that prints out a Python program that prints out a Ruby program that prints out a C program that prints out a Java program that prints out the original program. Nothing new, an obvious generalisation of &lt;a href="http://blog.sigfpe.com/2008/02/third-order-quine-in-three-languages.html"&gt;this&lt;/a&gt;. (Well, truth be told, it's a block of HTML that generates some Haskell code...)&lt;br /&gt;&lt;br /&gt;But there's one big difference. To allow everyone else to join in the fun you can configure it yourself! Any non-empty list of languages will do, including length one. You may start hitting language line length limits if you make your list too long. Note that you can repeat languages so you can give someone some Haskell which decays into Perl after n iterations.&lt;br /&gt;&lt;br /&gt;The code is geared towards generating tightly packed code but it's easily adapted to generate something slightly more readable. Apologies for the C warnings. Trivial to fix.&lt;br /&gt;&lt;br /&gt;It's easily extended to support many more languages. C++, C#, obj-C, ocaml, Prolog, Lisp, Scheme and Go, say, should all be trivial apart from maybe a tiny bit of work with delimiting strings. (Eg. for Go you may need to tweak the import statement slightly so it doesnt't use double quotation marks.)&lt;br /&gt;&lt;br /&gt;The code leaves many opportunities for refactoring but it's not like anyone is actually going to use this code for real production so I'm leaving it as is now.&lt;br /&gt;&lt;br /&gt;I've only tested it under MacOS X. I don't know if there are carriage return/linefeed issues with other OSes. The shell script at the end is a regression test.&lt;br /&gt;&lt;br /&gt;There was a little bit of theory involved which I learnt from &lt;a href="http://www.amazon.com/gp/product/1575860082?ie=UTF8&amp;tag=sigfpe-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1575860082"&gt;Vicious Circles&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Here's a challenge for you: write a quine that takes as input the name of a language and outputs the same thing implemented in the input language. Much harder than what I just wrote.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; import Data.List&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Here's the bit you can easily play with:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; langs = [Haskell, Perl, Python, Ruby, C, Java]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;Implementation&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Languages = Haskell | Ruby | Perl | C | Python | Java&lt;br /&gt;&lt;br /&gt;&amp;gt; sequenceFromString Haskell s = "map toEnum[" ++ (intercalate "," $&lt;br /&gt;&amp;gt;     map (\c -&amp;gt; show (fromEnum c)) s) ++ "]"&lt;br /&gt;&amp;gt; sequenceFromString Perl s    = (intercalate "," $&lt;br /&gt;&amp;gt;     map (\c -&amp;gt; "chr(" ++ show (fromEnum c) ++ ")") s)&lt;br /&gt;&amp;gt; sequenceFromString Python s  = (intercalate "+" $&lt;br /&gt;&amp;gt;     map (\c -&amp;gt; "chr(" ++ show (fromEnum c) ++ ")") s)&lt;br /&gt;&amp;gt; sequenceFromString Ruby s    = (intercalate "+" $&lt;br /&gt;&amp;gt;     map (\c -&amp;gt; show (fromEnum c) ++ ".chr") s)&lt;br /&gt;&amp;gt; sequenceFromString C s       = concatMap&lt;br /&gt;&amp;gt;     (\c -&amp;gt; "putchar(" ++ show (fromEnum c) ++ ");") s&lt;br /&gt;&amp;gt; sequenceFromString Java s    = concatMap&lt;br /&gt;&amp;gt;     (\c -&amp;gt; "o.write(" ++ show (fromEnum c) ++ ");") s&lt;br /&gt;&lt;br /&gt;&amp;gt; paramList' Haskell = intercalate " " . map (\n -&amp;gt; "a" ++ show n)&lt;br /&gt;&amp;gt; paramList' C       = intercalate "," . map (\n -&amp;gt; "char *a" ++ show n)&lt;br /&gt;&amp;gt; paramList' Python  = intercalate "," . map (\n -&amp;gt; "a" ++ show n)&lt;br /&gt;&amp;gt; paramList' Ruby    = intercalate "," . map (\n -&amp;gt; "a" ++ show n)&lt;br /&gt;&amp;gt; paramList' Java    = intercalate "," . map (\n -&amp;gt; "String a" ++ show n)&lt;br /&gt;&lt;br /&gt;&amp;gt; paramList Perl    _ = ""&lt;br /&gt;&amp;gt; paramList lang n = paramList' lang [0..n-1]&lt;br /&gt;&lt;br /&gt;&amp;gt; driver l args = defn l ++ intercalate (divider l) args ++ endDefn l&lt;br /&gt;&lt;br /&gt;&amp;gt; divider C       = "\",\""&lt;br /&gt;&amp;gt; divider Perl    = "','"&lt;br /&gt;&amp;gt; divider Ruby    = "\",\""&lt;br /&gt;&amp;gt; divider Python  = "\",\""&lt;br /&gt;&amp;gt; divider Haskell = "\" \""&lt;br /&gt;&amp;gt; divider Java    = "\",\""&lt;br /&gt;&lt;br /&gt;&amp;gt; defn C       = "main(){q(\""&lt;br /&gt;&amp;gt; defn Perl    = "&amp;amp;q('"&lt;br /&gt;&amp;gt; defn Python  = "q(\""&lt;br /&gt;&amp;gt; defn Ruby    = "q(\""&lt;br /&gt;&amp;gt; defn Haskell = "main=q \""&lt;br /&gt;&amp;gt; defn Java    = "public static void main(String[]args){q(\""&lt;br /&gt;&lt;br /&gt;&amp;gt; endDefn C       = "\");}"&lt;br /&gt;&amp;gt; endDefn Perl    = "')"&lt;br /&gt;&amp;gt; endDefn Python  = "\")"&lt;br /&gt;&amp;gt; endDefn Ruby    = "\")"&lt;br /&gt;&amp;gt; endDefn Haskell = "\""&lt;br /&gt;&amp;gt; endDefn Java    = "\");}}"&lt;br /&gt;&lt;br /&gt;&amp;gt; arg Haskell n = "a" ++ show n&lt;br /&gt;&amp;gt; arg Perl n    = "$_[" ++ show n ++ "]"&lt;br /&gt;&amp;gt; arg C n       = "printf(a" ++ show n ++ ");"&lt;br /&gt;&amp;gt; arg Python n  = "a" ++ show n&lt;br /&gt;&amp;gt; arg Ruby n    = "a" ++ show n&lt;br /&gt;&amp;gt; arg Java n    = "o.print(a" ++ show n ++ ");"&lt;br /&gt;&lt;br /&gt;&amp;gt; argDivide Haskell l = "++" ++ sequenceFromString Haskell (divider l) ++ "++"&lt;br /&gt;&amp;gt; argDivide Perl l    = ","    ++ sequenceFromString Perl (divider l) ++ ","&lt;br /&gt;&amp;gt; argDivide C l       = sequenceFromString C (divider l)&lt;br /&gt;&amp;gt; argDivide Python l  = "+" ++ sequenceFromString Python (divider l) ++ "+"&lt;br /&gt;&amp;gt; argDivide Ruby l    = "+" ++ sequenceFromString Ruby (divider l) ++ "+"&lt;br /&gt;&amp;gt; argDivide Java l    = sequenceFromString Java (divider l)&lt;br /&gt;&lt;br /&gt;&amp;gt; argList lang1 lang2 n = intercalate (argDivide lang1 lang2) $&lt;br /&gt;&amp;gt;     map (arg lang1) ([1..n-1] ++ [0])&lt;br /&gt;&lt;br /&gt;&amp;gt; fromTo Haskell l n = "q " ++ paramList Haskell n ++ "=putStrLn$a0++" ++&lt;br /&gt;&amp;gt;     sequenceFromString Haskell ("\n" ++ defn l) ++ "++" ++&lt;br /&gt;&amp;gt;     argList Haskell l n ++ "++" ++ sequenceFromString Haskell (endDefn l)&lt;br /&gt;&amp;gt; fromTo Perl    l n = "sub q {" ++ "print $_[0]," ++&lt;br /&gt;&amp;gt;     sequenceFromString Perl ("\n" ++ defn l) ++ "," ++ argList Perl l n ++ "," ++&lt;br /&gt;&amp;gt;     sequenceFromString Perl (endDefn l ++ "\n") ++ "}"&lt;br /&gt;&amp;gt; fromTo Python  l n = "def q(" ++ paramList Python n ++&lt;br /&gt;&amp;gt;     "): print a0+" ++ sequenceFromString Python ("\n" ++ defn l) ++&lt;br /&gt;&amp;gt;     "+" ++ argList Python l n ++ "+" ++ sequenceFromString Python (endDefn l)&lt;br /&gt;&amp;gt; fromTo Ruby    l n = "def q(" ++ paramList Ruby n ++&lt;br /&gt;&amp;gt;     ") print a0+" ++ sequenceFromString Ruby ("\n" ++ defn l) ++&lt;br /&gt;&amp;gt;     "+" ++ argList Ruby l n ++ "+" ++ sequenceFromString Ruby (endDefn l ++ "\n") ++ " end"&lt;br /&gt;&amp;gt; fromTo C       l n = "q(" ++ paramList C n ++ "){" ++ "printf(a0);" ++&lt;br /&gt;&amp;gt;     sequenceFromString C ("\n" ++ defn l) ++ argList C l n ++&lt;br /&gt;&amp;gt;     sequenceFromString C (endDefn l ++ "\n") ++ "}"&lt;br /&gt;&amp;gt; fromTo Java    l n = "public class quine{public static void q(" ++&lt;br /&gt;&amp;gt;     paramList Java n ++ "){java.io.PrintStream o=System.out;o.print(a0);" ++&lt;br /&gt;&amp;gt;     sequenceFromString Java ("\n" ++ defn l) ++ argList Java l n ++&lt;br /&gt;&amp;gt;     sequenceFromString Java (endDefn l ++ "\n") ++ "}"&lt;br /&gt;&lt;br /&gt;&amp;gt; main = do&lt;br /&gt;&amp;gt;     let n = length langs&lt;br /&gt;&amp;gt;     let langs' = cycle langs&lt;br /&gt;&amp;gt;     putStrLn $ fromTo (head langs') (head (tail langs')) n&lt;br /&gt;&amp;gt;     putStrLn $ driver (head langs') $ zipWith (\lang1 lang2 -&amp;gt; fromTo lang1 lang2 n)&lt;br /&gt;&amp;gt;         (take n (tail langs')) (tail (tail langs'))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;Regression Test&lt;/b&gt;&lt;br /&gt;Assuming this article is stored in &lt;tt&gt;quineCentral.lhs&lt;/tt&gt;.&lt;br /&gt;&lt;pre&gt;runghc quineCentral.lhs&gt;1.hs&lt;br /&gt;cat 1.hs&lt;br /&gt;echo "---------------------------------"&lt;br /&gt;runghc 1.hs&gt;2.pl&lt;br /&gt;cat 2.pl&lt;br /&gt;echo "---------------------------------"&lt;br /&gt;perl 2.pl&gt;3.py&lt;br /&gt;cat 3.py&lt;br /&gt;echo "---------------------------------"&lt;br /&gt;python 3.py&gt;4.ruby&lt;br /&gt;cat 4.ruby&lt;br /&gt;echo "---------------------------------"&lt;br /&gt;ruby 4.ruby&gt;5.c&lt;br /&gt;cat 5.c&lt;br /&gt;echo "---------------------------------"&lt;br /&gt;gcc -o 5 5.c&lt;br /&gt;./5&gt;quine.java&lt;br /&gt;cat quine.java&lt;br /&gt;echo "---------------------------------"&lt;br /&gt;javac quine.java&lt;br /&gt;java quine&gt;7.hs&lt;br /&gt;cat 7.hs&lt;br /&gt;diff 1.hs 7.hs&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-2220443145187268402?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/2220443145187268402/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=2220443145187268402' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2220443145187268402'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2220443145187268402'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2011/01/quine-central.html' title='Quine Central'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-6550716211953466047</id><published>2010-12-24T08:27:00.000-08:00</published><updated>2010-12-24T11:25:42.417-08:00</updated><title type='text'>Generalising Gödel's Theorem with Multiple Worlds. Part II.</title><content type='html'>&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;&lt;a href="http://blog.sigfpe.com/2010/12/generalising-godels-theorem-with.html"&gt;Last time&lt;/a&gt; we looked at a method for testing whether propositions of provability logic were valid by looking at the consequences of propositions within nested collections of worlds. This lends itself naturally to an algorithm that we can implement in Haskell. The diagrams I used last time are a variant on what are known as tableaux. But tableaux can be used in a number of ways and so we need code that is suitably generalised. In the code that follows I've ensured that the core algorithm has an interface that is suitable for carrying out the four tasks I'll demand of it. This means that the code leans more towards practicality than mathematical elegance.&lt;br /&gt;&lt;br /&gt;And I apologise in advance: this post is mostly a bunch of implementation details. We'll get back to some mathematics next time. Nonetheless, by the end of this post you'll have working code to test the validty of propositions of provability logic.&lt;br /&gt;&lt;br /&gt;Central to tableau algorithms is pattern matching. I'd like the Haskell pattern matcher do much of the work, and to increase its flexibility I'll need this extension:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; {-# LANGUAGE ViewPatterns #-}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We'll need these libraries too:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; import Control.Applicative&lt;br /&gt;&amp;gt; import Control.Arrow&lt;br /&gt;&amp;gt; import Control.Monad&lt;br /&gt;&amp;gt; import Data.Function&lt;br /&gt;&amp;gt; import List&lt;br /&gt;&amp;gt; import Maybe&lt;br /&gt;&amp;gt; import Text.Show&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;Logical Propositions&lt;/b&gt;&lt;br /&gt;Now we'll need a bunch of logical operators. The first three are constructors for a proposition type and the rest are sugar to make expressions look nicer:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; infixr 1 :-&amp;gt;&lt;br /&gt;&amp;gt; infixr 2 :\/&lt;br /&gt;&amp;gt; infixr 3 :/\&lt;br /&gt;&lt;br /&gt;&amp;gt; infixr 1 --&amp;gt;&lt;br /&gt;&amp;gt; infixr 1 &amp;lt;--&lt;br /&gt;&amp;gt; infixr 1 &amp;lt;-&amp;gt;&lt;br /&gt;&amp;gt; infixr 2 \/&lt;br /&gt;&amp;gt; infixr 3 /\&lt;br /&gt;&lt;br /&gt;&amp;gt; (\/)    = (:\/)&lt;br /&gt;&amp;gt; (/\)    = (:/\)&lt;br /&gt;&amp;gt; (--&amp;gt;)   = (:-&amp;gt;)&lt;br /&gt;&amp;gt; (&amp;lt;--)   = flip (:-&amp;gt;)&lt;br /&gt;&amp;gt; p &amp;lt;-&amp;gt; q = (p :-&amp;gt; q) :/\ (q :-&amp;gt; p)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Here's our basic proposition type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Prop = Letter String&lt;br /&gt;&amp;gt;           | Prop :\/ Prop&lt;br /&gt;&amp;gt;           | Prop :/\ Prop&lt;br /&gt;&amp;gt;           | Prop :-&amp;gt; Prop&lt;br /&gt;&amp;gt;           | Box Prop&lt;br /&gt;&amp;gt;           | Dia Prop&lt;br /&gt;&amp;gt;           | F&lt;br /&gt;&amp;gt;           | T&lt;br /&gt;&amp;gt;           | Neg Prop deriving (Eq, Ord)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;I want &lt;tt&gt;show&lt;/tt&gt; to know about operator precedence for propositions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Show Prop where&lt;br /&gt;&amp;gt;     showsPrec p (a :/\ b) = showParen (p&amp;gt;3) $ showsPrec 3 a . showString " /\\ " . showsPrec 3 b&lt;br /&gt;&amp;gt;     showsPrec p (a :\/ b) = showParen (p&amp;gt;2) $ showsPrec 2 a . showString " \\/ " . showsPrec 2 b&lt;br /&gt;&amp;gt;     showsPrec p (a :-&amp;gt; b) = showParen (p&amp;gt;1) $ showsPrec 1 a . showString " --&amp;gt; " . showsPrec 1 b&lt;br /&gt;&amp;gt;     showsPrec p (Neg r)   = showParen (p&amp;gt;4) $ showString "Neg " . showsPrec 5 r&lt;br /&gt;&amp;gt;     showsPrec p (Box r)   = showParen (p&amp;gt;4) $ showString "Box " . showsPrec 5 r&lt;br /&gt;&amp;gt;     showsPrec p (Dia r)   = showParen (p&amp;gt;4) $ showString "Dia " . showsPrec 5 r&lt;br /&gt;&amp;gt;     showsPrec p (Letter n)= showParen (p&amp;gt;5) $ showsPrec 6 n&lt;br /&gt;&amp;gt;     showsPrec p T         = showString "T"&lt;br /&gt;&amp;gt;     showsPrec p F         = showString "F"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Some simple rules for simplification of some logical expressions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; simplify p = let simplify' (a :\/ F) = a&lt;br /&gt;&amp;gt;                  simplify' (F :\/ b) = b&lt;br /&gt;&amp;gt;                  simplify' (a :/\ T) = a&lt;br /&gt;&amp;gt;                  simplify' (T :/\ b) = b&lt;br /&gt;&amp;gt;                  simplify' (a :\/ T) = T&lt;br /&gt;&amp;gt;                  simplify' (T :\/ b) = T&lt;br /&gt;&amp;gt;                  simplify' (a :/\ F) = F&lt;br /&gt;&amp;gt;                  simplify' (F :/\ b) = F&lt;br /&gt;&amp;gt;                  simplify' (F :-&amp;gt; b) = T&lt;br /&gt;&amp;gt;                  simplify' (T :-&amp;gt; b) = b&lt;br /&gt;&amp;gt;                  simplify' (a :-&amp;gt; F) = Neg a&lt;br /&gt;&amp;gt;                  simplify' (a :-&amp;gt; T) = T&lt;br /&gt;&amp;gt;                  simplify' (Neg T) = F&lt;br /&gt;&amp;gt;                  simplify' (Neg F) = T&lt;br /&gt;&amp;gt;                  simplify' (Box T) = T&lt;br /&gt;&amp;gt;                  simplify' (Dia F) = F&lt;br /&gt;&amp;gt;                  simplify' z = z&lt;br /&gt;&amp;gt;    in case p of&lt;br /&gt;&amp;gt;        a :/\ b -&amp;gt; let a' = simplify a&lt;br /&gt;&amp;gt;                       b' = simplify b&lt;br /&gt;&amp;gt;                   in simplify' (a' :/\ b')&lt;br /&gt;&amp;gt;        a :\/ b -&amp;gt; let a' = simplify a&lt;br /&gt;&amp;gt;                       b' = simplify b&lt;br /&gt;&amp;gt;                   in simplify' (a' :\/ b')&lt;br /&gt;&amp;gt;        a :-&amp;gt; b -&amp;gt; simplify' (simplify a :-&amp;gt; simplify b)&lt;br /&gt;&amp;gt;        Box a   -&amp;gt; simplify' (Box (simplify a))&lt;br /&gt;&amp;gt;        Dia a   -&amp;gt; simplify' (Dia (simplify a))&lt;br /&gt;&amp;gt;        Neg (Neg a) -&amp;gt; simplify a&lt;br /&gt;&amp;gt;        a           -&amp;gt; a&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;Kinds of Proposition&lt;/b&gt;&lt;br /&gt;I'm actually going to use more than one proposition type. So when we do case analysis I need to make my patterns more abstract so they can work with multiple types. I'm going to use the &lt;tt&gt;PropType&lt;/tt&gt; type to represent the ways we're going to classify logical propositions. The types are:&lt;br /&gt;&lt;br /&gt;1. &lt;tt&gt;Atomic&lt;/tt&gt;: A single letter or its negation&lt;br /&gt;2. &lt;tt&gt;Constant&lt;/tt&gt;: Simply &lt;tt&gt;T&lt;/tt&gt; or &lt;tt&gt;F&lt;/tt&gt; or a negation thereof.&lt;br /&gt;3. &lt;tt&gt;DoubleNegation&lt;/tt&gt;.&lt;br /&gt;4. &lt;tt&gt;Disjunction&lt;/tt&gt;: used to represent things like a&amp;and;b or &amp;not;(a&amp;or;b).&lt;br /&gt;5. &lt;tt&gt;Conjunction&lt;/tt&gt;: used to represent things like a&amp;or;b or &amp;not;(a&amp;and;b).&lt;br /&gt;6. &lt;tt&gt;Provability&lt;/tt&gt;: These are statements about provability like those starting with &amp;#x25fb; or &amp;not;&amp;#x25ca;.&lt;br /&gt;7. &lt;tt&gt;Consistency&lt;/tt&gt;: These are statements about consistency like those starting with &amp;#x25ca; or &amp;not;&amp;#x25fb;.&lt;br /&gt;&lt;br /&gt;As this type is a simple container we can make it a functor too.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data PropType a = Atomic a&lt;br /&gt;&amp;gt;                 | Constant a&lt;br /&gt;&amp;gt;                 | DoubleNegation a&lt;br /&gt;&amp;gt;                 | Disjunction a a&lt;br /&gt;&amp;gt;                 | Conjunction a a&lt;br /&gt;&amp;gt;                 | Provability a&lt;br /&gt;&amp;gt;                 | Consistency a&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Functor PropType where&lt;br /&gt;&amp;gt;     fmap f (Atomic a)         = Atomic (f a)&lt;br /&gt;&amp;gt;     fmap f (Constant a)       = Constant (f a)&lt;br /&gt;&amp;gt;     fmap f (DoubleNegation a) = DoubleNegation (f a)&lt;br /&gt;&amp;gt;     fmap f (Provability a)    = Provability (f a)&lt;br /&gt;&amp;gt;     fmap f (Consistency a)    = Consistency (f a)&lt;br /&gt;&amp;gt;     fmap f (Conjunction a b)  = Conjunction (f a) (f b)&lt;br /&gt;&amp;gt;     fmap f (Disjunction a b)  = Disjunction (f a) (f b)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;I'll introduce a typeclass that will allow us to use the &lt;tt&gt;PropType&lt;/tt&gt; view to query propositions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; class PropTypeable a where&lt;br /&gt;&amp;gt;     propType :: a -&amp;gt; PropType a&lt;br /&gt;&amp;gt;     neg      :: a -&amp;gt; a&lt;br /&gt;&amp;gt;     isF      :: a -&amp;gt; Bool&lt;br /&gt;&amp;gt;     negative :: a -&amp;gt; Bool&lt;br /&gt;&amp;gt;     positiveComponent :: a -&amp;gt; Prop&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And now here we have the cases that I summarised in English above:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance PropTypeable Prop where&lt;br /&gt;&amp;gt;     propType (a :\/ b)        = Disjunction a b&lt;br /&gt;&amp;gt;     propType (a :/\ b)        = Conjunction a b&lt;br /&gt;&amp;gt;     propType (Neg (a :\/ b))  = Conjunction (Neg a) (Neg b)&lt;br /&gt;&amp;gt;     propType (Neg (a :/\ b))  = Disjunction (Neg a) (Neg b)&lt;br /&gt;&amp;gt;     propType (a :-&amp;gt; b)        = Disjunction (Neg a) b&lt;br /&gt;&amp;gt;     propType (Neg (a :-&amp;gt; b))  = Conjunction a (Neg b)&lt;br /&gt;&amp;gt;     propType (Neg (Neg a))    = DoubleNegation a&lt;br /&gt;&amp;gt;     propType (Box a)          = Provability a&lt;br /&gt;&amp;gt;     propType (Neg (Box a))    = Consistency (Neg a)&lt;br /&gt;&amp;gt;     propType (Dia a)          = Consistency a&lt;br /&gt;&amp;gt;     propType (Neg (Dia a))    = Provability (Neg a)&lt;br /&gt;&amp;gt;     propType (Letter a)       = Atomic (Letter a)&lt;br /&gt;&amp;gt;     propType (Neg (Letter a)) = Atomic (Neg (Letter a))&lt;br /&gt;&amp;gt;     propType T                = Constant T&lt;br /&gt;&amp;gt;     propType F                = Constant F&lt;br /&gt;&amp;gt;     propType (Neg F)          = Constant T&lt;br /&gt;&amp;gt;     propType (Neg T)          = Constant F&lt;br /&gt;&amp;gt;     neg                       = Neg&lt;br /&gt;&amp;gt;     isF F                     = True&lt;br /&gt;&amp;gt;     isF (Neg T)               = True&lt;br /&gt;&amp;gt;     isF _                     = False&lt;br /&gt;&amp;gt;     positiveComponent (Neg a) = a&lt;br /&gt;&amp;gt;     positiveComponent a       = a&lt;br /&gt;&amp;gt;     negative (Neg _)          = True&lt;br /&gt;&amp;gt;     negative _                = False&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;It'll be a while before we need the full generality so it's going to seem like overkill for the moment!&lt;br /&gt;&lt;br /&gt;And some pre-packaged letters for convenience:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; [a, b, c, d, p, q, r, s, t] = map (Letter . return) "abcdpqrst"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We're going to need some operations that act on lists.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;placesWhere&lt;/tt&gt; finds all of the elements of a list for which some predicate holds. Instead of just listing the elements that match, it lists the elements paired with the rest of the list after the matching element is removed. We can think of these pairs as elements and their surrounding context:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; placesWhere p []     = []&lt;br /&gt;&amp;gt; placesWhere p (x:xs) = let r = map (second (x:)) $ placesWhere p xs&lt;br /&gt;&amp;gt;                        in if p x then ((x, xs) : r) else r&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;This finds something in the intersection of two sets using a given 'equality' predicate for matching. As we may be using a predicate different from &lt;tt&gt;==&lt;/tt&gt; we need to see both of the (possibly different) elements that satisfy the predicate.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; findIntersection eq a b = listToMaybe [(x, y) | x &amp;lt;- a, y &amp;lt;- b, x `eq` y]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Sometimes we'll meet propositions that we can start to reason about using ordinary propositional calculus. These will match the &lt;tt&gt;propositional&lt;/tt&gt; predicate:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; propositional (propType -&amp;gt; DoubleNegation _) = True&lt;br /&gt;&amp;gt; propositional (propType -&amp;gt; Conjunction _ _)  = True&lt;br /&gt;&amp;gt; propositional (propType -&amp;gt; Disjunction _ _)  = True&lt;br /&gt;&amp;gt; propositional _                              = False&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;On the other hand the &lt;tt&gt;provability&lt;/tt&gt; predicate is used to identify propositions that need rules pertaining to provability and consistency:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; provability (propType -&amp;gt; Provability _) = True&lt;br /&gt;&amp;gt; provability (propType -&amp;gt; Consistency _) = True&lt;br /&gt;&amp;gt; provability _                           = False&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;The Algorithm&lt;/b&gt;&lt;br /&gt;And now we're almost ready to implement the tableau rules. Because we'll be using tableaux in a number of different ways I need lots of hooks into the algorithm that can perform different operations. I've collected all of these hooks into a single type. The algorithm will take a proposition of type &lt;tt&gt;prop&lt;/tt&gt; (which will be &lt;tt&gt;Prop&lt;/tt&gt; for the first three cases) and produce something of type &lt;tt&gt;result&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data TableauRules prop result = TableauRules {&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;If our result corresponds to a world that is self-contradictory it is said to close. Here's how we indicate a closed (and hence not really existing) world. In the simplest case we won't actually store any information about a world, just whether or not it closes. So &lt;tt&gt;closes&lt;/tt&gt; will be the identity function:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     closes :: result -&amp;gt; Bool,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Occasionally we'll find a world with something obviously equivalent to &lt;tt&gt;F&lt;/tt&gt; in it. It closes. Here's what we want to return in that case. The argument is the offending proposition:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     foundF :: prop -&amp;gt; result,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Sometimes we'll find a pair that obviosuly contradict, like a and &amp;not;a:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     foundContradiction :: (prop, prop) -&amp;gt; result,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And sometimes we'll find an open world (ie. a real non-closed one). This function gets handed the list of propositions that have been found to hold in it:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     open :: [prop] -&amp;gt; result,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Here's what we do when we find a conjunction. I hope you remember that when we meet a conjunction we can delete it and replace it with the two subpropositions. &lt;tt&gt;conjRule&lt;/tt&gt; is handed the subpropositions as well as the result from proceeding with the tableau rule for conjunctions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     conjRule :: prop -&amp;gt; prop -&amp;gt; result -&amp;gt; result,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Disjunctions work a little differently. When handling a&amp;or;b, say, we need to handle two subtableaux, one with a and one with b. The first argument to &lt;tt&gt;disjRule rules&lt;/tt&gt; is the disjunction itself, the next two are the left and right subpropositions, and the last two arguments are the results of continuing the two subtableaux.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     disjRule :: prop -&amp;gt; prop -&amp;gt; prop -&amp;gt; result -&amp;gt; result -&amp;gt; result,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;With &lt;tt&gt;doubleNegation&lt;/tt&gt; we get to see propositions that have undergone double negation elimination.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     doubleNegation :: prop -&amp;gt; result -&amp;gt; result,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;When we use &lt;tt&gt;Dia&lt;/tt&gt; to open new worlds we need to ensure that each of these subworlds is valid. Each subworld is processed with &lt;tt&gt;processWorld&lt;/tt&gt; . For example, when we're drawing tableau diagrams we can use this book to draw a frame around the subtableaux. We then fold together these subworlds using &lt;tt&gt;combineWorlds&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     processWorld  :: prop -&amp;gt; result -&amp;gt; result,&lt;br /&gt;&amp;gt;     combineWorlds :: result -&amp;gt; result -&amp;gt; result,&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Lastly we have our driver function that kicks off the whole tableau algorithm on a list of propositions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;     tableau :: [prop] -&amp;gt; result -&amp;gt; result&lt;br /&gt;&amp;gt; }&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can now implement a tableau computation algorithm that supports these hooks. We'll start with the check for whether or not we have an immediate contradiction:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; simpleClosure rules ps = case find isF ps of&lt;br /&gt;&amp;gt;    Just a  -&amp;gt; foundF rules a&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Split the propositions into those that are negated and those that are not. We're looking for propositions in one part that directly contradict propositions in the other:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;    Nothing -&amp;gt;&lt;br /&gt;&amp;gt;      let (neg, pos) = partition negative ps&lt;br /&gt;&amp;gt;          maybePair = findIntersection ((==) `on` positiveComponent) neg pos&lt;br /&gt;&amp;gt;      in case maybePair of&lt;br /&gt;&amp;gt;          Just pair -&amp;gt; foundContradiction rules pair&lt;br /&gt;&amp;gt;          Nothing   -&amp;gt; open rules ps&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Double negation elimination is straightforward to apply. Delete the original proposition and replace it with the version without double negation:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; applyDNeg rules p a props = doubleNegation rules a $&lt;br /&gt;&amp;gt;   applyPropositional rules (a : delete p props)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The rule for handling conjunctions. We delete the conjunction from the current list of propositions and replace it with the two subpropositions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; applyConj rules p a b props = conjRule rules a b $&lt;br /&gt;&amp;gt;   applyPropositional rules (a : b : delete p props)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Disjunctions require running two separate subtableaux:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; applyDisj rules p a b props =&lt;br /&gt;&amp;gt;    let props' = delete p props&lt;br /&gt;&amp;gt;        left   = applyPropositional rules (a : props')&lt;br /&gt;&amp;gt;        right  = applyPropositional rules (b : props')&lt;br /&gt;&amp;gt;    in disjRule rules p a b left right&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Here we tie together the rules for propositional calculus. We use a bit of case analysis to decide which rule to apply, and if no rule applies we try the provability rules instead:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; applyPropositional rules props =&lt;br /&gt;&amp;gt;     let t = simpleClosure rules props in if closes rules t&lt;br /&gt;&amp;gt;         then t&lt;br /&gt;&amp;gt;         else case find propositional props of&lt;br /&gt;&amp;gt;             Nothing -&amp;gt; applyProvability t rules props&lt;br /&gt;&amp;gt;             Just p  -&amp;gt; case p of&lt;br /&gt;&amp;gt;                 (propType -&amp;gt; DoubleNegation q) -&amp;gt; applyDNeg rules p q props&lt;br /&gt;&amp;gt;                 (propType -&amp;gt; Conjunction a b)  -&amp;gt; applyConj rules p a b props&lt;br /&gt;&amp;gt;                 (propType -&amp;gt; Disjunction a b)  -&amp;gt; applyDisj rules p a b props&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;When we've exhausted all possible rules from propositional calculus we scan for propositions like &lt;tt&gt;Dia p&lt;/tt&gt; or &lt;tt&gt;Neg (Box p)&lt;/tt&gt;. These may imply the existence of subworlds. We then try to instantiate these subworlds, seeded according to the rules I gave in the previous article:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; applyProvability t rules props =&lt;br /&gt;&amp;gt;     let impliedWorlds = placesWhere consistency props&lt;br /&gt;&lt;br /&gt;&amp;gt;         consistency (propType -&amp;gt; Consistency _) = True&lt;br /&gt;&amp;gt;         consistency _ = False&lt;br /&gt;&lt;br /&gt;&amp;gt;         testWorld (p@(propType -&amp;gt; Consistency q), props) =&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;In the following line, &lt;tt&gt;neg p&lt;/tt&gt; corresponds to the application of Löb's Theorem. &lt;tt&gt;provabilities&lt;/tt&gt; is the list of propositions inherited by a subworld from statements about provability in the parent:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;              let tableau = runTableau rules (q : neg p : provabilities)&lt;br /&gt;&amp;gt;                  provabilities = do &lt;br /&gt;&amp;gt;                      p@(propType -&amp;gt; Provability q) &amp;lt;- props&lt;br /&gt;&amp;gt;                      [p, q]&lt;br /&gt;&amp;gt;              in processWorld rules p tableau&lt;br /&gt;&lt;br /&gt;&amp;gt;     in foldr (combineWorlds rules) t (map testWorld impliedWorlds)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And finally, here's where we kick our algorithm off:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; runTableau rules props = tableau rules props $ applyPropositional rules props&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;Testing Validity&lt;/b&gt;&lt;br /&gt;We can now use a set of simple rules to test the validity of propositions. As mentioned above, the return data is a &lt;tt&gt;Bool&lt;/tt&gt; used to indicate whether a subworld was invalid. By and large, these rules do the trivial thing. Note how the rule for disjunction requires both of the alternatives to be invalid in order to completely invalidate, so we use &lt;tt&gt;(&amp;amp;&amp;amp;)&lt;/tt&gt;. But when considering subworlds, just one bad subworlds is enough to invalidate a world:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; validRules = TableauRules {&lt;br /&gt;&amp;gt;     closes = id,&lt;br /&gt;&amp;gt;     open   = \_ -&amp;gt; False,&lt;br /&gt;&lt;br /&gt;&amp;gt;     foundF             = \_ -&amp;gt; True,&lt;br /&gt;&amp;gt;     foundContradiction = \_ -&amp;gt; True,&lt;br /&gt;&lt;br /&gt;&amp;gt;     conjRule       = \_ _ t -&amp;gt; t,&lt;br /&gt;&amp;gt;     disjRule       = \_ _ _ -&amp;gt; (&amp;amp;&amp;amp;),&lt;br /&gt;&amp;gt;     doubleNegation = \_ t -&amp;gt; t,&lt;br /&gt;&lt;br /&gt;&amp;gt;     combineWorlds = (||),&lt;br /&gt;&amp;gt;     processWorld  = \_ t -&amp;gt; t,&lt;br /&gt;&lt;br /&gt;&amp;gt;     tableau = \_ t -&amp;gt; t&lt;br /&gt;&amp;gt; }&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can now write a simple validty test. We negate the proposition we're interested in and test whether the implied world closes:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; valid p = runTableau validRules [neg p]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Here's a small regression test to ensure everything works. It's just a bunch of examples that I worked out by hand or lifted from Boolos's book:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; valids = [&lt;br /&gt;&amp;gt;         T,&lt;br /&gt;&amp;gt;         a :-&amp;gt; a,&lt;br /&gt;&amp;gt;         Box a :-&amp;gt; Box a,&lt;br /&gt;&amp;gt;         Box a :-&amp;gt; Box (Box a),&lt;br /&gt;&amp;gt;         Box (Box a :-&amp;gt; a) :-&amp;gt; Box a,&lt;br /&gt;&amp;gt;         Box F &amp;lt;-&amp;gt; Box (Dia T),&lt;br /&gt;&amp;gt;         let x = p :/\ q :-&amp;gt; r :-&amp;gt; a in Box (Box x :-&amp;gt; x) :-&amp;gt; Box x,&lt;br /&gt;&amp;gt;         F :-&amp;gt; Dia p,&lt;br /&gt;&amp;gt;         Box (Dia p) :-&amp;gt; Box (Box F :-&amp;gt; F),&lt;br /&gt;&amp;gt;         (Box F \/ q /\ Dia (Box F /\ Neg q)) &amp;lt;-&amp;gt;&lt;br /&gt;&amp;gt;           (Dia (Box F \/ q /\ Dia (Box F /\ Neg q))&lt;br /&gt;&amp;gt;           --&amp;gt; q /\ Neg (Box (Box F \/ q /\ Dia (Box F /\ Neg q)&lt;br /&gt;&amp;gt;           --&amp;gt; q)))&lt;br /&gt;&amp;gt;     ]&lt;br /&gt;&lt;br /&gt;&amp;gt; invalids = [&lt;br /&gt;&amp;gt;         F,&lt;br /&gt;&amp;gt;         a :-&amp;gt; Box a,&lt;br /&gt;&amp;gt;         Box a :-&amp;gt; a,&lt;br /&gt;&amp;gt;         Box (Box a :-&amp;gt; a) :-&amp;gt; a,&lt;br /&gt;&amp;gt;         Dia T,&lt;br /&gt;&amp;gt;         Box (Dia T),&lt;br /&gt;&amp;gt;         Neg (Box F),&lt;br /&gt;&amp;gt;         (Box F \/ p /\ Dia (Box F /\ Neg q)) &amp;lt;-&amp;gt;&lt;br /&gt;&amp;gt;           (Dia (Box F \/ q /\ Dia (Box F /\ Neg q))&lt;br /&gt;&amp;gt;           --&amp;gt; q /\ Neg (Box (Box F \/ q /\ Dia (Box F /\ Neg q)&lt;br /&gt;&amp;gt;           --&amp;gt; q)))&lt;br /&gt;&amp;gt;     ]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;If everything is working, &lt;tt&gt;regress1&lt;/tt&gt; should give the result &lt;tt&gt;True&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; regress1 = do&lt;br /&gt;&amp;gt;     print $ (and $ map valid valids) &amp;amp;&amp;amp;&lt;br /&gt;&amp;gt;             (and $ map (not . valid) invalids)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;That's enough implementation. In the next installment we'll start putting this code to work.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;References&lt;/b&gt;&lt;br /&gt;This code is an implementation of the algorithm described in Chapter 10 of &lt;a href="http://www.amazon.com/gp/product/0521483255?ie=UTF8&amp;tag=sigfpe-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0521483255"&gt;The Logic of Provability&lt;/a&gt;. See that book if you want (1) a proof that the above algorithm always terminates and (2) that it really does correctly decide the validity of propositions of provability logic.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-6550716211953466047?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/6550716211953466047/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=6550716211953466047' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6550716211953466047'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6550716211953466047'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/12/generalising-godels-theorem-with_24.html' title='Generalising Gödel&apos;s Theorem with Multiple Worlds. Part II.'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-6530290512510321373</id><published>2010-12-11T15:58:00.000-08:00</published><updated>2010-12-20T18:01:01.627-08:00</updated><title type='text'>Generalising Gödel's Theorem with Multiple Worlds. Part I.</title><content type='html'>&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;In his first &lt;a href="http://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_theorems"&gt;incompleteness theorem&lt;/a&gt;, Gödel showed us that we can construct a sentence that denies its own provability. In his second incompleteness theorem he showed that an example of such a sentence is the one that asserts the inconsistency of arithmetic. If arithmetic is consistent then it can't prove its own consistency. On the other hand, if arithmetic is inconsistent then we can prove anything, and hence we can prove its consistency.&lt;br /&gt;&lt;br /&gt;Can we generalise what Gödel did? For example, can we construct sentences that we can prove assert their own provability? What about sentences that deny that their provability is provable? Or what about sentences that assert that if they're provable then it's not provable that it's inconsistent that they imply that they're inconsistent with the rest of arithmetic?&lt;br /&gt;&lt;br /&gt;Not only can we do these things, we can also write a computer program that generates such theorems for us. We can do so by working with the idea that a consistent set of axioms describes a world, and that a set of axioms able to talk about sets of axioms describes a world within a world, and a set of axioms that...well you can guess how it goes. A bit like Inception really.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Gödel's Theorem&lt;/b&gt;&lt;br /&gt;Briefly, Gödel's first incompleteness theorem goes like this: we work with &lt;a href="http://en.wikipedia.org/wiki/Peano_axioms"&gt;PA&lt;/a&gt;, the logical system built from Peano's Axioms. Within PA we can state and prove theorems about arithmetic like the fact that 1+2=3 or that for any prime there is always another greater prime. But even though PA is about numbers, we can talk about other things if we can encode them as numbers. In particular, Gödel came up with a scheme to encode propositions of PA as numbers. We use [P] to represent the number for the proposition P in Gödel's scheme. A proof is essentially just a list of propositions where each one is connected to some earlier ones by simple mechanical rules. These rules can be turned into arithmetical statements about the Gödel numbers for the propositions. So in the language of PA it is possible to assert that a particular number is the Gödel number of a provable proposition. Let Prov(n) denote the proposition of PA that says that n is the Gödel number of a provable proposition. Prov(n) is a proposition of PA and so is ¬Prov(n). Imagine we could find a proposition G with the property that G↔¬Prov([G]). It would assert its own unprovability. But it also appears to involve stuffing a representation of the Gödel number of G within G as well as all the rules for determining provability. Amazingly Gödel figured out how to do this (using tricks not unlike those used for quining). If G were false, G would be provable. So if we trust PA as a means of reasoning about proofs in PA, then G is true, though it can't be proved using the axioms of PA.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Provability Logic&lt;/b&gt;&lt;br /&gt;We're going to be interested specifically in provability, so we don't need all of the power of PA. So we're going to work with a simplified domain specific logic called GL (for Gödel-Löb), otherwise known as &lt;a href="http://en.wikipedia.org/wiki/Provability_logic"&gt;Provability Logic&lt;/a&gt;. The idea is that sentences of GL will be shorthand for classes of statement in PA. GL will contain &lt;a href="http://en.wikipedia.org/wiki/Propositional_calculus"&gt;propositional calculus&lt;/a&gt;. So here's an example statement in GL: p∧q. The unknowns p and q represent propositions of PA and statements of GL are considered valid if they're provable whatever statements of PA they represent. For example, we could assign p="there are at least 10 primes" and q="7&amp;gt;2", in which case p∧q holds. But we could assign p="there are just 10 primes" in which case it's false. So p∧q isn't a valid proposition of GL as it doesn't hold for all p and q. On the other hand p→p∨q is valid because no matter what crazy propositions we assign to p and q, p→p∨q is true. (Of course, when I say "there are at least 10 primes", I mean a long and complicated sentence of PA that amounts to the same thing.)&lt;br /&gt;&lt;br /&gt;But there's more to GL than propositional calculus. It also has the one-argument predicate ◻ which asserts that its argument is provable. More precisely, ◻p says that whatever proposition of PA p represents, let's say it's P, we have Prov([P]). It's just shorthand. Here's another example: ◻(q→◻p) says that whatever P we assign to p, and whatever Q we assign to q, Prov([Q →Prov([P])]). Or in English it says "for any propositions P and Q, it is provable that Q implies that P is provable".&lt;br /&gt;&lt;br /&gt;We'll use ⊤ and ⊥ from propositional calculus as the always true and always false propositions in the usual way. ◻⊥ is the assertion that we can prove ⊥. In other words it's the assertion that PA is inconsistent. So now we can state Gödel's second incompleteness theorem: ¬◻⊥→¬◻¬◻⊥. If PA is consistent, we can't prove it.&lt;br /&gt;&lt;br /&gt;We can also introduce the symbol ◊. This is just shorthand for ¬◻¬. ◊p says that it's not provable that p is false. In other words, it says that p is consistent with the rest of PA. So ◻ is provability and ◊ is consistency.&lt;br /&gt;&lt;br /&gt;A set of assignments of propositions of PA to a bunch of letters in GL can be thought of as a world. For example, we can imagine a world in which p="2&amp;gt;1" and q="2&amp;lt;1". In this world, we have p and ¬q. The GL proposition ◻p says that we have a proof of p so it must be true in all worlds. Conversely, ◊p says that it's not true there is no world where p holds. In other words, it asserts the existence of a world where p holds. So worlds can talk about other worlds. If we have ◊◊p in some world, then it's asserting that there's another world in which ◊p holds. In other words, ◊◊p asserts there is a world in which it is asserted that there is another world where p holds. We can draw pictures to represent this. If a world has propositions that talk about another world then we draw the talked about world as a kind of subworld. Here's how we can picture ◊◊p:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPRquQ3BVI/AAAAAAAAAlc/5_cTlQWF450/s1600/dia_dia_p.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPRquQ3BVI/AAAAAAAAAlc/5_cTlQWF450/s320/dia_dia_p.png" width="266" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Worlds are assignments of propositions of PA to letters of GL. But most of the time we won't particularly care about specific propositions themselves like "2&amp;gt;1". We'll be more interested in what truth assignments we can make to the propositions represented by the letters. So we can think of a world as a place where the letters of GL have been assigned truth values, true or false. And we can think of each world as containing subworlds consisting of propositions that can be proved or disproved by their parent worlds.&lt;br /&gt;&lt;br /&gt;I'm going to spend most of my time looking at how we can explore and unfold all of the implications contained in a world.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Rules of the Game&lt;/b&gt;&lt;br /&gt;We can give some rules. If a world contains p∧q like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPd-eyP7xI/AAAAAAAAAls/Qf1dsaS9HY4/s1600/p_and_q.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPd-eyP7xI/AAAAAAAAAls/Qf1dsaS9HY4/s1600/p_and_q.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;then it must contain both p and q as well:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_UdKHLrHa05M/TQPd5daBX_I/AAAAAAAAAlo/sL5s4ylFljo/s1600/p_and_q2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_UdKHLrHa05M/TQPd5daBX_I/AAAAAAAAAlo/sL5s4ylFljo/s1600/p_and_q2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;I crossed out the p∧q as it became redundant. It's important to note that when I wrote p∧q we could have any propositions of GL standing in for p and q. So a world containing (p∨q)∧r must also contain p∨q and r. Other rules may apply too. If we had p∧q and r∧s we can unfold both to get p, q, r and s. This rule also kicks in if it applies to negated propositions that become conjunctions if we "push down" the negation using de Morgan's laws. For example if a world contains ¬(p→q) then it also contains p and ¬q.&lt;br /&gt;&lt;br /&gt;If a world contains p∨q then we don't know exactly what it looks like.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPgLaM6kRI/AAAAAAAAAl0/NvenBo5K0BU/s1600/p_or_q.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPgLaM6kRI/AAAAAAAAAl0/NvenBo5K0BU/s1600/p_or_q.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;There are two possibilities. We can draw both together like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_UdKHLrHa05M/TQPgLjfXllI/AAAAAAAAAl4/iiFd9g5788c/s1600/p_or_q2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_UdKHLrHa05M/TQPgLjfXllI/AAAAAAAAAl4/iiFd9g5788c/s1600/p_or_q2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The line in the middle means that the world either looks like what's on the left or it looks like what's on the right. Once we've indicated there are two alternatives then the original proposition became redundant again. When we have a vertical line like this then everything above the vertical line applies in both the left and right possibilities. This saves us having to split our world into two and copy all of our formulae to both sides. Like in the case of conjunctions this rule also kicks in for other kinds of disjunctions. So a world containing p→q splits into two separate worlds headed by ¬p and q.&lt;br /&gt;&lt;br /&gt;If a world contains ⊥, or a contradiction, in means that it wasn't really a valid world after all. If we meet a world like this, we know that our starting point must have been been self-contradictory. If a world implies the existence of a subworld that isn't valid, then it must itself be invalid:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_UdKHLrHa05M/TQPhMFVMkmI/AAAAAAAAAmA/pToBmBvLONs/s1600/bot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_UdKHLrHa05M/TQPhMFVMkmI/AAAAAAAAAmA/pToBmBvLONs/s1600/bot.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;On the other hand, if we've split a world into two possibilities because we found p∨q then even if one branch is invalid the world might still be valid if the other branch is valid.&lt;br /&gt;&lt;br /&gt;If we have a world containing ◻p:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPj_NQrKbI/AAAAAAAAAmU/RB-rd4W5FxE/s1600/box_p.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPj_NQrKbI/AAAAAAAAAmU/RB-rd4W5FxE/s1600/box_p.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;then we know that p must be in every subworld. Actually, we know it must also hold in any subsubworld too, all the way down. This is because if we can prove something we can then use that proof directly to form a constructive proof that we can prove it. So that means ◻p holds in every subworld too:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_UdKHLrHa05M/TQPlHveTOjI/AAAAAAAAAmg/9OdRp278Cco/s1600/box_p3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_UdKHLrHa05M/TQPlHveTOjI/AAAAAAAAAmg/9OdRp278Cco/s1600/box_p3.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;We treat negation as above. So if we see ¬◻p we treat it like ◊¬p.&lt;br /&gt;&lt;br /&gt;As we have also said, if we have a world with a ◊p in it then it must contain a subworld with p in it. Note that ◊ implies the existence of subworlds, but ◻ doesn't. It just tells us what must be in them if they exist.&lt;br /&gt;&lt;br /&gt;But there's one last rule we'll need. It's &lt;a href="http://en.wikipedia.org/wiki/Lob's_theorem"&gt;Lob's theorem&lt;/a&gt;. This is the big theorem on which everything else I say depends. It states that ◻(◻p →p) →◻p. If we can prove that proving something implies its truth then we can prove it. I could sketch a proof here, but I highly recommend the cartoon proof &lt;a href="http://lesswrong.com/lw/t6/the_cartoon_guide_to_l%C3%B6bs_theorem/"&gt;here&lt;/a&gt;. In a way, Löb's theorem is a bit sad. The raison d'etre of mathematics is that we can use proofs to be sure of things. In other words, we take for granted that ◻p→p. But if we could prove this then we could prove ◻p, even if p were false! ◻p→p is not valid!&lt;br /&gt;&lt;br /&gt;(Philosophical digression: Mathematicians assume ◻p→p. They don't assume ◻p→p because there's a proof. They assume it because experience has shown it to work. So I claim that ◻p→p is an empirical fact that we learn by scientific induction. That's controversial because I'm basically saying that much of mathematics is empirical - at least it is if you talk about truth rather than proof. If you disagree, don't worry about it. The rest of what I say here is independent of this digression.)&lt;br /&gt;&lt;br /&gt;Anyway, we can flip Löb's theorem around to get ◊p →◊(p ∧¬◊p). So if we have a world in which ◊p holds:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_UdKHLrHa05M/TQPjF3VjPBI/AAAAAAAAAmI/Sb2HxJrZjxY/s1600/dia_p.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_UdKHLrHa05M/TQPjF3VjPBI/AAAAAAAAAmI/Sb2HxJrZjxY/s1600/dia_p.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;then we have a subworld in which both p and ¬◊p hold:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_UdKHLrHa05M/TQPjHvfx1UI/AAAAAAAAAmM/zM97PGKmjHA/s1600/dia_p2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_UdKHLrHa05M/TQPjHvfx1UI/AAAAAAAAAmM/zM97PGKmjHA/s1600/dia_p2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;And that's all we'll need apart from the obvious rule that we can remove double negation. Suppose we have a proposition of GL. We can use the rules above to extract as many implications as we can. Eventually there will come a point where there is nothing more we can do. If we can do this without hitting a contradiction then we've found a bunch of possible worlds for the proposition. If we can't find a valid world, however, the the original proposition must have been false. You might ask whether or not there any other rules we need in addition to the ones above. Maybe there are other theorems like Löb's theorem that we need to use. Amazingly it can be shown that no other rules are needed. This is a sure-fire terminating algorithm for determining whether or not a proposition of GL is valid! This is a powerful tool. We can now start constructing wild propositions like those I started with and find out whether they are valid. (Note that I've not in any way proved this procedure always works. You'll need to look at &lt;a href="http://www.amazon.com/gp/product/0521483255?ie=UTF8&amp;amp;tag=sigfpe-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0521483255"&gt;Boolos's book&lt;/a&gt; to see why.)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Some Proofs&lt;/b&gt;&lt;br /&gt;Let's work through some examples. First I'll prove something from propositional calculus: (p∧q)∨(p∧r)→p. We start by drawing a world with the negation:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_UdKHLrHa05M/TQPpM_0HVbI/AAAAAAAAAmo/xSz58vngcDc/s1600/proof_a_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_UdKHLrHa05M/TQPpM_0HVbI/AAAAAAAAAmo/xSz58vngcDc/s1600/proof_a_1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Now ¬(a→b) is the same as a∧¬b. So we get:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPpNN25QYI/AAAAAAAAAms/OGYVw5vfVrU/s1600/proof_a2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPpNN25QYI/AAAAAAAAAms/OGYVw5vfVrU/s1600/proof_a2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Now the only way to proceed is to consider the disjunction and consider two alternatives:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_UdKHLrHa05M/TQPpNsI-96I/AAAAAAAAAmw/ftbWv8hvrEI/s1600/proof_a3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_UdKHLrHa05M/TQPpNsI-96I/AAAAAAAAAmw/ftbWv8hvrEI/s1600/proof_a3.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;On both sides of the vertical line we find p. But that contradicts the ¬p we discovered earlier. So there is no way the negation of our original proposition can hold in any world. And therefore the original proposition must be valid.&lt;br /&gt;&lt;br /&gt;How about showing Gödel's second incompleteness theorem this way. ◊⊤ says that ⊤ is consistent with the rest of arithmetic. Ie. it expresses the consistency of arithmetic. The theorem is then ◊⊤→¬◻◊⊤, ie. if PA is consistent, then PA can't prove it is consistent. We'll start with this:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_UdKHLrHa05M/TQPuMlQDAKI/AAAAAAAAAm4/pguO7HHU3wY/s1600/godel1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_UdKHLrHa05M/TQPuMlQDAKI/AAAAAAAAAm4/pguO7HHU3wY/s1600/godel1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;We can deal with the negated implication like before. We can also deal straightforwardly with the double negation:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_UdKHLrHa05M/TQPuNEJLn-I/AAAAAAAAAm8/Y9OxlqaEGfs/s1600/godel2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_UdKHLrHa05M/TQPuNEJLn-I/AAAAAAAAAm8/Y9OxlqaEGfs/s1600/godel2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;There's now no way to proceed except to use the ◊⊤ to open up a new world. Remember that when we use this we must use Löb's theorem as well as inheriting p and ◻p from the parent world's ◻p:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPuNbi7aCI/AAAAAAAAAnA/5G-jgHKKM2c/s1600/godel3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://4.bp.blogspot.com/_UdKHLrHa05M/TQPuNbi7aCI/AAAAAAAAAnA/5G-jgHKKM2c/s320/godel3.png" width="256" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;And we get a contradiction because we have both ◊⊤ and ¬◊⊤. So Gödel's second incompleteness theorem is indeed valid. Note that this isn't a demonstration from scratch. We've shown its validity from Löb's theorem. So this isn't really a useful way to show it's valid. But it *is* a useful way to show the validity of generalisations of Gödel's theorem. Unfortunately, I have to stop for now.&lt;br /&gt;&lt;br /&gt;In the coming posts I'll implement all of the above as a computer program. If there's any ambiguity in what I've said I hope the source code to the program will resolve those ambiguities. What's more, our program won't just test the validity of a proposition but it will draw out a nice picture of our world with its subworlds. So you'll be able to trace through every step to make sure you understand!&lt;br /&gt;&lt;br /&gt;But that's not all. We'll also see how to write a program to illustrate a bunch more theorems like Craig's Interpolation Lemma and Beth's Definability Theorem and then we'll finish with a program that is designed to construct self-referential propositions. In particular, given any self-referential description like "p is a proposition that is equivalent to the proposition that denies that it's provable that p's provability is consistent with p itself" it will solve to find p, even though GL doesn't allow us to directly construct self-referential propositions.&lt;br /&gt;&lt;br /&gt;So let me recap: a world is an assignment of consistent truth values to the letters (and consequently propositions) of GL. Some of these propositions imply the existence of other subworlds with different truth values for these propositions. We draw these worlds as subworlds of the original world. For a world to be valid it mustn't contain any contradictions (unless the world contains a bunch of alternatives in which case just one alternative needs to be valid.) A proposition of GL is valid if unfolding its negation doesn't result in an invalid world.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Exercises&lt;/b&gt;&lt;br /&gt;Which are the following are valid? If you report back any difficulties you have I can incorporate any needed revisions into the description above.&lt;br /&gt;1. p→◻p&lt;br /&gt;2. ◻p→◻◻p&lt;br /&gt;3. ◊p→◊◻◻p&lt;br /&gt;4. ◊(p→◻q) →◊(◻(◊(p∧q)))&lt;br /&gt;5. ◻(p∧q∧r∧s) →◻p∧◻q∧◻r∧◻s&lt;br /&gt;&lt;br /&gt;&lt;b&gt;References&lt;/b&gt;&lt;br /&gt;Most of this stuff is based on Chapter 10 of &lt;a href="http://www.amazon.com/gp/product/0521483255?ie=UTF8&amp;amp;tag=sigfpe-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0521483255"&gt;The Logic of Provability&lt;/a&gt; by Boolos. The idea of using multiple worlds to prove theorems is due to &lt;a href="http://en.wikipedia.org/wiki/Kripke"&gt;Kripke&lt;/a&gt;. I believe the procedure of unfolding the implications of a proposition in the tree-like way I describe above is due to Smullyan.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt;&lt;br /&gt;Solutions to problems:&lt;br /&gt;1. Not valid. Diagram:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_UdKHLrHa05M/TRAJfqGWLbI/AAAAAAAAAnI/lB_Bcnd4V8M/s1600/ex1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_UdKHLrHa05M/TRAJfqGWLbI/AAAAAAAAAnI/lB_Bcnd4V8M/s1600/ex1.png" /&gt;&lt;/a&gt;&lt;/div&gt;2. Valid&lt;br /&gt;3. Valid.&lt;br /&gt;4. Valid.&lt;br /&gt;5. Valid. Diagram:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_UdKHLrHa05M/TRAJ3B1CI9I/AAAAAAAAAnM/ODc4USh_hDk/s1600/ex5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="214" src="http://3.bp.blogspot.com/_UdKHLrHa05M/TRAJ3B1CI9I/AAAAAAAAAnM/ODc4USh_hDk/s320/ex5.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-6530290512510321373?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/6530290512510321373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=6530290512510321373' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6530290512510321373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6530290512510321373'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/12/generalising-godels-theorem-with.html' title='Generalising Gödel&apos;s Theorem with Multiple Worlds. Part I.'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_UdKHLrHa05M/TQPRquQ3BVI/AAAAAAAAAlc/5_cTlQWF450/s72-c/dia_dia_p.png' height='72' width='72'/><thr:total>18</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-1106589231951497985</id><published>2010-11-19T04:50:00.000-08:00</published><updated>2010-11-21T01:01:36.606-08:00</updated><title type='text'>Beating the odds with entangled qubits</title><content type='html'>Quantum mechanics allows the possibility of "spooky action at a distance", correlations between widely separated but simultaneous random events that can't be explained by probability theory. These events look like they secretly communicate with each other, but we also know that quantum mechanics prevents us sending messages faster than the speed of light. Nonetheless, even though we can't exploit non-locality to send messages faster than the speed of light, two cooperating parties can exploit non-locality to perform tasks better than would be possible without non-locality. The CHSH game is one such example.&lt;br /&gt;&lt;br /&gt;My goal here is to write code to emulate the &lt;a href="http://www.nature.com/nature/journal/v466/n7310/fig_tab/4661053a_F1.html"&gt;CHSH game&lt;/a&gt;. It will require reusing the probability and quantum mechanics monads I've used here many times &lt;a href="http://blog.sigfpe.com/2007/03/independence-entanglement-and.html"&gt;before&lt;/a&gt;. So I won't be explaining how these work.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; import Data.Map (toList, fromListWith)&lt;br /&gt;&amp;gt; import Complex&lt;br /&gt;&amp;gt; infixl 7 .*&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The CHSH game is cooperative in the sense that the two players, A and B, are attempting to work together to win. The two players are widely separated. Between the two players is the game show host. The host randomly generates a pair of bits, s and t. s is sent to A and t is sent to B. Neither A nor B gets to see the message sent to their partner. A and B must now simultaneously make their moves, stating a choice of bit.&lt;br /&gt;&lt;br /&gt;Call A's move a and B's move b. A and B win if they can arrange that a XOR b equals s AND t. So, for example, if A receives a true bit, and thinks B has also received a true bit, then A wants to make a move that differs from B's, otherwise A wants to make the same move as B. A and B are allowed to plan as much as they like beforehand but it should be pretty clear that they can't possibly guarantee a win.&lt;br /&gt;&lt;br /&gt;We can formally write the victory condition as:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; victory a b s t = (a `xor` b) == (s &amp;amp;&amp;amp; t)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now there's a pretty good strategy A and B can adopt: three quarters of the time, s AND t will be false. In that case, A and B want their answers to match. So they could simply choose false, regardless of what message the game show host sends them. We can give their strategies as a function of the host's message:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; astrategy s = False&lt;br /&gt;&amp;gt; bstrategy t = False&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now we can simulate our game:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; game = do&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The host picks a random bit to send to each player:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;    s &amp;lt;- 0.5 .* return False + 0.5 .* return True&lt;br /&gt;&amp;gt;    t &amp;lt;- 0.5 .* return False + 0.5 .* return True&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The players now respond to each of their messages:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;    let a = astrategy s&lt;br /&gt;&amp;gt;    let b = bstrategy t&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now we can collect the replies and score the result:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;    let score = victory a b s t&lt;br /&gt;&amp;gt;    return score&lt;br /&gt;&lt;br /&gt;&amp;gt; play1 = collect game&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Running &lt;tt&gt;play1&lt;/tt&gt; gives the expected result that A and B have a 3/4 chance of winning. It's not hard to prove classically that they can do no better than this.&lt;br /&gt;&lt;br /&gt;But in a quantum universe it is possible to do better! We now allow A and B to adopt strategies that involve making measurements of a quantum system. To describe their strategies we need to use the quantum monad. Here is the previous strategy rewritten for this monad. In this case, the argument &lt;tt&gt;b&lt;/tt&gt; is the state of the quantum system they observe. The first element of the pair is the move in the game, the second element is the state the physical system is left in by the player:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; aqstrategy b s = qreturn (False, b)&lt;br /&gt;&amp;gt; bqstrategy b t = qreturn (False, b)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now we can rewrite &lt;tt&gt;game&lt;/tt&gt; to support quantum processes:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; game' aqstrategy bqstrategy bits = do&lt;br /&gt;&amp;gt;    s &amp;lt;- 0.5 .* preturn False + 0.5 .* preturn True&lt;br /&gt;&amp;gt;    t &amp;lt;- 0.5 .* preturn False + 0.5 .* preturn True&lt;br /&gt;&amp;gt;    (score, _, _) &amp;lt;- collect $ observe $ do&lt;br /&gt;&amp;gt;        (abit, bbit) &amp;lt;- bits&lt;br /&gt;&amp;gt;        (a, abit') &amp;lt;- aqstrategy abit s&lt;br /&gt;&amp;gt;        (b, bbit') &amp;lt;- bqstrategy bbit t&lt;br /&gt;&amp;gt;        let score = victory a b s t&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Note that we have to return &lt;tt&gt;abit'&lt;/tt&gt; and &lt;tt&gt;bbit'&lt;/tt&gt; because quantum processes are reversible and can't erase information about a state. Also note that &lt;tt&gt;abit'&lt;/tt&gt; and &lt;tt&gt;bbit'&lt;/tt&gt; can be widely separated in space.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;        qreturn (score, abit', bbit')&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Probabilistic processes can erase whatever they like:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;    preturn score&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The quantum version of a coin toss to generate a random bit, ie. an equal superposition of &lt;tt&gt;False&lt;/tt&gt; and &lt;tt&gt;True&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; coin' = (1/sqrt 2) .* qreturn False + (1/sqrt 2) .* qreturn True&lt;br /&gt;&lt;br /&gt;&amp;gt; bits = do&lt;br /&gt;&amp;gt;   a &amp;lt;- coin'&lt;br /&gt;&amp;gt;   b &amp;lt;- coin'&lt;br /&gt;&amp;gt;   return (a, b)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;So now we play exactly as in &lt;tt&gt;play1&lt;/tt&gt; except that we give each player an independent qubit, which they ignore:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; play2 = collect $ game' aqstrategy bqstrategy bits&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Unsurprisingly, the probability of winning is just the same as before.&lt;br /&gt;&lt;br /&gt;But now we can try something impossible in the classical case. We give each player half of a perfectly correlated pair of qubits. The players are now entangled and can exploit non-locality. Of course if their strategies ignore the qubits we get the same result as before:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; bell = (1/sqrt 2) .* qreturn (False, False) + (1/sqrt 2) .* qreturn (True, True)&lt;br /&gt;&amp;gt; play3 = collect $ game' aqstrategy bqstrategy bell&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now comes the surprising bit. The players can each look at the (classical) bit given to them by the game host. Depending what it is they rotate the qubit's state through some angle in state space. (In the case of the qubit being an electron state, this is an actual physical rotation of the electron.) In these strategies, the choice of move is the same as the state the qubit is left in after observation, hence the &lt;tt&gt;qreturn (b', b')&lt;/tt&gt; bit.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; aqstrategy' b False = rotate (0) b &amp;gt;&amp;gt;= \b' -&amp;gt; qreturn (b', b')&lt;br /&gt;&amp;gt; aqstrategy' b True  = rotate (pi/2) b &amp;gt;&amp;gt;= \b' -&amp;gt; qreturn (b', b')&lt;br /&gt;&lt;br /&gt;&amp;gt; bqstrategy' b False = rotate (pi/4) b &amp;gt;&amp;gt;= \b' -&amp;gt; qreturn (b', b')&lt;br /&gt;&amp;gt; bqstrategy' b True  = rotate (-pi/4) b &amp;gt;&amp;gt;= \b' -&amp;gt; qreturn (b', b')&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And now when we play, the probability of winning is greater than 3/4.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; play4 = collect $ game' aqstrategy' bqstrategy' bell&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;All of the 'communication' took place before the game started. A and B didn't communicate s and t to each other.  And yet they can beat the classical odds.&lt;br /&gt;&lt;br /&gt;So in conclusion: Quantum mechanics gives opportunities for collusion that are impossible classically. Sadly we don't yet know how to maintain the state of separated entangled qubits for extended periods of time. But I remember seeing recently that people are managing to maintain qubit states for nanoseconds.&lt;br /&gt;&lt;br /&gt;By the way, there are games where it is possible to achieve a 100% success rate with the help of quantum states. These give examples of what is known as &lt;a href="http://en.wikipedia.org/wiki/Quantum_pseudo-telepathy"&gt;quantum pseudo-telepathy&lt;/a&gt;. I presume the "pseudo" is because despite the 100% success rate, it still doesn't give a way to send messages instantly.&lt;br /&gt;&lt;br /&gt;A last thought from me: one reason why humans send messages is to allow them to coordinate strategies. But quantum game theory shows that we can coordinate strategies without sending messages. In other words, even though non-locality doesn't give us faster-than-light communication, it does allow us to do things that were previously thought to require FTL. I think this may have some profound consequences. &lt;br /&gt;&lt;br /&gt;And an example from a different domain: in biochemistry one could imagine remote parts of ligands coordinating the way they bind to receptors, something that would be completely missed by the kind of quasi-classical simulation I've seen biochemists use.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;My standard quantum mechanics code:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data W b a = W { runW :: [(a, b)] } deriving (Eq, Show, Ord)&lt;br /&gt;&lt;br /&gt;&amp;gt; mapW f (W l) = W $ map (\(a, b) -&amp;gt; (a, f b)) l&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Functor (W b) where&lt;br /&gt;&amp;gt;  fmap f (W a) = W $ map (\(a, p) -&amp;gt; (f a, p)) a&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Num b =&amp;gt; Monad (W b) where&lt;br /&gt;&amp;gt;  return x = W [(x, 1)]&lt;br /&gt;&amp;gt;  l &amp;gt;&amp;gt;= f = W $ concatMap (\(W d, p) -&amp;gt; map (\(x, q)-&amp;gt;(x, p*q)) d) (runW $ fmap f l)&lt;br /&gt;&lt;br /&gt;&amp;gt; a .* b = mapW (a*) b&lt;br /&gt;&lt;br /&gt;&amp;gt; instance (Eq a, Show a, Num b) =&amp;gt; Num (W b a) where&lt;br /&gt;&amp;gt;   W a + W b = W $ (a ++ b)&lt;br /&gt;&amp;gt;   a - b = a + (-1) .* b&lt;br /&gt;&amp;gt;   _ * _ = error "Num is annoying"&lt;br /&gt;&amp;gt;   abs _ = error "Num is annoying"&lt;br /&gt;&amp;gt;   signum _ = error "Num is annoying"&lt;br /&gt;&amp;gt;   fromInteger a = if a==0 then W [] else error "fromInteger can only take zero argument"&lt;br /&gt;&lt;br /&gt;&amp;gt; collect :: (Ord a, Num b) =&amp;gt; W b a -&amp;gt; W b a&lt;br /&gt;&amp;gt; collect = W . filter ((/= 0) . snd) . toList . fromListWith (+) . runW&lt;br /&gt;&lt;br /&gt;&amp;gt; type P a = W Double a&lt;br /&gt;&amp;gt; type Q a = W (Complex Double) a&lt;br /&gt;&lt;br /&gt;&amp;gt; a `xor` b = a/=b&lt;br /&gt;&lt;br /&gt;&amp;gt; rotate :: Double -&amp;gt; Bool -&amp;gt; Q Bool&lt;br /&gt;&amp;gt; rotate theta True = let theta' = theta :+ 0&lt;br /&gt;&amp;gt;   in cos (theta'/2) .* return True - sin (theta'/2) .* return False&lt;br /&gt;&amp;gt; rotate theta False = let theta' = theta :+ 0&lt;br /&gt;&amp;gt;   in cos (theta'/2) .* return False + sin (theta'/2) .* return True&lt;br /&gt;&lt;br /&gt;&amp;gt; observe :: Ord a =&amp;gt; Q a -&amp;gt; P a&lt;br /&gt;&amp;gt; observe = W . map (\(a, w) -&amp;gt; (a, magnitude (w*w))) . runW . collect&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Some help for the compiler (and maybe humans too):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; preturn = return :: a -&amp;gt; P a&lt;br /&gt;&amp;gt; qreturn = return :: a -&amp;gt; Q a&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-1106589231951497985?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/1106589231951497985/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=1106589231951497985' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/1106589231951497985'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/1106589231951497985'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/11/beating-odds-with-entangled-qubits.html' title='Beating the odds with entangled qubits'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-3062585063578105934</id><published>2010-11-09T07:26:00.000-08:00</published><updated>2010-11-09T07:26:37.456-08:00</updated><title type='text'>Statistical Fingertrees</title><content type='html'>I have no time to post a proper article. But I have time to post a mini-article with details in the links.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; {-# LANGUAGE MultiParamTypeClasses #-}&lt;br /&gt;&amp;gt; import Data.FingerTree&lt;br /&gt;&amp;gt; import Data.Monoid&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;I'm a bad person. I often use &lt;a href="http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Na.C3.AFve_algorithm"&gt;this&lt;/a&gt; method to compute the mean and variance of some samples. It's not robust. It performs badly if the variance is small compared to the size of the samples. I shouldn't use it. As penance, this is a quick article on dynamically computing variances robustly and efficiently  for datasets that are frequently manipulated.&lt;br /&gt;&lt;br /&gt;For convenience I'll talk about the unscaled variance: the sum, not the average of the square deviation from the mean. You can easily compute the variance from this if you know the dataset size.&lt;br /&gt;&lt;br /&gt;If we know the size, mean and unscaled variance of two sets (more properly, multisets) we can find the mean and unscaled variance of the their union using the formula &lt;a href="http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm"&gt;here&lt;/a&gt;. This method is much more robust that the naive algorithm.&lt;br /&gt;&lt;br /&gt;The rule for combining two datasets gives us a monoid:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Stats = Stats { n :: Float, mean :: Float, unscaledVariance :: Float }&lt;br /&gt;&amp;gt;              deriving Show&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Monoid Stats where&lt;br /&gt;&amp;gt;     mempty = Stats 0 0 undefined&lt;br /&gt;&amp;gt;     Stats n m v `mappend` Stats 0 _ _ = Stats n m v&lt;br /&gt;&amp;gt;     Stats 0 _ _ `mappend` Stats n m v = Stats n m v&lt;br /&gt;&amp;gt;     Stats n m v `mappend` Stats n' m' v' = &lt;br /&gt;&amp;gt;       let delta = m' - m&lt;br /&gt;&amp;gt;       in Stats (n + n') ((n*m+n'*m')/(n+n'))&lt;br /&gt;&amp;gt;                (v + v' + delta*delta*n*n'/(n+n'))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Given a single sample, we can compute its stats:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Measured Stats Float where&lt;br /&gt;&amp;gt;     measure x = Stats 1 x 0&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now we need just one more line of code:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type StatsTree = FingerTree Stats Float&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We now have a data structure that allows us to freely split, join, delete elements from and add elements to sequences of samples, all the while robustly keeping track of their mean and unscaled variance (and hence their variance).&lt;br /&gt;&lt;br /&gt;For example:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; example = fromList [1..10] :: StatsTree&lt;br /&gt;&amp;gt; test = let (_, b) = split ((&amp;gt;=4) . n) example&lt;br /&gt;&amp;gt;            (c, _) = split ((&amp;gt;3)  . n) b&lt;br /&gt;&amp;gt;        in measure c&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;computes the stats for the 3 elements starting at the 4th element of &lt;tt&gt;example&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;An example application might be maintaining rolling averages and variances for a sliding window.&lt;br /&gt;&lt;br /&gt;This was inspired an article by John D Cook somewhere around &lt;a href="http://www.johndcook.com/blog/"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-3062585063578105934?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/3062585063578105934/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=3062585063578105934' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3062585063578105934'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/3062585063578105934'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/11/statistical-fingertrees.html' title='Statistical Fingertrees'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-8804149847519007167</id><published>2010-09-18T13:07:00.000-07:00</published><updated>2010-09-19T15:45:24.292-07:00</updated><title type='text'>On Removing Singularities from Rational Functions</title><content type='html'>&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;Suppose we have the function&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; f x = 1/(x+x^2) - 1/(x+2*x^2)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Some basic algebraic manipulation shows that in the limit as x&amp;rarr;0, f(x)&amp;rarr;1. But we can't simply compute &lt;tt&gt;f 0&lt;/tt&gt; because this computation involves division by zero at intermediate stages. How can we automate the process of computing the limit without implementing symbolic algebra?&lt;br /&gt;&lt;br /&gt;I've already &lt;a href="http://blog.sigfpe.com/2008/05/desingularisation-and-its-applications.html"&gt;described&lt;/a&gt; one way to remove singularities from a function. But that approach is very limited in its applicability.&lt;br /&gt;&lt;br /&gt;This article is about a variation on the approach to &lt;a href="http://blog.sigfpe.com/2005/07/formal-power-series-and-haskell.html"&gt;formal power series&lt;/a&gt; that nicely &lt;a href="http://www.cs.dartmouth.edu/~doug/powser.html"&gt;showcases&lt;/a&gt; some advantages of lazy lists. It will allow us to form Laurent series of functions so we can keep track of the singularities.&lt;br /&gt;&lt;br /&gt;The usual Haskell approach to power series allows you to examine the coefficients of any term in the power series of the functions you can form. These series can't be used, however, to evaluate the function. Doing so requires summing an infinite series, but we can't do so reliably because no matter how many terms in a power series we add, we can never be sure that there aren't more large terms further downstream that we haven't reached yet. And if we want to perform computations completely over the rationals, say, we don't want to be dealing with infinite sums.&lt;br /&gt;&lt;br /&gt;I'd like to look at a way of working with power series that allows us to perform exact computations making it possible to answer questions like "what is the sum of all the terms in this power series starting with the x^n term?" By extending to Laurent series, and implementing the ability to selectively sum over just the terms with non-negative powers, we can compute functions like &lt;tt&gt;f&lt;/tt&gt; above at 0 and simply skip over the troublesome &lt;a href="poles"&gt;http://en.wikipedia.org/wiki/Pole_%28complex_analysis%29&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Power Series&lt;/b&gt;&lt;br /&gt;When I &lt;a href="http://blog.sigfpe.com/2005/07/formal-power-series-and-haskell.html"&gt;previously&lt;/a&gt; discussed power series I used code that worked with the coefficients of the power series. This time we want to work with values of the function so it makes sense to store, not the coefficients a&lt;sub&gt;i&lt;/sub&gt; but the terms themselves, a&lt;sub&gt;i&lt;/sub&gt;x&lt;sup&gt;i&lt;/sup&gt;. So instead of a list of coefficients, &lt;tt&gt;Num a =&amp;gt; [a]&lt;/tt&gt; we need a representation that looks a little like:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Power a = Power (a -&amp;gt; [a])&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;where we pass x in as an argument to the function contained in a &lt;tt&gt;Power&lt;/tt&gt;. But we also want to allow Laurent series so we need to also store an offset to say which (possibly negative) term our series starts with:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Laurent a = Laurent (a -&amp;gt; (Int, [a]))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;But this fails us for at least two reasons:&lt;br /&gt;&lt;br /&gt;1. We have the individual terms, but to evaluate the function requires summing all of the terms in an infinite list.&lt;br /&gt;2. If we have a Laurent series, then we need to store values of a&lt;sub&gt;i&lt;/sub&gt;x&lt;sup&gt;i&lt;/sup&gt; for x=0 and i&amp;lt;0. We'll end up with division by zero errors.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Partial Sum Series&lt;/b&gt;&lt;br /&gt;So here's what we'll do instead. Suppose our power series is &amp;Sigma;&lt;sub&gt;i=n&lt;/sub&gt;&lt;sup&gt;&amp;infin;&lt;/sup&gt;a&lt;sub&gt;i&lt;/sub&gt;x&lt;sup&gt;i&lt;/sup&gt;. We'll store the terms s&lt;sub&gt;j&lt;/sub&gt;=&amp;Sigma;&lt;sub&gt;i=j&lt;/sub&gt;&lt;sup&gt;&amp;infin;&lt;/sup&gt;a&lt;sub&gt;i&lt;/sub&gt;x&lt;sup&gt;i-j&lt;/sup&gt;. Our type will look like:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Partial a = Partial (a -&amp;gt; (Int, [a]))&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Eq (Partial a)&lt;br /&gt;&amp;gt; instance Show (Partial a)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;It's straightforward to add two functions in this form. We just add them term by term after first aligning them so that the x&lt;sup&gt;i&lt;/sup&gt; term in one is lined up with the x&lt;sup&gt;i&lt;/sup&gt; term in the other:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Num a =&amp;gt; Num (Partial a) where&lt;br /&gt;&lt;br /&gt;&amp;gt;  Partial f + Partial g = Partial $ \x -&amp;gt;&lt;br /&gt;&amp;gt;     let (m, xs) = f x&lt;br /&gt;&amp;gt;         (n, ys) = g x&lt;br /&gt;&amp;gt;         pad 0 _ ys = ys&lt;br /&gt;&amp;gt;         pad n x ys = let z:zs = pad (n-1) x ys&lt;br /&gt;&amp;gt;                      in x*z : z : zs&lt;br /&gt;&amp;gt;         l = min m n&lt;br /&gt;&amp;gt;     in (l, zipWith (+) (pad (m-l) x xs) (pad (n-l) x ys))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Notice the slight subtlety in the alignment routine &lt;tt&gt;pad&lt;/tt&gt;. By the definition above, the jth term has a factor of x&lt;sup&gt;j&lt;/sup&gt; built into it. So we need to multiply by x each time we pad our list on the left.&lt;br /&gt;&lt;br /&gt;Now we need to multiply series. We know from ordinary power series that we need some sort of convolution. But it looks like for this case we have an extra complication. We appear to need to difference our representation to get back the original terms, convolve, and then resum. Amazingly, we don't need to do this at all. We can convolve 'in place' so to speak.&lt;br /&gt;&lt;br /&gt;Here's what an ordinary convolution looks like when we want to multiply the sequence of terms (a&lt;sub&gt;i&lt;/sub&gt;) by (b&lt;sub&gt;i&lt;/sub&gt;):&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_UdKHLrHa05M/TIu6ntLuO5I/AAAAAAAAAkc/OA_K2bq6Q2I/s1600/grid1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_UdKHLrHa05M/TIu6ntLuO5I/AAAAAAAAAkc/OA_K2bq6Q2I/s320/grid1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;In this example, the blue diagonal corresponds to the terms that are summed to get the 4th term in the result.&lt;br /&gt;&lt;br /&gt;However, we wish to work with partial sums s&lt;sub&gt;j&lt;/sub&gt;=&amp;Sigma;&lt;sub&gt;i=j&lt;/sub&gt;&lt;sup&gt;&amp;infin;&lt;/sup&gt;a&lt;sub&gt;i&lt;/sub&gt;x&lt;sup&gt;i-j&lt;/sup&gt; and t&lt;sub&gt;j&lt;/sub&gt;=&amp;Sigma;&lt;sub&gt;i=j&lt;/sub&gt;&lt;sup&gt;&amp;infin;&lt;/sup&gt;t&lt;sub&gt;i&lt;/sub&gt;x&lt;sup&gt;i-j&lt;/sup&gt;, constructing the partial sums of the convolution of a and b from s and t. The partial sums of the convolution can be derived from the partial sums by tweaking the convolution so it looks like this:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_UdKHLrHa05M/TJTqSqSV99I/AAAAAAAAAk4/GxqXWHDgJ7A/s1600/grid2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_UdKHLrHa05M/TJTqSqSV99I/AAAAAAAAAk4/GxqXWHDgJ7A/s320/grid2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The blue terms work just like before and need to be summed. But we also need to subtract off the red terms, weighted by a factor of x. That's it! (I'll leave that as an exercise to prove. The &lt;a href="http://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle"&gt;inclusion-exclusion principle&lt;/a&gt; helps.)&lt;br /&gt;&lt;br /&gt;The neat thing is that the red terms for each sum are a subset of the blue terms needed for the next element. We don't need to perform two separate sums. We can share much of the computation between the red and blue terms. All we need to do is write an ordinary convolution routine that additionally returns not just the blue terms, but a pair containing the blue sum and the red sum.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;  Partial f * Partial g = Partial $ \x -&amp;gt;&lt;br /&gt;&amp;gt;     let (m, xs) = f x&lt;br /&gt;&amp;gt;         (n, ys) = g x&lt;br /&gt;&amp;gt;         (outer, inner) = convolve xs ys&lt;br /&gt;&amp;gt;         f' a b = a-x*b -- (the subtraction I mentioned above)&lt;br /&gt;&amp;gt;     in (m+n, zipWith f' outer inner)&lt;br /&gt;&amp;gt;  fromInteger n = let n' = fromInteger n in Partial $ \_ -&amp;gt; (0, n' : repeat 0)&lt;br /&gt;&lt;br /&gt;&amp;gt;  negate (Partial f) = Partial $ \x -&amp;gt; let (m, xs) = f x&lt;br /&gt;&amp;gt;                                 in (m, map negate xs)&lt;br /&gt;&amp;gt;  signum = error "signum not implemented"&lt;br /&gt;&amp;gt;  abs    = error "signum not implemented"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;This is an ordinary convolution routine tweaked to return the partial sum &lt;tt&gt;inner&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; convolve (a0:ar@(a1:as)) ~(b0:br@(b1:bs)) =&lt;br /&gt;&amp;gt;  let (inner, _) = convolve ar br&lt;br /&gt;&amp;gt;      ab = map (a0 *) bs&lt;br /&gt;&amp;gt;      ba = map (* b0) as&lt;br /&gt;&amp;gt;  in (a0*b0 : a0*b1+a1*b0&lt;br /&gt;&amp;gt;            : zipWith3 (\a b c -&amp;gt; a+b+c) inner ab ba, 0 : inner)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The code is very similar to the &lt;a href="http://www.cs.dartmouth.edu/~doug/powser.html"&gt;usual power series multiplication&lt;/a&gt; routine. We can also use the same method described by &lt;a href="http://www.cs.dartmouth.edu/~doug/powser.html"&gt;McIlroy&lt;/a&gt; to divide our series.&lt;br /&gt;&lt;br /&gt;As our series are a munged up version of the usual power series it's pretty surprising that it's possible to divide with so little code:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Fractional a =&amp;gt; Fractional (Partial a) where&lt;br /&gt;&amp;gt;   fromRational n = let n' = fromRational n in Partial $ \_ -&amp;gt; (0, n' : repeat 0)&lt;br /&gt;&amp;gt;   recip (Partial f) = Partial $ \x -&amp;gt;&lt;br /&gt;&amp;gt;      let nibble (n, x:xs) | x==0      = nibble (n+1, xs)&lt;br /&gt;&amp;gt;                           | otherwise = (n, (x:xs))&lt;br /&gt;&amp;gt;          (n, xs) = nibble (f x)&lt;br /&gt;&amp;gt;      in (-n, rconvolve x xs)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;In effect, &lt;tt&gt;rconvolve&lt;/tt&gt; solves the equation &lt;tt&gt;convolve a b==1&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; rconvolve x (a0:ar@(a1:as)) =&lt;br /&gt;&amp;gt;   let (outer, inner) = convolve ar result&lt;br /&gt;&amp;gt;       f a b = x*b-a&lt;br /&gt;&amp;gt;       r = -1/f a0 a1&lt;br /&gt;&amp;gt;       result = recip a0 : (map (r *) $ zipWith f outer inner)&lt;br /&gt;&amp;gt;   in result&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Note one ugly quirk of this code. I need to 'nibble' off leading zeroes from the series. This requires our underlying type &lt;tt&gt;a&lt;/tt&gt; to have computable equality. (In principle we can work around this using &lt;a href="http://conal.net/blog/posts/functional-concurrency-with-unambiguous-choice/"&gt;parallel or&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;That's it. We can now write a function to compute the positive part of a rational function. (By positive part, I mean all of the terms using non-negative powers of x.)&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; pos f z = let Partial g = f $ Partial $ \x -&amp;gt; (1, 1 : repeat 0)&lt;br /&gt;&amp;gt;               (n, xs) = g z&lt;br /&gt;&amp;gt;           in if n&amp;gt;0&lt;br /&gt;&amp;gt;               then z^n*head xs&lt;br /&gt;&amp;gt;               else xs!!(-n)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Here are some examples:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; test1 = let f x = (1+2*x)/(3-4*x*x)&lt;br /&gt;&amp;gt;         in pos (\x -&amp;gt; 1/(f x-f 0)/x) (0::Rational)&lt;br /&gt;&lt;br /&gt;&amp;gt; test2 = pos (\x -&amp;gt; 1/(1+4*x+3*x^2+x^3) - 1/(1+x)) (1::Rational)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;The original example I started with:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; test3 = pos (\x -&amp;gt; 1/(x+x^2) - 1/(x+2*x^2)) (0::Rational)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;No division by zero anywhere!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Conclusions&lt;/b&gt;&lt;br /&gt;The code works. But it does have limitations. As written it only supports rational functions. It's not hard to extend to square roots. (Try writing the code - it makes a nice exercise.) Unfortunately, any implementation of square root will (I think) require a division by x. This means that you'll be able to compute the positive part away from zero, but not at zero.&lt;br /&gt;&lt;br /&gt;This method can't be extended fully to transcendental functions. But it is possible to add partial support for them. In fact, So with a little work we can still compute the positive part of functions like &lt;tt&gt;1/sqrt(cos x-1)&lt;/tt&gt; away from x==0. But applying &lt;tt&gt;cos&lt;/tt&gt; to an arbitrary rational function may need more complex methods. I encourage you to experiment.&lt;br /&gt;&lt;br /&gt;Note that this code makes good use of laziness. If your function has no singularities then you might find it performs no computations beyond what is required to compute the ordinary numerical value.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-8804149847519007167?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/8804149847519007167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=8804149847519007167' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8804149847519007167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8804149847519007167'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/09/on-removing-singularities-from-rational.html' title='On Removing Singularities from Rational Functions'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_UdKHLrHa05M/TIu6ntLuO5I/AAAAAAAAAkc/OA_K2bq6Q2I/s72-c/grid1.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-8296075839582296279</id><published>2010-09-05T09:34:00.000-07:00</published><updated>2010-09-05T09:34:45.414-07:00</updated><title type='text'>Automatic even/odd splitting</title><content type='html'>&lt;b&gt;Statement of the Problem&lt;/b&gt;&lt;br /&gt;Suppose you have a real valued function on the reals, say f. We can split it into the sum of an even and odd part:&lt;br /&gt;&lt;blockquote&gt;f(x) = f&lt;sub&gt;odd&lt;/sub&gt;(x)+f&lt;sub&gt;even&lt;/sub&gt;(x)&lt;br /&gt;&lt;/blockquote&gt;where&lt;br /&gt;&lt;blockquote&gt;f&lt;sub&gt;odd&lt;/sub&gt;(x) = (f(x)-f(-x))/2, f&lt;sub&gt;even&lt;/sub&gt;(x) = (f(x)+f(-x))/2&lt;br /&gt;&lt;/blockquote&gt;If f&lt;sub&gt;odd&lt;/sub&gt; has a power series around zero, then all of its terms must have odd powers in x. So f&lt;sub&gt;odd&lt;/sub&gt;(x)/x must have all even powers and it becomes natural to 'compress' down the terms to form f'(x) = f&lt;sub&gt;odd&lt;/sub&gt;(sqrt(x))/sqrt(x). (Take that as a definition of f' for this article.) We can implement this operation as a higher order function:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; import Prelude hiding (odd)&lt;br /&gt;&lt;br /&gt;&amp;gt; odd f x = let s = sqrt x in 0.5*(f s - f (-s))/s&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Here's a simple example:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; f x = x*(1+x*(2+x*(3+7*x)))&lt;br /&gt;&lt;br /&gt;&amp;gt; test0 = odd f 3&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;But there's something not quite right about this. If &lt;tt&gt;f&lt;/tt&gt; is rational, then so is &lt;tt&gt;odd f&lt;/tt&gt;. But the implementation of &lt;tt&gt;odd&lt;/tt&gt; involves square roots. Among other things, square roots introduce inaccuracy. As square roots don't appear in the final result, can we eliminate them from the intermediate steps of the computation too?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Better Solution&lt;/b&gt;&lt;br /&gt;Let's use the results of last week's article to compute &lt;tt&gt;odd&lt;/tt&gt; another way. We want a linear function that maps as follows:&lt;br /&gt;&lt;br /&gt;&lt;TABLE&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;-&gt;&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;x&lt;/TD&gt;&lt;TD&gt;-&gt;&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;x&lt;sup&gt;2&lt;/sup&gt;&lt;/TD&gt;&lt;TD&gt;-&gt;&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;x&lt;sup&gt;3&lt;/sup&gt;&lt;/TD&gt;&lt;TD&gt;-&gt;&lt;/TD&gt;&lt;TD&gt;x&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;x&lt;sup&gt;2n&lt;/sup&gt;&lt;/TD&gt;&lt;TD&gt;-&gt;&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;x&lt;sup&gt;2n+1&lt;/sup&gt;&lt;/TD&gt;&lt;TD&gt;-&gt;&lt;/TD&gt;&lt;TD&gt;x&lt;sup&gt;n&lt;/sup&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TABLE&gt;&lt;br /&gt;Here's an automaton:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_UdKHLrHa05M/TIPAZwMN6SI/AAAAAAAAAkQ/yP_WsaMCW64/s1600/automaton.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_UdKHLrHa05M/TIPAZwMN6SI/AAAAAAAAAkQ/yP_WsaMCW64/s320/automaton.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;If we start at 0, end at 1, and take exactly n steps, then the product of the factors we collect up along the way is given by the second column of that table. In n is even there is no such path so we collect up 0. As we're working with polynomials over the reals, rather than types, we have x1=1x and so on. We can construct a transition matrix:&lt;br /&gt;&lt;br /&gt;&lt;TABLE&gt;&lt;TR&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;x&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;/TABLE&gt;&lt;br /&gt;We now do something like we did &lt;a href="http://blog.sigfpe.com/2010/08/constraining-types-with-regular.html"&gt;last time&lt;/a&gt;. Any time we have a function of some variable x, we replace x with the transition matrix. Our functions now take matrix values like&lt;br /&gt;&lt;br /&gt;&lt;TABLE&gt;&lt;TR&gt;&lt;TD&gt;a&lt;/TD&gt;&lt;TD&gt;b&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;c&lt;/TD&gt;&lt;TD&gt;d&lt;/TD&gt;&lt;/TR&gt;&lt;/TABLE&gt;&lt;br /&gt;Any polynomial of our transition matrix always gives us equal elements along the diagonal. This is true even if we form the inverse of the transition matrix. So we don't need to store d. So now we implement a simple matrix type needing only to store three elements instead of four:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data O a = O a a a deriving (Show, Eq)&lt;br /&gt;&lt;br /&gt;&amp;gt; instance Num a =&amp;gt; Num (O a) where&lt;br /&gt;&amp;gt;    O a b c + O a' b' c' = O (a+a') (b+b') (c+c')&lt;br /&gt;&amp;gt;    O a b c * O a' b' c' = O (a*a'+b*c') (a*b'+b*a') (c*a'+a*c')&lt;br /&gt;&amp;gt;    fromInteger n = let i = fromInteger n in O i 0 0&lt;br /&gt;&amp;gt;    negate (O a b c) = O (negate a) (negate b) (negate c)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Notice how similar this is to automatic differentiation. We can extend to reciprocals too:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Fractional a =&amp;gt; Fractional (O a) where&lt;br /&gt;&amp;gt;    fromRational n = let i = fromRational n in O i 0 0&lt;br /&gt;&amp;gt;    recip (O a b c) = let idet = recip (a*a-b*c) in O (idet*a) (idet*negate b) (idet*negate c)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And now we can implement a replacement for &lt;tt&gt;odd&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; transition x = O 0 1 x&lt;br /&gt;&amp;gt; odd' f x = let O _ fx _ = f (transition x) in fx&lt;br /&gt;&lt;br /&gt;&amp;gt; test1 = odd' f 3&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Another example:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; test3 = odd' (\x -&amp;gt; (x+3*x*x-1/x)/(x*x)) 2&lt;br /&gt;&amp;gt; test4 = odd  (\x -&amp;gt; (x+3*x*x-1/x)/(x*x)) 2&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;This new version has many advantages: it uses only rational functions, it's more accurate, and it's well defined at zero.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;br /&gt;Automatic differentiation is just one of a family of methods that can be used to compute a wide variety of functions of real-valued functions. Essentially we're just working over real-valued matrices instead of real numbers. By using automata we can simplify the process of working out which matrices to use. (Though for the simple example above, you may have been able to guess the matrix without any other help).&lt;br /&gt;&lt;br /&gt;(BTW I think there are hidden automata lurking in a few places in mathematics. For example, in &lt;a href="http://en.wikipedia.org/wiki/Umbral_calculus"&gt;Umbral calculus&lt;/a&gt;.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-8296075839582296279?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/8296075839582296279/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=8296075839582296279' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8296075839582296279'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8296075839582296279'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/09/automatic-evenodd-splitting.html' title='Automatic even/odd splitting'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_UdKHLrHa05M/TIPAZwMN6SI/AAAAAAAAAkQ/yP_WsaMCW64/s72-c/automaton.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-4864434678468834247</id><published>2010-08-14T17:31:00.000-07:00</published><updated>2010-08-14T21:06:47.277-07:00</updated><title type='text'>Constraining Types with Regular Expressions</title><content type='html'>&lt;b&gt;Structures with Constraints&lt;/b&gt;&lt;br /&gt;Here's a picture of an element of a binary tree type:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_UdKHLrHa05M/TGNjvNEaqvI/AAAAAAAAAi0/Aiyqonq9bnM/s1600/tree1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_UdKHLrHa05M/TGNjvNEaqvI/AAAAAAAAAi0/Aiyqonq9bnM/s320/tree1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The leaves correspond to elements and the letters indicate the type of those elements. If we read the leaves from left to right they match the regular expression A&lt;sup&gt;*&lt;/sup&gt;B&lt;sup&gt;*&lt;/sup&gt;. We can define a binary type whose leaves always match this string by making use of &lt;a href="http://www.cs.nott.ac.uk/~ctm/Dissect.pdf"&gt;dissections&lt;/a&gt;. Similarly we can construct all kinds of other structures, such as lists and more general trees, whose leaves match this expression.&lt;br /&gt;&lt;br /&gt;But can we make structures that match other regular expressions? Here are some examples:&lt;br /&gt;&lt;br /&gt;1. Structures that match the expression A&lt;sup&gt;n&lt;/sup&gt; for some fixed n. These are structures, like trees, that have a fixed size. For the case n=5 we need a type that includes elements like:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_UdKHLrHa05M/TGNnLmwtd_I/AAAAAAAAAjA/Lx-d5u1dZ1A/s1600/tree2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_UdKHLrHa05M/TGNnLmwtd_I/AAAAAAAAAjA/Lx-d5u1dZ1A/s320/tree2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;2. Structures that match (AA)&lt;sup&gt;*&lt;/sup&gt;. These are structures like trees with an even number of elements. This is easy for lists. We can just use a list of pairs. But it's more complex for trees because subtrees may have odd or even size even though the overall structure is even in size. We could also generalise to ((AA)&lt;sup&gt;n&lt;/sup&gt;)&lt;sup&gt;*&lt;/sup&gt; for some fixed n.&lt;br /&gt;&lt;br /&gt;3. Structures that match A&lt;sup&gt;*&lt;/sup&gt;1A&lt;sup&gt;*&lt;/sup&gt;. Here's an example:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_UdKHLrHa05M/TGNoVxkRu6I/AAAAAAAAAjM/fzi3B6q6gqk/s1600/tree3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_UdKHLrHa05M/TGNoVxkRu6I/AAAAAAAAAjM/fzi3B6q6gqk/s320/tree3.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The element marked 1 is of unit type &lt;tt&gt;()&lt;/tt&gt;. It's a 'hole'. So this regular expression is a &lt;a href="http://www.cs.nott.ac.uk/~ctm/diff.pdf"&gt;derivative&lt;/a&gt; type. We can also construct a variety of types like those matching A&lt;sup&gt;*&lt;/sup&gt;BA&lt;sup&gt;*&lt;/sup&gt;CB&lt;sup&gt;*&lt;/sup&gt; by using a mixture of dissection and differentiation:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_UdKHLrHa05M/TGNq80CU5FI/AAAAAAAAAjY/mw4Nou3EaL8/s1600/tree4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_UdKHLrHa05M/TGNq80CU5FI/AAAAAAAAAjY/mw4Nou3EaL8/s320/tree4.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;4. A &lt;a href="http://www.cse.unsw.edu.au/~dons/fps.html"&gt;bytestring&lt;/a&gt;- or rope-like type used to represent a string that is statically guaranteed to match a given regular expression.&lt;br /&gt;&lt;br /&gt;5. Many kinds of other constraints you could imagine on a datastructure. Like trees which are guaranteed not to have two neighbouring leaves of the same type, or whose sequence of leaves never contain a certain subsequence.&lt;br /&gt;&lt;br /&gt;So the challenge is this: can we implement a uniform way of building any container type to match any regular expression we want?&lt;br /&gt;&lt;br /&gt;We'll need some extensions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; {-# OPTIONS_GHC -fwarn-incomplete-patterns #-}&lt;br /&gt;&amp;gt; {-# LANGUAGE TypeFamilies, EmptyDataDecls, UndecidableInstances, TypeOperators #-}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;b&gt;First Example: Differentiating Lists&lt;/b&gt;&lt;br /&gt;Let l(x) be the type of lists of elements of type x. We can define a list type through&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;data List x = Nil | Cons x (List x)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Algebraically we can write this as&lt;br /&gt;&lt;blockquote&gt;l(x) = 1+x l(x)&lt;br /&gt;&lt;/blockquote&gt;Let's try approaching the derivative of a list from the perspective of regular expressions. We know from &lt;a href="http://en.wikipedia.org/wiki/Deterministic_finite_automata"&gt;Kleene's Theorem&lt;/a&gt; that the set of strings (ie. the language) matching a regular expression is precisely the language accepted by a finite state automaton. Let's consider the case x&lt;sup&gt;*&lt;/sup&gt;1x&lt;sup&gt;*&lt;/sup&gt;. This is the language we get by considering all paths from state 0 to state 1 in the following automaton:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_UdKHLrHa05M/TGcaDcP8oMI/AAAAAAAAAkE/RFwDCHq5oDg/s1600/automaton1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_UdKHLrHa05M/TGcaDcP8oMI/AAAAAAAAAkE/RFwDCHq5oDg/s320/automaton1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Note that I have overloaded the numeral '1' to mean both the type with one element (when labelling an edge) and to mean the state numbered 1 (when labelling a vertex).&lt;br /&gt;&lt;br /&gt;Define the L&lt;sub&gt;ij&lt;/sub&gt;(x) to be the type of lists whose sequence of elements correspond to all possible paths from state i to state j. A list in L&lt;sub&gt;ij&lt;/sub&gt;(x) is either an empty list, or constructed from a first element and a list. If we combine an element and a list, the combination has to match a possible path from i to j. There are a number of ways we could do this. The element could correspond to a transition from i to k. But if this is the case, then the remainder of the list must correspond to a path from k to j. So we must replace &lt;tt&gt;Cons&lt;/tt&gt; with something that for its first argument takes a type corresponding to a single automaton step from i to k. For its second argument it must take an element of L&lt;sub&gt;kj&lt;/sub&gt;(x). The set of paths from i to j that correspond to a single step is the (i,j)-th element of the transition matrix for the automaton. This is X&lt;sub&gt;ij&lt;/sub&gt;(x) where&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;X(x) = (X&lt;sub&gt;ij&lt;/sub&gt;(x)) =&lt;/TD&gt;&lt;td&gt;(x&lt;/td&gt;&lt;td&gt;1)&lt;/TD&gt;&lt;tr&gt;&lt;td&gt;                            &lt;/TD&gt;&lt;td&gt;(0&lt;/td&gt;&lt;td&gt;x)&lt;/TD&gt; &lt;/table&gt;&lt;/blockquote&gt;&lt;br /&gt;On the other hand, we also need to replace &lt;tt&gt;Nil&lt;/tt&gt; with a version that respects transitions too. As &lt;tt&gt;Nil&lt;/tt&gt; takes no arguments, it must correspond to paths of length zero in the automaton. The only such paths are zero length paths from a state to itself. So the matrix for such paths is:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;I =&lt;/TD&gt;&lt;td&gt;(1&lt;/TD&gt;&lt;td&gt;0)&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;   &lt;/TD&gt;&lt;td&gt;(0&lt;/TD&gt;&lt;td&gt;1)&lt;/TD&gt;&lt;/TR&gt;&lt;/TABLE&gt;&lt;/blockquote&gt;&lt;br /&gt;Let's also define the matrix L(x)= (L&lt;sub&gt;ij&lt;/sub&gt;(x)).&lt;br /&gt;&lt;br /&gt;The words above boil down to:&lt;br /&gt;&lt;blockquote&gt;L&lt;sub&gt;ij&lt;/sub&gt;(x) = I&lt;sub&gt;ij&lt;/sub&gt;+&amp;Sigma;&lt;sub&gt;k&lt;/sub&gt; X&lt;sub&gt;ik&lt;/sub&gt;(x)L&lt;sub&gt;kj&lt;/sub&gt;(x)&lt;br /&gt;&lt;/blockquote&gt;where the sum is over all the places we might visit on the first step of the journey from i to j.&lt;br /&gt;&lt;br /&gt;We can rewrite this using standard matrix notation:&lt;br /&gt;&lt;blockquote&gt;L(x) = I + X(x)L(x)&lt;br /&gt;&lt;/blockquote&gt;Compare with the definition of ordinary lists given above. We get the type of constrained lists by taking the original definition of a list and replacing everything with matrices. We replace 1 with I. We replace x with the transition matrix of the automaton. And we replace the structure we're trying to define with a family of types - one for each pair of start and end states for the automaton. We can describe this replacement more formally: it's a homomorphism from the set of types to the set of matrices of types. (Actually, it's a bit more subtle than that. This isn't quite the usual semiring of types. For one thing, the order of multiplication matters.) And it doesn't just apply to lists. We can apply this rule to any container type. For example, suppose we wish to repeat the above for trees. Then we know that for ordinary binary trees, t(x), we have&lt;br /&gt;&lt;blockquote&gt;t(x) = x+t(x)&lt;sup&gt;2&lt;/sup&gt;&lt;br /&gt;&lt;/blockquote&gt;We replace this with&lt;br /&gt;&lt;blockquote&gt;T(x) = X(x)+T(x)&lt;sup&gt;2&lt;/sup&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;We started this section by considering the specific case of the pattern x&lt;sup&gt;*&lt;/sup&gt;1x&lt;sup&gt;*&lt;/sup&gt; with a corresponding matrix X(x). Because X&lt;sub&gt;10&lt;/sub&gt;=0 and X&lt;sub&gt;00&lt;/sub&gt;=X&lt;sub&gt;11&lt;/sub&gt; it's not hard to see that any type T we constrain using this regular expression will also have similar 'shape', ie.  T&lt;sub&gt;10&lt;/sub&gt;=0 and T&lt;sub&gt;00&lt;/sub&gt;=T&lt;sub&gt;11&lt;/sub&gt;. So we can write&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;T =&lt;/td&gt;&lt;td&gt;(t(x)&lt;/TD&gt;&lt;td&gt;t'(x))&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;   &lt;/td&gt;&lt;td&gt;(0   &lt;/TD&gt;&lt;td&gt;t(x))&lt;/TD&gt;&lt;/TR&gt;&lt;/table&gt;&lt;/blockquote&gt;&lt;br /&gt;where by definition, t(x)=T&lt;sub&gt;00&lt;/sub&gt;(x) and t'(x)=T&lt;sub&gt;01&lt;/sub&gt;(x). Suppose we have two such collections of types, (S&lt;sub&gt;ij&lt;/sub&gt;) and (T&lt;sub&gt;ij&lt;/sub&gt;). Now consider the types of pairs where the first element is of type S&lt;sub&gt;ij&lt;/sub&gt; and the second of T&lt;sub&gt;jk&lt;/sub&gt;. Then the leaves of the pair structure correspond to a path from i to k. So we have&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;(st(x)&lt;/TD&gt;&lt;td&gt;(st)'(x))&lt;/TD&gt;&lt;td&gt;=&lt;/TD&gt;&lt;td&gt;(s(x)&lt;/TD&gt;&lt;td&gt;s'(x))&lt;/TD&gt;&lt;td&gt;(t(x)&lt;/TD&gt;&lt;td&gt;t'(x))&lt;/TD&gt;&lt;tr&gt;&lt;tr&gt;&lt;td&gt;(0    &lt;/TD&gt;&lt;td&gt;st(x))   &lt;/TD&gt;&lt;td&gt; &lt;/TD&gt;&lt;td&gt;(0   &lt;/TD&gt;&lt;td&gt;s(x)) &lt;/TD&gt;&lt;td&gt;(0   &lt;/TD&gt;&lt;td&gt;t(x))&lt;/TD&gt;&lt;tr&gt; &lt;/TABLE&gt;&lt;/blockquote&gt;&lt;br /&gt;Multiply out and we find that&lt;br /&gt;&lt;blockquote&gt;(st)'(x) = s(x)t'(x)+s'(x)t(x)&lt;br /&gt;&lt;/blockquote&gt;In other words, the usual Leibniz rule for differentiation is nothing more than a statement about transitions for the automaton I drew above. To get a transition 0&amp;rarr;1 you either go 0&amp;rarr;0&amp;rarr;1 or 0&amp;rarr;1&amp;rarr;1.&lt;br /&gt;&lt;br /&gt;Although I talked specifically about differentiation, much of what I said above applies for any finite state automaton whose edges are labelled by types. The best thing now is probably to put together some code to see how this all looks.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Specification&lt;/b&gt;&lt;br /&gt;If you haven't checked &lt;a href="http://byorgey.wordpress.com/2010/08/12/on-a-problem-of-sigfpe/"&gt;Brent Yorgey&lt;/a&gt;'s solution to my problem last week, now is a good opportunity. My code is a generalisation of that but it may be helpful to look at Brent's specialisation first.&lt;br /&gt;&lt;br /&gt;The goal is to be able to define a transition matrix like this. ('K' is an abbreviation for 'constant' matrix.)&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type D a = K22 (a,    ())&lt;br /&gt;&amp;gt;                (Void, a )&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And then define a functor like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type ListF = I :+: (X :*: Y)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Think of &lt;tt&gt;ListF&lt;/tt&gt; being a bifunctor taking arguments &lt;tt&gt;X&lt;/tt&gt; and &lt;tt&gt;Y&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;We'd like then to able to form the matrix of fixed points of &lt;tt&gt;Y = ListF X Y&lt;/tt&gt;. In this case, ordinary lists should appear as the element at position (0,0) in the matrix:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type List x = Fix (I0, I0) (D x) ListF&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;I'm using &lt;tt&gt;In&lt;/tt&gt; to represent the integer n at the type level.&lt;br /&gt;&lt;br /&gt;Derivatives of lists should appear at (0,1) so we want&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type List' x = Fix (I0, I1) (D x) ListF&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;But &lt;tt&gt;Fix&lt;/tt&gt; is intended to be completely generic. So it needs to be defined in a way that also works for trees:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type TreeF = X :+: (Y :*: Y)&lt;br /&gt;&amp;gt; type Tree  = Fix (I0, I0) (D Int) TreeF&lt;br /&gt;&amp;gt; type Tree' = Fix (I0, I1) (D Int) TreeF&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And of course it needs to work with other transition matrices. For example x&lt;sup&gt;*&lt;/sup&gt;1y&lt;sup&gt;*&lt;/sup&gt;1z&lt;sup&gt;*&lt;/sup&gt; has the following transition diagram and matrix:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type E x y z = K33 (x,    (),   Void)&lt;br /&gt;&amp;gt;                    (Void, y,    ()  )&lt;br /&gt;&amp;gt;                    (Void, Void, z   )&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;So we'd expect&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type DDTree x y z = Fix (I0, I1) (E x y z) TreeF&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;to define the second divided difference of trees.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Implementation&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;We'll need a type with no elements:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Void&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And some type level integers:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data Zero&lt;br /&gt;&amp;gt; data S a&lt;br /&gt;&amp;gt; type I0 = Zero&lt;br /&gt;&amp;gt; type I1 = S I0&lt;br /&gt;&amp;gt; type I2 = S I1&lt;br /&gt;&amp;gt; type I3 = S I2&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now we'll need some type-level matrices. For any square matrix, we need a type-level function to give its dimension and another to access its (i, j)-th element:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type family Dim x :: *&lt;br /&gt;&amp;gt; type family (:!) x ij :: *&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;(Thanks to &lt;a href="http://byorgey.wordpress.com/2010/07/06/typed-type-level-programming-in-haskell-part-ii-type-families/"&gt;Brent&lt;/a&gt;'s tutorial that code is much better than how it used to look.)&lt;br /&gt;&lt;br /&gt;We can now define matrix addition through pointwise addition of the elements:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data (:+) m n&lt;br /&gt;&amp;gt; type instance Dim (m :+ n) = Dim m&lt;br /&gt;&amp;gt; type instance (m :+ n) :! ij = Either (m :! ij) (n :! ij)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And similarly we can define multiplication. I'm using the type-level function &lt;tt&gt;Product'&lt;/tt&gt; to perform the loop required in the definition of matrix multiplication:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data (:*) m n&lt;br /&gt;&amp;gt; type instance Dim (m :* n) = Dim m&lt;br /&gt;&amp;gt; type instance (m :* n) :! ij = Product' I0 (Dim m) m n :! ij&lt;br /&gt;&lt;br /&gt;&amp;gt; data Product' i k m n&lt;br /&gt;&amp;gt; type instance Product' p I1 m n :! (i, j) = (m :! (i, p), n :! (p, j))&lt;br /&gt;&amp;gt; type instance Product' p (S (S c)) m n :! (i, j) = Either&lt;br /&gt;&amp;gt;    (m :! (i, p), n :! (p, j))&lt;br /&gt;&amp;gt;    (Product' (S p) (S c) m n :! (i, j))&lt;br /&gt;&amp;gt; &lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;(Weird seeing all that familiar matrix multiplication code at the type level.)&lt;br /&gt;&lt;br /&gt;Now we need some types to represent our functors:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data I&lt;br /&gt;&amp;gt; data X&lt;br /&gt;&amp;gt; data Y&lt;br /&gt;&amp;gt; data K n&lt;br /&gt;&amp;gt; data (f :+: g)&lt;br /&gt;&amp;gt; data (f :*: g)&lt;br /&gt;&amp;gt; data F m f&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;(I think phantom empty data types should be called ethereal types.)&lt;br /&gt;&lt;br /&gt;To turn these into usable types we need to implement the homomorphism I described above. So here are the rules laid out formally:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type Id = K22 ((),   Void)&lt;br /&gt;&amp;gt;               (Void, ())&lt;br /&gt;&lt;br /&gt;&amp;gt; type family   Hom self m f :: *&lt;br /&gt;&amp;gt; type instance Hom self m I = Id&lt;br /&gt;&amp;gt; type instance Hom self m X = m&lt;br /&gt;&amp;gt; type instance Hom self m Y = self&lt;br /&gt;&amp;gt; type instance Hom self m (K n) = n&lt;br /&gt;&amp;gt; type instance Hom self m (f :+: g) = Hom self m f :+ Hom self m g&lt;br /&gt;&amp;gt; type instance Hom self m (f :*: g) = Hom self m f :* Hom self m g&lt;br /&gt;&amp;gt; type instance Dim (F m f) = Dim m&lt;br /&gt;&lt;br /&gt;&amp;gt; data Fix ij m f = Fix (Hom (F m f) m f :! ij)&lt;br /&gt;&amp;gt; type instance (:!) (F m f) ij = Fix ij m f&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;That's more or less it. We can now go ahead and try to construct some elements. We could (as Brent suggests) write some smart constructors to make our life easier. But for now I'm writing everything explicitly so you can see what's going on:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; x0 = Fix (Left ())                         :: List Int&lt;br /&gt;&amp;gt; x1 = Fix (Right (Left (1, Fix (Left ())))) :: List Int&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;tt&gt;x0&lt;/tt&gt; is the empty list. &lt;tt&gt;x1&lt;/tt&gt; is the list &lt;tt&gt;[1]&lt;/tt&gt;. The &lt;tt&gt;Left&lt;/tt&gt; and &lt;tt&gt;Right&lt;/tt&gt; get a bit tedious to write. But this is intended as a proof that the concept works rather than a user-friendly API.&lt;br /&gt;&lt;br /&gt;We can explicitly implement the isomorphism with the more familiar list type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; iso1 :: [x] -&amp;gt; List x&lt;br /&gt;&amp;gt; iso1 []     = Fix (Left ())&lt;br /&gt;&amp;gt; iso1 (a:as) = Fix (Right (Left (a, iso1 as)))&lt;br /&gt;&lt;br /&gt;&amp;gt; iso1' :: List x -&amp;gt; [x]&lt;br /&gt;&amp;gt; iso1' (Fix (Left ()))              = []&lt;br /&gt;&amp;gt; iso1' (Fix (Right (Left (a, as)))) = a : iso1' as&lt;br /&gt;&amp;gt; iso1' (Fix (Right (Right a)))      = error "Can't be called as a is void"&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;So that's it! If we can write our container as the fixed point of a polynmomial functor, and if we can convert our regular expression to a finite state automaton, then &lt;tt&gt;Fix&lt;/tt&gt; completely automatically builds the constrained container type.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What have we learnt?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;We haven't just solved the original problem. We've shown that derivatives and dissections are special cases of a more general operation. Take a look at the definition of &lt;tt&gt;D x&lt;/tt&gt; again. we can think of it as xI+Delta where Delta is the matrix&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; type Delta = K22 (Void, ()  )&lt;br /&gt;&amp;gt;                  (Void, Void)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;This matrix has the property that its square is zero. It's the 'infinitesimal type' I described &lt;a href="http://blog.sigfpe.com/2006/09/infinitesimal-types.html"&gt;here&lt;/a&gt;. In other words, this is type-level automatic differentiation. We've also been doing type-level automatic &lt;a href="http://blog.sigfpe.com/2010/07/automatic-divided-differences.html"&gt;divided differencing&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;We can now go back and look at the matrix form of &lt;a href="http://en.wikipedia.org/wiki/Divided_differences#Matrix_form"&gt;divided differences&lt;/a&gt; on wikipedia. I hope you can now see that the matrix T&lt;sub&gt;id&lt;/sub&gt;(x&lt;sub&gt;0&lt;/sub&gt;,...,x&lt;sub&gt;n-1&lt;/sub&gt;) defined there is nothing other than a transition matrix for this automaton:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_UdKHLrHa05M/TGcYq2E34fI/AAAAAAAAAj4/KCiy_drv9eQ/s1600/automaton2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_UdKHLrHa05M/TGcYq2E34fI/AAAAAAAAAj4/KCiy_drv9eQ/s320/automaton2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;In fact, we can use what we've learnt about regular expressions here to solve some numerical problems. But I won't write about that until the next article.&lt;br /&gt;&lt;br /&gt;By the way, I think what I've described here can be viewed as an application of what Backhouse talks about in &lt;a href="http://www.cs.nott.ac.uk/~rcb/talks/NWPT2002.pdf"&gt;these slides&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I think that for any automaton we have a 2-category. The 0-cells are states, the 1-cells are the types associated with paths from one state to another, and the 2-cells are functions between types that respect the constraint. I haven't worked out the details however. The 2-category structure is probably important. As things stand, I've just shown how to make the types. But we don't yet have an easy way to write functions that respect these constraints. I suspect 2-categories give a language to talk about these things. But that's just speculation right now.&lt;br /&gt;&lt;br /&gt;By the way, I couldn't write a working &lt;tt&gt;Show&lt;/tt&gt; instance for &lt;tt&gt;Fix&lt;/tt&gt;. Can you write one? And an implementation of &lt;tt&gt;arbitrary&lt;/tt&gt; for QuickCheck?&lt;br /&gt;&lt;br /&gt;And I hope you can now solve my problem from &lt;a href="http://blog.sigfpe.com/2010/08/divided-differences-and-tomography-of.html"&gt;last week&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Leftover bits of code&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;K22&lt;/tt&gt; and &lt;tt&gt;K33&lt;/tt&gt; are constructors for 2&amp;times;2 and 3&amp;times;3 matrices. It would probably have been better to have used lists like Brent did.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data K22 row0 row1&lt;br /&gt;&amp;gt; type instance Dim (K22 row0 row1) = I2&lt;br /&gt;&amp;gt; &lt;br /&gt;&amp;gt; type instance (:!) (K22 (m00, m01) row1) (I0, I0) = m00&lt;br /&gt;&amp;gt; type instance (:!) (K22 (m00, m01) row1) (I0, I1) = m01&lt;br /&gt;&amp;gt; type instance (:!) (K22 row0 (m10, m11)) (I1, I0) = m10&lt;br /&gt;&amp;gt; type instance (:!) (K22 row0 (m10, m11)) (I1, I1) = m11&lt;br /&gt;&amp;gt; &lt;br /&gt;&amp;gt; data K33 row0 row1 row2&lt;br /&gt;&amp;gt; type instance Dim (K33 row0 row1 row2) = I3&lt;br /&gt;&amp;gt; &lt;br /&gt;&amp;gt; type instance (:!) (K33 (m00, m01, m02) row1 row2) (I0, I0) = m00&lt;br /&gt;&amp;gt; type instance (:!) (K33 (m00, m01, m02) row1 row2) (I0, I1) = m01&lt;br /&gt;&amp;gt; type instance (:!) (K33 (m00, m01, m02) row1 row2) (I0, I2) = m02&lt;br /&gt;&amp;gt; type instance (:!) (K33 row0 (m10, m11, m12) row2) (I1, I0) = m10&lt;br /&gt;&amp;gt; type instance (:!) (K33 row0 (m10, m11, m12) row2) (I1, I1) = m11&lt;br /&gt;&amp;gt; type instance (:!) (K33 row0 (m10, m11, m12) row2) (I1, I2) = m12&lt;br /&gt;&amp;gt; type instance (:!) (K33 row0 row1 (m20, m21, m22)) (I2, I0) = m20&lt;br /&gt;&amp;gt; type instance (:!) (K33 row0 row1 (m20, m21, m22)) (I2, I1) = m21&lt;br /&gt;&amp;gt; type instance (:!) (K33 row0 row1 (m20, m21, m22)) (I2, I2) = m22&lt;br /&gt;&lt;/pre&gt;Update: I neglected to mention that there is a bit of subtlety with the issue of being able to create the same string by different walks through the automaton. I'll leave that as an exercise :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-4864434678468834247?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/4864434678468834247/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=4864434678468834247' title='14 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4864434678468834247'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/4864434678468834247'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/08/constraining-types-with-regular.html' title='Constraining Types with Regular Expressions'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_UdKHLrHa05M/TGNjvNEaqvI/AAAAAAAAAi0/Aiyqonq9bnM/s72-c/tree1.png' height='72' width='72'/><thr:total>14</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-6122287169525245825</id><published>2010-08-07T15:52:00.000-07:00</published><updated>2010-08-07T15:52:27.983-07:00</updated><title type='text'>Divided Differences and the Tomography of Types</title><content type='html'>&lt;b&gt;Health Warning&lt;/b&gt;&lt;br /&gt;This article assumes knowledge of &lt;a href="http://www.cs.nott.ac.uk/~ctm/publications.html"&gt;Conor McBride's work&lt;/a&gt; on the differentiation and dissection of types, even if the early parts look deceptively simple.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Tables of Divided Differences&lt;/b&gt;&lt;br /&gt;Given a sequence of numbers like 2, 6, 12, 20, 30 we can use the well known method of finite differences to predict the next in the series. We write the numbers in a column. In the next column we write the differences between the numbers in the previous column and iterate like so:&lt;br /&gt;&lt;br /&gt;&lt;center&gt;&lt;br /&gt;&lt;table border="0" bgcolor="#c0c0ff"&gt;&lt;colgroup span="3" width="40"&gt;&lt;tr&gt;&lt;td&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;4&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;6&lt;/TD&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;6&lt;/TD&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;12&lt;/TD&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;8&lt;/TD&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;20&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;10&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;30&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;On the assumption that the rightmost column is zero all the way down we can extend this table to:&lt;br /&gt;&lt;br /&gt;&lt;center&gt;&lt;br /&gt;&lt;table border="0" bgcolor="#c0c0ff"&gt;&lt;colgroup span="3" width="40"&gt;&lt;tr&gt;&lt;td&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;4&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;6&lt;/TD&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;6&lt;/TD&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;12&lt;/TD&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;8&lt;/TD&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td&gt;20&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;10&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td bgcolor="#8080ff"&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;30&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td bgcolor="#8080ff"&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td bgcolor="#8080ff"&gt;12&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#8080ff"&gt;42&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;We can think of this as an exercise in polynomial fitting. We have some x-values, in this case: x&lt;sub&gt;0&lt;/sub&gt;=1, x&lt;sub&gt;2&lt;/sub&gt;=2,..., x&lt;sub&gt;4&lt;/sub&gt;=5, and some y-values: y&lt;sub&gt;0&lt;/sub&gt;=2, y&lt;sub&gt;1&lt;/sub&gt;=6,..., y&lt;sub&gt;4&lt;/sub&gt;=30. We hope to fit a polynomial f so that f(x&lt;sub&gt;i&lt;/sub&gt;)=y&lt;sub&gt;i&lt;/sub&gt;. The table of finite differences has the property that the (n+1)-th column becomes constant if a degree n polynomial fits the data.&lt;br /&gt;&lt;br /&gt;But what happens if we want to fit data to x-values that aren't equally spaced? Then we can use the *divided* difference table instead. In this case we don't record the differences between the y-values, but the quotient between the y-differences and the x-differences. For x&lt;sub&gt;0,1,2,3&lt;/sub&gt; = 1, 2, 4, 6 and y&lt;sub&gt;0,1,2,3&lt;/sub&gt; = 3, 6, 18, 138 we get:&lt;br /&gt;&lt;br /&gt;&lt;center&gt;&lt;br /&gt;&lt;table border="0" bgcolor="#c0c0ff"&gt;&lt;colgroup span="1" width="40"&gt;&lt;colgroup span="1" width="40"&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt;1&lt;/td&gt;&lt;td&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt; &lt;/td&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;(6-3)/(2-1)=3&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt;2&lt;/td&gt;&lt;td&gt;6&lt;/TD&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;(6-3)/(4-1)=1&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt; &lt;/td&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;(18-6)/(4-2)=6&lt;/TD&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;(1-1)/(6-1)=0&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt;4&lt;/td&gt;&lt;td&gt;18&lt;/TD&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;(10-6)/(6-2)=1&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt; &lt;/td&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;(38-18)/(6-4)=10&lt;/TD&gt;&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt;6&lt;/td&gt;&lt;td&gt;38&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;Again we reach zero. This is because y&lt;sub&gt;i&lt;/sub&gt; = f(x&lt;sub&gt;i&lt;/sub&gt;) = x&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;2&lt;/sup&gt;+2 so we have a quadratic again.&lt;br /&gt;&lt;br /&gt;Let's assume y&lt;sub&gt;i&lt;/sub&gt; = f(x&lt;sub&gt;i&lt;/sub&gt;) for some set of points. We'll leave x, y and f unknown and fill out the table:&lt;br /&gt;&lt;br /&gt;&lt;center&gt;&lt;br /&gt;&lt;table border="0" bgcolor="#c0c0ff" width="800"&gt;&lt;colgroup span="1" width="40"&gt;&lt;colgroup span="1" width="40"&gt;&lt;colgroup span="1" width="180"&gt;&lt;colgroup span="1" width="180"&gt;&lt;colgroup span="1" width="180"&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt;x&lt;sub&gt;0&lt;/sub&gt;&lt;/td&gt;&lt;td&gt;f(x&lt;sub&gt;0&lt;/sub&gt;)&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt; &lt;/td&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;(f(x&lt;sub&gt;1&lt;/sub&gt;)-f(x&lt;sub&gt;0&lt;/sub&gt;))/(x&lt;sub&gt;1&lt;/sub&gt;-x&lt;sub&gt;0&lt;/sub&gt;)=&lt;br&gt;f[x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;]&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt;x&lt;sub&gt;1&lt;/sub&gt;&lt;/td&gt;&lt;td&gt;f(x&lt;sub&gt;1&lt;/sub&gt;)&lt;/TD&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;(f[x&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;]-f[x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;])/(x&lt;sub&gt;2&lt;/sub&gt;-x&lt;sub&gt;0&lt;/sub&gt;)=&lt;br&gt;f[x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;]&lt;/TD&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt; &lt;/td&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;(f(x&lt;sub&gt;2&lt;/sub&gt;)-f(x&lt;sub&gt;2&lt;/sub&gt;))/(x&lt;sub&gt;2&lt;/sub&gt;-x&lt;sub&gt;1&lt;/sub&gt;)=&lt;br&gt;f[x&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;]&lt;/TD&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;(f[x&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;,x&lt;sub&gt;3&lt;/sub&gt;]-f[x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;])/(x&lt;sub&gt;3&lt;/sub&gt;-x&lt;sub&gt;0&lt;/sub&gt;)=&lt;br&gt;f[x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;,x&lt;sub&gt;3&lt;/sub&gt;]&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt;x&lt;sub&gt;2&lt;/sub&gt;&lt;/td&gt;&lt;td&gt;f(x&lt;sub&gt;2&lt;/sub&gt;)&lt;/TD&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;(f[x&lt;sub&gt;2&lt;/sub&gt;,x&lt;sub&gt;3&lt;/sub&gt;]-f[x&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;])/(x&lt;sub&gt;3&lt;/sub&gt;-x&lt;sub&gt;1&lt;/sub&gt;)=&lt;br&gt;f[x&lt;sub&gt;1&lt;/sub&gt;,x&lt;sub&gt;2&lt;/sub&gt;,x&lt;sub&gt;3&lt;/sub&gt;]&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt; &lt;/td&gt;&lt;td&gt;&lt;/TD&gt;&lt;td&gt;(f(x&lt;sub&gt;3&lt;/sub&gt;)-f(x&lt;sub&gt;2&lt;/sub&gt;))/(x&lt;sub&gt;3&lt;/sub&gt;-x&lt;sub&gt;2&lt;/sub&gt;)=&lt;br&gt;f[x&lt;sub&gt;2&lt;/sub&gt;,x&lt;sub&gt;3&lt;/sub&gt;]&lt;/TD&gt;&lt;/td&gt;&lt;/TR&gt;&lt;tr&gt;&lt;td bgcolor="#ffa0a0"&gt;x&lt;sub&gt;3&lt;/sub&gt;&lt;/td&gt;&lt;td&gt;f(x&lt;sub&gt;3&lt;/sub&gt;)&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;Note that for any f, this table also defines the generalised divided differences f[x,y], f[x,y,z] and so on. (Compare with the notation at &lt;a href="http://en.wikipedia.org/wiki/Divided_differences"&gt;Wikipedia&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Divided Differences of Types&lt;/b&gt;&lt;br /&gt;Now suppose that f isn't a function of real numbers, but is a container type. Then as I noted &lt;a href="http://blog.sigfpe.com/2009/09/finite-differences-of-types.html"&gt;here&lt;/a&gt;, the second column corresponds to Conor McBride's dissection types. f[x,y] is an f-container with a hole, with everything to the left of the hole containing an x, and everything to the right of the hole containing a y.&lt;br /&gt;&lt;br /&gt;So what's the next column? Consider f(x)=x&lt;sup&gt;4&lt;/sup&gt;. Then f[x,y] = (y&lt;sup&gt;4&lt;/sup&gt;-x&lt;sup&gt;4&lt;/sup&gt;)/(y-x) = x&lt;sup&gt;3&lt;/sup&gt;+x&lt;sup&gt;2&lt;/sup&gt;y+xy&lt;sup&gt;2&lt;/sup&gt;+y&lt;sup&gt;3&lt;/sup&gt;. This corresponds to the description I just gave involving a hole. Now consider f[x,y,z] = (f[y,z]-f[x,y])/(z-x) = z&lt;sup&gt;2&lt;/sup&gt;+xz+x&lt;sup&gt;2&lt;/sup&gt;+y&lt;sup&gt;2&lt;/sup&gt;+yz+yx. This is the type consisting of a 4-tuple, with two holes, and with x left of the first hole, y between the first and second holes, and z to the right of the second hole. In other words, it's a trisection of a container. More generally, f[x&lt;sub&gt;0&lt;/sub&gt;,...,x&lt;sub&gt;n&lt;/sub&gt;] is the type corresponding to n holes and blocks of elements of x&lt;sub&gt;i&lt;/sub&gt; between them. I've only shown this for the type x&lt;sup&gt;4&lt;/sup&gt; but it works for all the same types in &lt;a href="http://www.cs.nott.ac.uk/~ctm/CJ.pdf"&gt;Conor's paper&lt;/a&gt;. But there's a catch: I've used a definition involving subtraction, and that makes no sense for types. Don't worry, I'll address that later.&lt;br /&gt;&lt;br /&gt;We can give trisections and so on a computational interpretation like in Conor's paper. Dissections correspond to tail recursion elimination. They make explicit the state of a function traversing and transforming a recursive type. (I recommend Conor's description in the paper so I won't reproduce it here.) Trisections correspond to a pair of coroutines. The first is transforming a recursive type. The second is transformed the result of the first routine. The second one can (optionally) start consuming pieces of tree as soon as the first has started producing them. At any moment in time we have 3 pieces: (1) the part of tree that is as yet untouched, (2) the part that the first coroutine has produced, but which the second hasn't seen yet, and (3) the output from the second coroutine. Make that explicit, and you get a trisection. The deal's much the same for quadrisections and so on.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Rediscovering Newton&lt;/b&gt;&lt;br /&gt;A consequence of this is that we can now give a computational interpretation to much of the description of divided differences at &lt;a href="http://en.wikipedia.org/wiki/Divided_differences"&gt;Wikipedia&lt;/a&gt;. Among other things, when Conor derives a type theoretical analogue of the Taylor series, he's actually rediscovering a form of the &lt;a href="http://en.wikipedia.org/wiki/Newton_form"&gt;Newton polynomial&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Type Matrices&lt;/b&gt;&lt;br /&gt;One interesting property of divided differences is that they have a description in terms of &lt;a href="http://en.wikipedia.org/wiki/Divided_differences#Matrix_form"&gt;matrices&lt;/a&gt;. In particular, Wikipedia describes a homomorphism from the ring of functions of x to the ring of matrices whose elements are functions of n unknowns. We know that types don't form a ring, but they do form a semiring. So we can form an analogous semiring homomorphism from the set of container types to the set of matrices whose entries are *types*. Matrices of types aren't something we see every day. How can we put this notion to work?&lt;br /&gt;&lt;br /&gt;Let's use the notation from Wikipedia and consider the matrices we get when n = 1. (At this point I recommend reading all of the Wikipedia article on divided differences.) Let's define ourselves a tree container:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data F a = Leaf a | Form (F a) (F a)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We'll call the homomorphism T so that&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;colgroup span="1" width="40"&gt;&lt;colgroup span="2" width="100" bgcolor="#c0c0ff"&gt;&lt;TR&gt;&lt;TD&gt;T(f)(x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;)=&lt;/TD&gt;&lt;TD&gt;f(x&lt;sub&gt;0&lt;/sub&gt;)&lt;TD&gt;f[x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;]&lt;/TD&gt;&lt;/TR&gt;&lt;br /&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;f(x&lt;sub&gt;1&lt;/sub&gt;)&lt;/TD&gt;&lt;/TR&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;Our tree type satisfies the equation f(x)=x+f(x)&lt;sup&gt;2&lt;/sup&gt;. As we have a homomorphism, we also expect this to hold:&lt;br /&gt;&lt;blockquote&gt;T(f)(x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;) = T(i)(x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;)+T(f)(x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;)&lt;sup&gt;2&lt;/sup&gt; (Equation 1)&lt;br /&gt;&lt;/blockquote&gt;where i is the identity function i(x) = x.&lt;br /&gt;&lt;br /&gt;We can easily compute T(i)(x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;) directly using (y-x)/(y-x)=1. We get&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;colgroup span="1" width="40"&gt;&lt;colgroup span="2" width="100" bgcolor="#c0c0ff"&gt;&lt;TR&gt;&lt;TD&gt;T(i)(x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;)=&lt;/TD&gt;&lt;TD&gt;x&lt;sub&gt;0&lt;/sub&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;br /&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;x&lt;sub&gt;1&lt;/sub&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;Multiplying out the matrices in equation 1 now gives us a bunch of equations:&lt;br /&gt;&lt;blockquote&gt;f(x&lt;sub&gt;0&lt;/sub&gt;) = x&lt;sub&gt;0&lt;/sub&gt; + f(x&lt;sub&gt;0&lt;/sub&gt;)&lt;sup&gt;2&lt;/sup&gt;&lt;br /&gt;f(x&lt;sub&gt;1&lt;/sub&gt;) = x&lt;sub&gt;1&lt;/sub&gt; + f(x&lt;sub&gt;1&lt;/sub&gt;)&lt;sup&gt;2&lt;/sup&gt;&lt;br /&gt;f[x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;] = 1 + f(x&lt;sub&gt;0&lt;/sub&gt;)f[x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;]+f[x&lt;sub&gt;0&lt;/sub&gt;,x&lt;sub&gt;1&lt;/sub&gt;]f(x&lt;sub&gt;1&lt;/sub&gt;)&lt;br /&gt;&lt;/blockquote&gt;The first two equations are just the definition of our tree. The third line, in Haskell, is:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data F' x0 x1 = Empty | ForkL (F x0) (F' x0 x1) | ForkR (F' x0 x1) (F x1)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;This is essentially the application of Conor's version of the Leibniz law to trees. &lt;tt&gt;F'&lt;/tt&gt; is the dissected tree. And note how we don't have any subtraction. By using the matrix formulation of divided differences we only have matrix multiplications and sums to deal with.&lt;br /&gt;&lt;br /&gt;The image of f(x) = x+f(x)&lt;sup&gt;2&lt;/sup&gt; under the homomorphism T yields the simultaneous definitions of f and its dissection. More generally, if we had chosen to work with larger n we'd get the simultaneous definition of trees, their dissections, their trisections and so on. And it'll work for any recursive container type.&lt;br /&gt;&lt;br /&gt;So matrices of types are a meaningful concept. They give a way to organise the mutually recursive definitions of higher order divided differences. If you look at this hard enough you may also see that the matrix I defined above is playing the same role as the &lt;tt&gt;D&lt;/tt&gt; type in my &lt;a href="http://blog.sigfpe.com/2010/07/automatic-divided-differences.html"&gt;previous article&lt;/a&gt;. With a little template Haskell I think we could in fact implement automatic (higher order) divided differences at the type level.&lt;br /&gt;&lt;br /&gt;But this is all just scratching the surface. The matrix and homomorphism defined above don't just apply to divided differences. Matrices of types have another deeper and surprising interpretation that will allow me to unify just about everything I've ever said on automatic differentiation, divided differences, and derivatives of types as well as solve a wide class of problems relating to building data types with certain constraints on them. I'll leave that for my next article.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Exercise&lt;/b&gt;&lt;br /&gt;In preparation for the next installment, here's a problem to think about: consider the tree type above. We can easily build trees whose elements are of type A or of type B. We just need f(A+B). We can scan this tree from left to right building a list of elements of type A+B, ie. whose types are each either A or B. How can we redefine the tree so that the compiler enforces the constraint that at no point in the list, the types of four elements in a row spell the word BABA? Start with a simpler problem, like enforcing the constraint that AA never appears.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Apology&lt;/b&gt;&lt;br /&gt;Sorry about the layout above. My literate Haskell-&amp;gt;HTML program doesn't support tables yet so there was a lot of manual HTML. This meant that I didn't write stuff out as fully as I could have and it may be a bit sketchy in places. I may have to switch to PDF for my next post. (Or use Wordpress, but I had trouble there too. or I could use one of the Javascript TeX renderers, but I don't like the external dependency.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-6122287169525245825?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/6122287169525245825/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=6122287169525245825' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6122287169525245825'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6122287169525245825'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/08/divided-differences-and-tomography-of.html' title='Divided Differences and the Tomography of Types'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-8237604777329296040</id><published>2010-07-31T09:28:00.000-07:00</published><updated>2010-08-04T13:26:14.447-07:00</updated><title type='text'>Automatic Divided Differences</title><content type='html'>&lt;b&gt;Divided Differences&lt;/b&gt;&lt;br /&gt;I've &lt;a href="http://blog.sigfpe.com/2006/09/practical-synthetic-differential.html"&gt;previously&lt;/a&gt; talked about automatic differentiation here a few times. One of the standard arguments for using automatic differentiation is that it is more accurate than numeric differentiation implemented via &lt;a href="http://en.wikipedia.org/wiki/Divided_differences"&gt;divided differences&lt;/a&gt;. We can approximate f'(x) by using (f(x)-f(y))/(x-y) with a value of y near x. Accuracy requires y to be close to x, and that requires computing the difference between two numbers that are very close. But subtracting close numbers is itself a source of numerical error when working with finite precision. So you're doomed to error no matter how close you choose x and y to be.&lt;br /&gt;&lt;br /&gt;However, the accuracy problem with computing divided differences can itself be fixed. In fact, we can adapt the methods behind automatic differentiation to work with divided differences too.&lt;br /&gt;&lt;br /&gt;(This paragraph can be skipped. I just want to draw a parallel with what I said &lt;a href="http://blog.sigfpe.com/2009/09/finite-differences-of-types.html"&gt;here&lt;/a&gt;. Firstly I need to correct the title of that article. I should have said it was about *divided differences*, not *finite differences*. The idea in that article was that the notion of a divided difference makes sense for types because for a large class of function you can define divided differences without using either differencing or division. You just need addition and multiplication. That's the same technique I'll be using here. I think it's neat to see the same trick being used in entirely different contexts.)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Direct Approach&lt;/b&gt;&lt;br /&gt;Firstly, here's a first attempt at divided differencing:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; diff0 f x y = (f x - f y)/(x - y)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;We can try it on the function &lt;tt&gt;f&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; f x = (3*x+1/x)/(x-2/x)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;tt&gt;diff0 f 1 1.000001&lt;/tt&gt; gives -14.0000350000029. Repeating the calculation with an arbitrary precision package (I used &lt;a href="http://darcs.augustsson.net/Darcs/CReal/"&gt;CReal&lt;/a&gt;) gives -14.000035000084000. We are getting nowhere near the precision we'd like when working with double precision floating point.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Indirect Approach&lt;/b&gt;&lt;br /&gt;Automatic differentiation used a bunch of properties of differentiation: linearity, the product rule and the chain rule. Similar rules hold for divided differences. First let me introduce some notation. If f is a function then I'll use f(x) for normal function application. But I'll use f[x,y] to mean the divided difference (f(x)-f(y))/(x-y). We have&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;(f+g)[x,y] = f[x,y]+g[x,y]&lt;br /&gt;(fg)[x,y] = f(x)g[x,y]+f[x,y]g(y)&lt;br /&gt;h[x,y] = f[g(x),g(y)]g[x,y] when h(x)=f(g(x))&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;We can modify the product rule to make it more symmetrical though it's not strictly necessary:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;(fg)[x,y] = 0.5(f(x)+f(y))g[x,y]+0.5f[x,y] (g(x)+g(y))&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;(I got that from &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.9483"&gt;this&lt;/a&gt; paper by Kahan.)&lt;br /&gt;&lt;br /&gt;In each case, given f evaluated at x and y, and its divided difference at [x, y], and the same for g, we can compute the corresponding quantities for the sum and product of f and g. So we can store f(x), f(y) and f[x,y] together in a single structure:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data D a = D { fx :: a, fy :: a, fxy :: a } deriving (Eq, Show, Ord)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;And now we can implement arithmetic on these structures using the rules above:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Fractional a =&amp;gt; Num (D a) where&lt;br /&gt;&amp;gt;    fromInteger n = let m = fromInteger n in D m m 0&lt;br /&gt;&amp;gt;    D fx fy fxy + D gx gy gxy = D (fx+gx) (fy+gy) (fxy+gxy)&lt;br /&gt;&amp;gt;    D fx fy fxy * D gx gy gxy = D (fx*gx) (fy*gy) (0.5*(fxy*(gx+gy) + (fx+fy)*gxy))&lt;br /&gt;&amp;gt;    negate (D fx fy fxy) = D (negate fx) (negate fy) (negate fxy)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;I'll leave as an exercise the proof that this formula for division works:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Fractional a =&amp;gt; Fractional (D a) where&lt;br /&gt;&amp;gt;    fromRational n = let m = fromRational n in D m m 0&lt;br /&gt;&amp;gt;    D fx fy fxy / D gx gy gxy = D (fx/gx) (fy/gy) (0.5*(fxy*(gx+gy) - (fx+fy)*gxy)/(gx*gy))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;For the identity function, i, we have i(x)=x, i(y)=y and i[x,y]=1. So for any x and y, the evaluation of the identity function at x, y and [x,y] is represented as &lt;tt&gt;D x y 1&lt;/tt&gt;. To compute divided differences for any function f making use of addition, subtraction and division we need to simply apply &lt;tt&gt;f&lt;/tt&gt; to &lt;tt&gt;D x y 1&lt;/tt&gt;. We pick off the divided difference from the &lt;tt&gt;fxy&lt;/tt&gt; element of the structure. Here's our replacement for &lt;tt&gt;diff0&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; diff1 f x y = fxy $ f (D x y 1)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;This is all mimicking the construction for automatic differentiation.&lt;br /&gt;&lt;br /&gt;Evaluating &lt;tt&gt;diff0 f 1 1.000001&lt;/tt&gt; gives -14.000035000083997. Much closer to the result derived using &lt;tt&gt;CReal&lt;/tt&gt;. One neat thing about this is that we have a function that's well defined even in the limit as x tends to y. When we evaluate &lt;tt&gt;diff1 f 1 1&lt;/tt&gt; we get the derivative of f at 1.&lt;br /&gt;&lt;br /&gt;I thought that this was a novel approach but I found it sketched at the end of &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.1665"&gt;this&lt;/a&gt; paper by Reps and Rall. (Though their sketch is a bit vague so it's not entirely clear what they intend.)&lt;br /&gt;&lt;br /&gt;Both the &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.9483"&gt;Kahan paper&lt;/a&gt; and the Reps and Rall papers give some applications of computing divided diferences this way.&lt;br /&gt;&lt;br /&gt;It's not clear how to deal with the standard transcendental functions. They have divided differences that are very complex compared to their derivatives.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Aside&lt;/b&gt;&lt;br /&gt;There is a sense in which divided differences are uncomputable(!) and that what we've had to do is switch from an extensional description of functions to an intensional description to compute them. I'll write about this some day.&lt;br /&gt;&lt;br /&gt;Note that the ideas here can be extended to higher order divided differences and that there are some really nice connections with type theory. I'll try to write about these too.&lt;br /&gt;&lt;br /&gt;Update: I found &lt;a href="http://www.cs.wisc.edu/wpis/abstracts/tr1415r.abs.html"&gt;another paper&lt;/a&gt; by Reps and Rall that uses precisely the method described here.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-8237604777329296040?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/8237604777329296040/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=8237604777329296040' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8237604777329296040'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8237604777329296040'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/07/automatic-divided-differences.html' title='Automatic Divided Differences'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-8201342159792013333</id><published>2010-07-03T19:14:00.000-07:00</published><updated>2010-07-04T08:10:58.886-07:00</updated><title type='text'>Death to Hydrae (or the operational semantics of ordinals)</title><content type='html'>&lt;b&gt;Unprovable Propositions&lt;/b&gt;&lt;br /&gt;Among other things, &lt;a href="http://en.wikipedia.org/wiki/Godel's_incompleteness_theorems"&gt;Godel's first incompleteness theorem&lt;/a&gt; allows us to construct a statement in the language of &lt;a href="http://en.wikipedia.org/wiki/Peano_axioms"&gt;Peano arithmetic&lt;/a&gt; that can't be proved using the axioms of Peano arithmetic. Unfortunately, this statement is a highly contrived proposition whose sole purpose is to be unprovable. People who learn of Godel's theorems often ask if there are other more natural and uncontrived mathematical statements that can't be proved from the Peano axioms.&lt;br /&gt;&lt;br /&gt;My goal in this post will be to describe one of these propositions. Not just uncontrived, but actually very useful. I only intend to tell half of the story here because I feel like there are many good treatments already out there that tell the rest. I'm just going to get to the point where I can state the unprovable proposition, and then sketch how it can be proved if you allow yourself a little Set Theory.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; {-# OPTIONS_GHC -fno-warn-missing-methods #-}&lt;br /&gt;&amp;gt; import Prelude hiding ((^))&lt;br /&gt;&amp;gt; infixr 8 ^&lt;br /&gt;&amp;gt; type Natural = Integer&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;b&gt;Termination&lt;/b&gt;&lt;br /&gt;Suppose we implement a function to compute the Fibonacci numbers like so:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; fib 0 = 0&lt;br /&gt;&amp;gt; fib 1 = 1&lt;br /&gt;&amp;gt; fib n = fib (n-2) + fib (n-1)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;How do we know that &lt;tt&gt;fib&lt;/tt&gt; terminates for all natural number arguments? One approach is this: if we pass in the argument &lt;tt&gt;n&lt;/tt&gt; it clearly never recurses more than n levels. Each time it recurses it calls itself at most twice. So it must terminate in O(2&lt;sup&gt;n&lt;/sup&gt;) steps (assuming that the primitive operations such as addition take constant time). We can think of this code in a kind of imperative way. It's a bit like &lt;tt&gt;n&lt;/tt&gt; nested loops, each loop going round up to two times.&lt;br /&gt;&lt;br /&gt;Suppose instead that we have some kind of recursive function &lt;tt&gt;g&lt;/tt&gt; that goes &lt;tt&gt;n&lt;/tt&gt; levels deep but for which the number of calls of &lt;tt&gt;g&lt;/tt&gt; to itself is no longer two. In fact, suppose the number of self-calls is very large. Even worse, suppose that each time &lt;tt&gt;g&lt;/tt&gt; is called, it calls itself many more times than it did previously, maybe keeping track of this ever growing number through a global variable. Or instead of a global variable, maybe an evil demon decides how many times &lt;tt&gt;g&lt;/tt&gt; calls itself at each stage. Can you still be sure of termination?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Simple Machine&lt;/b&gt;&lt;br /&gt;In order to look at this question, we'll strip a computer right down to the bare minimum. It will have an input (that the evil demon could use) for natural numbers and will output only one symbol.  Here's a design for such a machine:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; data Machine = Done | Output Machine | Input (Natural -&amp;gt; Machine)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;A value of type &lt;tt&gt;Machine&lt;/tt&gt; represents the state of the machine. &lt;tt&gt;Done&lt;/tt&gt; means it has finished running. &lt;tt&gt;Output s&lt;/tt&gt; means output a symbol and continue in state &lt;tt&gt;s&lt;/tt&gt;. &lt;tt&gt;Input f&lt;/tt&gt; means stop to input a number from the demon (or elsewhere), call it &lt;tt&gt;i&lt;/tt&gt;, and then continue from state &lt;tt&gt;f i&lt;/tt&gt;. This is very much in the style discussed by apfelmus and I in &lt;a href="http://apfelmus.nfshost.com/articles/operational-monad.html"&gt;recent&lt;/a&gt; &lt;a href="http://blog.sigfpe.com/2009/12/where-do-monads-come-from.html"&gt;blog&lt;/a&gt; posts.&lt;br /&gt;&lt;br /&gt;Here's an interpreter for one of these machines:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; run1 Done       = return ()&lt;br /&gt;&amp;gt; run1 (Output x) = print "*" &amp;gt;&amp;gt; run1 x&lt;br /&gt;&amp;gt; run1 (Input f)  = readLn &amp;gt;&amp;gt;= (run1 . f)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;For any &lt;tt&gt;n&lt;/tt&gt; we can easily build a machine to output &lt;tt&gt;n&lt;/tt&gt; stars. This is such a natural machine to want to build it seems only right to give it the name &lt;tt&gt;n&lt;/tt&gt;. If we want to do this then we need to make &lt;tt&gt;Machine&lt;/tt&gt; an instance of &lt;tt&gt;Num&lt;/tt&gt; and define &lt;tt&gt;fromInteger&lt;/tt&gt; for it:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Num Machine where&lt;br /&gt;&amp;gt;    fromInteger 0 = Done&lt;br /&gt;&amp;gt;    fromInteger n = Output (fromInteger (n-1))&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Typing &lt;tt&gt;run1 8&lt;/tt&gt;, say, will output 8 stars.&lt;br /&gt;&lt;br /&gt;Now given two of these machines there is a natural notion of adding them. &lt;tt&gt;a + b&lt;/tt&gt; is the machine that does everything &lt;tt&gt;b&lt;/tt&gt; does followed by everything &lt;tt&gt;a&lt;/tt&gt; does. (Remember, that's &lt;tt&gt;b&lt;/tt&gt; then &lt;tt&gt;a&lt;/tt&gt;.) To do this we need to dig into &lt;tt&gt;b&lt;/tt&gt; and replace every occurrence of &lt;tt&gt;Done&lt;/tt&gt; in it with &lt;tt&gt;a&lt;/tt&gt;. That way, instead of finishing like &lt;tt&gt;b&lt;/tt&gt;, it leads directly into &lt;tt&gt;a&lt;/tt&gt;. In the case of &lt;tt&gt;a + Input f&lt;/tt&gt;, for each number &lt;tt&gt;i&lt;/tt&gt; we need to dig into &lt;tt&gt;f i&lt;/tt&gt; replacing each &lt;tt&gt;Done&lt;/tt&gt; with &lt;tt&gt;a&lt;/tt&gt;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt;    a + Done     = a&lt;br /&gt;&amp;gt;    a + Output b = Output (a + b)&lt;br /&gt;&amp;gt;    a + Input f  = Input (\i -&amp;gt; a + f i)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;There's a natural way to multiply these machines too. The idea is that in &lt;tt&gt;a * b&lt;/tt&gt; we run machine &lt;tt&gt;b&lt;/tt&gt;. But each time the &lt;tt&gt;Output&lt;/tt&gt; command is run, instead of printing a star it executes &lt;tt&gt;a&lt;/tt&gt;. You can think of this as a control structure. If &lt;tt&gt;n&lt;/tt&gt; is a natural number then &lt;tt&gt;a * n&lt;/tt&gt; means running machine &lt;tt&gt;a&lt;/tt&gt; &lt;tt&gt;n&lt;/tt&gt; times. In the case of &lt;tt&gt;a * Input f&lt;/tt&gt;, instead of multiplying by a fixed natural number, we get an input from the user and multiply by &lt;tt&gt;f i&lt;/tt&gt; instead:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt;    _ * Done     = Done&lt;br /&gt;&amp;gt;    a * Output b = a*b + a&lt;br /&gt;&amp;gt;    a * Input f  = Input (\i -&amp;gt; a * f i)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can make a machine to input a number and then output that many stars. Here it is:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; w = Input fromInteger&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Try running &lt;tt&gt;run1 w&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Can you guess what the machine &lt;tt&gt;w * w&lt;/tt&gt; does? Your first guess might be that it inputs two numbers and outputs as many stars as the product of the two numbers. Try it. What actually happens is that we're computing &lt;tt&gt;w * Input fromInteger&lt;/tt&gt;. Immediately from the definition of &lt;tt&gt;*&lt;/tt&gt; we get &lt;tt&gt;Input (\i -&amp;gt; w*i)&lt;/tt&gt;. In other words, the first input gives us an input &lt;tt&gt;i&lt;/tt&gt;, and then &lt;tt&gt;w&lt;/tt&gt; is run &lt;tt&gt;i&lt;/tt&gt; times. So if we initially input &lt;tt&gt;i&lt;/tt&gt;, we are then asked for &lt;tt&gt;i&lt;/tt&gt; more inputs and after each input, the corresponding number of stars is output. Although the original expression contains just two occurrences of &lt;tt&gt;w&lt;/tt&gt;, we are required to enter &lt;tt&gt;i+1&lt;/tt&gt; numbers.&lt;br /&gt;&lt;br /&gt;Given the definitions of &lt;tt&gt;+&lt;/tt&gt; and &lt;tt&gt;*&lt;/tt&gt; it seems natural to define the power operation too:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; (^) :: Machine -&amp;gt; Machine -&amp;gt; Machine&lt;br /&gt;&amp;gt; a ^ Done     = Output Done&lt;br /&gt;&amp;gt; a ^ Output b = a^b * a&lt;br /&gt;&amp;gt; a ^ Input f  = Input (\i -&amp;gt; a ^ f i)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The power operation corresponds to the nesting of loops. So, for example, &lt;tt&gt;w ^ n&lt;/tt&gt; can be thought of loops nested &lt;tt&gt;n&lt;/tt&gt; deep.&lt;br /&gt;&lt;br /&gt;Try working out what &lt;tt&gt;w ^ w&lt;/tt&gt; does when executed with &lt;tt&gt;run1&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Consider the set M of all machines built using just a finite number of applications of the three operators &lt;tt&gt;+&lt;/tt&gt;, &lt;tt&gt;*&lt;/tt&gt; and &lt;tt&gt;^&lt;/tt&gt; to &lt;tt&gt;w&lt;/tt&gt; and the non-zero naturals. (The non-zero condition means we exclude machines like 0*w that accept an input and do nothing with it.) Any such expression can be written as &lt;tt&gt;f w&lt;/tt&gt;, where the definition of &lt;tt&gt;f&lt;/tt&gt; makes no mention of &lt;tt&gt;w&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Suppose we use &lt;tt&gt;run1&lt;/tt&gt; and we always enter the same natural &lt;tt&gt;n&lt;/tt&gt;. Then each occurrence of &lt;tt&gt;w&lt;/tt&gt; acts like &lt;tt&gt;n&lt;/tt&gt;. So if we start with some expression in &lt;tt&gt;w&lt;/tt&gt;, say &lt;tt&gt;f w&lt;/tt&gt;, then always inputting &lt;tt&gt;n&lt;/tt&gt; results in &lt;tt&gt;f n&lt;/tt&gt; stars. We could test this with &lt;tt&gt;run1 (w^w^w^w)&lt;/tt&gt;, always entering 2, but it would require a lot of typing. Instead we can write another intepreter that consumes its inputs from a list rather from the user (or demon). And instead of printing stars it simply prints out the total number of stars at the end:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; run2 Done    _        = 0&lt;br /&gt;&amp;gt; run2 (Output x)   as  = 1 + run2 x as&lt;br /&gt;&amp;gt; run2 (Input f) (a:as) = run2 (f a) as&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now you can try &lt;tt&gt;run2 (w^w^w^w) [2,2..]&lt;/tt&gt; and see that we (eventually) get 2&lt;sup&gt;2&lt;sup&gt;2&lt;sup&gt;2&lt;/sup&gt;&lt;/sup&gt;&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Termination Again&lt;/b&gt;&lt;br /&gt;If we run a machine in M there's a pattern that occurs again and again. We input a number, and then as a result we go into a loop requesting more numbers. These inputs may in turn request more inputs. Like the mythological &lt;a href="http://www.youtube.com/watch?v=Ow-dkIuIaz8"&gt;hydra&lt;/a&gt;, every input we give may spawn many more requests for inputs. As the number of inputs required may depend on our previous inputs, and we may input numbers as large as we like, these machines may run for a long time. Suppose our machine terminates after requesting &lt;tt&gt;n&lt;/tt&gt; inputs. Then there must be some highest number that we entered. Call it &lt;tt&gt;m&lt;/tt&gt;. Then if the original machine was &lt;tt&gt;f w&lt;/tt&gt; (with &lt;tt&gt;f&lt;/tt&gt; defined in terms of the 3 operators and non-zero naturals), the machine must have terminated outputting no more than &lt;tt&gt;f m&lt;/tt&gt; stars. So if our machine terminates, we can bound how many steps it took.&lt;br /&gt;&lt;br /&gt;But do our machines always terminate? The input we give to the machine might not be bounded. If we run &lt;tt&gt;run2 (w^w) [4,5..]&lt;/tt&gt;, say, the inputs grow and grow. If these inputs grow faster than we can chop off the heads of our hydra, we might never reach termination.&lt;br /&gt;&lt;br /&gt;Consider a program to input &lt;tt&gt;n&lt;/tt&gt; and then output &lt;tt&gt;fib n&lt;/tt&gt;. It accepts an input, recurses to a depth of at most &lt;tt&gt;n&lt;/tt&gt;, and calls itself at most twice in each recursion. Compare with the machine &lt;tt&gt;2 ^ w&lt;/tt&gt;. This accepts an input &lt;tt&gt;n&lt;/tt&gt;, recurses to a depth &lt;tt&gt;n&lt;/tt&gt;, calling itself exactly twice each time. So if &lt;tt&gt;2 ^ w&lt;/tt&gt; terminates, so does &lt;tt&gt;fib&lt;/tt&gt;. The more complex example above where I introduced the evil demon will terminate if &lt;tt&gt;w ^ w&lt;/tt&gt; does, as long as the demon doesn't stop inputting numbers. So if we can show in one proof that every machine of type &lt;tt&gt;Machine&lt;/tt&gt; terminates, then there are many programs whose termination we could easily prove.&lt;br /&gt;&lt;br /&gt;Let's consider an example like &lt;tt&gt;run1 (w ^ w)&lt;/tt&gt; with inputs 2, 3, 4, ...&lt;br /&gt;&lt;br /&gt;We start with &lt;tt&gt;w ^ w&lt;/tt&gt;. Examining the definition of the operator &lt;tt&gt;^&lt;/tt&gt; we see that this proceeds by requesting an input. The first input is 2. Now we're left with &lt;tt&gt;w ^ 2&lt;/tt&gt;. This is &lt;tt&gt;w * w&lt;/tt&gt;. Again it accepts an input. This time 3. Now we go to state &lt;tt&gt;w * 3&lt;/tt&gt;. This is &lt;tt&gt;w*2 + w&lt;/tt&gt;. Again we accept an input. This time 4. We are led to &lt;tt&gt;w*2 + 4&lt;/tt&gt;. This now outputs 4 stars and we are left with &lt;tt&gt;w * 2&lt;/tt&gt; which is &lt;tt&gt;w + w&lt;/tt&gt;. We accept an input 5, output 5 stars and are left with &lt;tt&gt;w&lt;/tt&gt;. After a further input of 6, it outputs 6 stars and terminates. Or we could just run &lt;tt&gt;run2 (w ^ w) [2,3..]&lt;/tt&gt; and get 15(=4+5+6) as output.&lt;br /&gt;&lt;br /&gt;The transitions are:&lt;br /&gt;&lt;tt&gt;w ^ w&lt;/tt&gt;&lt;br /&gt;-&amp;gt; &lt;tt&gt;w ^ 2&lt;/tt&gt;&lt;br /&gt;-&amp;gt; &lt;tt&gt;w * w&lt;/tt&gt;&lt;br /&gt;-&amp;gt; &lt;tt&gt;w * 3&lt;/tt&gt;&lt;br /&gt;-&amp;gt; &lt;tt&gt;w*2 + w&lt;/tt&gt;&lt;br /&gt;-&amp;gt; &lt;tt&gt;w*2 + 4&lt;/tt&gt; -&amp;gt; ... -&amp;gt; &lt;tt&gt;w * 2&lt;/tt&gt;&lt;br /&gt;-&amp;gt; &lt;tt&gt;w + w&lt;/tt&gt;&lt;br /&gt;-&amp;gt; &lt;tt&gt;w + 5&lt;/tt&gt; -&amp;gt; ... &lt;tt&gt;w&lt;/tt&gt;&lt;br /&gt;-&amp;gt; &lt;tt&gt;6&lt;/tt&gt; -&amp;gt; ... -&amp;gt; &lt;tt&gt;0&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;Now for some Set Theory. Rewrite the above sequence using the transfinite ordinal &amp;omega; instead of &lt;tt&gt;w&lt;/tt&gt;. The sequence becomes a sequence of ordinals. Any time we accept an input, the rightmost &amp;omega; becomes a finite ordinal. So we have a descending sequence of ordinals. This is true whatever ordinal we start with. The execution of either &lt;tt&gt;Input&lt;/tt&gt; or &lt;tt&gt;Output&lt;/tt&gt; always strictly decreases our ordinal, and any descending sequence of ordinals &lt;a href="http://en.wikipedia.org/wiki/Well-order"&gt;must eventually terminate&lt;/a&gt;. Therefore every machine in M eventually terminates.&lt;br /&gt;&lt;br /&gt;But here's the important fact: to show termination we used the ordinal &amp;omega;, and this required the &lt;a href="http://en.wikipedia.org/wiki/Axiom_of_infinity"&gt;axiom of infinity&lt;/a&gt; and some Set Theory. Instead we could encode the termination question, via Godel numbering, as a proposition of Peano arithmetic. If we do this, then we hit against an amazing fact. It can't be proved using the axioms of Peano arithmetic. So we have here a useful fact, not a contrived self-referential one, that can't be proved with Peano arithmetic.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Why can't it be proved using just the Peano axioms?&lt;/b&gt;&lt;br /&gt;A few years back, Jim Apple made a post about constructing (some) &lt;a href="http://blog.jbapple.com/2007/02/countable-ordinals-in-haskell.html"&gt;countable ordinals in Haskell&lt;/a&gt;. His construction nicely reflects the definitions a set theorist might make, but the code doesn't actually do anything. Later I learned from &lt;a href="http://lambda-the-ultimate.org/node/3235"&gt;Hyland and Power&lt;/a&gt; how you can interpret algebraic structures as computational effects. &lt;a href="http://apfelmus.nfshost.com/articles/operational-monad.html"&gt;apfelmus&lt;/a&gt; illustrates nicely how an abstract datatype can be made to do things with the help of an interpreter. Roughly speaking, doing this is what is known as &lt;a href="http://en.wikipedia.org/wiki/Operational_semantics"&gt;operational semantics&lt;/a&gt;. So I thought, why not apply this approach to the algebraic rules for defining and combining ordinals. The result are the interpreters &lt;tt&gt;run1&lt;/tt&gt; and &lt;tt&gt;run2&lt;/tt&gt; above.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;run1&lt;/tt&gt; gives an example of a &lt;a href="http://math.andrej.com/2008/02/02/the-hydra-game/"&gt;Hydra game&lt;/a&gt;. In fact, its precisely the hydra game described in &lt;a href="http://www.few.vu.nl/~ariya/pub/WoLLIC07hydra.pdf"&gt;this&lt;/a&gt; paper because it always chops off the rightmost head. The &lt;a href="http://en.wikipedia.org/wiki/Goodstein's_theorem"&gt;Kirby-Paris&lt;/a&gt; theorem tells us we can't prove this game terminates using just the Peano axioms. A web search on &lt;a href="http://en.wikipedia.org/wiki/Goodstein's_theorem"&gt;Goodstein's theorem&lt;/a&gt; will reveal many great articles with the details.&lt;br /&gt;&lt;br /&gt;A well-ordered quantity that you can keep decreasing as a program runs, and that can be used to prove termination, is an example of a &lt;a href="http://en.wikipedia.org/wiki/Loop_variant"&gt;loop variant&lt;/a&gt;. Loop variants are often natural numbers but the above shows that transfinite ordinals make fine loop variants. But in the interest of being fair and balanced, &lt;a href="http://archive.eiffel.com/doc/faq/variant.html"&gt;here&lt;/a&gt;'s a dissenting view. The author has a point. If you are forced to use transfinite ordinals to show your program terminates, the age of the universe will probably be but the briefest flicker compared to your program's execution. On the other hand, if you don't want an actual bound on the execution time, ordinals can provide very short proofs of termination for useful programs.&lt;br /&gt;&lt;br /&gt;(By the way, this article is a sequel to my own &lt;a href="http://blog.sigfpe.com/2008/10/whats-use-of-transfinite-ordinal.html"&gt;article&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Exercise&lt;/b&gt;&lt;br /&gt;Algebraic structures give rise to monads. Can you see how to generalise the definition of &lt;tt&gt;Machine&lt;/tt&gt; to make it a monad? If you pick the right definition then the substitution of &lt;tt&gt;a&lt;/tt&gt; for &lt;tt&gt;Done&lt;/tt&gt; in the definition of &lt;tt&gt;+&lt;/tt&gt; should give you a particularly simple definition of ordinal addition. (See &lt;a href="http://blog.sigfpe.com/2006/11/variable-substitution-gives.html"&gt;this&lt;/a&gt; for a hint on how substitution works in monads.)&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; instance Show Machine&lt;br /&gt;&amp;gt; instance Eq Machine&lt;br /&gt;&amp;gt; instance Ord Machine&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-8201342159792013333?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/8201342159792013333/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=8201342159792013333' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8201342159792013333'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/8201342159792013333'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/07/death-to-hydrae-or-denotational.html' title='Death to Hydrae (or the operational semantics of ordinals)'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-6558089432442147035</id><published>2010-05-29T10:08:00.001-07:00</published><updated>2011-05-01T09:23:28.158-07:00</updated><title type='text'>Constructing Intermediate Values</title><content type='html'>&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;The &lt;a href="http://en.wikipedia.org/wiki/Intermediate_value_theorem"&gt;Intermediate Value Theorem&lt;/a&gt;, as it is usually told, tells us that if we have a continuous real-valued function f on the closed interval [0,1], with f(0)&amp;lt;0 and f(1)&amp;gt;0 then there is a point x in (0,1) such that f(x)=0. Here's a sketch of a proof:&lt;br /&gt;&lt;br /&gt;Define three sequences l&lt;sub&gt;i&lt;/sub&gt;, u&lt;sub&gt;i&lt;/sub&gt;, x&lt;sub&gt;i&lt;/sub&gt; recursively as follows:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;x&lt;sub&gt;i+1&lt;/sub&gt; = (l&lt;sub&gt;i&lt;/sub&gt;+u&lt;sub&gt;i&lt;/sub&gt;)/2&lt;br /&gt;(l&lt;sub&gt;i+1&lt;/sub&gt;, u&lt;sub&gt;i+1&lt;/sub&gt;) = (l&lt;sub&gt;i&lt;/sub&gt;, x&lt;sub&gt;i+1&lt;/sub&gt;) if f(x&lt;sub&gt;i+1&lt;/sub&gt;)&amp;gt;0&lt;br /&gt;(l&lt;sub&gt;i+1&lt;/sub&gt;, u&lt;sub&gt;i+1&lt;/sub&gt;) = (x&lt;sub&gt;i+1&lt;/sub&gt;, u&lt;sub&gt;i&lt;/sub&gt;) otherwise&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Set l&lt;sub&gt;0&lt;/sub&gt;=0, u&lt;sub&gt;0&lt;/sub&gt;=1.&lt;br /&gt;The x&lt;sub&gt;i+1&lt;/sub&gt; are bracketed by the l&lt;sub&gt;i&lt;/sub&gt; and u&lt;sub&gt;i&lt;/sub&gt;, and |u&lt;sub&gt;i&lt;/sub&gt;-l&lt;sub&gt;i&lt;/sub&gt;|=2&lt;sup&gt;-i&lt;/sup&gt; so the x&lt;sub&gt;i&lt;/sub&gt; form a Cauchy sequence with some limit x. Because f is continuous, we get f(x)=0.&lt;br /&gt;&lt;br /&gt;This proof not only shows that we have an intermediate value x where f crosses the x-axis, it also seems to describe a procedure for computing x. Let's use this strategy to write a numerical method.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Numerical Method&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; import Ratio&lt;br /&gt;&amp;gt; infixl 6 .*&lt;br /&gt;&lt;br /&gt;&amp;gt; ivt0 :: (Float -&amp;gt; Float) -&amp;gt; Float&lt;br /&gt;&amp;gt; ivt0 f = ivt' 0 1 where&lt;br /&gt;&amp;gt;    ivt' l u =&lt;br /&gt;&amp;gt;          let z = 0.5*(l+u)&lt;br /&gt;&amp;gt;          in if z==l || z==u || f z == 0&lt;br /&gt;&amp;gt;               then z&lt;br /&gt;&amp;gt;               else if f z &amp;lt; 0&lt;br /&gt;&amp;gt;                   then ivt' z u&lt;br /&gt;&amp;gt;                   else ivt' l z&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here are some simple functions we can use to test it:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; f0, g0 :: Float -&amp;gt; Float&lt;br /&gt;&amp;gt; f0 x = 3*x-1&lt;br /&gt;&amp;gt; g0 x = 2*x-1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;tt&gt;ivt0 f0&lt;/tt&gt; and &lt;tt&gt;ivt0 f1&lt;/tt&gt; give the expected results.&lt;br /&gt;&lt;br /&gt;Of course, it might not actually be the case that &lt;tt&gt;f (ivt f) == 0&lt;/tt&gt;. We're dealing with the type &lt;tt&gt;Float&lt;/tt&gt; here and there might not actually be a solution to &lt;tt&gt;f x == 0&lt;/tt&gt; to the precision of a &lt;tt&gt;Float&lt;/tt&gt;. And it's not clear that the word continuous means anything here anyway because the limited precision of a &lt;tt&gt;Float&lt;/tt&gt; precludes us from constructing the deltas and epsilons the definition of continuity demands.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Switching to Exact Real Arithmetic&lt;/b&gt;&lt;br /&gt;So can we do better? Can we implement a function to compute intermediate values for exact real arithmetic? I &lt;a href="http://blog.sigfpe.com/2010/04/on-representing-some-real-numbers.html"&gt;recently&lt;/a&gt; talked about computable real numbers, and they are what I intend to use here. I'll represent the real number x using a sequence of rationals, x&lt;sub&gt;i&lt;/sub&gt;, such that |x&lt;sub&gt;i&lt;/sub&gt;-x|&amp;lt;2&lt;sup&gt;-i&lt;/sup&gt;. Here's a suitable type:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; data R = R { runR :: Integer -&amp;gt; Rational }&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(I'll only be calling it for non-negative arguments, but alas, Haskell has no natural number type.)&lt;br /&gt;&lt;br /&gt;We'll need to display our numbers. Here's something cheap and cheerful that's good enough for this article:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Show R where&lt;br /&gt;&amp;gt;    show (R f) = show (fromRational (f 30) :: Float)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It displays a &lt;tt&gt;Float&lt;/tt&gt; that is formed from a rational approximation that is within 2&lt;sup&gt;-30&lt;/sup&gt; of our value.&lt;br /&gt;&lt;br /&gt;We can inject the rationals into the computable reals by using constant sequences:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Fractional R where&lt;br /&gt;&amp;gt;    fromRational x = R $ const $ fromRational x&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;An important point here is that two different sequences might define the same real. So when we construct functions mapping reals to reals we must be sure that they take different representations of the same real to representations of the same real. I'll call these 'well-defined'.&lt;br /&gt;&lt;br /&gt;Now our problems begin. Haskell wants us to implement &lt;tt&gt;Eq&lt;/tt&gt; before &lt;tt&gt;Num&lt;/tt&gt;, but equality of computable reals is &lt;a href="http://blog.sigfpe.com/2010/04/on-representing-some-real-numbers.html"&gt;not decidable&lt;/a&gt;. Still, it is semi-decidable in that if two reals differ, we can detect this. Here's a partial equality function:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Eq R where&lt;br /&gt;&amp;gt;    R f == R g = eq' 2 0 f g where&lt;br /&gt;&amp;gt;        eq' delta i f g = &lt;br /&gt;&amp;gt;               let fi = f i&lt;br /&gt;&amp;gt;                   gi = g i&lt;br /&gt;&amp;gt;               in if fi&amp;lt;gi-delta&lt;br /&gt;&amp;gt;                   then False&lt;br /&gt;&amp;gt;                   else if fi&amp;gt;gi+delta&lt;br /&gt;&amp;gt;                       then False&lt;br /&gt;&amp;gt;                       else eq' (delta/2) (i+1) f g&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The idea is that if &lt;tt&gt;f i&lt;/tt&gt; and &lt;tt&gt;g i&lt;/tt&gt; can be separated by more than 2*2&lt;sup&gt;-i&lt;/sup&gt; then the reals that &lt;tt&gt;f&lt;/tt&gt; and &lt;tt&gt;g&lt;/tt&gt; represent can't be equal.&lt;br /&gt;&lt;br /&gt;We have to implement &lt;tt&gt;Ord&lt;/tt&gt; too. If one computable real is bigger than another then they must differ by more than some power of 1/2. This means we can eventually distinguish between them. As a result, &lt;tt&gt;compare&lt;/tt&gt; will always terminate if we compare distinct numbers:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Ord R where&lt;br /&gt;&amp;gt;    compare (R f) (R g) = compare' 2 0 f g where&lt;br /&gt;&amp;gt;        compare' delta i f g =&lt;br /&gt;&amp;gt;               let fi = f i&lt;br /&gt;&amp;gt;                   gi = g i&lt;br /&gt;&amp;gt;               in if fi&amp;lt;gi-delta&lt;br /&gt;&amp;gt;                   then LT&lt;br /&gt;&amp;gt;                   else if fi&amp;gt;gi+delta&lt;br /&gt;&amp;gt;                       then GT&lt;br /&gt;&amp;gt;                       else compare' (delta/2) (i+1) f g&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can define &lt;tt&gt;min&lt;/tt&gt; and &lt;tt&gt;max&lt;/tt&gt; straightforwardly:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt;    min (R f) (R g) = R $ \i -&amp;gt; min (f i) (g i)&lt;br /&gt;&amp;gt;    max (R f) (R g) = R $ \i -&amp;gt; max (f i) (g i)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And now we can define addition. We use the fact that if |x-x'|&amp;lt;&amp;epsilon; and |y-y'|&amp;lt;&amp;epsilon; then |(x+y)-(x'+y')|&amp;lt;2&amp;epsilon;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; instance Num R where&lt;br /&gt;&amp;gt;    fromInteger n = R $ const $ fromInteger n&lt;br /&gt;&amp;gt;    R f + R g = R $ \i -&amp;gt; let j = i+1 in f j + g j&lt;br /&gt;&amp;gt;    negate (R f) = R $ negate . f&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I'm going to skip multiplication of two reals. But it's easy to define multiplication of a rational with a real. Here I'm using the fact that if |x-x'|&amp;lt;&amp;epsilon;, then |ax-ax'|&amp;lt;|a|&amp;epsilon;:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; x .* R f = R $ mul x x f where&lt;br /&gt;&amp;gt;    mul x x' f i | abs x' &amp;lt; 1 = x * f i&lt;br /&gt;&amp;gt;                 | otherwise  = mul x (x'/2) f (i+1)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now we're ready! It's straightforward to translate our previous algorithm. Note that we don't need a function to take the limit of a sequence. By definition we use sequences of rational approximations that are accurate to within 2&lt;sup&gt;-i&lt;/sup&gt; so our &lt;tt&gt;ivt1&lt;/tt&gt; function merely needs to compute an approximation to our intermediate value that is accurate to within 2&lt;sup&gt;-i&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; ivt1 :: (R -&amp;gt; R) -&amp;gt; R&lt;br /&gt;&amp;gt; ivt1 f = R $ \i -&amp;gt; ivt' 0 1 i where&lt;br /&gt;&amp;gt;    ivt' :: Rational -&amp;gt; Rational -&amp;gt; Integer -&amp;gt; Rational&lt;br /&gt;&amp;gt;    ivt' x y i =&lt;br /&gt;&amp;gt;           let z = (1%2)*(x+y)&lt;br /&gt;&amp;gt;           in if i==0&lt;br /&gt;&amp;gt;                then z&lt;br /&gt;&amp;gt;                else if f (fromRational z) &amp;lt; 0&lt;br /&gt;&amp;gt;                    then ivt' z y (i-1)&lt;br /&gt;&amp;gt;                    else ivt' x z (i-1)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Some new test functions:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; f, g :: R -&amp;gt; R&lt;br /&gt;&amp;gt; f x = 3.*x-1&lt;br /&gt;&amp;gt; g x = 2.*x-1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note that (amazingly!) any well-defined function &lt;tt&gt;R -&amp;gt; R&lt;/tt&gt; we implement is continuous because of the argument I sketched &lt;a href="http://blog.sigfpe.com/2008/01/what-does-topology-have-to-do-with.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;And you'll find that &lt;tt&gt;ivt1 f&lt;/tt&gt; gives the result you expect. But sadly, &lt;tt&gt;ivt1 g&lt;/tt&gt; never returns. What went wrong?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Clutching Defeat from the Jaws of Victory&lt;/b&gt;&lt;br /&gt;After one iteration, &lt;tt&gt;ivt1&lt;/tt&gt; sets &lt;tt&gt;z&lt;/tt&gt; to the value we want. This seems like a particularly easy case. But the algorithm immediately fails because it then proceeds to compare &lt;tt&gt;f z&lt;/tt&gt; with zero. We already know that our equality test must fail for this case. How do we fix our algorithm?&lt;br /&gt;&lt;br /&gt;Consider the actions of any higher order function that operates on a continuous function &lt;tt&gt;f&lt;/tt&gt; type &lt;tt&gt;R -&amp;gt; R&lt;/tt&gt;. It will evaluate &lt;tt&gt;f&lt;/tt&gt; at various real values. The return values from &lt;tt&gt;f&lt;/tt&gt; will then be sampled at various values of &lt;tt&gt;i&lt;/tt&gt;. Each sample corresponds to a bracket around a value of &lt;tt&gt;f&lt;/tt&gt;. Consider a higher order function acting on the function below:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/TAF3F5LJMBI/AAAAAAAAAhc/l-9_AdEXWDo/s1600/ivt.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 171px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/TAF3F5LJMBI/AAAAAAAAAhc/l-9_AdEXWDo/s400/ivt.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5476789564989845522" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;If our higher order function terminates, it can only compute a finite number of these brackets. So I can choose an &amp;epsilon; small enough that I could nudge the graph up or down by &amp;epsilon; without invalidating the brackets. I could provide the higher order function with a different function &lt;tt&gt;R -&amp;gt; R&lt;/tt&gt; that responds in exactly the same way to the finite number of samples. &lt;br /&gt;&lt;br /&gt;Now consider the case of an &lt;tt&gt;ivt&lt;/tt&gt; function. It will respond the same way to two functions differing only by &amp;epsilon;. The function in the graph above has been designed so that it is flat from 0.4 to 0.6. Nudging the function above by &amp;epsilon;&amp;gt;0 up and down causes the crossing of the x-axis to move from x&amp;lt;0.4 to x&amp;gt;0.6. In other words, no algorithm could possibly return a valid intermediate value for all possible inputs because I can always contrive a pair of functions, to which &lt;tt&gt;ivt&lt;/tt&gt; would respond identically, and yet which have widely separated points where they cross the x-axis. This shows that the problem of determining a crossing point of a continuous function that starts off less than zero, and ends up greater than zero, is uncomputable. We can do it for some functions. But no function can solve this problem for all continuous inputs. What's really weird about this is that it fails precisely when the problem seems easiest - when there are lots of zeroes!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Revised Approach&lt;/b&gt;&lt;br /&gt;But we don't need to give up. If we could ensure that at each bisection we could pick a point that wasn't exactly a zero, then we'd be fine. But how can we do this? Well here's a cheating way: we don't bother. We let that problem be the caller's responsibility. Instead of bisecting at each stage, we'll trisect. We'll pick a point that isn't a zero in the middle third by using a function pased in by the caller. Call this function &lt;tt&gt;g&lt;/tt&gt;. &lt;tt&gt;g&lt;/tt&gt; is of type &lt;tt&gt;Rational -&amp;gt; Rational -&amp;gt; Rational&lt;/tt&gt; and the specification is that &lt;tt&gt;g x y&lt;/tt&gt; must be a rational number between &lt;tt&gt;x&lt;/tt&gt; and &lt;tt&gt;y&lt;/tt&gt; that isn't a zero. Here's our new &lt;tt&gt;ivt2&lt;/tt&gt; function:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; ivt2 :: (Rational -&amp;gt; Rational -&amp;gt; Rational) -&amp;gt; (R -&amp;gt; R) -&amp;gt; R&lt;br /&gt;&amp;gt; ivt2 g f = R $ \i -&amp;gt;&lt;br /&gt;&amp;gt;     let delta = (1%2)^i&lt;br /&gt;&amp;gt;         ivt' :: Rational -&amp;gt; Rational -&amp;gt; Rational&lt;br /&gt;&amp;gt;         ivt' x y =&lt;br /&gt;&amp;gt;               let l = (1%3)*(2*x+y)&lt;br /&gt;&amp;gt;                   u = (1%3)*(x+2*y)&lt;br /&gt;&amp;gt;                   z = g l u&lt;br /&gt;&amp;gt;               in if u-l &amp;lt; delta&lt;br /&gt;&amp;gt;                    then z&lt;br /&gt;&amp;gt;                    else if f (fromRational z) &amp;lt; 0&lt;br /&gt;&amp;gt;                        then ivt' z y&lt;br /&gt;&amp;gt;                        else ivt' x z&lt;br /&gt;&amp;gt;     in ivt' 0 1&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;How can we make use of this? Well suppose we want to find where a polynomial, f, crosses the x-axis. If the polynomial has degree n, it can only cross at n points. So if we pick n+1 values, f must evaluate to a non-zero value at one of them. We can examine the n+1 values of f(x) in parallel to find one that is non-zero. This must terminate. Here's an implementation for the case n+1=2. If we pass two real values to &lt;tt&gt;pickNonZero&lt;/tt&gt; it will &lt;tt&gt;True&lt;/tt&gt; or &lt;tt&gt;False&lt;/tt&gt; according to whether the first or second argument is non-zero. It must only be called with a pair of numbers where one is non-zero:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; pickNonZero (R f) (R g) = pickNonZero' 0 1 where&lt;br /&gt;&amp;gt;     pickNonZero' i e = let fi = f i&lt;br /&gt;&amp;gt;                            gi = g i&lt;br /&gt;&amp;gt;                        in if fi &amp;gt; e || fi &amp;lt; -e&lt;br /&gt;&amp;gt;                           then True&lt;br /&gt;&amp;gt;                           else if gi &amp;gt; e || gi &amp;lt; -e&lt;br /&gt;&amp;gt;                               then False&lt;br /&gt;&amp;gt;                               else pickNonZero' (i+1) ((1%2)*e)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note how it examines the two values in parallel. If it simply tested the two numbers in turn against zero it would fail if the first were zero. (Interesting side note: we could write this using &lt;tt&gt;par&lt;/tt&gt;.)&lt;br /&gt;&lt;br /&gt;Suppose the function &lt;tt&gt;f&lt;/tt&gt; is strictly monotonically increasing. Then we know that if y&amp;gt;x, then f(y)&amp;gt;f(x) so we can't have that both f(x) and f(y) are zero. We can use this to write a suitable helper function to write an intermediate value function that is 100% guaranteed to work for all monotonically increasing functions (assuming we don't run out of memory):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&amp;gt; monotonicPicker :: (R -&amp;gt; R) -&amp;gt; Rational -&amp;gt; Rational -&amp;gt; Rational&lt;br /&gt;&amp;gt; monotonicPicker f x y = if pickNonZero&lt;br /&gt;&amp;gt;                               (f (fromRational x)) (f (fromRational y))&lt;br /&gt;&amp;gt;                             then x&lt;br /&gt;&amp;gt;                             else y&lt;br /&gt;&lt;br /&gt;&amp;gt; ivt3 :: (R -&amp;gt; R) -&amp;gt; R&lt;br /&gt;&amp;gt; ivt3 f = ivt2 (monotonicPicker f) f&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;By varying &lt;tt&gt;monotonicPicker&lt;/tt&gt; we can find crossing points for all manner of continuous function. Try &lt;tt&gt;ivt3 g&lt;/tt&gt;. But no matter how hard we try, we can never write one that works for all continuous functions.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What went wrong?&lt;/b&gt;&lt;br /&gt;So why did the Intermediate Value Theorem, despite its appearance of constructing an intermediate value, fail to yield an algorithm? For an answer to that, you'll have to wait until my next article. The surprising thing is that it's not a problem of analysis, but a problem of logic.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Acknowledgements&lt;/b&gt;&lt;br /&gt;I have borrowed liberally from &lt;a href="http://www.paultaylor.eu/"&gt;Paul Taylor&lt;/a&gt;, especially from &lt;a href="http://www.paultaylor.eu/ASD/lamcra/introivt"&gt;here&lt;/a&gt;. However, I've chosen to phrase everything in terms of classical reasoning about Haskell programs. Any errors are, of course, mine. It was probably &lt;a href="http://math.andrej.com/"&gt;Andrej Bauer&lt;/a&gt;'s awesome blog that got me interested in this stuff.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Afterword&lt;/b&gt;&lt;br /&gt;I want to stress that I have made absolutely no mention of intuitionism or constructivism in what I've said above. I have simply used ordinary classical mathematics to reason about some Haskell programs. I'm leaving that for the next article.&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;bc1=000000&amp;IS2=1&amp;bg1=FFFFFF&amp;fc1=000000&amp;lc1=0000FF&amp;t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;m=amazon&amp;f=ifr&amp;md=10FE9736YVPPT7A0FBG2&amp;asins=0486428753" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;/TD&gt;&lt;td&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;bc1=000000&amp;IS2=1&amp;bg1=FFFFFF&amp;fc1=000000&amp;lc1=0000FF&amp;t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;m=amazon&amp;f=ifr&amp;md=10FE9736YVPPT7A0FBG2&amp;asins=067453767X" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TABLE&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-6558089432442147035?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/6558089432442147035/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=6558089432442147035' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6558089432442147035'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6558089432442147035'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/05/constructing-intermediate-values.html' title='Constructing Intermediate Values'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_UdKHLrHa05M/TAF3F5LJMBI/AAAAAAAAAhc/l-9_AdEXWDo/s72-c/ivt.png' height='72' width='72'/><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-7790745507221255597</id><published>2010-05-15T11:49:00.001-07:00</published><updated>2010-05-20T11:52:23.055-07:00</updated><title type='text'>Optimising pointer subtraction with 2-adic integers.</title><content type='html'>Here is a simple C type and a function definition:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;struct A&lt;br /&gt;{&lt;br /&gt;    char x[7];&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;int diff(struct A *a, struct A *b)&lt;br /&gt;{&lt;br /&gt;    return a-b;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It doesn't seem like there could be much to say about that. The &lt;tt&gt;A&lt;/tt&gt; structure is 7 bytes long so the subtraction implicitly divides by 7. That's about it. But take a look at the assembly language generated when it's compiled with gcc:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;movl 4(%esp), %eax&lt;br /&gt;subl 8(%esp), %eax&lt;br /&gt;imull $-1227133513, %eax, %eax&lt;br /&gt;ret&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Where is the division by 7? Instead we see multiplication by -1227133513. A good first guess is that maybe this strange constant is an approximate fixed point representation of 1/7. But this is a single multiplication with no shifting or bit field selection tricks. So how does this work? And what is -1227133513? Answering that question will lead us on a trip through some suprising and abstract mathematics. Among other things, we'll see how not only can you represent negative numbers as positive numbers in binary using twos complements, but that we can also represent fractions similarly in binary too.&lt;br /&gt;&lt;br /&gt;But first, some history.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Some n-bit CPUs&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/S-8sGYPI4sI/AAAAAAAAAg0/R9cP5tVUW8o/s1600/4004.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 68px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/S-8sGYPI4sI/AAAAAAAAAg0/R9cP5tVUW8o/s400/4004.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5471640560375227074" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;That's an Intel 4004 microprocessor, the first microprocessor completely contained on a single chip. It was a 4 bit processor equipped with 4 bit registers. With 4 bits we can represent unsigned integers from 0 to 15 in a single register. But what happens if we want to represent larger integers?&lt;br /&gt;&lt;br /&gt;Let's restrict ourselves to arithmetic operations using only addition, subtraction and multiplication and using one register per number. Then a curious thing happens. Take some numbers outside of the range 0 to 15 and store only the last 4 bits of each number in our registers. Now perform a sequence of additions, subtractions and multiplications. Obviously we usually expect to get the wrong result because if the final result is outside of our range we can't represent it in a single register. But the result we do get will have the last 4 bits of the correct result. This happens because in the three operations I listed, the value of a bit in a result doesn't depend on higher bits in the inputs. Information only propagates from low bit to high bit. We can think of a 4004 as allowing us to correctly resolve the last 4 bits of a result. From the perspective of a 4004, 1 and 17 and 33 all look like the same number. It doesn't have the power to distinguish them. But if we had a more powerful 8 bit processor like the 6502 in this machine, we could distinguish them:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/S_GRzA1q9aI/AAAAAAAAAhQ/H4cm-mKFINs/s1600/800px-BBC_Micro.jpeg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 139px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/S_GRzA1q9aI/AAAAAAAAAhQ/H4cm-mKFINs/s400/800px-BBC_Micro.jpeg" border="0" alt=""id="BLOGGER_PHOTO_ID_5472315327816332706" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;b&gt;Introduction&lt;/b&gt;&lt;br /&gt;This is analogous to the situation we have with distances in physical world. With our eyes we can resolve details maybe down to 0.5mm. If we want to distinguish anything smaller we need more powerful equipment, like a magnifying class. When that fails we can get a microscope, or an electron microscope, or these days even an atomic force microscope. The more we pay, the smaller we can resolve. We can think of the cost of the equipment required to resolve two points as being a kind of measure of how close they are.&lt;br /&gt;&lt;br /&gt;We have the same with computers. To resolve 1 and 17 we need an 8-bit machine. To resolve 1 and 65537 we need a 32-bit machine. And so on. So if we adopt a measure based on cost like in the previous paragraph, there is a sense in which 1 is close to 17, but 1 is even closer to 257, and it's closer still to 65537. We have this inverted notion of closeness where numbers separated by large (in the usual sense) powers of two are close in this new sense.&lt;br /&gt;&lt;br /&gt;We have an interesting relationship between computing machines with different 'resolving' power. If we take an arithmetical computation on an N-bit machine, and then take the last M bits of the inputs and result, we get exactly what the M-bit machine would have computed. So an M-bit machine can be thought of as a kind of window onto the last M-bits onto an N-bit machine. Here's a sequence of machines:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/S_GRYTr3mQI/AAAAAAAAAhI/uYvPUGFKDzE/s1600/n-bit.001.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/S_GRYTr3mQI/AAAAAAAAAhI/uYvPUGFKDzE/s400/n-bit.001.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5472314869019023618" /&gt;&lt;/a&gt;&lt;br /&gt;Each machine provides a window onto the low bits of the previous machine in the sequence. But what happens at the "..." on the left? That suggests the bizarre idea that maybe all of these finite machines could be thought of as window to some infinite bit machine. Does that idea make any kind of sense?&lt;br /&gt;&lt;br /&gt;I'll try to convince you that's a sensible idea by pointing out that it's something familiar to anyone who's taken a rigorous analysis course. (And I'll mention in passing that the above diagram illustrates a &lt;a href="http://en.wikipedia.org/wiki/Limit_%28category_theory%29"&gt;limit&lt;/a&gt; in an appropriate &lt;a href="http://en.wikipedia.org/wiki/Category_of_rings#Category_of_commutative_rings"&gt;category&lt;/a&gt;!)&lt;br /&gt;&lt;br /&gt;Mathematicians (often) build the real numbers from the rational numbers by a process known as completion. Consider a sequence like 1, 14/10, 141/100, 1414/1000, ... . The nth term is the largest fraction, with 10&lt;sup&gt;n&lt;/sup&gt; in the denominator, such that its square is less than 2. It's well known that there is no rational number whose square is 2. And yet it feels like this sequence ought to be converging to something. It feels this way because successive terms in the sequence get as close to each other as you like. If you pick any &amp;epsilon; there will be a term in the series, say x, with the property that later terms never deviate from x by more than &amp;epsilon;. Such a sequence is called a Cauchy sequence. But these sequences don't all converge to rational numbers. A number like &amp;radic;2 is a gap. What are we to do?&lt;br /&gt;&lt;br /&gt; Mathematicians fill the gap by defining a new type of number, the real number. These are &lt;a href="http://en.wikipedia.org/wiki/Real_number#Construction_from_the_rational_numbers"&gt;by definition&lt;/a&gt; Cauchy sequences. Now every Cauchy sequence converges to a real number because, by definition, the real number it converges to is the sequence. For this to be anything more than sleight of hand we need to prove that we can do arithmetic with these sequences. But that's just a technical detail that can be found in any analysis book. So, for example, we can think of the sequence I gave above as actually being the square root of two. In fact, the decimal notation we use to write &amp;radic;2, 1.414213..., can be thought of as shorthand for the sequence (1, 14/10, 141/100, ...).&lt;br /&gt;&lt;br /&gt;The notion of completeness depends on an an idea of closeness. I've described an alternative to the usual notion of closeness and so we can define an alternative notion of Cauchy sequence. We'll say that the sequence x&lt;sub&gt;1&lt;/sub&gt;, x&lt;sub&gt;2&lt;/sub&gt;, ... is a Cauchy sequence in the new sense if all the numbers from x&lt;sub&gt;n&lt;/sub&gt; onwards agree on their last n bits. (This isn't quite the usual definition but it'll do for here.) For example, 1, 3, 7, 15, 31, ... define a Cauchy sequence. We consider a Cauchy sequence equal to zero if x&lt;sub&gt;n&lt;/sub&gt; always has zero for its n lowest bits. So 2, 4, 8, 16, 32, ... is a representation of zero. We can add, subtract and multiply Cauchy sequences pointwise, so, for example, the product and sum of x&lt;sub&gt;n&lt;/sub&gt; and y&lt;sub&gt;n&lt;/sub&gt; has terms x&lt;sub&gt;n&lt;/sub&gt;y&lt;sub&gt;n&lt;/sub&gt;. Two Cauchy sequences are considered equal if their difference is zero. These numbers are called 2-adic integers.&lt;br /&gt;&lt;br /&gt;Exercise: prove that if x is a 2-adic integer then x+0=x and that 0x=0.&lt;br /&gt;&lt;br /&gt;There's another way of looking at 2-adic integers. They are infinite strings of binary digits, extending to the left. The last n digits are simply given by the last n digits of x&lt;sub&gt;n&lt;/sub&gt;. For example we can write 1, 3, 7, 31, ... as ...1111111. Amazingly we can add subtract and multiply these numbers using the obvious extensions of the usaul algorithms. Let's add ...1111111 to 1:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;...11111111&lt;br /&gt;...00000001&lt;br /&gt;-----------&lt;br /&gt;...00000000&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We get a carry of 1 that ripples off to infinity and gives us zeroes all the way.&lt;br /&gt;&lt;br /&gt;We can try doing long multiplication of ...111111 with itself. We get:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;...11111111&lt;br /&gt;...1111111 &lt;br /&gt;...111111  &lt;br /&gt;...11111   &lt;br /&gt;...&lt;br /&gt;-----------&lt;br /&gt;...00000001&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It's important to notice that even though there are an infinite number of rows and columns in that multiplication you only need to multiply and add a finite number of numbers to get any digit of the result. If you don't like that infinite arrangement you can instead compute the last n digits of the product by multiplying 11...n digits...111 by itself and taking the last n digits. The infinite long multiplication is really the same as doing this for all n and organising it in one big table.&lt;br /&gt;&lt;br /&gt;So ...1111111 has many of the properties we expect of -1. Added to 1 we get zero and squaring it gives 1. It is -1 in the 2-adic integers. This gives us a new insight into twos complement arithmetic. The negative &lt;a href="http://en.wikipedia.org/wiki/Two's_complement"&gt;twos-complements&lt;/a&gt; are the truncated last n digits of the 2-adic representations of the negative integers. We should properly be thinking of twos-complement numbers as extending out to infinity on the left.&lt;br /&gt;&lt;br /&gt;The field of analysis makes essential use of the notion of closeness with its &amp;delta; and &amp;epsilon; proofs. Many theorems from analysis carry over to the 2-adic integers. We find ourselves in a strange alternative number universe which is a sort of mix of analysis and number theory. In fact, people have even tried studying physics in p-adic universes. (p-adics are what you get when you repeat the above for base p numbers, but I don't want to talk about that now.) One consequence of analysis carrying over is that some of our intuitions about real numbers carry over to the 2-adics, even though some of our intuitive geometric pictures seem like they don't really apply. I'm going to concentrate on one example.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Newton-Raphson Method&lt;/b&gt;&lt;br /&gt;I hope everyone is familiar with the Newton-Raphson method for solving equations. If we wish to solve f(x)=0 we start with an estimate x&lt;sub&gt;n&lt;/sub&gt;. We find the tangent to y=f(x) at x=x&lt;sub&gt;n&lt;/sub&gt;. The tangent line is an approximation to the curve y=f(x) so we solve the easy problem of finding where the tangent line crosses the x-axis to get a new estimate x&lt;sub&gt;n+1&lt;/sub&gt;. This gives the formula&lt;br /&gt;&lt;br /&gt;x&lt;sub&gt;n+1&lt;/sub&gt; = x&lt;sub&gt;n&lt;/sub&gt;-f(x&lt;sub&gt;n&lt;/sub&gt;)/f'(x&lt;sub&gt;n&lt;/sub&gt;).&lt;br /&gt;&lt;br /&gt;With luck the new estimate will be closer than the old one. We can do some &lt;a href="http://en.wikipedia.org/wiki/Newton's_{m}ethod#Analysis"&gt;analysis&lt;/a&gt; to get some sufficient conditions for convergence.&lt;br /&gt;&lt;br /&gt;The surprise is this: the Newton-Raphson method often works very well for the 2-adic integers even though the geometric picture of lines crossing axes doesn't quite make sense. In fact, it often works much better than with real numbers allowing us to state very precise and easy to satisfy conditions for convergence.&lt;br /&gt;&lt;br /&gt;Now let's consider the computation of reciprocals of real numbers. To find 1/a we wish to solve f(x)=0 where f(x)=1/x-a. Newton's method gives the iteration x&lt;sub&gt;n+1&lt;/sub&gt; = x&lt;sub&gt;n&lt;/sub&gt;(2-ax&lt;sub&gt;n&lt;/sub&gt;). This is a well know iteration that is used internally by CPUs to compute reciprocals. But for it to work we need to start with a good estimate. The famous &lt;a href="http://en.wikipedia.org/wiki/Pentium_FDIV_bug"&gt;Pentium divide bug&lt;/a&gt; was a result of it using an incorrect lookup table to provide the first estimate. So let's say we want to find 1/7. We might start with an estimate like 0.1 and quickly get estimates 0.13, 0.1417, 0.142848, ... . It's converging to the familiar 0.142857...&lt;br /&gt;&lt;br /&gt;But what happens if we start with a bad estimate like 1. We get the sequence:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;1&lt;br /&gt;-5&lt;br /&gt;-185&lt;br /&gt;-239945&lt;br /&gt;-403015701065&lt;br /&gt;-1136951587135200126341705&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;It's diverging badly. But now let's look at the binary:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;00000000000000000000000000000000000000000000000000000000000000000000000000000000001&lt;br /&gt;11111111111111111111111111111111111111111111111111111111111111111111111111111111011&lt;br /&gt;11111111111111111111111111111111111111111111111111111111111111111111111111101000111&lt;br /&gt;11111111111111111111111111111111111111111111111111111111111111111000101011010110111&lt;br /&gt;11111111111111111111111111111111111111111111010001000101010011001000110110110110111&lt;br /&gt;11100001111001111011011101100100000010000000011011010110110110110110110110110110111&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Our series may be diverging rapidly in the usual sense, but amazingly it's converging rapidly in our new 2-adic sense!&lt;br /&gt;&lt;br /&gt;If it's really converging to a meaningful reciprocal we'd expect that if we multiplied the last n digits of these numbers by 7 then we'd get something that agreed with the number 1 in the last 7 digits. Let's take the last 32 digits:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;10110110110110110110110110110111&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and multiply by 7:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;10100000000000000000000000000000001&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The last 32 bits are&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;00000000000000000000000000000001.&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So if we're using 32 bit registers, and we're performing multiplication, addition and subtraction, then this number is, to all intents and purposes, a representation of 1/7. If we interpret as a twos complements number, then in decimal it's -1227133513. And that's the mysterious number gcc generated.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Epilogue&lt;/b&gt;&lt;br /&gt;There are many things to follow up with. I'll try to be brief.&lt;br /&gt;&lt;br /&gt;Try compiling C code with a struct of size 14. You'll notice some extra bit shifting going on. So far I've only defined the 2-adic integers. But to get he reciprocal of every non-zero number we need numbers whose digits don't just extend leftwards to infinity but also extend a finite number of steps to the right of the "binary point". These are the full 2-adic numbers as opposed to merely the 2-adic integers. That's how the extra shifts can be interpreted. Or more simply, if you need to divide by 14 you can divide by 2 first and then use the above method to divide by 7.&lt;br /&gt;&lt;br /&gt;I don't know how gcc generates its approximate 2-adic reciprocals. Possibly it uses something based on the Euclidean GCD algorithm. I wasn't able to find the precise line of source in a reasonable time.&lt;br /&gt;&lt;br /&gt;An example of a precise version of the Newton-Raphson method for the p-adics is the &lt;a href="http://en.wikipedia.org/wiki/Hensel's_lemma"&gt;Hensel lemma&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The last thing I want to say is that all of the above is intended purely to whet your appetite and point out that a curious abstraction from number theory has an application to compiler writing. It's all non-rigourous and hand-wavey. Recommend reading further at &lt;a href="http://en.wikipedia.org/wiki/P-adic_number"&gt;Wikipedia&lt;/a&gt;. I learnt most of what I know on the subject from the first few chapters of Koblitz's book &lt;a href="http://www.amazon.com/gp/product/0387960171?ie=UTF8&amp;tag=sigfpe-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0387960171"&gt;p-adic Numbers, p-adic Analysis, and Zeta functions&lt;/a&gt;. The proof of the &lt;a href="http://en.wikipedia.org/wiki/Von_Staudt%E2%80%93Clausen_theorem"&gt;von Staudt–Clausen theorem&lt;/a&gt; in that book is mindblowing. It reveals that the real numbers and the p-adic numbers are equally valid ways to approximately get a handle on rational numbers and that there are whole alternative p-adic universes out there inhabited by weird versions of familiar things like the Riemann zeta function.&lt;br /&gt;&lt;br /&gt;(Oh, and please don't take the talk of CPUs too literally. I'm fully aware that you can represent big numbers even on a 4 bit CPU. But what what I say about a model of computation restricted to multiplication, addition and subtraction in single n-bit registers holds true.)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Some Challenges&lt;/b&gt;&lt;br /&gt;1. Prove from first principles that the iteration for 1/7 converges. Can you prove how many digits it generates at a time?&lt;br /&gt;2. Can you find a 32 bit square root of 7? Using the Newton-Raphson method? Any other number? Any problems?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Acknowledgements&lt;/b&gt;&lt;br /&gt;Update: I have replaced the images with images that are, to the best of my knowledge, public domain, or composited from public domain images. (Thanks jisakujien. My bad.)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt;&lt;br /&gt;If you want to play with some p-adics yourself, there is some code to be found &lt;a href="http://www.polyomino.f2s.com/david/haskell/p-adic.html"&gt;here&lt;/a&gt;. That also has code for transcendental functions applied to p-adics.&lt;br /&gt;&lt;br /&gt;Here's some C code to compute inverses of odd numbers modulo 2&lt;sup&gt;32&lt;/sup&gt; (assuming 32 bit ints). Like the real valued Newton  method, it doubles the number of correct digits at each step so we only need 5 iterations to get 32 bits. (C code as I think it's traditional to twiddle one's bits in C rather than Haskell.)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#include &amp;lt;stdio.h&amp;gt;&lt;br /&gt;#include &amp;lt;assert.h&amp;gt;&lt;br /&gt;&lt;br /&gt;typedef unsigned int uint;&lt;br /&gt;&lt;br /&gt;uint inverse(uint x)&lt;br /&gt;{&lt;br /&gt;   uint y = 2-x;&lt;br /&gt;   y = y*(2-x*y);&lt;br /&gt;   y = y*(2-x*y);&lt;br /&gt;   y = y*(2-x*y);&lt;br /&gt;   y = y*(2-x*y);&lt;br /&gt;&lt;br /&gt;   return y;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int main()&lt;br /&gt;{&lt;br /&gt;   uint i;&lt;br /&gt;   for (i = 1; i&amp;lt;0xfffffffe; i += 2)&lt;br /&gt;   {&lt;br /&gt;       assert (i*inverse(i) == 1);&lt;br /&gt;   }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;HR&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;bc1=000000&amp;IS2=1&amp;bg1=FFFFFF&amp;fc1=000000&amp;lc1=0000FF&amp;t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;m=amazon&amp;f=ifr&amp;md=10FE9736YVPPT7A0FBG2&amp;asins=0387960171" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-7790745507221255597?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/7790745507221255597/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=7790745507221255597' title='24 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7790745507221255597'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/7790745507221255597'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/05/optimising-pointer-subtraction-with-2.html' title='Optimising pointer subtraction with 2-adic integers.'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_UdKHLrHa05M/S-8sGYPI4sI/AAAAAAAAAg0/R9cP5tVUW8o/s72-c/4004.jpg' height='72' width='72'/><thr:total>24</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-6369504533228653968</id><published>2010-04-24T08:47:00.000-07:00</published><updated>2010-04-26T07:40:39.521-07:00</updated><title type='text'>On representing some real numbers exactly</title><content type='html'>There are uncountably many real numbers, and only countably many finite strings of symbols, so we know for sure that there is no scheme to represent all real numbers as finite strings of symbols in such a way that different reals get different representations. However, there are useful subsets of the real numbers that *are* equipped with finite encoding schemes. I hope to give some examples of such sets, including one, the set of periods, that isn't currently well known outside the mathematical world.&lt;br /&gt;&lt;br /&gt;It's traditional to think of the real numbers as divided up into three types of number: the rational numbers, the algebraic numbers, and the leftovers known collectively as the transcendental numbers. If you're not familiar with those terms I'll be explaining them below. We can represent this as follows:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&amp;#x211a;&amp;sub;A&amp;sub;&amp;#x211d;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;I'm using unicode for blackboard bold R and Q there. I hope your browser supports them. &amp;#x211d; is the set of real numbers, &amp;#x211a; is the set of rational numbers, and A is the algebraic numbers.&lt;br /&gt;&lt;br /&gt;But there are many different kinds of transcendental number in existence, so the subject of this article will be three more levels of classification we could insert between the set of algebraic numbers and the set of real numbers in the sequence above.&lt;br /&gt;&lt;br /&gt;But first, let's revisit &amp;#x211a; and A.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Rational Numbers, &amp;#x211a;&lt;/b&gt;&lt;br /&gt;The rational numbers are fairly straightforward to deal with. They are nothing more than the ratios of integers. Given two such ratios we can easily compare them to see whether they are equal. For example, we're taught at a young age how to tell if 2/3 is the same as 4/6.&lt;br /&gt;&lt;br /&gt;However, it didn't take long for the Ancient Greeks to realise that they had some equations they could solve approximately using rationals, but couldn't solve exactly. The best known example is solving x&lt;sup&gt;2&lt;/sup&gt;=2.&lt;br /&gt;&lt;br /&gt;Assume x is rational, so that x=p/q, with p and q integers with highest factors cancelled out. We have p&lt;sup&gt;2&lt;/sup&gt;=2q&lt;sup&gt;2&lt;/sup&gt;, and so p must be divisible by 2. But then p&lt;sup&gt;2&lt;/sup&gt; is divisible by 4, and so q is divisible by 2. This means that we can cancel 2 from the top and bottom of p/q. That contradicts our assumption about cancellation. So the equation has no rational solution.&lt;br /&gt;&lt;br /&gt;If we expect x&lt;sup&gt;2&lt;/sup&gt;=2 to have a solution we must work with numbers that can't be represented as fractions.&lt;br /&gt;&lt;br /&gt;Notice how the rational numbers come equipped with a notation to describe them. We can just write one integer over another, with a horizontal line between them. Given such a description we can immediately tell if it's valid (it's only invalid if the denominator is 0) and whether it equals some other rational number.&lt;br /&gt;&lt;br /&gt;One interesting property of the rational numbers is that they are *dense* in the real numbers. This is just another way of saying that any real number can be approximated as well as we like by a rational number. For example, we can approximate &amp;radic;2 to within 1/1000 by the rational number 141421/10000. This is an important property that will come up again later.&lt;br /&gt;&lt;br /&gt;We also know that there are only &lt;a href="http://en.wikipedia.org/wiki/Countable_set"&gt;countably many&lt;/a&gt; rational numbers because there are only countably many pairs of integers.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Real Algebraic Numbers, A&lt;/b&gt;&lt;br /&gt;The real algebraic numbers are all the real numbers that we can obtain by finding roots of polynomial equations whose coefficients are rational numbers. For example, the real solutions to x&lt;sup&gt;5&lt;/sup&gt;-2x&lt;sup&gt;2&lt;/sup&gt;+&amp;#xbd;=0 are all algebraic numbers. Among other things, this includes the solution to x&lt;sup&gt;2&lt;/sup&gt;=2 that wasn't contained in the rationals. The algebraic real numbers also include the solutions to "&lt;a href="http://en.wikipedia.org/wiki/Galois_theory#A_non-solvable_quintic_example"&gt;insoluble&lt;/a&gt;" algebraic equations. (By 'algebraic' equations, I just mean polynomial equations.) Even though Galois proved we can't write expressions for the solutions to all algebraic equations using the four arithmetic operations and nth roots, these solutions still exist in the real numbers, and they are all, by definition, algebraic numbers.&lt;br /&gt;&lt;br /&gt;We can describe an algebraic number by writing down the equation it solves, and additionally providing some description to say which root of the equation we're interested in. Unfortunately, given any algebraic number there is always an infinite number of algebraic equations that it satisfies. So like with the rationals there is some redundancy. The good news, however, is that given two such descriptions of algebraic numbers there is an algorithm to tell whether or not they describe equal numbers.&lt;br /&gt;&lt;br /&gt;Even though there are vastly more algebraic numbers than rational numbers, they still only form a countable set. As mentioned in the previous paragraph, every algebraic number can be described by a finite string of symbols, and there are only countably many such strings.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Computable Real Numbers, C&lt;/b&gt;&lt;br /&gt;Now I want to insert a set between A and &amp;#x211d;. These are the computable real numbers. But first think back to what I said about the rational numbers.&lt;br /&gt;&lt;br /&gt;Pick a real number x. Given any &amp;epsilon; we can always find a rational number that comes within &amp;epsilon; of x. We can define a function f on the rationals with the property that f(&amp;epsilon;) is rational and within &amp;epsilon; of x for any rational &amp;epsilon;. So f gives rational approximations to x to any desired accuracy. Such a function uniquely specifies x. In fact x is the limit of f(&amp;epsilon;) as &amp;epsilon; tends to zero. The computable real numbers are the real numbers specified in this way by *computable* functions. Notice how there are no infinities involved here. A computable real number is represented by nothing more than a finite string of symbols forming a computer program that processes finite integers. We can find out what the real number is to any desired accuracy in a finite time.&lt;br /&gt;&lt;br /&gt;Because we can write computer programs to find roots of algebraic equations to any desired accuracy we know that the algebraic numbers are contained in the computables. So now we have:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&amp;#x211a;&amp;sub;A &amp;sub;C &amp;sub;&amp;#x211d;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Where I'm using C to represent the computables.&lt;br /&gt;&lt;br /&gt;Almost any number we ever want to work with is computable. Whether it's &amp;radic;2, or &amp;pi;, or the value of some humongous integral that's appeared in your engineering problem, chances are there's an algorithm somewhere that can approximate it as accurately as you like, given enough CPU time and RAM.&lt;br /&gt;&lt;br /&gt;Note that every computable is described by a finite string of symbols, and so the computables form a countable set.&lt;br /&gt;&lt;br /&gt;There are some problems with computable numbers. Given two representations of computable numbers we have no way of telling whether or not they are equal. It's not just that this is hard to do. There simply is no algorithm to prove that the numbers are equal. The problem is that the two computer programs may give exactly the same approximations to every degree of accuracy down to a certain &amp;epsilon;. But for higher accuracy they may give different results. We have no way of knowing in advance at what value of &amp;epsilon; they'll start differing. And anyway, there simply is no algorithm for telling if two computer programs will always generate the same results. So paradoxically, testing the equality of two computable numbers is itself uncomputable.&lt;br /&gt;&lt;br /&gt;Actually, the situation is much worse. We can't even tell if we have a valid representation of a computable real. In order to do that, we need to know that our computer program to generate approximations terminates. But solving that would solve the halting problem.&lt;br /&gt;&lt;br /&gt;By the way, despite these issues this representation of computable real numbers as functions leads to &lt;a href="http://www.haskell.org/haskellwiki/Exact_real_arithmetic"&gt;practical&lt;/a&gt; ways to do arbitrary precision arithmetic on a computer in a way that gives us guaranteed bounds on the accuracy of our results.&lt;br /&gt;&lt;br /&gt;From here we could go two ways. We could try to rein in our class of number to something more manageable. Or we could try to push further and try to find ways to represent real numbers that aren't even computable. Let's do the former first.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Periods, P&lt;/b&gt;&lt;br /&gt;The periods are an interesting class of real number that doesn't seem to be well known. The definition may be old, but it was only recently that they were being &lt;a href="http://inc.web.ihes.fr/prepub/PREPRINTS/M01/M01-22.ps.gz"&gt;promoted&lt;/a&gt; in mathematical circles as an interesting thing to study. So I thought I'd contribute to their promotion.&lt;br /&gt;&lt;br /&gt;Consider the number &amp;pi;. It arises straightforwardly in geometric problems. An example is computing the area of a circle. Back in the 18th century Lambert proved that it was irrational and in the 19th century Lindemann proved that it was transcendental. But in a sense the transcendental numbers are simply a rubbish heap into which the leftover numbers have been discarded. Can we get some kind of handle on at least some of these numbers in a way that puts &amp;pi; back on the map?&lt;br /&gt;&lt;br /&gt;The real number &amp;pi; is the area of a unit circle. So it is the area of the region of the plane given by the equation x&lt;sup&gt;2&lt;/sup&gt;+y&lt;sup&gt;2&lt;/sup&gt;&amp;lt;1.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_UdKHLrHa05M/S9MTYyeZqXI/AAAAAAAAAgM/LmAl70kuvaA/s1600/circle.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 200px;" src="http://2.bp.blogspot.com/_UdKHLrHa05M/S9MTYyeZqXI/AAAAAAAAAgM/LmAl70kuvaA/s200/circle.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5463732089517615474" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here's a similar construction of another real number:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/S9MbpHqnAUI/AAAAAAAAAgU/iGLIzS1re8E/s1600/log2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 94px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/S9MbpHqnAUI/AAAAAAAAAgU/iGLIzS1re8E/s200/log2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5463741166176895298" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The area of that region is log(2). (Natural logarithm of course!) We can construct the logarithm of any positive rational number in a similar way. These are all transcendental numbers. But in a way they are nice transcendental numbers. They arise from considering areas of the plane described by straightforward algebraic inequalities. (As in the previous section, I'm using 'algebraic inequalities' to mean 'polynomial inequalities'.)&lt;br /&gt;&lt;br /&gt;That suggests a class of real number: those numbers that can be represented as the volume of a region defined by a bunch of algebraic inequalities with rational coefficients. These are known as the periods. I'm talking about generalised volumes in n-dimensions, not just areas in the plane. Clearly &amp;pi; and log(2) are periods.&lt;br /&gt;&lt;br /&gt;But does this representation solve the problem we had with computable numbers where we were unable to guarantee we could check the equality of two numbers? Now it gets interesting. The answer is: we don't know. It is conjectured that if we have the same number represented in two different ways as a period that we can transform one representation into another simply by using a small set of elementary operations. It is also conjectured that there is a terminating algorithm for finding such a sequence of operations, or if one doesn't exist, demonstrating this.&lt;br /&gt;&lt;br /&gt;One obvious question now is this: are there any computable reals that aren't periods? A recent paper, &lt;a href="http://arxiv.org/abs/0805.0349"&gt;Periods and elementary real numbers&lt;/a&gt; claims to exhibit one by means of a kind of diagonalisation argument. But in general it's hard to prove that a number isn't a period. I don't believe there is a proof yet that e = 2.718... isn't a period, but mathematicians expect that it isn't.&lt;br /&gt;&lt;br /&gt;I think the name 'period' comes from the fact that the period of a pendulum of rational length, in a gravitational field of rational strength, is a period. Computing this requires &lt;a href="http://en.wikipedia.org/wiki/Elliptic_function"&gt;elliptic functions&lt;/a&gt;, and these often result in periods when applied to rational numbers.&lt;br /&gt;&lt;br /&gt;Although mathematics is about numbers, few mathematical publications talk much about specific real numbers. Curiously, when they do, they often talk about numbers that are periods. For example there have been many recent papers on the values of the &lt;a href="http://en.wikipedia.org/wiki/Riemann_zeta_function"&gt;Riemann zeta function&lt;/a&gt; for interesting arguments. These are often periods.&lt;br /&gt;&lt;br /&gt;It has been suggested that the study of periods is actually the study of algebraic geometry in disguise. There is certainly no end to the interesting mathematics we can do using only periods.&lt;br /&gt;&lt;br /&gt;We now have:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;Q &amp;sub;A &amp;sub;P &amp;sub;C &amp;sub;R&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Definables&lt;/b&gt;&lt;br /&gt;In the section on computable numbers I mentioned that we could go the other way and try to find a bigger class of real numbers that could be represented by finite strings of symbols. We could simply go "all the way" and consider those real numbers that can be defined, by any means possible, using the symbols of mathematics. We can try to pin this down a bit better. We'll work with the language of set theory. In this language we can write strings of symbols like S = "x&amp;gt;0 and x&lt;sup&gt;2&lt;/sup&gt;=2". Such a string uniquely defines a real number if when we glue the string "there exists a unique x such that" onto the beginning of it we get a true proposition. We can now represent the number &amp;radic;2 as the string S.&lt;br /&gt;&lt;br /&gt;It looks like we now have:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;Q &amp;sub;A &amp;sub;P &amp;sub;C &amp;sub;D &amp;sub;R&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;with D the set of &lt;a href="http://en.wikipedia.org/wiki/Definable_real_number"&gt;definables&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Except there's a problem. The set D is not definable! The problem is this: we can talk about strings of symbols representing real numbers. To talk about these in set theory we'd encode these strings of symbols as sets so that we can apply the language of set theory. In order to use our attempted definition of definable we need some way to say when a string of symbols represents a true proposition. But to define a set of definables we need to talk about true propositions in the language of set theory. Godel showed us how to talk about the provability of a proposition within set theory. He did this by showing that provability is about mechanical operations we can perform on strings. But there's nothing analogous for talking about the truth of propositions. In fact, Tarski showed us this is &lt;a href="http://en.wikipedia.org/wiki/Tarski's_undefinability_theorem"&gt;impossible&lt;/a&gt;. So while we can talk about all kinds of individual numbers as being definable, we can't construct the set of definable numbers.&lt;br /&gt;&lt;br /&gt;You may be interested to see an example of a definable number that isn't computable. Probably the most publicised example is &lt;a href="http://en.wikipedia.org/wiki/Tarski's_undefinability_theorem"&gt;Chaitin's constant&lt;/a&gt;. It represents the probability that a randomly generated string of symbols is a computer program that terminates in a finite time. We can't actually compute this number because it requires us to solve the &lt;a href="http://en.wikipedia.org/wiki/Halting_problem"&gt;halting problem&lt;/a&gt;. Nonetheless, it's perfectly well defined.&lt;br /&gt;&lt;br /&gt;You can find the original paper on periods by Zagier and Kontsevich &lt;a href="http://inc.web.ihes.fr/prepub/PREPRINTS/M01/M01-22.ps.gz"&gt;here&lt;/a&gt;. I first found out about periods from the book &lt;a href="http://www.amazon.com/gp/product/3540669132?ie=UTF8&amp;tag=sigfpe-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=3540669132"&gt;Mathematics Unlimited&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Update: Jared asked if I intended the real algebraic numbers. That's what I said in the text, but I confusingly wrote &lt;span style="text-decoration: overline"&gt;&amp;#x211a;&lt;/span&gt; for this set. But that usually means the *complex* algebraic numbers. So I've changed notation and now use A for the real algebraic numbers.&lt;br /&gt;&lt;br /&gt;Similarly, the periods are usually defined to be complex numbers whose real and imaginary parts are given by algebraically specified volumes. I was trying to avoid mention of the complex numbers to keep prerequisites to a minimum and the essential ideas here work without mention of the complex numbers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-6369504533228653968?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/6369504533228653968/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=6369504533228653968' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6369504533228653968'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/6369504533228653968'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/04/on-representing-some-real-numbers.html' title='On representing some real numbers exactly'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_UdKHLrHa05M/S9MTYyeZqXI/AAAAAAAAAgM/LmAl70kuvaA/s72-c/circle.png' height='72' width='72'/><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-2295616127977346642</id><published>2010-03-28T14:38:00.000-07:00</published><updated>2010-03-29T08:22:24.876-07:00</updated><title type='text'>A Partial Ordering of some Category Theory applied to Haskell</title><content type='html'>I've had a few requests from people wanting to teach themselves applications of Category Theory to Haskell based on posts I've made. I've made things difficult by posting stuff at random levels of difficulty and without any kind of organising thread through them. So here's an attempt to list a bunch of posts related to aspects of category theory. I've grouped them by themes and within each theme I've tried to list the articles in order of difficulty. Unfortunately there can be big gaps between one article and the next as none of this material was intended to be linked together continuously. Nonetheless, I hope this is of some help.&lt;br /&gt;&lt;br /&gt;First a warning: &lt;a href="http://blog.sigfpe.com/2006/03/category-theory-screws-you-up.html"&gt;Category Theory Screws You Up!&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Monads&lt;/b&gt;&lt;br /&gt;The first theme has to be monads of course. But don't forget: monads don't do anything. They're simply an interface to something that you must already have implemented some other way. So don't believe all that hype about how monads are what allow Haskell to use side effects and I/O.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;If you just want to use &lt;tt&gt;IO&lt;/tt&gt; and don't care about monads: &lt;a href="http://blog.sigfpe.com/2007/11/io-monad-for-people-who-simply-dont.html"&gt;The IO Monad for People who Simply Don't Care&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;This gets more hits than anything else on my blog: &lt;a href="http://blog.sigfpe.com/2006/08/you-could-have-invented-monads-and.html"&gt;You Could Have Invented Monads! (And Maybe You Already Have.)&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2007/04/trivial-monad.html"&gt;The Trivial Monad&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2007/04/homeland-security-threat-level-monad.html"&gt;Homeland Security Threat Level Monad&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2006/06/monads-kleisli-arrows-comonads-and.html"&gt;Monads, Kleisli Arrows, Comonads and other Rambling Thoughts&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The idea that monads give a way to describe substitutions in a tree forms the basis for this and the next two posts: &lt;a href="http://blog.sigfpe.com/2006/11/variable-substitution-gives.html"&gt;Variable substitution gives a...&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2010/01/monads-are-trees-with-grafting.html"&gt;Monads are Trees with Grafting&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2009/12/where-do-monads-come-from.html"&gt;Where do monads come from?&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2008/11/from-monoids-to-monads.html"&gt;From Monoids to Monads&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2008/11/some-thoughts-on-reasoning-and-monads.html"&gt;Some thoughts on reasoning and monads&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2008/12/mother-of-all-monads.html"&gt;The Mother of all Monads&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2009/02/beyond-monads.html"&gt;Beyond Monads&lt;/a&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Fold and Unfold&lt;/b&gt;&lt;br /&gt;I think this is one of the great applications of Category Theory to Computer Science. Structural recursion can be characterised really nicely in terms of F-algebras. That's cool. But even cooler is that when you dualise the definitions you get a great way to look at non-terminating computations on things like streams.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2007/07/data-and-codata.html"&gt;Data and Codata&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2006/03/coalgebras-and-automata.html"&gt;Coalgebras and Automata&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2008/02/purely-functional-recursive-types-in.html"&gt;Purely functional recursive types in Haskell and Python&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Commutative Monads and Vector Spaces&lt;/b&gt;&lt;br /&gt;Trying to order these is tricky. I'm not sure I define the term commutative monad until the talk.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;I realised a while back that the operation to build a vector space from a basis is a monad. In fact, like many well known algebraic structures, we get a commutative monad. &lt;a href="http://blog.sigfpe.com/2007/02/monads-for-vector-spaces-probability.html"&gt;Monads for vector spaces, probability and quantum mechanics pt. I&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2007/03/monads-vector-spaces-and-quantum.html"&gt;Monads, Vector Spaces and Quantum Mechanics pt. II&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2009/05/trace-diagrams-with-monads.html"&gt;Trace Diagrams with Monads&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;A talk this time: &lt;a href="http://vimeo.com/6590617"&gt;Commutative Monads, Diagrams and Knots&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;And what happens when you try to use a non-commutative monad when a commutative monad is expected: &lt;a href="http://blog.sigfpe.com/2006/11/why-isnt-listt-monad.html"&gt;Why isn't ListT a monad?&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Comonads&lt;/b&gt;&lt;br /&gt;I'm not sure the killer application for comonads has been found yet. But I do think they're good for things like dataflow and cellular automata fit the comonad model very well:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2006/12/evaluating-cellular-automata-is.html"&gt;Evaluating cellular automata is comonadic&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2008/03/comonadic-arrays.html"&gt;Comonadic Arrays&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2007/02/comonads-and-reading-from-future.html"&gt;Comonads and reading from the future&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2007/01/monads-hidden-behind-every-zipper.html"&gt;The Monads Hidden Behind Every Zipper&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2008/06/categories-of-polynomials-and-comonadic.html"&gt;Categories of polynomials and comonadic plumbing&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Category Theory&lt;/b&gt;&lt;br /&gt;And these are generally categorical articles&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2008/05/you-could-have-defined-natural.html"&gt;You Could Have Defined Natural Transformations&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2006/11/yoneda-lemma.html"&gt;Reverse Engineering Machines with the Yoneda Lemma&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2006/12/yonedic-addendum.html"&gt;A Yonedic Addendum&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Applying the Yoneda lemma to memoize functions that at first seem unmemoizable: &lt;a href="http://blog.sigfpe.com/2009/11/memoizing-polymorphic-functions-with.html"&gt;Memoizing Polymorphic Functions with High School Algebra and Quantifiers&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2010/03/products-limits-and-parametric.html"&gt;Products, Limits and Parametric Polymorphism&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://blog.sigfpe.com/2009/03/dinatural-transformations-and-coends.html"&gt;Dinatural Transformations and Coends&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The goal here was not to understand what Category Theory applies to Haskell, but how Haskell code can be interpreted in other categories: &lt;a href="http://blog.sigfpe.com/2009/10/what-category-do-haskell-types-and.html"&gt;"What Category do Haskell Types and Functions Live In?"&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;A first step towards 2-category theory: &lt;a href="http://blog.sigfpe.com/2008/05/interchange-law.html"&gt;The Interchange Law&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;HR&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;bc1=000000&amp;IS2=1&amp;bg1=FFFFFF&amp;fc1=000000&amp;lc1=0000FF&amp;t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;m=amazon&amp;f=ifr&amp;md=10FE9736YVPPT7A0FBG2&amp;asins=1441931236" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;bc1=000000&amp;IS2=1&amp;bg1=FFFFFF&amp;fc1=000000&amp;lc1=0000FF&amp;t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;m=amazon&amp;f=ifr&amp;md=10FE9736YVPPT7A0FBG2&amp;asins=B0035BMK5Y" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-2295616127977346642?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/2295616127977346642/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=2295616127977346642' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2295616127977346642'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/2295616127977346642'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/03/partial-ordering-of-some-category.html' title='A Partial Ordering of some Category Theory applied to Haskell'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-361364758521479897</id><published>2010-03-06T09:26:00.000-08:00</published><updated>2010-03-07T08:21:55.902-08:00</updated><title type='text'>Products, Limits and Parametric Polymorphism</title><content type='html'>&lt;pre&gt;&lt;br /&gt;&amp;gt; {-# LANGUAGE RankNTypes #-}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;When I wrote about &lt;a href="http://blog.sigfpe.com/2009/11/memoizing-polymorphic-functions-with.html"&gt;memoizing polymorphic types&lt;/a&gt; I mentioned that you can think of &lt;tt&gt;forall a. F(a)&lt;/tt&gt; as the product over all types &lt;tt&gt;a&lt;/tt&gt; of &lt;tt&gt;F(a)&lt;/tt&gt;, where &lt;tt&gt;F&lt;/tt&gt; is some type level function. For example &lt;tt&gt;F&lt;/tt&gt; might be a type constructor like &lt;tt&gt;[]&lt;/tt&gt;. That's not completely accurate, as I hope to now explain. Along the way we should get some insight into the meaning of the limit of a functor in a category.&lt;br /&gt;&lt;br /&gt;Suppose we have two types, &lt;tt&gt;A&lt;/tt&gt; and &lt;tt&gt;B&lt;/tt&gt;. We can form their product &lt;tt&gt;(A, B)&lt;/tt&gt;. We have the two projections &lt;tt&gt;fst&lt;/tt&gt; and &lt;tt&gt;snd&lt;/tt&gt; and if we have an element &lt;tt&gt;x&lt;/tt&gt; in &lt;tt&gt;(A, B)&lt;/tt&gt; we know that there is no necessary relationship between &lt;tt&gt;fst x&lt;/tt&gt; and &lt;tt&gt;snd x&lt;/tt&gt;. We can freely choose &lt;tt&gt;x&lt;/tt&gt; so that each of &lt;tt&gt;fst x&lt;/tt&gt; and &lt;tt&gt;snd x&lt;/tt&gt; can take on any values we like in &lt;tt&gt;A&lt;/tt&gt; and &lt;tt&gt;B&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;But now consider an element of &lt;tt&gt;forall a. F(a)&lt;/tt&gt;. For each concrete type &lt;tt&gt;X&lt;/tt&gt; we have a projection &amp;pi;&lt;sub&gt;X&lt;/sub&gt;&lt;tt&gt;::forall a. F(a) -&amp;gt; F(X)&lt;/tt&gt;. So it looks like a product of all types. However, we can't freely choose elements of &lt;tt&gt;forall a. F(a)&lt;/tt&gt; so as to get any element of &lt;tt&gt;F(X)&lt;/tt&gt; we like for each choice of &lt;tt&gt;X&lt;/tt&gt;. To demonstrate this, consider an element &lt;tt&gt;x&lt;/tt&gt; of the type &lt;tt&gt;forall a. [a]&lt;/tt&gt;. For any choice of &lt;tt&gt;X&lt;/tt&gt; we get a projection. For example, picking &lt;tt&gt;X&lt;/tt&gt; to be &lt;tt&gt;Int&lt;/tt&gt; or &lt;tt&gt;String&lt;/tt&gt; gives:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; p1 :: (forall a. [a]) -&amp;gt; [Int]&lt;br /&gt;&amp;gt; p2 :: (forall a. [a]) -&amp;gt; [String]&lt;br /&gt;&amp;gt; p1 x = x&lt;br /&gt;&amp;gt; p2 x = x&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can draw a simple category diagram representing this:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_UdKHLrHa05M/S5KmK477RzI/AAAAAAAAAfw/1YOZWjNQg_g/s1600-h/diag0.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 94px;" src="http://3.bp.blogspot.com/_UdKHLrHa05M/S5KmK477RzI/AAAAAAAAAfw/1YOZWjNQg_g/s200/diag0.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5445597605456987954" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;However, &lt;tt&gt;forall a. [a]&lt;/tt&gt; comes with a &lt;a href="http://www-ps.iai.uni-bonn.de/cgi-bin/exfind.cgi"&gt;free theorem&lt;/a&gt;. For this particular type we have a free theorem that says that for any &lt;tt&gt;f :: X -&amp;gt; Y&lt;/tt&gt;, &lt;tt&gt;fmap f (p1 x) == p2 x&lt;/tt&gt;. For example consider the well known function &lt;tt&gt;show :: Int -&amp;gt; String&lt;/tt&gt;. The free theorem tells us that this diagram commutes:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/S5KmRiLGrbI/AAAAAAAAAf4/MTyvk2YRJOc/s1600-h/diag1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 107px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/S5KmRiLGrbI/AAAAAAAAAf4/MTyvk2YRJOc/s200/diag1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5445597719605718450" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;So if &lt;tt&gt;p1 x == [3]&lt;/tt&gt; then &lt;tt&gt;p2 x == ["3"]&lt;/tt&gt;. We have lost free choice. But we have lost a lot more freedom than this. We have a commuting triangle like this for absolutely any function &lt;tt&gt;f :: X -&amp;gt; Y&lt;/tt&gt;. It should be clear that there is no way we can pick elements of our list to satisfy all of these constraints. So &lt;tt&gt;x&lt;/tt&gt; must be the empty list.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Cones&lt;/b&gt;&lt;br /&gt;This scenario of having one projection for each type has a name. It's an example of a &lt;a href="http://en.wikipedia.org/wiki/Cone_%28category_theory%29"&gt;cone&lt;/a&gt;. Let's borrow the definition from Wikipedia:&lt;br /&gt;&lt;br /&gt;Let F:J&amp;rarr;C be a functor. Let N be an object of C. A cone from N to F is a family of morphisms with one morphism for each X,&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&amp;pi;&lt;sub&gt;X&lt;/sub&gt;:N&amp;rarr;F(X)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;so that for every morphism f:X&amp;rarr;Y, the following diagram commutes:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_UdKHLrHa05M/S5KmYoTCABI/AAAAAAAAAgA/F2e81FtfV_s/s1600-h/diag2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 111px;" src="http://1.bp.blogspot.com/_UdKHLrHa05M/S5KmYoTCABI/AAAAAAAAAgA/F2e81FtfV_s/s200/diag2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5445597841508663314" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;For any Haskell functor &lt;tt&gt;F&lt;/tt&gt;, the free theorem tells us that we have exactly these diagrams with &lt;tt&gt;forall a. F(a)&lt;/tt&gt; playing the role of N.  So &lt;tt&gt;forall a. F(a)&lt;/tt&gt;, with its projections to F(X), forms a cone.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Limits&lt;/b&gt;&lt;br /&gt;If &lt;tt&gt;F&lt;/tt&gt; is an instance of the Haskell &lt;tt&gt;Functor&lt;/tt&gt; type class, ie. an endofunctor on Hask, then the type &lt;tt&gt;forall a. F(a)&lt;/tt&gt; gives us a cone. But not just any old cone. I don't know how to prove this but I'm pretty sure it's true that the free theorems for a functor are the only non-trivial relations between &lt;tt&gt;F(X)&lt;/tt&gt; and &lt;tt&gt;F(Y)&lt;/tt&gt; that we're forced to obey. If that's true, then, in a sense, &lt;tt&gt;forall a. F(a)&lt;/tt&gt; is the "biggest" type satisfying the free theorems. We can make this more precise by saying that for any cone N with associated projections &amp;pi;&lt;sub&gt;X&lt;/sub&gt;, we can can map it uniquely to &lt;tt&gt;forall a. F(a)&lt;/tt&gt; so that the following diagram commutes:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_UdKHLrHa05M/S5Kl_ox1h6I/AAAAAAAAAfo/0nQvdVykQdk/s1600-h/diag3.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 182px; height: 200px;" src="http://4.bp.blogspot.com/_UdKHLrHa05M/S5Kl_ox1h6I/AAAAAAAAAfo/0nQvdVykQdk/s200/diag3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5445597412141139874" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This special kind of cone has a name. It's called a &lt;a href="http://en.wikipedia.org/wiki/Limit_%28category_theory%29"&gt;limit&lt;/a&gt;. In other words, &amp;forall;a. F(a) = lim F. In fact, this is exactly how &lt;tt&gt;Limit&lt;/tt&gt; is defined in &lt;a href="http://hackage.haskell.org/packages/archive/category-extras/latest/doc/html/Control-Functor-Limit.html"&gt;category-extras&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;In a sense you can think of a limit of a functor in any category as being like a product for which a version of the free theorems for a functor holds.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Colimits&lt;/b&gt;&lt;br /&gt;A dual story can be told for existential types and colimits. But to do this we need free theorems for existential types. I'll leave that until I've figured out a nice way to derive these free theorems...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Final words&lt;/b&gt;&lt;br /&gt;I should have written this article before I wrote one on &lt;a href="http://blog.sigfpe.com/2009/03/dinatural-transformations-and-coends.html"&gt;coends&lt;/a&gt;. Think of it as a prequel.&lt;br /&gt;&lt;br /&gt;The fact that we don't have complete freedom of choice when defining polymorphic elements in Haskell is what we mean by 'parametric' polymorphism. Instead of specifying one individual value for each type we define the elements in a uniform way. In a language like C++ we can use template specialisation to freely construct a rule for getting a value from a type using 'ad hoc' polymorphism. That freedom comes at a price - it becomes harder to reason about polymorphism.&lt;br /&gt;&lt;br /&gt;It's amazing that many definitions in category theory emerge naturally (pun fully intended) from the free theorems. I keep hoping that one day I'll find a paper on exactly what is going on here that I understand. The original free theorems &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.9875"&gt;paper&lt;/a&gt; is very uncategorical in its language.&lt;br /&gt;&lt;br /&gt;&lt;HR&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;bc1=000000&amp;IS2=1&amp;bg1=FFFFFF&amp;fc1=000000&amp;lc1=0000FF&amp;t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;m=amazon&amp;f=ifr&amp;md=10FE9736YVPPT7A0FBG2&amp;asins=0387984038" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;iframe src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;bc1=000000&amp;IS2=1&amp;bg1=FFFFFF&amp;fc1=000000&amp;lc1=0000FF&amp;t=sigfpe-20&amp;o=1&amp;p=8&amp;l=as1&amp;m=amazon&amp;f=ifr&amp;md=10FE9736YVPPT7A0FBG2&amp;asins=0262660717" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/11295132-361364758521479897?l=blog.sigfpe.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sigfpe.com/feeds/361364758521479897/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=11295132&amp;postID=361364758521479897' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/361364758521479897'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/11295132/posts/default/361364758521479897'/><link rel='alternate' type='text/html' href='http://blog.sigfpe.com/2010/03/products-limits-and-parametric.html' title='Products, Limits and Parametric Polymorphism'/><author><name>sigfpe</name><uri>http://www.blogger.com/profile/08096190433222340957</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://homepage.mac.com/sigfpe/.Pictures/Photo%20Album%20Pictures/2002-12-07%2014.53.40%20-0800/ImageDSC01397_1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_UdKHLrHa05M/S5KmK477RzI/AAAAAAAAAfw/1YOZWjNQg_g/s72-c/diag0.png' height='72' width='72'/><thr:total>18</thr:total></entry><entry><id>tag:blogger.com,1999:blog-11295132.post-1814777119392201914</id><published>2010-02-06T14:32:00.000-08:00</published><updated>2010-02-07T10:24:47.969-08:00</updated><title type='text'>The Categorification of the Naturals</title><content type='html'>A heavyweight looking title, but this post is really about nothing more than doing arithmetic.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Peano Arithmetic&lt;/b&gt;&lt;br /&gt;I've seen &lt;a href="http://www.haskell.org/haskellwiki/Type_arithmetic"&gt;many&lt;/a&gt; &lt;a href="http://okmij.org/ftp/Computation/type-arithmetics.html"&gt;articles&lt;/a&gt; on &lt;a href="http://www.sigfpe.com/Computing/peano.html"&gt;type&lt;/a&gt; level arithmetic. They all seem to share the idea that the Haskell type system can be made to perform computations by treating types as symbols that can be manipulated according to rules. But every article I have seen seems to miss the important idea that the naturals don't have to simply be empty symbols - that they are perfectly good types with elements and that the basic operations of arithmetic have nice a interpretation as functions between types. Implementing these missing pieces will also give an example of &lt;a href="http://en.wikipedia.org/wiki/Categorification"&gt;categorification&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As usual, some Haskell administration first because this post is runnable Haskell code:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; {-# LANGUAGE ScopedTypeVariables, UndecidableInstances #-}&lt;br /&gt;&amp;gt; {-# OPTIONS -fglasgow-exts #-}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here are what are commonly called (some of) the &lt;a href="http://en.wikipedia.org/wiki/Peano_axioms"&gt;Peano axioms&lt;/a&gt; defining addition and multiplication:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;1. 0+b = b&lt;br /&gt;2. Sa+b = S(a+b)&lt;br /&gt;3. 0.b = 0&lt;br /&gt;4. Sa.b = b+a.b&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;The idea is that S represents the "successor" function maping n to n+1. Using just these definitions, and induction, we can define addition and multiplication for all natural numbers. For example, 3 is represented by SSS0 and 2 by SS0 and we can compute 3+2 using&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;2+3&lt;br /&gt;= SS0+SSS0 by definition&lt;br /&gt;= S(S0+SSS0) by 2&lt;br /&gt;= S(S(0+SSS0)) by 2&lt;br /&gt;= SSSSS0 by 1&lt;br /&gt;= 5 by definition&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;But where do addition and multiplication come from? One point of view is that the natural numbers are what we get when we take finite sets but consider sets of the same size to be equal. We can do the same with finite types. The type &lt;tt&gt;Bool&lt;/tt&gt; and &lt;tt&gt;Maybe ()&lt;/tt&gt; both have two elements (ignoring bottoms) and are isomorpic. We can just consider these to be the same type, called 2. Given two types &lt;tt&gt;A&lt;/tt&gt; and &lt;tt&gt;B&lt;/tt&gt; we can form &lt;tt&gt;Either A B&lt;/tt&gt;. The number of elements in this new type is the sum of the number of elements in &lt;tt&gt;A&lt;/tt&gt; and &lt;tt&gt;B&lt;/tt&gt;. If we blur the distinction between isomorphic types we can th
