A Neighborhood of Infinity: July 2005

Friday, July 29, 2005

The Sleeping Beauty Problem

Rather than explain it I'll point you here. It looks a bit like the Monty Hall problem at first. But the Monty Hall problem is simple once you use the right method - and almost everyone agrees when it's pointed out to them. But philosophers can't actually agree on the solution to the Monty Hall problem.

There isn't really any mathematical content. But it does say something interesting about the interpretation of mathematics. In particular, what exactly do we mean when we say the probability of an event X is p? Turns out that the meaning becomes quite subtle when you know your memory is unreliable. Anyway, follow the links to find out more.

Are you a halfer or a thirder?

Thursday, July 28, 2005

Automatic Differentiation

The most important equation in mathematics is x²=0
Sir Michael Atiyah

Automatic Differentiation (also known as AD) is a great way to numerically compute derivatives of functions which is 'exact' (up to machine precision) and yet is easy to implement and can run fast. In particular it's useful for computing derivatives of functions that are implemented as many lines of code. I've seen people do crazy things like try to write their hundred lines of code as one big expression and then attempt to differentiate it symbolically in Mathematica. But in general, symbolic differentiation produces expressions that are extremely inefficient.

Here's one approach to automatic differentiation:

In a programming language which supports classes we can easily implement the complex numbers. We write complex numbers as a+ib which is typically represented in memory by a pair of machine precision floats a and b. We then implement rules for things like complex multiplication by translating rules like

(a+a'i)*(b+b'i) = ab-a'b'+(ab'+a'b)i

into code like

c0 = a0*b0-a1*b1
c1 = a0*b1+a1*b0

In a language like C++ we can then overload operators like * with an implementation that acts on our complex numbers to give a transparent way to write functions that operate on complex numbers.

Now, suppose that instead of introducing a complex number i such that i²=-1 we instead introduce d such that d²=0. Numbers of the form a+bd are known as dual numbers.
We now get rules like

(a+a'd)*(b+b'd) = ab+(ab'+a'b)d

that translates into code like

c0 = a0*b0
c1 = a0*b1+a1*b0

Note the important thing: if we implement a function f on real numbers, and then use operator overloading to allow f to act on dual numbers just as we might allow it to act on complex numbers then we automatically compute the derivative. In particular, we find that for real x

f(x+d) = f(x)+f'(x)d

When we build functions by composing our overloaded operators the d²=0 rule automagically implements the chain and product rules for us.

Once we have such classes it is essentially trivial to compute derivatives using them. This is much easier than almost any of the other methods available, and is much more accurate and reliable than finite differencing methods. For more details see the web site or even my own paper (recently published in the Journal of Graphics Tools) giving an application.

I've only mentioned first derivatives above. The method generalises to nth derivatives straightforwardly and also generalises to derivatives with respect to multiple variables. The catch is that if you have a function, f, of n variables, and your code takes time T to compute f, then computing the derivative with respect to the n variables takes time proportional to nT. But there's also another trick you can do to get this back to a time proportional to T and independent of n. This is called adjoint or reverse automatic differentiation. There is more on this at the autodiff web site.

I learnt this stuff originally from a paper by Max Jerrell.

Anyway...you now have no excuse for wasting your time with symbolic differentiation. If it takes more than one sheet of paper to differentiate your expressions you should be using AD. It's provably no harder to implement than complex numbers and yet you wouldn't hesitate to use complex numbers to evaluate expressions would you? It's not exactly rocket science. (OK, I'm lying, rocket scientists use this stuff all the time to differentiate entire fluid dynamics simulations.)

BTW Dual numbers also crop up in algebraic geometry as they give a nice algebraic way to define things like tangent spaces. The tensor product of the quaternions with the dual numbers has some nice applications to Newtonian dynamics. I've also seen fluid dynamics papers where the authors use complex numbers as an approximation to the dual numbers. In fact, there are quite a few people in aerodynamics doing this - it's called the complex step method. It's a very stupid thing to do seeing as the dual numbers give better results and are easier to implement than complex numbers. But I guess if you're writing in an ancient dialect of FORTRAN and it gives you the COMPLEX type for free then maybe it's the easiest option.

Tuesday, July 26, 2005

Exact Markov Chain Monte Carlo

Markov Chain Monte Carlo methods are well known as a way to approximately generate random variables from a probability density function even when its overall normalisation constant is unknown. One of my favourite applications is the Metropolis Light Transport algorithm used in graphics. But it also has applications ranging from nuclear physics to finance. Unfortunately it has some drawbacks. The underlying theorems tell you that in the limit if you carry out enough trials you are drawing from the correct distribution, but it doesn't tell you how many trials you need in practice and neither does it give a good way to tell if you're getting close.

On the other hand, the Propp-Wilson algorithm is a variation that can converge to the correct distribution in a finite number of steps and, more importantly, it tells you when it has converged. It's quite curious the way it uses a notion of coupling from minus infinity which is quite distinct from convergence at infinity.

A nice bit of research you can do with this algorithm is play with Aztec Diamonds, coverings of the diamond with 2 by 1 dominoes. Notice how towards the corners you have little or no choice in how you place your dominoes but as you move away from the corners you gain more freedom. The Propp-Wilson algorithm allows you to sample uniformly from the space of all coverings and you can clearly see that the patterns are typically 'frozen' outside of a circle known as the arctic circle.

The Propp-Wilson algorithm can also be used to sample from the Ising Model exactly.

One problem with the Propp-Wilson algorithm is that it takes time to converge. It's tempting for a researcher, who's getting bored for waiting for convergence, to restart the program hoping to get a faster converging sequence. But this introduces bias because the distribution of samples at the end of short chains is different from the distribution of samples from unbounded chains. Fill's algorithm deals with that. I've never before heard of an algorithm that factors in user impatience!

I learnt about this stuff from Olle Haggstrom's book Finite Markov Chains and Algorithmic Applications.

Monday, July 25, 2005

Calculators are Bad

I've just been reading this paper on caclulators.

Calculators are terrible. I own an HP48 and a TI83 and despite their incredible power I find them both horrible to use. The simplest of operations can take many keystrokes and despite using both of them for years I never reached a stage when I felt I was an experienced user. I always spend my time hunting for the next key to press. They both have a plethora of operating modes and many functions assigned to every key. I was so annoyed by all this recently that I went out and bought myself a simple $12.99 Sharp calculator from Walgreens. While it's not a great calculator, if you want to do simple things it's much easier to use than the HP48 and TI83.

On the other hand, I miss RPN and 3 days ago I was reunited with my old HP32SII after 8 years. If you look on places like Ebay and Amazon you'll see that these machines are often sold used for over $100 despite being quite 'outdated' now. For most things this calculator is a joy to use. Even though I used it very little, and not having at all for the last 8 years, I was instantly able to dive into using much of the functionality very quickly. I still have a few complaints about it: with the 32SII HP had started playing with 'algebraic notation' side by side with RPN. This gives the calculator a bit of an inconsistent schizophrenic feel, especially when dealing with expressions ie. solving equations or computing integrals. It also has only a 4 level stack, a major shortcoming, especially when complex numbers take two stack entries each.

So I'm still waiting for the ultimate calculator to arrive. I've been tempted to build my own using a microcontroller, an FPGA and an LCD display - but I've been unable to find a cheap way to make my own keyboard.

Actually, what I'd really like is a pocket machine running Haskell...

UPDATE: I just got a HP42S for nothing. The size of a normal calculator, fairly uncluttered, yet with full support for graphics, complex matrices, equation solving and even sound.

Friday, July 22, 2005

Black Hole Information Loss

Hot off the press is Hawking's preprint on Black Hole Information Loss. Doesn't seem to contain much detail though.

Anyway, I'm glad Hawking came to his senses eventually. I've always been firmly in the no information loss camp. It seems plainly obvious that if you have information loss then you aren't doing quantum mechanics because quantum mechanics is about unitary linear systems. But what do I know?

Tuesday, July 19, 2005

Mathematics Sighting

I was watching an early episode of the British comedy series Cold Feet recently. In one episode one of the characters has an urgent need for a condom. He runs into the local pub and into the men's. For about a couple of seconds we see that one wall of the toilets is completely covered in mathematics. I didn't have time to register exactly what it was though the prevalence of psis made me think it was something involving the Schrodinger equation. Unfortunately I didn't get around to rewinding to check what it was. This is a pretty weird place to find a bunch of mathematics in a sitcom. Did anyone else spot this? Anyone know of any weirder sightings of mathematics on TV? Anyone out there at all?

Monday, July 18, 2005

The Quantum Computer Condition

This looks like just what I want to read.

I'm a sceptic when it comes to quantum computers. Given a real quantum system, I expect it to diverge from an idealised mathematical description of it exponentially fast. The rate of divergence comes from the difference between the real and idealised Hamiltonians. This is generally pretty damn fast. There are lots of error correction schemes out there but they always assume the Hamiltonian has a special ideal form . Real Hamiltonians don't look like this so I expect the error correction merely to retard the exponential divergence. So I have grave doubts about whether quantum computers can do the kinds of things that many people would like them to do. I've a feeling quantum computers will eventally find a use, but it won't be anything like current researchers are doing. I don't see a rosy future for binary qubit machines outside of implementing (and not breaking) cryptography.

The above paper looks interesting as it apparently gives precise conditions for a quantum computer to function effectively in the presence of decoherence. That will help me to sharpen up (or even demolish) my arguments.

Sunday, July 17, 2005

An Unfortunate Legacy

The Bell Numbers are named after Eric Temple Bell. Unfortunately, because of the Wikipedia and Mathworld articles on Umbral Calculus, as well as the introduction to Steven Roman's book on the subject, Bell will now go down in history as the person who failed to make the subject rigorous. Poor guy. Mathematical programs that don't quite work out are generally forgotten, so I don't know what Bell did to deserve this.

But Bell is doubly unfortunate. He is the author of a book, that despite being an inspiration to many, is notorious for its unreliability. He is the same Bell that wrote Men of Mathematics.

BTW He also wrote science fiction under the pseudonym "John Taine".

Friday, July 15, 2005

Formal Power Series and Haskell

Haskell is flavour of the month for me right now. Lots of mathematical algorithms that can be quite messy in an imperative language turn out to be nearly trivial in a lazy functional language.

I thought I'd have a go at implementing formal power series in Haskell. There are some notoriously messy problems to deal with especially when implementing things like power series 'reversion', ie. given f finding the power series g s.t. f(g) = x. In other words, finding the power series of the inverse of a function defined by a power series. But in Haskell it's easy to give a recursive definitions so that like the mythical orobouros the formal series feed themselves by eating their own tails. (Well, in this case they eat their own heads to feed their tails...)

Consider defining formal power series (FPS) addition in Haskell. It's just:

a ^+ b = zipWith (+) a b

Before multiplying series let's define:

(*!) _ 0 = 0
(*!) a b = a*b

(!*) 0 _ = 0
(!*) a b = a*b

They are just 'short cut' versions of multiplication.

That's trivial enough. Now onto multiplication. Basically we use

(a₀+a₁z+a₂z²+...)(b₀+b₁z+b₂z²+...)
=a₀b₀+ a₀(b₁z+b₂z²+...)+(a₁z+a₂z²+...)(b₀+b₁z+b₂z²+...)

But the last term is just another convolution so we can write:

~(a:as) `convolve` (b:bs) = (a *! b):
((map (a !*) bs) ^+ (as `convolve` (b:bs)))

Almost as trivial.

The ~ marks an irrefutable pattern which we need later because we want to hold back our orobouros from being too eager in a later function definition.

Composition of functions is easy too.

(a₀+a₁z+a₂z²+...)( (b₀+b₁z+b₂z²+...) )
=a₀+a₁(b₀+b₁z+b₂z²+...)+a₂(b₀+b₁z+b₂z²+...)²+...
Note that we need to assume b₀=0 to avoid having to sum an infinite series.

We can write this simply as the one liner:

compose (f:fs) (0:gs) = f:(gs `convolve` (compose fs (0:gs)))

Now comes the inverse. We have the a_i and want to solve for the b_i:
z = a₀+a₁(b₀+b₁z+b₂z²+...)+a₂(b₀+b₁z+b₂z²+...)²+...
b₀+b₁z+b₂z²+... = (z-a₀-a₂(b₀+b₁z+b₂z²+...)²-a₃(...)³+...)/a₁

This looks useless at first. But note that if we know b₀...b_n we can compute the right hand side up to b_n+1 (assuming a₀=0 and a₁!=0). So if we give Haskell a little bootstrap by telling it the first couple of terms the above circular definition gives an immediate algorithm for reversion. Here's the code:

inverse (0:f:fs) = x where x = map (recip f *) (0:1:g)
_:_:g = map negate (compose (0:0:fs) x)

That's all there is to it!

In general, given any formula for x of the form x=f(x) we can use it to solve for x simply by defining x=f(x), as long as evaluating f(x) is one step ahead of x, ie. it can compute the (n+1)-th term using only the first n terms. And this doesn't just apply to algebraic formulae. We can use any function we like in the definition of f, including integrals and derivatives. We can even use tricks like the Newton-Raphson method. Except that we don't actually need to iterate more than once.

For example d/dx exp(f(x)) = exp(f(x))f'(x). So we can define exponentiation of power series via

exp(f(x)) = integral (exp(f(x))f'(x))

This translates into one line of Haskell. (Plus the one liner that defines 'integrate'). We can use other integro-differential equation formulae to give series for a wide variety of functions including hypergeometric functions.

Anyway, I now have my formal power series class defined to the point where it's an instance of class Floating. It's easy to verify things like cos²x+sin²x=1 or that the Schwarzian derivative of the ratio of solutions to f''(x)+u(x)f(x)=0 is 2f(x) for formal power series.

I have writen a small combinatorial domain specific language that uses FPSes as generating functions to count numbers of combinatorial species. I also plan to do some stuff with Sheffer sequences too.

I'll put the code on my web site some time soon.

And this is the last time I write about something that requires so many subscripts and superscripts in HTML. So no entries on supersymmetry.

Monday, July 11, 2005

Rithmomachia

Today, Rithmomachia is a little known game, but for 5 centuries it was a popular game across Europe even rivalling chess. It's similar to chess and draughts in that players take turns to move their own pieces on a grid of squares attempting to capture the opponents pieces. In Rithmomachia the board is 8x16 squares and each piece has a number on it. There have been many sets of rules published over the years but they all have one thing in common: they involve mathematical relationships between these numbers. For example in some variants you can win by arranging numbered pieces in an arithmetic, geometric, or harmonic progression. Rithmomachia is a challenging game to play and it seems astonishing that so many people were able to play. But this type of arithmetic formed an important part of the Quadrivium and so would have been part of the knowledge of any well educated person in medieval times.

Anyway, there's no point me writing about it as there's plenty of stuff out there already. Check out what Wikipedia has to say on the subject. There's also an introduction on Kuro5hin. (I contributed to both of those articles.) You can even play a modern version called Ambush on your PC.

Saturday, July 09, 2005

Who Invented That Symbol?

Jeff Miller maintains a list of the earliest known uses of various mathematical symbols here. He also has a list of the earliest known uses of various mathematical words. Sometimes you can forget that symbols like < and !, when used as mathematical notation, haven't been around for eternity but were in fact invented.

Thursday, July 07, 2005

Heaviside Operators and Umbral Calculus

There are a bunch of mathematical 'tricks' I've used to solve real problems since before I was half the age I am now. Only recently have I actually found out who was responsible for my two favourite.

The first trick was manipulating the operator D=d/dx as if it were an ordinary variable. Along with the rule f(D)exp(ax)=exp(ax)f(D+a) you can solve a wide variety of differential equations trivially. For example to find a solution to

d^2y/dx^2-6dy/dx+6y=x^2 exp(x)

write

(D-2)(D-3)y=x^2

y = 1/((D-2)(D-3)) exp(x) x^2

= exp(x) 1/((D-1)(D-2)) x^2

= exp(x) (1/2+3D/4+7D^2/8+3D/4+...) x^2 (Taylor series in D)

= (x^2/2+3x/2+7/4) exp(x)

(I haven't checked that BTW...) For complex problems this was a lot easier than what I was originally taught, which was to guess the form of the result with some unknown constants and then solve for the constants.

I've rarely seen this presented as a method for solving differential equations in a textbook. I actually learnt it from a schoolteacher. In my day we did differential equations at school. (That's high school for our American brethren.)

Anyway, I only found out in the last couple of years that this idea was due to Heaviside who in fact used 'p' instead of 'D'. There was a blog entry about it over at ChapterZero recently. It was pretty controversial stuff in its day but the work was eventually placed on a rigorous foundation.

The other trick I used quite a bit was a method for solving recurrence equations. Write x^[r] to mean x(x-1)(x-2)...(x-r+1). Define the linear operator d by df(x)=f(x+1)-f(x). Then d x^[r]=rx^[r-1]. Notice the formal similarity to D x^r = rx^(r-1). What this gives is is a vague 'metatheorem' which says that many theorems about the calculus of polynomials in x can be translated directly into theorems about difference equations by replacing x^r with x^[r] and D with d. This meant I could use many of Heaviside's tricks on difference equations.

Anyway, this trick is an example of umbral calculus, a term invented by Sylvester. I had a vague idea of this before as I found some obscure articles about it a couple of years ago. But I just found an entry on Wikipedia that explains what it's all about. Turns out there are many families of polynomials, P_n(x), that have the property that P_n(x) acts a lot like x^n. The word 'umbral' comes from the fact that the 'n' is a kind of fake shadowy exponent that isn't actually an exponent. The 'metatheorem' I mentioned above can be used for many other series of polynomials too, and that's what umbral calculus is all about. Again, this stuff was controversial in its day but was eventually made rigorous by Gian-Carlo Rota.

I find it curious that both the methods I was using were only put on a rigorous footing relatively recently. Maybe that explains why they're not in textbooks - they still haven't shaken off the stigma of being 'tricks' despite being methods that are easy to use and have plenty of applications.

BTW You can combine the above tricks by noting that d=exp(D)-1. Note that the 'ratio' of the two operators I've been discussing is D/(exp(D)-1), the generator function of the Bernoulli numbers. Thinking about this is how I was eventually led to the paper I'm trying to write up on Brion's theorem.

Wednesday, July 06, 2005

Quantum Systems are Big

Just thinking about the blog entry about water over at ars mathematica.

One of the things that is underappreciated about quantum systems is how big they are. Consider a classical system made of two subsystems A and B. Then the space of configurations of the union of the systems is AxB. Typically this is a finite dimensional manifold. To describe the state of the combined system we just need to describe the A susbsystem and then the B subsystem. In this sense classical mechanics is additive.

In quantum mechanics we look at much larger spaces. Typically we're looking at wavefunctions, complex valued functions on the classical configuration space. When we combine two systems the state space is the tensor product of the state spaces of the individual systems. So in a sense combining quantum mechanical systems is multiplicative. When you start combining quantum systems they get very large.

Anyway, to get an idea of what I'm talking about see my comment. With what seems like a really simple system we're way beyond what is reasonable to simulate on a computer. It's not surprising then that it's hard to predict the behaviour of water. On the contrary, we should be happy that we can make any kinds of predictions at all about the bulk properties of matter.

Now, many years ago, I used to work in a computational chemistry group. People were simulating molecules all the time. Not just little ones - big ones with hundreds or even thousands of atoms. They were trying to understand biology at a molecular level in order to design drugs. But given that water is so hard to understand this seems like an insurmountable task. They either used simplified quantum models (eg. single electron models) or empirical classical models based on masses and springs. Typically what happened was post hoc fitting of data. The empirical models had many parameters that could be tweaked. The user would run sims with many different parameters, eventually see the real data (eg. from X-ray diffraction techniques or NMR) and then choose the model that best fit the experimental results claiming it explained the observed behaviour. It had next to zero predictive value. Occasionally my colleagues would have moments of lucidity and realise that they might as well predict molecular behaviour using yarrow stalks and the I Ching - and then a short while later they'd go back to work tweaking the parameters.

And you thought you were helping to cure cancer when you donated your cycles to folding@home!

Incidentally, the largeness of quantum systems is why quantum computers are potentially so powerful. The states of a 5 qubit computer form a 5D vector space. The states of a 10 qubit system form a 10D vector space. The latter is vastly larger than the former and has much more complex time evolution. The potential increase in power is much more than what you get from upgrading a 5 bit classical computer to a 10 bit one. (When I say 10 bits, I don't mean a computer with a 10 bit address bus, I mean 10 bits total memory.) But, alas, I suspect, for the same reason, that an (N+1)-qubit computer is much harder to build than an N-qubit computer, and so the cost of making a quantum computer of a given power will be much the same as the cost of making a classical computer of equivalent power. (I'll make an exception for specialised types of problem that you can't even begin to tackle classcially such as certain types of encryption.)

Saturday, July 02, 2005

Oh! To live forever!

I buy books at a constant rate, faster than I can read them. This has the unfortunate consequence that no matter how long I live I can't possibly read all of the books I buy. In fact, the longer I live, the more unread books I will have upon my death. But if I live infinitely long I can read them all.

If I live forever then the set of books I buy will be infinite. So you might think there is a 'race' going on. I'll buy infinitely many books, and read infinitely many, so depending on the relative rates I might buy more than I read or vice versa. But suppose it takes time t_i to read book i. Then I'll have read book i by time t_1+...+t_i. So every book is eventually read. Even if every book contains a bibliography with 100 new books, and I add those books to my to-buy list each time I finish a book, I'll still read them all. Even if I read books slower and slower so book i+1 takes twice as long to read as book i, I'll still get to finish all of my books.

Now let's kick it up a notch. Each book contains an countably infinite bibliography and our goal is to ensure that for every book we read, we eventually also read the books in its bibliography. Here it gets a little harder. If we just take the entire bibliography from the first book and stick it in our to-read list we'll never read any of the books in the bibliographies of those books, we'll always be stuck on the first bibliography. What strategy can we use to achieve our goal? (It can be achieved.)

Anyway, this was derived from Smullyan's arguments about Hintikka sets in his First Order Logic.

Friday, July 01, 2005

Two Papers

I'm trying to get the time to write two papers at the moment.

The first is an application of a result about convex polygons, that first came out of the theory of Toric Varieties, for an image processing application. I actually submitted to a conference but it wasn't quite the right venue so I'll do what they recommended in the rejection and resubmit to another journal related to the conference. Need to add another example though.

The second is on automatic differentiation (AD). Here's the idea behind AD. Given a bit of code like
y = cos(x) z = x+y
we can rewrite them as
dx = 1 y = cos(x) dy = -sin(x)*dx z = x+y dz = dx+dy
In order to compute the derivatives w.r.t x.

But that's an ugly way to work. The code to compute the original function is interleaved with the code to compute derivatives. You have to maintain both sets of code simultaneously, you have to do the differentiation yourself, and it violates the principle of separating concerns. But a nice solution that I first learnt from a paper by Max Jerrell is to replace the float or double datatype with a new type that tracks derivatives for you. The next effect is that the original code
y = cos(x) z = x+y
is left intact but you override +, cos and so on so that they automatically implement the chain rule for you. That way the differentiation code is neatly encapsulated. Here's a link to a paper I've already written on the subject. This approach is called forward AD. Note that this method is exact (unlike finite differences) but not symbolic algebra (which is usually a really inefficient way of computing derivatives).

Note how the code above computes derivatives in the same order as the original code. It turns out that there is a more efficient method for computing derivatives (at least in many cases) that works by reversing the order of the original code and then applying the chain rule. In effect it's like computing the matrix/vector product ABCx as A(B(Cx)) instead of as ((AB)C)x. The former is usually much faster. A solution to this is to again override +, cos and so on. But this time, instead of immediately applying the chain rule, a tree representing the computation is built. Only at the end are the numbers crunched, this time sweeping through the expression tree in reverse order. This usually requires quite different code to the method in the previous paragraph and is called reverse or adjoint AD.

But I have an approach to unifying these two algorithms. Again there is a separation of concerns issue: the second AD algorithm performs two functions - reversing the order of the computation and then computing the derivative. I've figured out a way to override the operators * and + in such a way that the reverse AD code is simply the forward AD code with the underlying float type replaced with yet another type. This simplifies AD libraries considerably, makes the code easier to maintain, and just seems like a nice way to do things.

So now you know what I'm up to.

A Neighborhood of Infinity

Search This Blog