Thursday, September 22, 2005

Are Men Innately Better than Women at Mathematics?

No, argues Elizabeth Spelke. In the linked paper (which isn't yet peer reviewed) she looks at a great wealth of published data and comes to the conclusion that the differences can all be explained by socialisation. There's also an old debate between Spelke and Pinker over at Edge.


Wednesday, September 21, 2005

Tetris and matrices over semirings

Here's a position in Tetris:


We can describe this as a list of heights (1,1,2,1,0,1,1,2,0,1).
Here's a Tetris piece:


and here's what you get when you drop one on the other


We get a new list of heights (1,3,3,3,0,1,1,2,0,1). This has failed to capture the 'overhang' in the 4th column but lets just think about Tetris pieces dropped from the top so overhangs make no difference. What is the relationship between the old and new list of heights?

It turns out that you can get one from the other from a single matrix 'multiplication' - the catch is that we have to work over the semiring (R union infinity,min,+) instead of the more usual semiring (well, actually field) (R,+,*).

Let hn be the initial height in the nth column and hn' be the height after dropping the new piece. Think about how low the new piece made of Ys can fall. If the left most column of the Ys is in contact with the X's then it's lowest point ends up at height h2+1. If he middle column is in contact then the lowest point ends up at h3. And if the right column is in contact the lowest point ends up at height h4. So the lowest point ends up at height min(h2+1,h3,h4). So we can now write down expressions for the new height: h2' = min(h2+1,h3,h4)+1, h3' = min(h2+1,h3,h4)+1, h4' = min(h2+1,h3,h4)+2. We can rewrite this as:

h2' = min(h2+2,h3+1,h4+1)

h3' = min(h2+2,h3+1,h4+1)

h4' = min(h2+3,h3+2,h4+2)

Now let's switch to the semiring (R union infinity,min,+) (with 'additive' identity OO = infinity). Use the notation (+) = min and (*) = +. Then we can write:

h2' = 2(*)h2 (+) 1(*)h3 (+) (1)*(h4)

h3' = 2(*)h2 (+) 1(*)h3 (+) (1)*(h4)

h4' = 3(*)h2 (+) 2(*)h3 (+) (2)*(h4)

We can now write this in matrix form h' = Mh where h is

( 0 OO OO OO ...)
( OO 2 1 1 ...)
( OO 2 1 1 ...)
( OO 3 2 2 ...)
( ... ... ...)
( ... 0 )

So corresponding to each Tetris piece in a particular column we have a matrix. To find the effect of dropping multiple pieces we just multiply the matrices in the semiring in turn.

Turns out this has all types of applications - for example resource allocation in a queing system. I'll leave you to figure out how falling testris pieces might represent jobs submitted to a queing system and how the columns might represent types of resource.

Here's an easy exercise: If M is the matrix of a piece, what piece is the transpose MT the matrix of?

I learnt this from this paper by Gaubert and Max Plus. (I think Max Plus is a relative of Bourbaki.)

Many algorithms are 'linear' in this semiring. Another example that seems quite different is the dynamic programming algorithm for approximate DNA string matching. I tried to work out what the transpose matrix did (ie. what the adjoint algorithm was) and then use the code reversal method I mentioned earlier to see what new algorithm I obtained was. It was another approximate string matcher but this time it worked through the strings in the opposite direction. (Note how each operation in this description is essentially of the form a = max(some terms added to constants).)


Tuesday, September 20, 2005

The 100 Greatest Theorems

Here is the list, presented at a conference in 1999. I think that like IMDB we also need a list of the 100 worst theorems. Any suggestions for candidates?


Monday, September 19, 2005

A Certain 24th Root

This paper by Heninger, Rains and Sloane (that's Sloane as in the famous Encyclopedia) is interesting. It's about the conditions required for a formal power series with integer coefficients, to be the nth power of another such series. In particular it focuses on theta functions for lattices.

The theta function of a lattice is simply the sum of xu.u for all u in the lattice. Clearly it is a formal power series with integer coefficients if the lattice is such that u.u is always an integer. These are often pretty amazing functions, for example for certain types of lattice they are modular forms. It turns out that the theta function for the Leech lattice is a power of 24. More amazingly: this is a new result!

It's clear that the theta function is the generating function for the number of lattice elements of a given length. So if it's a 24th power you'd expect this to have a combinatorial explanation too. Unfortunately the proof isn't combinatorial. So even though the 24th root appears to be counting something, nobody knows what it is that is being counted!

Anyway, Sloane and friends had fun trawling through their sequence database trying to find other sequences that were powers. Another interesting one they found was the exponential generating function for the number of partitions of {1,...,n} into an odd number of ordered subsets. It turns out this is a perfect square. This is begging for a combinatorial explanation but it hasn't yet been found. I'll have to see if I can reproduce the square root of this series with my (unfinished) Haskell code for formal power series.

And of course this paper is interesting to me because of the prominent role of the number 24 :-)


Monday, September 12, 2005

Of Partitions and Primes

I must have been hiding under a rock or something. I've only just found out about a big mathematical result about the partition function, p(n). p(n) is the number of ways of writing n as distinct (unordered) sums of integers. p(4) is 5 as 4 = 4+1 = 2+2 = 2+1+1 = 1+1+1+1.

First a digression: There's a bizarre formula for p(n) due to Rademacher and you can see it here. Note the n-1/24 subexpression. You might wonder where that 1/24 comes comes from. Actually it has a neat physical interpretation. In bosonic string theory you have a string which can oscillate with different harmonics. Given a string with a certain amount of energy there are different ways the energy could be shared out between the different harmonics. In quantum mechanics the amount of energy that can go into each harmonic is 'quantised' so that it is proportional to an integer. In addition the energy of a single quantum of energy in the nth harmomic is proportional to n. What this means is that if you have m1 quanta in the lowest harmonic, m2 in the second and so on then the total energy in the system, in suitable units, is E=m1+2*m2+3*m3+... In other words the number of different states with energy E is just the partition function p(E). In practice it's a little more complex because strings oscillate in a high dimensional background space so in fact there are harmonics in different directions. But essentially it's a similar counting problem. What I've actually described is a type of conformal field theory. Bosonic string theory (which is a 26D theory) is basically 24 copies of this system.

The interesting twist with quantum string theory is that when you consider the minimum energy you can put into each harmonic it isn't zero. You can argue this from the Hesenberg uncertainty principle, it can't be at rest because then you'd know its position and momentum simultaneously. So it must have non-zero energy. You can treat each harmonic independently so each harmonic has some energy. In fact they each have a minimum of half a quantum. (This is standard stuff.) So the correct formula is really E=(m1+1/2)+2*(m2+1/2)+3*(m3+1/2)+... The catch is that when you try to add up you get a divergent series. But this is no problem for physicists as you can sum the series using the methods outlined here. In fact, the 'zero-point' energy is essentially 1/2(1+2+3+...) so zeta regularisation gives -1/24. So the partition function of n is actually a function of the energy of a conformal field theory in its nth excited state. As I've obsessed about before, this is the same 24 that is the dimensionality of the Leech lattice, Golay code and lots of other good stuff. It's all connected mysteriously! Somehow this information about the quantum theory of strings is already encoded in the properties of the partition function discovered 70 years ago even though the partition function is defined by the elementary properties of numbers. This is also why I think String Theory is cool even if it might be useless as a physical theory. Even if we wanted to give up the string theory 'research programme' today it'd still be there hidden away inside the mathematics we study, regardless of whether we think that mathematics has physical applications.

I'll stop there as I don't even pretend to understand the connections. But do check out the proof of Rademacher's formula via Farey sequences and Ford circles. It's in a few analytic number theory books (eg. Apostol's Springer text on the subject) and it's one of the more beautiful proofs out there.

Anyway, that's all irrelevant. What's relevant is that the partition function has neat congruence properties. Ramanujan proved p(5m+4) is a multiple of 5 for all natural numbers m. He also showed p(7m+5) is a multiple of 7 and p(11m+6) is a multiple of 11. Many years ago Dyson explained the 5 and 7 cases using what was called the "crank conjecture" and then later this was extended to 11. But in the last couple of months Mahlburg has proved that the crank conjecture applies for all primes giving a way to produce such congruences for any prime. This is a pretty major result. People have been investigating these congruences at least since Ramanujan's day but now we have a handle on all of them simultaneously. More detail is posted here and this brings to the end a long chapter in number theory. Note, by the way, how often the number 24 appears through out this paper. :-)

Update: I've edited this so it makes slightly more sense.



I keep starting papers and not finishing them. Both reading and writing, but this time I mean writing. But my latest is short enough that the first draft must be practically the final draft so once my colleagues have suggested improvements I'll see if I can submit it to a journal before the end of the month. Anyway, the inspiration behind this paper was the realisation that you can transform code that computes linear functions into its adjoint by writing the code 'backwards' and taking the adjoint of each line. I could explain what I mean but I found a web site that explains it all. I found an application to digital filtering where I was able to transform a useful algorithm into another by essentially writing it backwards and reversing all of the loops. But the general method of computing the adjoint of code seems so useful I assume it's commonplace, but I can't find a single paper on transforming code in this way, just the web site I mentioned above. It's closely connected to reverse mode automatic differentiation but more general as it applies to any linear operator, not just the action of the derivative on the tangent space.

OK, I will explain. Consider the line of code, in C++, a += b. We can write this in matrix form as

(a) = (1 1) (a)
(b) (0 1) (b)

If we take the transpose of the matrix we get

(a) = (1 0) (a)
(b) (1 1) (b)

which is the same as the code b += a. More generaly, any block of code that performs linear updates to variables can be converted to its adjoint by writing the steps backwards and transforming each line of code like this. A more complex example (exercise) is a = 2*a+3*b-4*c; b = 2*a which maps to a += 2*b; b = 0; b += 3*a; c -= 4*a; a *= 2. If your code has loops, eg. iterates over arrays of variables, you just run the loops backwards. The application for me is that there is a published fast algorithm for some linear operator but I need the adjoint. Surprisingly the adjoint of the fast algorithm can actually turn out to be a whole new fast algorithm.

Incidentally, I asked the journal editor for the expected turnaround time for the decision on whether to publish. He responded with the mean, standard deviation, extrema and quartiles. If only people like plumbers, doctors and cable installers could respond in this way. "If I attempt to fix this leak it has a 50% chance of taking 1 hour to fix, a 25% chance of taking 2 and there's a 5% chance I won't get it done today" rather than "I'm charging by the hour and it'll take as long as it takes unless there's a complication in which case it'll take longer".


Tuesday, September 06, 2005


I'm currently reading this article on the history of the infinitesimals. My interest in the subject comes from a subject I've mentioned earlier: automatic differentiation. Implementing infinitesimals as an extension to the real numbers is straightforward in modern programming languages but for some reason it's not well known. They give a nice efficient way to compute derivatives of functions and remove removable singularities from functions.

One of the earliest ways to construct the real numbers from the rationals was as an equivalence class of sequences of rational numbers. Two sequences are equivalent if the difference tends to zero. In Non-Standard Analysis, developed by Robinson, this definition is tweaked slightly so that the sequences are equivalent if they are equal on a 'large' subset of the sequence. The trick is to define 'large' appropriately leading to the concept of an ultrafilter. This results in a set *R which is an extension of the usual set of reals R but also contains infinitely small and large numbers. There's also a nice 'metatheorem' that goes with this construction showing that 'reasonable' theorems about *R are also theorems about R. This basically gives us license to use infinitesimals in our proofs and still be sure that our results make sense as propositions about real numbers. So many of Leibniz's old 'proofs' using infinitesimals, say, can now be reinterpreted as perfectly valid proofs.

Anyway, you don't need all this machinery to do automatic differentiation, just an algebraic structure with elements whose square is zero.

Incidentally, Gödel used ultrafilters to 'prove' the existence of God.