A Neighborhood of Infinity

Friday, March 31, 2006

Quantum Probability

I took part in a brief discussion over at antimeta which reminded me that I ought to get back to a document I started writing on quantum mechanics for dummies. One of my pet peeves is that I believe there to be a little bit of a conspiracy to make quantum mechanics seem less accessible to people. Not a deliberate conspiracy - but people maintaining an aura of mystery about it that puts people off the subject. All of the fuzzy talk about quantum mechanics in the popular science press does nothing to help the situation. In particular, there is a core of quantum mechanics that I believe requires few prerequisites beyond elementary probability theory, vector spaces and complex numbers.

Anyway, I did some more digging on the web and found this course by Greg Kuperberg. The opening paragraphs almost take the words I wanted to say out of my mouth. In particular, despite the mystical mumbo-jumbo that is often written on the subject, the rules of quantum mechanics are "rigorous and clear" and "The precepts of quantum mechanics are neither a set of physical forces nor a geometrical model for physical objects. Rather, they are a variant, and ultimately a generalization, of classical probability theory." Most of all "...more mathematicians could and should learn quantum mechanics...". You don't even have to understand F=ma to get started with quantum mechanics and get to the point where you can really and truly get to grips, directly, with the so-called paradoxes of quantum mechanics such as the Bell Paradox. The strange thing is that you won't find words like this in most of the quantum mechanics textbooks. They throw you into physical situations that require finding tricky solutions to the Schrödinger equation while completely failing to give any insight into the real subject matter of quantum mechanics. Most QM books I know are really introductions to solving partial differential equations. (Remark to physicists: I bet you didn't know you could get the simultaneous eigenvalues for the energy and angular momentum operators for the hydrogen atom by a beautifully simple method that doesn't require even looking at a differential equation...) The best thing about the newly appearing field of quantum computing is that it's slowly forcing people to thing about quantum mechanics separately from the mechanics.

So even though I haven't read that course myself yet, I'm recommending that everyone read it :-) And some time I might get back to the even more elementary introduction I hope to put together.

Thursday, March 30, 2006

A Neat Proof Technique

Last night I read this paper. There wasn't really anything in the paper I wanted, I was more interesting in flexing my newly developing computer science muscles. Anyway, like with all computer science papers I've managed to finish, I felt like I understood the paper but had no idea what the punch line was. Still, it was worthwhile because part way through there was a neat mathematical proof technique used that I think is worthy of a mention.

The paper is about the generalisations of 'fold' and 'unfold' that I played with recently. The usual fold function acts on a list like

[4,8,15,16,23,42].

Imagine it written as

cons(4,cons(8,cons(15,cons(16,cons(23,cons(42,[]))))))

where the cons function is a 'constructor' that constructs a list from a list element and another list. [] is the empty list. The fold function takes three arguments, a binary function f, a constant g, and a list. It then replaces cons by f and [] by g. For example

fold (+) 0 [4,8,15,16,23,42]

becomes

4+(8+(15+(16+(23+(42+0)))))

and so is 108, the sum of the numbers. An obvious theorem is that

fold cons [] x = x.

fold also generalises to structures such as trees. In this case there is another constructor that builds a tree from a node and its children. I'll stick with lists as the proof technique is much the same. The authors prove a bunch of theorems about the fold function. But now they want to prove something is true of all of the elements in a list (actually, a tree, but I use a list). Suppose the list is as above and you want to prove the property P holds for all elements in that list. Then you want to prove

P(4) && P(8) && P(15) && P(16) && P(23) && P(42) = True.

(I'm using Haskell's && to mean 'and'.) In other words you want to prove that

fold (&&) True [P(4),P(8),P(15),P(16),P(23),P(42)] = True.

The authors then proceed to show their theorem is true using the theorems about fold they just proved. Neat eh? But Maybe I haven't explained very well so I'll try to spell it out differently.

The authors want to prove theorems about all elements of a certain datastructure. So what they do is take the logical proposition that expresses what they want and rewrite the proposition so that the proposition itself in the same 'shape' as the datastructure. The statement of the truth of the proposition is an application of fold to the datastructure. So they can now use the theory of datastructures they develop to prove something about the structure of the logical proposition itself - namely that it's true. You need to read the actual paper (top left, page 5) to see the actual details of the proof technique.

In the particular case of lists the proof technique turns out to be standard induction. If you can prove P of an an element of the list, and you can prove "if P is true of an element it must be true of the next element", then P holds for every element of the list. Every time you generalise fold to a different datastructure there is a new induction principle that goes with it.

I suppose what they have actually done is prove a metatheorem about certain types of logical proposition and then used that to prove specific propositions.

Well, I hope that makes some kind of sense. I think it's quite a neat trick and that many of the paper's readers might miss the cool bit of self-reference that's going on here.

Monday, March 27, 2006

The General Theory of Self-Reproducing Programs

I'm guessing that anyone reading this is already familiar with the idea of programs that output themselves. If you're not there's a great list of such programs here. But what I was surprised to discover at the weekend was that there is in fact a bit of general theory about this. In particular, what do we need in a computer language to guarantee we can write a self-replicator?

Consider the following in Haskell:

let p x = x ++ show x in putStrLn $ p"let p x = x ++ show x in putStrLn $ p"

Evaluate this expression in an interactive Haskell session and it prints itself out. But there's a nice little cheat that made this easy: the Haskell 'show' function conveniently wraps a string in quotation marks. So we simply have two copies of once piece of code: one without quotes followed by one in quotes. In C, on the other hand, there is a bit of a gotcha. You need to explicitly write code to print those extra quotation marks. And of course, just like in Haskell, this code needs to appear twice, once out of quotes and once in. But the version in quotes needs the quotation marks to be 'escaped' using backslash so it's notactually the same as the first version. And that means we can't use exactly the same method as with Haskell. The standard workaround is not to represent the quotation marks directly in the strings, but instead to use the ASCII code for this character and use C's convenient %c mechanism to print at. For example:

main(a){printf(a="main(a){printf(a=%c%s%c,34,a,34);}",34,a,34);}

Again we were lucky, C provides this great %c mechanism. What do you need in a language to be sure you can write a self-replicator?

It turns out there is a very general approach to writing self-replicators that's described in Vicious Circles. What follows is essentially from there except that I've simplified the proofs by reducing generality.

We'll use capital letters to represent programs. Typically these mean 'inert' strings of characters. I'll use square brackets to indicate the function that the program evaluates. So if P is a program to compute the mathematical function p, we write [P](x) = p(x). P is a program and [P] is a function. We'll consider both programs that take arguments like the P I just mentioned, and also programs, R, that take no arguments, so [R] is simply the output or return value of the program R.

Now we come to an important operation. We've defined [P](x) to be the result of running P with input x. Now we define P(x) to be the program P modified so that it no longer takes an argument or input but instead substitutes the 'hard-coded' value of x instead. In other words [P(x)] = [P](x). P(x) is, of course, another program. There are also many ways of implementing P(x). We could simply evaluate [P](x) and write a program to simply print this out or return it. On the other hand, we could do the absolute minimum and write a new piece of code that simply calls P and supplies it with a hard-coded argument. Whatever we choose is irrelevant to the following discussion. So here's the demand that we make of our programming language: that it's powerful enough for us to write a program that can compute P(x) from inputs P and x. This might not be a trivial program to write, but it's not conceptually hard either. It doesn't have gotchas like the quotation mark issue above. Typically we can compute P(x) by some kind of textual substitution on P.

With that assumption in mind, here's a theorem: any program P that takes one argument or input has a fixed point, X, in the sense that running P with input X gives the same result as just running X. Given an input X, P acts just like an interpreter for the programming language as it outputs the same thing as an
interpreter would given input X.

So here's a proof:
Define the function f(Q) = [P](Q(Q)). We've assumed that we can write a program that computes P(x) from P and x so we know we can write a program to compute Q(Q) for any Q. We can then feed this as an input to [P]. So f is obviously computable by some program which we call Q₀. So [Q₀](Q) = [P](Q(Q)).

Now the fun starts:


[P](Q₀(Q₀)) = [Q₀](Q₀) (by definition of Q₀)
            = [Q₀(Q₀)] (by definition of P(x))

In other words Q₀(Q₀) is our fixed point.

So now take P to compute the identity function. Then [Q₀(Q₀)] = [P](Q₀(Q₀)) = Q₀(Q₀). So Q₀(Q₀) outputs itself when run! What's more, this also tells us how to do other fun stuff like write a program to print itself out backwards. And it tells us how to do this in any reasonably powerful programming language. We don't need to worry about having to work around problems like 'escaping' quotation marks - we can always find a way to replicate the escape mechanism too.

So does it work in practice? Well it does for Haskell - I derived the Haskell fragment above by applying this theorem directly, and then simplifying a bit. For C++, however, it might give you a piece of code that is longer than you want. In fact, you can go one step further and write a program that automatically generates a self-replicator. Check out Samuel Moelius's kpp. It is a preprocessor that converts an ordinary C++ program into one that can access its own source code by including the code to generate its own source within it.

Another example of an application of these methods is Futamura's theorem which states that there exists a program that can take as input an interpreter for a language and output a compiler. I personally think this is a little bogus.

Stanislaw Lem has Passed Away

He has an obituary in The Times. If you haven't read any of his science fiction you really should read some. For lovers of mathematical poetry out there, here is a translation of a passage from his Cyberiad:

Come, let us hasten to a higher plane,
Where dyads tread the fairy fields of Venn,
Their indices bedecked from one to n,
Comingled in an endless Markov chain!

Come, every frustrum longs to be a cone,
And every vector dreams of matrices.
Hark to the gentle gradient of the breeze
It whispers of a more ergodic zone.

In Riemann, Hilbert or in Banach space
Let superscripts and subscripts go their ways.
Our asymptotes no longer out of phase,
We shall encounter, counting, face to face.

I'll grant thee random access to my heart,
Thou'lt tell me all the constants of thy love;
And so we two shall all love's lemmas prove,
And in our bound partition never part.

For what did Cauchy know, or Christoffel,
Or Fourier, or any Boole or Euler,
Wielding their compasses, their pens and rulers,
Of thy supernal sinusoidal spell?

Cancel me not--for what then shall remain?
Abscissas, some mantissas, modules, modes,
A root or two, a torus and a node:
The inverse of my verse, a null domain.

Ellipse of bliss, converge, O lips divine!
The product of our scalars is defined!
Cyberiad draws nigh, and the skew mind
Cuts capers like a happy haversine.

I see the eigenvalue in thine eye,
I hear the tender tensor in thy sigh.
Bernoulli would have been content to die,
Had he known such a²cos(2φ) !

The Most Amazing and Mysterious Thing in All of Mathematics

I think that a good candidate is the table of the Homotopy Groups of Spheres.

For those not familiar with algebraic topology, π_m(Sⁿ) is the set of equivalence classes of continuous functions from the m-dimensional sphere to the n-dimensional sphere where two functions are considered equivalent if they are homotopic. An easy way to visualise this is that two functions are homotopic if you can interpolate a continuous animation between them. (Can you guess what industry I work in?) This set also has a group structure which is straightforward to define but which I won't go into here (unless someone requests it). That's all there is to it. How can such simplicity generate such complexity?

Monstrous Moonshine is pretty mysterious too - but it takes a lot of work to state it for a non-expert. So this wins by default.

Like John Baez I also wonder about the curious appearance of 24 on row 3.

And an aside I discovered on Anarchaia: foldr.com related to an earlier post.

Friday, March 24, 2006

6

Talking of category theory...I can't remember if I previously mentioned the bizarre functorial property of the number six I came across in Designs, Codes and their Links

This is the actual theorem:

consider the category whose objects are the n element sets and whose arrows are the bijections between the sets. This category has a non-trivial functor to itself only for n=6.

By smart use of Google Print you should be able view the proof. It's the first five pages of Chapter 6. (Don't make the obvious mistake with Google Print and end up with only three pages of that chapter.)

Anyway, it's not too hard to give a bit of insight into what this means. Consider the set with n elements. You can build all kinds of combinatorial objects which have some underlying set. Permuting the original n-element set induces a permutation of the combinatorial object and hence its underlying set. If the underlying set also has n elements you usually just end up with the original permutation. For example consider n=3 and the combinatorial object that is the even permutations on the 3 element set. This also has three elements. But the induced permutations on this new set are equivalent to the original permutations (via a bijection from the 3 element set to the set of its even permutations). On the other hand, if n=6 then you can construct another 6 element combinatorial object where the induced action of S₆ is quite diffferent to the original one. In fact, it gives an outer automorphism of S₆, another bizarre thing that only exists for n=6. To see the actual details of the construction look here.

I should also mention that Todd wrote a paper on this subject: The Odd Number 6, JA Todd, Math. Proc. Camb. Phil. Soc. 41 (1945) 66--68.

It's also mentioned in the Wikipedia but that's only because yours truly wrote that bit.

Anyway, I'm vaguely interested in how this connects to other exceptional objects in mathematics such as S(5,6,12), the Mathieu groups, the Golay codes, the Leech lattice, Modular Forms, String Theory, as well as Life, the Universe and Everything.

Thursday, March 23, 2006

Sets, Classes and Voodoo

After much anticipation I started reading Barwise and Moss's "Vicious Circles". Unfortunately by chapter 2 I've reached the point where the statements don't just seem incorrect, they don't even seem to be propositions. I raised my question on USENET but I may as well mention it here too.

Here's how I understand the concept of a class in ZF Set Theory: talk about classes is really just talk about predicates. We enrich the language of Set Theory with a bunch of new terms ('class', 'subclass') and overload other terms ('is an element of', 'is a subset of') to give a new language that reifies classes, but instead of adding new axioms to deal with classes we provide a translation back to ZF without these extra terms. For example if P and Q are classes then "x is in P" means "P(x)" and "P is contained in Q means" "for all x, P(x) implies Q(x)" or even "a subclass of a set is a set" which translates to the axiom of separation. (We could alternatively add new axioms, instead of the translation, and then we'd get NBG Set Theory.)

Am I right so far? (By the way, nobody ever seems to say what I've just said explicitly. In particular, it seems to me that once you add the term 'class' you need to start proving metatheorems about classes to show what kind of deductions about them are valid, but nobody ever seems to do this.)

I understand that this is a sensible thing to do because of the overloading - in the enriched language sets and classes look similar and that allows us to do category theory, for example, in a much wider context, without having to define everything twice. (And also, maybe, because talk about sets is really a substitute for talk about classes...but that's a philosophical point for another day...)

So what does "If a class is a member of a class then it is a set" mean in the context of ZF? A class is really just a predicate, so it doesn't make sense to me that there could be a predicate about predicates. So at this point the book is looking like Voodoo.

Can anyone out there clarify this for me?

(Hmmm...I wonder if the translation to ZF is an adjoint functor...ow...MUST STOP OBSESSING ABOUT ADJUNCTIONS...)

Wednesday, March 22, 2006

The Representation of Integers by Quadratic Forms

There's a nice article at Science News on the work of Bhargava on the representation of integers by quadratic forms. Rather than just restate what's written there (and in a few other blogs) let me quote the main theorem which is really quite amazing:

If Q is a quadratic form: Zⁿ->Z then if the image of Q contains {1, 2, 3, 5, 6, 7, 10, 13, 14, 15, 17, 19, 21, 22, 23, 26, 29, 30, 31, 34, 35, 37, 42, 58, 93, 110, 145, 203, 290}, then it contains every positiver integer.

Closely related is Conway's 15-theorem which originally inspired Bhargava and which (I think) I first read about in the excellent book The Sensual Quadratic Form Check out the section on topographs where Conway works his usual magic and makes some parts of the theory seem so clear and obvious that even your cat could start proving theorems about quadratic forms.

Tuesday, March 21, 2006

Category Theory Screws You Up!

Well it does. Since I started a burst of intense category theory reading a couple of weeks ago (not that intense as I have a full time job) I've been showing unpleasant symptoms. These include insomnia, lack of concentration and grumpiness. We're not just talking correlation here, I have causal mechanisms too: how can I sleep when an example of an adjunction might pop into my mind at any moment, how can I concentrate when my brain is already fully occupied in finding those examples, and of course I'm grumpy with all this effort to understand difficult theorems that always turn out to be trivial and content-free. Fortunately I find that drugs help with the insomnia, but there's no cure for the other symptoms.

At least I haven't reached the stage where I sit down to dinner wondering whether or not my eating it is an operation with a left or right adjoint. (But I thought it didn't I, so I must be pretty far gone.) And I'm not dreaming commutative diagrams yet.

So here's my advice: if someone comes up to you in a shady bar or alleyway and offers you a monad, or an adjunction, or even an innocent little natural transformation, just say "no!".

Friday, March 17, 2006

What can I do with adjoints? And Lemma 28.

By 'do' I mean 'compute'.

In order to teach myself about F-(co)algebras I wrote some literate Haskell that actually made use of them. The neat thing was that it gave me something for nothing. I wrote a generic unfold function (mostly not by thinking about it but by juggling all the functions that were available to me until I found one of the correct type) and I was amazed to find that it did something that I found useful, or at least interesting.

Can I do the same for adjunctions? Unfortunately, all of the examples I tried were basically uninteresting. They were quintessential category theory, you spend ages unpacking the definitions only to find that what you ended up with was trivial. Unlike the fold/unfold case I didn't find myself ever defining a new function that I could use for something.

I did just find this link with some intuition that I'd never read before: 'To get some understanding of adjunctions, it's best to think of adjunction as a relationship between a "path" to an element and a "data structure" or "space" of elements.' Hmmm...none of the category theory books talks about that. I think I can see it: a forgetful functor rubs out the path and just leaves you with the destination. (Don't take my word for it, I'm just guessing at this point.)

Maybe I'm now on target to be able to understand Lambek & Scott which I bought 20 years ago (I'm older than you thought), not really knowing what it was actually about. BTW Check out the review on Amazon: "I was looking for a book for my girlfriend this Christmas and stumbled upon this one..."

And I thought I'd mention Newton's Lemma 28. But after I asked a question about it on Usenet, some other people said far more interesting things that I could ever say.

Thursday, March 16, 2006

Answers and questions

The answer to that cellular automaton puzzle was that it generates the derivatives of the tan function at zero. It uses the fact that if f=tan x, then f'=1+f^2. We can now repeatedly use this to compute higher derivatives, eg.

(f^n)' = nf^(n-1)f'=nf^(n-1)(1+f^2) = nf^(n-1)+nf^(n+1).

You should now see why the CA rule arises.

(Why n,n instead of (n-1),(n+1) as in the rule I gave? Because this is the 'push' version rather than the 'pull' version of the rule that I gave. If you imagine differentiation acting as a linear operation on the vectors with respect to the basis (1,f,f^2,f^3,...) then the 'push' and 'pull' versions are described by matrices that are transposes of each other.)

I was petty excited by this and tried applying it to all sorts of other functions. But actually it was hard to make it work. Maybe the fact that there is a nice formula for the tangent function is just an accident. Anyway, Jan Rutten, whose paper I got this from, has lots of nice papers on coalgebras and infinite streams. I think I've now learned enough about coalgebras to move onto the next section of my category theory book: Adjoints.

Wednesday, March 15, 2006

Homotopies between proofs and between programs

Many years ago, while out drinking after an algebraic topology seminar, someone mentioned the idea of using algebraic topology to analyse computer programs. I never did get the details but I'm guessing it must have been something like the material presented at this conference.

Anyway, the comment got me thinking. I spend much of my time refactoring code. This is making small modifications to software intended to restructure it to a better form without changing its behaviour (eg. I've just been dumped with a single C++ source file 0.25MB long most of which is one function and is in considerable need of some tidying up!). If you think of a program as a path to get you from an input to an output then a change to that program, that keeps the behaviour the same, is a kind of homotopy between paths. But I couldn't really get anywhere with that idea.

In Baez's This Week's Finds this week he talks about homotopies between proofs. These are ways to convert one proof of a proposition into another proof of the same proposition. Cut elimination is such an operation but there are surely others.

According to the Curry-Howard isomorphism, a computer program that maps type A to type B is essentially the same thing as a proof of B assuming A. So a homotopy between programs is the same thing as a homotopy between proofs. For example, Baez's example of converting between two proofs P -> R corresponds exactly to converting the piece of Haskell code f . ( g . h) to (f . g) . h using the associativity of . (. is Haskell for function composition.)

So, is there some kind of interesting topology to be extracted from this? Does the space of programs that map input x to output f(x) have interesting fundamental groups or homology groups? I guess a good place to start would be with two simple programs to perform the same operation that can't be transformed into each other by simple steps. And do I really need omega-categories to make sense of this?

Anyway, besides Baez's stuff there's also theo's comments with a real world example of what looks like a homotopy between proofs.

Tuesday, March 14, 2006

Cellular automaton puzzle

Here are the cells of a 1D cellular automaton:


+--+--+--+--+
|a0|a1|a2|a3| ...
+--+--+--+--+

Instead of a finite number of states the ai are integers.

Here's the update rule:

a0 <- a1
ai <- (i-1)*left+(i+1)*right, i>0

where left and right refer to the cells to the left or right of ai.

We start it in this state:


+--+--+--+--+--+--+--+
| 1| 0| 1| 0| 0| 0| 0| ... all zeroes
+--+--+--+--+--+--+--+

What is the significance of the numbers that appear in the leftmost cell?

If you can't figure it out the answer is here. (Though it might be easier to work it out for yourself than read that paper!)

I think this may actually be useful. I've spent quite a bit of time recently writing code to do the sorts of things that these automata do - but these automata may do it better.

Monday, March 13, 2006

Coalgebras and Automata

fold is a really useful operator in a number of programming languages, including Haskell. It also hs a dual partner, unfold. But more importantly, both can be generalised, using a little category theory, to F-algebras and F-coalgebras. Amazingly, unfold, in the right category (as I discovered after a weekend of frenzied category theory) gives a really nice way to convert descriptions of automata into runnable automata. Anyway, I wrote the whole thing up as a piece of literate Haskell.

UPDATE: That came out formatted ugly so I've moved it to my web site where I'll eventually format it better. Here's the link.

Thursday, March 09, 2006

Hyperfun!

Smullyan's favourite game is called the Hypergame, invented by Zwicker. I hadn't heard of it until today when I read about it in Barwise and Moss's book Vicious Circles.

Consider games where the players take turns and the game is guaranteed to terminate in a finite number of moves. These are call well-founded games.

The Hypergame is quite simple. The first player chooses a well-founded game. The game now turns into that game and the second player starts by opening in that game with play continuing in that game. Eg. if the first player says "Chess (with whatever house rules are required to make it terminate)" then they start playing Chess with the second player moving first. Because of the stipulation of well-foundedness the Hypergame lasts precisely one move more than some well-founded game. Therefore the Hypergame is itself well-founded and always terminates.

Anyway, Alice and Bob decide to play:

Alice: Let's play the Hypergame!
Bob: Cool! I'll go first. My move is to select the Hypergame.
Alice: OK, now we're playing the hypergame. So my first move is to select the Hypergame.
Bob: The Hypergame is cool, I pick that.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
etc.

Wednesday, March 08, 2006

It's a square, square world!

I was browsing for books on elliptic functions on Ebay when I came across this link to a book by Oscar S Adams called "Elliptic Functions Applied To World Maps". This is pretty obscure, I thought to myself, so I googled Oscar. Sure enough, there really was an Oscar S Adams who worked on cartographic projections and his speciality was conformal (ie. angle preserving) projections including this one:

Anyway, there are lots of interesting conformal projections out there. Some of them are quite pretty so I thought I'd share. I like the use of the Schwarz-Christoffel transformation, better known (to me) from 2D fluid dynamics, to map the world into a variety of polygons. I was also surprised to see that Charles Sanders Peirce worked in this area.

Just for the hell of it I might place a bid on the book.

UPDATE: Come on! 'fess up if you're the person bidding against me!

Tuesday, March 07, 2006

Blind Games

I've always been fascinated by games that involve bluffing and I've always been more fascinated by 'blind' games. I define a 'blind' game to be one where there is some state in the game that a player can choose not to reveal. Instead the player claims what that state is. The other player can choose to accept or reject this state of affairs, but if it is rejected then the actual state is revealed and one or other player is 'punished' as appropriate. (Eg. blind chess isn't 'blind'.)

Consider this trivial game: you deal a pack of cards to N players. The players take turns and the winner is the first one to get rid of all their cards. A turn consists of discarding a set of cards that all have the same number or rank. This game isn't very interesting.

But now apply the 'blind transform' to this game with the 'punishment' of picking up all of the discards. The players play cards face down and make some claim about what cards they are laying down. If someone challenges this claim you have a look and mete out the punishment. A trivial game has now been turned into a mildly entertaining game. It can be turned into an excellent game by means of the 'ethyl transform' - playing whle drunk. But moving on...

One of my favourite games years ago, when I could talk people into playing it, was "Blind Le Truc". It's Le Truc but with all of the cards played face down but players claiming what the cards are. In tis case the punishment is losing a trick. Even played 'sighted' this game is full of bluff. Playing it blind brings it to a whole new level. Players can play against each other for extended periods of time playing what is practically an imaginary game going through all the motions of Le Truc without ever seeing a card. It's a lot of fun. Well, I think so anyway.

Even the most trivial of games can become interesting. Eg. each player is given an entire suit of a deck of cards. Each round a player places one card in the middle. Whoever places the highest card in the centre wins that round. Play continues until nobody has any cards left and the winner is whoever won the most rounds. After applying the blind transform, with round losing as punishment, this game is turned into something fun, at least for a few minutes.

Poker is also a blind version of a trivial game.

How amenable are these games to analysis? That last game is pretty simple. Suppose two players each have only 3 cards. What is the optimal way to play?

You can also apply the blind transform multiple times. It's more interesting if the nth application has a tougher punishment than the (n-1)th. In that case it corresponds to raising the stakes for being caught, or accusing someone of, 'cheating'.

Friday, March 03, 2006

When is one thing equal to some other thing?

That's the title of a rough draft of a paper by Barry Mazur (courtesy of Lambda the Ultimate). It's basically an introduction to the ideas of category theory but it has some nice comments along the way on what we mean by equivalence - often a tricky subject in category theory.

And here's a coincidence. I'd just done a web search to find out what properties the number 691 has. (It pops up in the theory of modular forms.) And then I see a link to this paper (completely independently) which uses 691 as an example on the first page. Weird!

Thursday, March 02, 2006

An Actual Application of Fractal Dimension

According to this story, and others, a trove of 32 paintings, purported to be Jackson Pollocks, was discovered last year. But many people had doubts about whether or not they were really by the master of paint pouring or not. So a physicist, Richard Taylor, used some software to compute the fractal dimension of Pollock's paintings and compare them to these new paintings. It turned out that the fractal dimension didn't fit in with the trend of Pollock's work and hence they look like fakes.

Of course, me mentioning this fact isn't intended to be an endorsement of the reliability of these methods. I put as much trust in them as I'd put in using catastrophe theory to navigate my way through a prison riot.

Wednesday, March 01, 2006

A Cautionary Tale for Would-Be Generalisers

You may or may not know this story already as it's been floating around for a while. But I'm going to retell it anyway:

Define sinc(x) = sin(x)/x and sinc(0) = 1

Let 'I' mean integration from 0 to infinity.

I sinc(x) = pi/2
I sinc(x)sinc(x/3) = pi/2
I sinc(x)sinc(x/3)sinc(x/5) = pi/2
I sinc(x)sinc(x/3)sinc(x/5)sinc(x/7) = pi/2
I sinc(x)sinc(x/3)sinc(x/5)sinc(x/7)sinc(x/9) = pi/2

You see the pattern right? So the story is that some guy was evaluating these integrals for some reason or other getting pi/2 all the time. They happily chugged on getting...

I sinc(x)sinc(x/3)sinc(x/5)sinc(x/7)sinc(x/9)sinc(x/11) = pi/2
I sinc(x)sinc(x/3)sinc(x/5)sinc(x/7)sinc(x/9)sinc(x/11)sinc(x/13) = pi/2

So when the computer algebra package this person was using said:

I sinc(x)sinc(x/3)sinc(x/5)sinc(x/7)sinc(x/9)sinc(x/11)sinc(x/13)sinc(x/15) = 467807924713440738696537864469/935615849440640907310521750000*pi

they knew they'd hit a bug. They complained to the vendor who agreed that, yes, definitely, there was something screwy in their integration routine.

Except there wasn't. Weird eh?

I've known this story for a while but I've only just stumbled on Borwein's paper that explains the phenomenon. They're called Borwein Integrals now.

Update: Corrected a small error. It's pi/2 for the first 7 terms, not just the first 6. Even more amazing!

Monday, February 27, 2006

Counter-counterfactual Computation

There has been quite a bit of reporting about Counterfactual Computation in the popular press. This work is essentially a realisation of a technique first published in this paper. I think it's worth looking at the process a little more clearly. Let's see if I can first cram a course in quantum mechanaics and quantum computing into a couple of paragraphs. (Physicists like to obfuscate these subjects but the principles are very simple. But I'm not sure if they're simple enough for two paragraphs however...)

Firstly, let's review classical computing. In a classical computer you have a machine in a state, call it s. You let a bit of time pass and it ends up in a new state, call it t. We can write this using the notation

|s> => |t>

That's basically all there is to know about computing, the rest is details. Let's look at an example. Suppose we have a computer with two one bit registers. The first is an on-off switch and the second will contain the result of the computation. For technical reasons we'll make it so that the answer doesn't just go into the second register but is exclusived-ored with it. We initialise the second register to zero so it's effectively the same thing as storing the result there. Write the state as |s,x> where s is the state of the switch and x is the second register. Suppose the result of the computation is r - this is the thing we are actually interested in. Then we have

|s,x> => |s,x^(s&r)>

In other words, this is saying that we only exclusive-or x with r if s is 1. (I used '^' to mean boolean 'exclusive-or' and '&' to mean 'and'.) For example

|0,0> => |0,0> and |1,0> => |1,r>.

Obviously s must be 1 in order to get a useful value of x.

Now let's move onto quantum computing. Now we no longer have a discrete set of states but instead a vector space. The time evolution of a quantum computer is given by a unitary linear operator. If the states evolves to state t we write this as

|s> => |t>

where we now interpret |s> and |t> as vectors in a normed vector space. The first important fact to know about quantum computers are that if |a> and |b> are possible states then so is any complex linear combination of these states. The second (fact(2)) is that if system A has states in a vector space V(A), and
system B has states given by vector space V(B), then the combined system is given by V(A) (x) V(B), where (x) is the usual tensor product over C. The third important fact (fact (3)) is about the interpretation of the state |a>+|b>. This makes no sense classically, but in QM linear combinations (called superpositions) are allowed. The rule is that if you observe this state to see whether it's in state |a> or state |b> then it 'collapses' into state |a> or |b>. The probability of state |s> collapsing into state |a> is given by |<a|s>|^2/|<a|a>||<s|s>>| where <x|y> is fancy notation for the inner product of |x> and |y>. (There's another rule of QM, if |a> and |b> are distinct possible observable outcomes of an experiment, then <a|b>=0.)

We now move our classical computer into Quantumland by considering the states |s,x> to be the basis elements of a vector space. Using fact (2) above we can consider these to be |s> (*) |x>. The rule

|s,x> => |s,x^(s&r)> (rule A)

defines what happens to each basis element and hence defines a linear map. For basis elements it acts just like a quantum system. But with a quantum computer we have available all kinds of interesting linear operators that make no sense for a classical computer. In particular, given a register we can apply this operator to it:

|0> => a*(|0>+|1>)
|1> => a*(|0>-|1>)

where a = 1/sqrt(2). This is linear and unitary. Notice also that if we do it twice we get back to where we started from. To make things easy we'll drop the constant a and work with linear operators that are unitary up to multiplication by a constant. The state |0>+|1> is a kind of weird mixture of on and off-ness.

So the trick to counterfactual computation is that instead of switching on the computer you instead 'half' switch it on using the linear operator above. Note that if you 'half' switch it on twice it just ends up being off again. But instead of 'half' switching it on twice in a row, we half switch it on, allow the quantum computer to evolve by rules (A) and then half switch it on again. Let's work through the details. First note that we are only 'half' switching on the first register. This means the linear operator applies only to the first factor in the tensor product |s>(x)|x>. So we get:

|0,0> => |0,0> + |1,0>

Now consider the result of this followed by allowing the computer to evolve by (A)

|0,0> => |0,0> + |1,r>.

And now follow this by another 'half' switch on:

|0,0> => |0,0> + |1,0> - |1,r> + |0,r>.

If r = 0 then:

|0,0> => |0,0> (modulo a constant multiple)

In other words, the state is unchanged.

If r = 1 then:

|0,0> => |0,0>+|1,0>+|0,1>-|1,1>

We no longer have the cancellation. Following fact (3), because of the |0,1> term above we have a non-zero probability of finding that the register x contains the answer r and yet the on-off switch is off.

And that's all there is to it. With a bit more cunning you can actually get the probability of finding r as high as you like. There are some great potential applications here. We have effectively left x untouched and this translates into practical 'interaction-free' experiments that allow us to make certain kinds of measurement leaving the measure system unchanged. (You may see why I used 'exclusive-or' above - just copying the result into the second register wouldn't have given me a unitary operator.)

BUT, and this is a big but, I see no valid way to interpret this as 'counterfactual computation'. In other words, I completely disagree with the interpretation of this result. In particular, suppose r equals zero. Then the reason we end up with the result |0,0> is that we have had a fortuitious cancellation. But we have actually passed the register |0>+|1> through the system. We only get |0,0> at the end because our second 'half' switching operation has made the effects of this 'destructively' interfere. In other words, it's just like we put on noise-cancelling headphones and then claim that there was no sound at all. That's silly, there was lots of sound, we just arranged for it to cancel at our ears. In our quantum computer both |0> and |1> waves passed through the system, but the second 'half' switch operation made them cancel. In the r=1 case you can thing of the shapes of these waves being messed up so they no longer cancel. (By the way, the talk of waves is quite fair here, you can think of the elements of these state vector spaces in fact being wavefunctions and the linear operators being time evolution of differential equations remarkably similar to the wave equation.)

In other words, I'm calling these physicists' bluff. There is nothing counterfactual about this computation at all. Though we might be able to get the result of our computation with the switch in the off state, that's only because it was partly on during the computation and we arranged for the state of that register to be destructively cancelled out at the end. I claim that the descriptions given of the experiment are completely bogus.

In fact, if you read Jozsa and Mitchison's paper you'll see that in his discussion his language is quite guarded. I think they realise that this talk of counterfactuality is a little bogus.

Update: Scott Aaronson rants about this too over here. Make sure you read Kwiat's response.

Sunday, February 26, 2006

The ambiguous operator, pt.2

In 1963 John McCarthy, the inventor of Lisp, published the paper A Basis for a Mathematical Theory of Computation in which he proposed the function (in the computer program sense of the word) amb(.,.). The idea is that amb(x,y) is first equal to x. But if later in the computation it is found that this leads to some sort of contradiction the value of x is retracted and replaced with y. This is a much more complex business than it may seem to be at first. Retracting a value essentially means winding back the entire state of the computation to where it was when amb returned the value x, and then slipping in the value of y. This means somehow freezing and copying the entire state when x was first returned. When a contradiction is found the entire state of the program is discarded and replaced with the frozen version which is reactivated. These frozen states are known as continuations. In many ways it's like a GOTO statement on acid. It can cause a jump to an arbitrary spot on your code. But continuations are nicer than GOTOs because they are more amenable to logical reasoning.

There are a number of languages that have support for continuations built in, including Scheme with its call-with-current-continuation function. But it takes a little care to understand exactly what is meant by "wind back" in this context. Although the state of the current computation is wound back, including things like local variables and the state of the functions currently being executed, global variables remain unchanged. So even though we wind back time and throw away much of the current state some information can still "leak through" from the path that we are now pretending never happened.

So, to get back to the devilish story of part 1: suppose we have a C keywords called TRY and FAIL with the bizarre property that TRY functions just like the return statement except that if a later FAIL statement is met your program winds back to the last TRY statement, undoing any effects caused by the previous return value, and continues execution after the TRY statement. The ambiguous operator would be implemented like this:


int amb(int a,int b)
{
    TRY(a);
    return b;
}

amb(a,b) would return a. But if a later FAIL is met it would unwind the computer back to its previous state and then continue execution with the return b line. Global variables would remain unundone. Given such keywords we can now define:


int devilChoice()
{
    printf("I choose B\n");
    TRY(B);
    printf("Sorry I meant to say A\n");
    TRY(A);
}


void devilCheats()
{
    FAIL;
}

But C has no such keywords and there's no way to write such functions - unless you cheat a bit. The required code is on my web site here. The trick is to freeze the state of the program by literally reaching in and taking a copy of the C stack. Remarkably this code runs fine, without modification, when compiled with gcc, mipspro or MSVC on IA32, AMD64, PowerPC and mips under Windows, MacOS X, Irix and Linux for all legal combinations of those compilers, processors and operating systems.

But despite the implementation of TRY and FAIL being an ugly hack they really are nice operators to have. Papers like Wadler's lend a little legitimacy to continuation shenanigans by showing there is a nice theory underlying them, and that by allowing them you bridge the gap between intuitionistic and classical logic in the Curry-Howard isomorphism. They can also make for some surprisingly simple implementations of problems. For example, using TRY you can write an elegant non-brute force C sudoku solver in a couple of lines. In fact, you find yourself able to write code that is declarative rather than imperative, even though you are working in C. The backtracking is handled behind the scenes by TRY and FAIL which can be nested recursively (as suggested in McCarthy's original paper) without any difficulty. I originally wrote that code in order to implement a pattern matcher similar to Mathematica's over ten years ago. It took me several years to figure out how to rewrite it without using continuations - using continuation passing style instead. (I had no Computer Science training, I hadn't even heard of this stuff.) Nowadays I'd just use a backtracking monad in Haskell...

For an implementation in Ruby see randomhacks. For code similar to my C code see Aubrey Jaffer's implementation (in C) of Scheme. You can also do the same thing semilegally through the setcontext() and getcontext() functions if they are available for your operating system. You can probably do something similar using threads or fork().

Saturday, February 25, 2006

The ambiguous operator, pt. 1

This was originally intended as my programming and mathematics blog but I never got around to any programming here, until today.

My recent interest in logic is partly motivated by a desire to understand the paper Call by Name is Dual to Call by Value. I suspect the paper is almost trivial, but that you have to be very familiar with the prerequisites before you can appreciate its triviality. (Ie. I'm using the mathematician's definition of trivial of course :-) )

Now I don't understand that paper yet, but on the seventh page there is a story about the Devil. Below I have a computational representation of the story written in C. The idea is, what must the functions devilChoice() and devilCheats() do in order to allow the devil to capture this soul? devilChoice() must return A or B.


#define A 0
#define B 1

int my_main() {
  int choice;

  static int devilMoney = 0;
  static int suckerMoney = 0;
  static int suckerKarma = 0;

  printf("Devil says:\n");
  printf("Here's my offer.\n");
  printf("I will choose either A or B\n");
  printf("If A I'll give you $1000000000\n");
  printf("If B I'll give you any wish for $1000000000\n");
  printf("Do you accept\n");

  printf("Sucker says: 'yes'.\n");
  choice = devilChoice();

  switch (choice)
  {
  case A:
        if (devilMoney<1000000000)
        {
            printf("Devil is unable to pay up.\n");
            printf("Devil is in big trouble...\n");
            exit(1);
        }
        printf("Devil gives 1000000000 to sucker\n");
        suckerMoney += 1000000000;
        devilMoney -= 1000000000;
        break;
    case B:
        printf("Sucker begs, borrows, steals $1000000000\n");
        suckerMoney += 1000000000;
        suckerKarma -= 1000000000;

        printf("Sucker gives money to Devil\n");
        devilMoney += 1000000000;
        suckerMoney -= 1000000000;

        printf("Sucker is about to make wish...\n");
        printf("'I wish to go to heaven'\n");
        devilCheats();
        printf("Sucker goes to heaven and devil loses a soul\n");
        exit(1);
        break;
    }

    if (suckerKarma<0)
    {
        printf("Sucker goes to hell with $%d\n",suckerMoney);
    }
}

(Hmmm...syntax colouring didn't work out quite as well as I wanted.)

In fact, I'm going to leave this as an exercise and give a possible solution another day. But the title of this post gives a clue - and of course the paper proposes a solution too. The code doesn't need to be 100% portable.

PS You don't need to ask. I have a solution in not quite 100% portable C. But of course I cheat. But is there an interesting cheat?

Friday, February 24, 2006

The cut rule and reading seminars

Despite being moderately good at mathematics, even managing to scrape together a PhD, there are certain topics that are always brick walls to me so that I find it hard to get started even at the most elementary level. In algebraic topology I always had problems with spectral sequences but they're not so elementary and are notoriously tricky. But in logic I can barely get off the ground. Here's an example of a sentence from an introduction to linear logic that baffles the hell out of me "One of the most important properties of the proof-rules for Classical Logic is that the cut-rule is redundant". This is one of the most ridiculous things I have read in mathematics writing. If it's redundant then don't study it. Excise it from the list of derivation rules and don't bother with it every again.

I'm sure than when set theorists first tried to write down the axioms that became ZF they found lots of redundant axioms. Over the years they were whittled them down to the list we have today so that I bet you can't even name the axioms that were jettisoned for redundancy. Not so in "Gentzen style" logic. Every document ever written on the subject seems to introduce this rule and then with a flourish they show how it can be eliminated. They develop the whole subject and then proceed to demolish the earlier work by showing how they can rewrite everything they did earlier without using this rule. The only explanation I can come up with is that authors of books on logic are paid by the word and that this allows them a few extra chapters for very little work.

Of course the problem here must be me. I'm sure there's a perfectly good reason for harping on about the cut rule, I just don't see it. And I think this points to a difficulty with trying to read mathematics texts outside of academia. When you're a student you're sharing extratextual material all the time. People tell you that such and such a result is interesting because it has an application somewhere and then you go and read the formal details knowing what the motivation was. Or someone will give an informal seminar where they write equations on the board, but speak much more informally between the equations. But most mathematics texts hide this material from you and present the bare facts. This is fine when you're in an academic environment, but I have to confess to finding it difficult when working on my own.

One thing I'd love to see online is the equivalent of the reading seminars we did during my PhD work. Each week we'd read a chapter and then have a seminar to discuss what we'd just read. Does anyone do this? Blogs seem like a great way to do this but I've seen no evidence of groups working like this.

Anyway, I shall try to persist until I see the light...

Friday, February 17, 2006

Mathematics has Competition

Mathematics is the lingua franca of science. Except that Bernard Chazelle now claims that there are two such languages - mathematics and computer science. It's a curious claim. He likens mathematics to "epithets" and computer science to "novels" and claims that a computer science background is imperative to studying biology. All he thinks we need are a great populariser of computer science (wasn't that what Wolfram tried to be?) and an Einstein.

I'm inclined to ask, with the article's author, "Isn't computer science really just a stepchild of mathematics?"

Thursday, February 16, 2006

Young babies count

At least according to an article in Scientific American.

Of course it's not clear that this is "counting" in any conventional sense. Additionally, some monkeys can count this well (and I think also many other mammals). So I'm not sure what conclusions can be drawn from this experiment.

Wednesday, February 08, 2006

What's the most implausible but true mathematical proposition?

Some would say the Banach-Tarski paradox but my attitude has always been that all bets are off when dealing with unmeasurable sets. I think the existence of a number satisfying the main property of Khinchin's constant is much less plasuble.

Addendum: According to this paper, even though the Khinchin property is true for almost all real numbers, nobody has explicitly found a number for which it is true! You can artificially construct a number for which it is true but that doesn't give you an explicit result, and anyway, that's cheating. Experimentally, π gives something near the Khinchin constant, but there's no proof of what happens in the limit.

Monday, February 06, 2006

Is Mathematics too Easy?

Over the years many people have felt that mathematicians have made life too easy for themselves by using axioms that seem much like the proverbial sledgehammer cracking open a nut. Look at the way analysts and algebraists will wield the Axiom of Choice when they can prove what they need with much weaker axioms. Wouldn't it be more enlightening to carry out these proofs using the weakest axiom system possible so we can see exactly what is needed to make them provable?

It turns out that this is more or less what reverse mathematicians do. They take theorems such as the well known Bolzano-Weierstrass theorem and try to figure out precisely how much mathematical machinery is required to prove them. There is no need to use all of ZFC, or even ZF, to prove such a result.

So reverse mathematicians sometimes start with something simpler like second order arithmetic. In its universe are integers and sets of integers, and nothing else. It can still talk about real numbers by encoding them as Cauchy sequences in the usual way but there is no way to encode subsets of the reals in this way. Maybe surprisingly, it is possible to represent continuous functions because the rationals are dense in the reals so continuous functions are defined by the values they take at rational numbers. It turns out that almost all classical mathematics can be encoded in this system even though it's much weaker than ZF.

But usually reverse mathematicians like to go even weaker still and work with recursive comprehension which is essentially the Peano axioms combined with induction and a restricted comprehension axiom. Using these weak tools it's still possible to prove things like the Intermediate Value Theorem, the uncountability of the reals and even results about metric spaces such as the Baire category theorem (but only for separable spaces).

One thing that has always intrigued me about number theory is how many theorems are proved by making excursions into analysis. For example, think about results in analytic number theory that can easily be stated using the language of Peano's axioms, and yet whose proofs require techniques like contour integration that make reference to infinite sized objects and limits. In 1988 Stephen G Simpson proved that such theorems can in fact be reduced to "primitive recursive arithmetic" and hence turned into "finitistic" proofs. On the other hand, I'm not sure that these proofs are necessarily going to give any insight into why the theorems are true. They probably end up being as unwieldy as Borbaki's definition of one.

Sunday, February 05, 2006

An End to Coding Theory

Error correcting codes have inspired some beautiful mathematics. One of my favourite mathematical constructions, the binary Golay code, is an error correcting code.

The idea behind the theory is simple. Suppose there is a set of M messages, one of which you want to send over a noisy communication channel. If you simply send one of the messages it might be corrupted and interpreted as another. So instead you embed the set M inside a larger set C and send an element of C as a message. If the message is corrupted then they might receive a message in C but not in M. They'll know right away that the message has been corrupted. But even better, if the embedding of M in C is designed cunningly enough they'll be able to make a good guess as to what the original message was assuming some probability model for the corruption. You pay a price of course, the elements of C typically take longer to transmit than the elements of M, reducing the rate of data transmission.

There have been lots of proposed schemes over the years since Shannon started the field. They use a variety of methods from discrete mathematics and algebraic geometry. There are nice connections between the sporadic groups and coding theory. The Mathieu groups arise directly as symmetries of the Golay code and the Golay code can itself be derived from the Leech lattice whose symmetries are described by the Monster. A common feature of these codes is that they are linear. The set C is a vector space over F₂, the field with two elements. M is a subspace of this space. M is the kernel of an F₂-linear map represented by a matrix called the syndrome. If the syndrome, when applied to a message, gives zero, then it's likely that the message has arrived uncorrupted. Much of the work in coding theory has been about generating suitable syndrome matrices.

But in the sixties Robert Gallager looked at generating random sparse syndrome matrices and found that the resulting codes, called Low Density Parity Check (LDPC) codes, were good in the sense that they allowed messages to be transmitted at rates near the optimum rate found by Shannon - the so-called Shannon limit. Unfortunately the computers of the day weren't up to the task of finding the most likely element of M from a given element of C. But now they are. We now have near-optimal error correcting codes and the design of these codes is ridiculously simply. There was no need to use exotic mathematics, random matrices are as good as almost anything else. The past forty years of coding theory has been, more or less, a useless excursion. Any further research in the area can only yield tiny improvements.

A good place to learn about these codes is in the book by one of the 'rediscoverers' of Gallager's codes, David McKay. There's also the Wikipedia article. I originally heard of these codes from a friend of mine and was sceptical at first. Then I read this Science News article and I'm now a bit embarassed about my scepticism.

Monday, January 30, 2006

The Flyspeck Project

I just came across the Flyspeck Project. The problem they are dealing with is the verification of the Kepler conjecture. Although there is a potential proof nobody is 100% sure of its validity and it seems that the group asigned to verify it have run out of steam. So the Flyspeck Project seeks to build a complete formal proof using HOL Light, a computational framework for proving things. Unfortunately the guys behind the project expect it might take as long as twenty years to verify the potential proof. Let's hope that not too many interesting conjectures require this treatment.

Note that this is different from the four colour problem. For that problem, the proof required an enumeration that was too long and complicated to carry out by hand, so a machine carried it out. For this problem we have a slightly different situation: we already have a potential proof, but we need to reduce it to a completely formal form so that the entire thing can be verified to follow from axioms.

Sunday, January 29, 2006

Biggest Test of Benford's Law?

Earlier this week a swapped a couple of emails with someone at Google involved with looking at web page statistics on the Google database. They probably have the one of the largestest corpus of textual data in the world and it's a mixture of data from a wide variety of sources. This makes it an ideal candidate for testing Benford's law and that's what I'm suggesting they do. The published tests I've seen so far are pretty small but it'd be fun to see the results from, say, a billion web pages. Coincidentally Boing Boing just had a linky to a New York Times story on it form 1998. Anyway...I asked the guy to email me if they carry out the test. It's probably pretty low down on their list of priorities. So of course I'll report back.

Friday, January 27, 2006

De Rham Cohomology and Computer Algebra

Computers are now good at a number of branches of mathematics. They've been performing symbolic integration, differentiation, summation and differential equation solving for decades. In recent years, with the advent of Grobner basis methods, they've become pretty competent at algebraic geometry. But one area has stood out in my mind as being unsuitable for computational methods and that's computing de Rham cohomology. In order to compute this we have to study spaces of differential forms on manifolds and these are large spaces. Given lots of pieces of differential form you can glue them to other bits of differential form using a partition of unity so in a sense you have a great freedom to mix and match forms. This is in stark contrast to algebraic geometry where spaces of functions frequently tend to be finite dimensional.

So I was surprised to find that in recent years computers have been calculating de Rham cohomology, and doing so using methods from algebraic geometry. In fact, functions to compute the de Rham cohomology of the complement of various algebraic varieties have been available in Macaulay2 for a while. There's a chapter of the Macaulay2 book available on the subject. The methods used are based on D-modules and discussed in the book Grobner Deformations of Hypergeometric Differential Equations.

I have no idea where the connection with hypergeometric series is coming from but it's interesting to see the connection with algebraic geometry. D-modules are rings with elements x_i and d_i sith the proprty that x_ix_j=x_jx_i and d_id_j=d_jd_i but that d_ix_j-x_jd_i=δ_ij. We can form ideals of these things and study them using Grobner basis methods. Given a smooth function we can form the ideal of elements of a D-module that annihilate the function by considering the d_i to be the differential operators d/dx_i. This gives a connection between the methods of algebraic geometry and differential geometry. But that's as far as my knowledge extends for now.

Saturday, January 21, 2006

Geometry is Innate

It seems that geometry is innate according to a MSNBC. The study worked with the Mundurukú people from a remote region of the Amazon who lack vocabulary for geometric terms. Nonetheless they scored well on a variety of nonverbal tests. Judging by the pictures they were testing for the ability to identify connectedness, similarity and congruence. They also displayed an ability to use maps even though they are not a map-making culture.

It's curious to note that the people of this culture weave baskets that appear to involve interesting geometry and whose design look anything but 'innate'. And though 'remote' they also wear glasses and Western style clothing. So clearly they do transmit 'geometric' culture, even if it's nonverbal, and they have had non-trivial contact with cultures that do have a strong geometric vocabulary.

Thursday, January 19, 2006

Eleven Reasons to use Haskell as a Mathematician

As a mathematician, what programming language should you use when you need to do anything algorithmic? For some specific domains there are languages tailored for the task, for example Macaulay2 is great for some types of algebra. For general problems many turn to Mathematica or Maple. But although they are universal (in the sense of Turing) they're not great platforms for general purpose algorithms and can perform very poorly unless the feature set of these languages just happens to marry well with the problem you're trying to solve. These tools are also like the proverbial sledgehammer - you're using a giant engine to solve what can sometimes be a simple problem and you'll pay the price in terms of memory and CPU usage. If you want a general purpose programming language the answer is simple. There are eleven reasons why you should use Haskell. So if you're a mathematician who occasionally needs to write code, get started on a tutorial right away!

(1) Haskell is Functional

Functions are the bread and butter of mathematics so it makes sense to use a programming language in which functions are first class objects. It's not at all unusual for a mathematician to take a function of two arguments, say f(.,.), and construct from it a function of a single variable defined by f_x(y) = f(x,y). Computer scientists call such a thing a closure - it's a function constructed by supplying values for some, but not all, of the arguments of another functions. It's one of the most natural operations in the world and yet it's actually quite horrible to construct such a thing in languages like Fortran, C or C++. It's slightly cleaner in C++ because you can use objects and templates to simulate closures, but the syntax is still pretty hideous. (I know there are libraries that make this even easier - but they do so by implementing a functional language as a 'domain specific language' within C++, essentially making my point for me.) Here's how you do this in Haskell: you write "f x". The way it works is quite neat. Given sets A, B and C there is a natural bijection between the set of functions AxB->C and the set of functions A->C^B. So rather than write functions of multiple arguments Haskell function always take one argument and return the function that operates on the rest of the arguments. At first this might seem like a bad idea because functions of multiple arguments are now, in some sense, a second class object. But it's not a problem at all. To apply the function f to the arguments 1 and 2 we simply write "f 1 2".

To get an idea of how horrible it can be to work with functions without functional programming consider the code in Numerical Recipes in C. Most of their code is 'functional' in character in the sense that it is intended to perform operations on functions. But look at how you have to make use of horrible void * pointers to pass in those extra arguments to simulate closures.

(2) Haskell Insulates You

As a mathematician you're more interested in solving the problem than dealing with the idiosyncracies of the hardware and issues like memory management. Haskell does a great job of insulating you from the underlying hardware so that you don't have to worry about these things. Languages like Java aren't too bad at this either. C++ is terrible, if you decide you don't want to manage memory yourself you can use shared_ptr<> objects and turn your code into a horribly unreadable mess.

(3) Haskell Performs Well

It's surprising how well Haskell peforms. You might think "if every time I add a to b it has to construct a temporary intermediate object that represents the function that adds a to a number and then apply that function to b it's going to be slow". But that is not the way to look at things. Haskell doesn't simply execute code by working top to bottom, left to right through your source code. The compiler (itself written in Haskell) is pretty sophisticated and frequently performs comparably to C. It's generally not as fast as C, but it's usually in the same league. The Haskell compiler rips apart your code and reconstitutes it as an executable in ways that are quite surprising. In Haskell you can often write code using idioms that feel natural, but are generally considered slow in other languges, without performance impact. One example is in list handling. Often you want to return a temporary list of objects and perform some operation on them. In many languages there is an overhead in making such a list, only to take it apart again when you want to operate on the actual elements so you instead have to restructure your code so that no list is constructed. In Haskell the compiler will often figure out that your list is temporary and do the restructuring behind the scenes for you.

(3) Haskell is Declarative

Mathematicians like to spend their time saying what is the case rather than thinking in terms of sequences of operations. If b = f(a) and c = f(b) then it doesn't matter if you say c = f(b) and then b = f(a). Order is irrelevant. In an imperative language like C it matters to a large extent what order your code is written in because a statement like b = f(a) is an instruction. In Haskell you aren't giving a list of instructions so much as stating what is true. This fits much better with the way that mathematicians work. Haskell can take care of some of the details of precisely what order the operations are going to be performed in and hide those details from you.

(4) Haskell is Lazy

This goes hand-in-hand with Haskell being declarative. Haskell only computes what it needs to compute in order to get the answer you require. It tries not to do unnecessary work. Languages that aren't lazy like this are called 'strict' languages. In a strict language, say, if you required the first component of the cross product of a vector, you would typically compute all three components of the vector and throw away the second and third components. If your code needed to be efficient you'd probably end up writing special purpose code to compute just the first component of the cross product. You'd be unable to reuse your cross product code. Mathematicians don't want to do this. They want to say that x is the first component of the cross product because it's true statement and not worry about what exactly gets implemented when. This laziness is ubiquitous throughout all of Haskell.

One place that laziness really pays off is in working with lists. Haskell has absolutely no problem working with infinite lists as long as you don't actually try to look at all of the elements. Instead of evaluating all of the elements it simply retains enough information to construct as many elements as are needed. Computing the elementwise sum, say, of two infinite lists is completely straightforward. And performing computations with power series, say, where the later elements are functions of the earlier elements, is absolutely straightforward. You don't have to check to see if you've computed enough of the earlier elements to compute the later one. Haskell does all that for you. If you want to see the first ten elements of the final infinite list you can just ask for them. You don't need to decide in advance how many elements you want and build that into the code from the beginning.

(5) Haskell is good at Datastructures

Haskell is good at dealing with datastructures. Often mathematicians don't realise that they are using datastructures all the time. But they are. For example a group is a set equipped with functions to perform multiplication and inversion as well as a singled out element, the identitiy. Haskell makes it very easy to work with structures that are tuples of objects like this. It's also very easy to build one datastructure out of another and even specify rules about datastructures. For example a matrix is a two-dimensional array of objects. But if the objects form a ring then so does the matrix. This kind of statement is easy to express in Haskell making it well suited to working with algebraic structures. If you've defined formal power series over a ring and you've defined the ring of polynomials over another ring it's trivial to build the ring of power series over polynomials over the integers, say. What's more, the compiler can tell you when the structure you're trying to create doesn't make sense, for example if you were to try to multiply two matrices whose underlying elements can't be multiplied. A programming language like Mathematica is very poor at this as it doesn't really distinguish between datatypes. You'll only discover your error when the program is actually running and it tries to multiply the underlying objects. Haskell also uses type inference. You don't have to explicitly tell it the type of something because it can often figure it out for itself. If you see the number '1' in a textbook what does it mean? Is it an element of N, Z, R, Q, R[sqrt(2)], C or some other completely different algebraic structure? It could be any of these things depending on the context. The same is true in Haskell, it'll figure out what you mean by '1' using completely unambiguous rules. This means that you don't have to use a different symbol for '1' in different contexts, and yet you don't have to laboriously state what you mean every time you use it.

And rather than having an ad-hoc system of datastructure types Haskell's is algebraic.

(6) Haskell is Compact

Not in the topological sense. Haskell code tends to be much shorter than its counterpart written in other languages. There is little extraneous verbiage and the verbiage that does exist is very powerful. This is more than a space saving exercise. Consider the history of calculus in n-dimensions. When Maxwell first wrote down his equations of electrodynamics they took up most of a page. WIth the introduction of vector notation they were simplified. By using the notation of exterior calculus they were simplified to dF = 0, d*F = *J. This isn't just typographically simpler but conceptually simpler. You can see immediately how you might generalise to n-dimensions instead of three, you can see trivially that d*J=0 from d^2 = 0 and you can now perform countless computations in a few lines that previously would have taken dozens of pages. What's more the new notation makes it harder to accidentally write equations of motion that don't make physical sense because they automatically obey symmetries that are difficult to see in more complex notation. Haskell is just like this too. You can do a lot with a tiny amount of code. And if two blocks of code appear to do similar things you can often factor out the common part and make it a separate block of code in a way that's much harder than with just about any other language. This makes it much easier to generalise your code and have insights about it. This works well with the way mathematicians like to think about things.

(7) Haskell is Mind Expanding

Haskell draws on ideas from interesting branches of mathematics, for example Category Theory and Logic. And the logic part doesn't appear the way you expect - it appears via the Curry-Howard 'isomorphism' in the type system used to define datastructures. As mentioned above, Haskell code lends itself well to generalisations. You'll find all sorts of commonality between things that previously didn't seem similar, and you'll find ways to express that similarity in your code. One of my favourite examples is by Ralph Hinze. He doesn't just illustrate the
similarity between binomial tree and binary arithmetic, he actually shows how merging binomials and adding binary numbers are actually instances of the very same thing. Haskell encourages this kind of abstraction. Though I've wandered into the realm of computer science with talk of trees this is still the type of abstraction that mathematicians find interesting.

(8) Haskell Has No Side Effects

When you add two numbers you get the sum. You don't find that something else in the universe has happened as well (except maybe the temperature in the room goes up if it's a really hard sum). So functions in a programming language shouldn't have any side effects. In Haskell a
function does nothing other than return the value of its result. This means that it's easy to bring your mathematical experience to bear when analysing programs. (Almost) all of the usual rules for manipulating mathematical expressions apply so that, for example, if b = f(a) and c =
g(b,b), then c = g(f(a),f(a)). In many other programming languages you'd have to make sure that calling f twice didn't have some side effect that made it behave differently to calling it once.

(9) Haskell Eats Circular Definitions for Breakfast

In most imperative languages a line of code like x=f(x) means replace the value of x with f applied to the old value of x. In Haskell it means x equals f(x). It has no problem with you defining a variable in terms of itself and x=f(x) means that x is a fixed point of the function f. In practice it's not so simple, you can't just write x = 1-x^2 and expect it to figure out that x is the golden ratio. Haskell is a programming language not an equation solving package. But in my formal power series library I'm able to use a line like "e = 1+integrate e" to define the power series for exp x. See here for another example of using circular definitions to solve differential
equations.

(10) Haskell is Well Founded

What I mean is that Haskell is built on fairly solid mathematical foundations in lambda calculus and proof theory. It's not just an ad hoc language that has accreted features over the years. Everything is there for a reason and people have thought long and hard about the theoretical underpinnings of what went into it.

(11) Haskell is Pretty

Well, that's a matter of taste. But I find it very elegant.

That's 11. I think there's a rule against making lists with prime numbers of elements but I'm ignoring it.

To be fair I ought to point out that it's not all a bowl of cherries. Haskell isn't very good at dealing with arrays. In most programming languages it takes time O(1) to access the ith element of an n element array. In Haskell you can get this kind of performace if you don't mind using the various painful to use array operators available, or you can sacrifice performance, keep the elegance, and get O(log n) performance. On the other hand, you'd be surprised just how many useful matrix algortihms don't require random access to the ith element.

Because Haskell functions don't have side effects it doesn't make sense to perform an operation more than once because it'll just do the same thing each time. So there's no notion of a 'for' loop (well, maybe there is) in Haskell and instead you must find other ways to express iterative operations - ie. using 'higher order functions' or recusrion. But sometimes you feel like you just want to write the loop and be done with it. There's always another way of expressing the problem but when the iterative approach is the first thing to pop into your head it can occasionally be frustrating. And sometimes you'd like side effects.

Haskell comes with a library of functions called the 'standard prelude'. Unfortunately, whoever designed it knew just enough mathematics to be dangerous and made a complete hash of it. Instead of using a well known hierarchy of algebraic structures such as group->ring->field they defined all kinds of bizarre structures starting with something like a ring with a norm or valuation. And while the library supports complex numbers it's not flexible enough to support the Gaussian integers (probably a consequence of forcing rings to have a norm). All things that could have been implemented correctly in a few seconds if a mathematician had been consulted early on.

And the last caveat. I found learning Haskell hard. Let me put it in perspective: I'm 39. I've been programming since I was 11. I've been writing recursive code since about 15 and I've been using whatever approximations to functional programming are avalable to me in any language I use for almost that long (including torturing myself with C++ template metaprogramming). I've even implemented a couple of declarative languages, Prolog and SASL, the latter being a functional lazy language a lot like Haskell that I thought I'd investigate before Haskell. And I still found Haskell tricky to learn. But it was worth it. And I believe that the trickiness of the language is the kind of trickiness that mathematicians should be able to handle. (And it probably helps if you don't know any C or C++...)

A friend of mine, and I, have been writing all kinds of mathematical code and finding Haskell well suited to the task. Logic (eg. building analytic tableau as in Smullyan's book on first order logic), Group Theory (eg. the Schreier-Sims algorithm), Calculus (automatic differentiation), Combinatorics (via formal power series), Algebra (eg. Grobner bases) and now we're looking into investigating relationships between elliptic functions and modular forms. (Hmmm...my friend takes most of the credit here.) Haskell is great at all of this stuff even though it's not in any way an algebra or mathematics package.

And one list thing about Haskell. If x is an integer (called an "Integer" in Haskell) then x+1>x. It's nice to work over the integers rather than a cheap and nasty 2-adic approximation (though the cheap and nasty approximation is there if you want it.)

Friday, January 13, 2006

You're needed!

As reported in Business Week, and cited on Slashdot, here's a story about why mathematicians aren't so useless after all.

Friday, January 06, 2006

General Relativity for Dummies

No, really, for complete dummies. John Baez has just posted a description of Einstein's field equations that just about anyone can understand. So simple, even, that it can be described in words. And yet the formulation is equivalent to the field equations. It goes like this:

Given a small ball of freely falling test particles initially at rest with respect to each other, the rate at which it begins to shrink is proportional to its volume times: the energy density at the center of the ball, plus the pressure in the $x$ direction at that point, plus the pressure in the $y$ direction, plus the pressure in the $z$ direction.

More information is here. He also has a derivation of Newton's law of gravition here.

As Baez points out - at first it looks implausible because the Einstein field equation is a tensor equation and this looks like a scalar equation. But the statement must hold true for any set of test particles that are at rest relative to each other. So at any point it doesn't just give one constraint.

This is pretty neat stuff and it ought to go into all GR textbooks...

Tuesday, January 03, 2006

Self-Reference in the Real World

A sign. From Boing Boing,

Wednesday, December 14, 2005

Theorem Proving and Code Generation

According to the Curry-Howard isomorphism a type in a programming language is essentially the same thing as a proposition and an object or program of the type is a(n intuitionist) proof of the proposition. So an automatic proof generator should also be usable to generate objects or programs of a given type. This is exactly what Djinn does. It seems pretty neat.

(I only learned about the Curry-Howard isomorphism in the last couple of years but it's probably of general interest. So I recommend reading the Wikipedia article I link to if you haven't met it before. It's also probably not what you're currently guessing it is. It's not about program correctness but type correctness and it doesn't prove programs to be correct but shows how programs themselves are proofs of type correctness. In the language of the article, Djinn solves the "Type Inhabitation Problem".)

Wednesday, December 07, 2005

Rant: Just for kids?

Search Google News for stories about mathematics and you'll find that they are dominated by a particular subsection of our society: kids. It seems that the only people in the world doing mathematics are children. Very occasionally
you'll see a story about an adult, but then it's usually about an adult teaching children. Do a search on history and you'll see some actual history. Search on physics and you'll get stories about kids mixed in with stories about people doing actual physics. So why are there so few stories about adults doing mathematics?

It's not just in news stories. Look at Texas Instruments' and Hewlett Packard's web sites and you'll see that calculators are largely marketed towards kids. TI are much worse on this front. They have powerful calculators that can solve all kinds of real world problems and yet they're not being marketed as devices for solving such problems, they're about getting kids through exams. HP at least have a section for professionals. Even so, in the old days if an HP calculator had exchangeable faceplates it was so that you could label the keys with new functions, but now it's to change the colour to make it look cool. The only people working with numbers are clearly children.

On closer inspection I see that I was wrong. I missed a story. It's about an adult. No mention of children or education. Ah...it's about an adult who can't do mathematics.

So clearly no adults use mathematics. So why do we bother teaching children a skill they'll never use in adulthood?

Friday, December 02, 2005

Bees can fly after all!

According to this article it seems that bees can fly after all.

PS I'm wondering about the field of 'human aviation' that the article mentions. Am I lacking a superpower that normally comes standard with being human?

Search This Blog