Search This Blog
Friday, March 31, 2006
Quantum Probability
Anyway, I did some more digging on the web and found this course by Greg Kuperberg. The opening paragraphs almost take the words I wanted to say out of my mouth. In particular, despite the mystical mumbo-jumbo that is often written on the subject, the rules of quantum mechanics are "rigorous and clear" and "The precepts of quantum mechanics are neither a set of physical forces nor a geometrical model for physical objects. Rather, they are a variant, and ultimately a generalization, of classical probability theory." Most of all "...more mathematicians could and should learn quantum mechanics...". You don't even have to understand F=ma to get started with quantum mechanics and get to the point where you can really and truly get to grips, directly, with the so-called paradoxes of quantum mechanics such as the Bell Paradox. The strange thing is that you won't find words like this in most of the quantum mechanics textbooks. They throw you into physical situations that require finding tricky solutions to the Schrödinger equation while completely failing to give any insight into the real subject matter of quantum mechanics. Most QM books I know are really introductions to solving partial differential equations. (Remark to physicists: I bet you didn't know you could get the simultaneous eigenvalues for the energy and angular momentum operators for the hydrogen atom by a beautifully simple method that doesn't require even looking at a differential equation...) The best thing about the newly appearing field of quantum computing is that it's slowly forcing people to thing about quantum mechanics separately from the mechanics.
So even though I haven't read that course myself yet, I'm recommending that everyone read it :-) And some time I might get back to the even more elementary introduction I hope to put together.
Thursday, March 30, 2006
A Neat Proof Technique
The paper is about the generalisations of 'fold' and 'unfold' that I played with recently. The usual fold function acts on a list like
[4,8,15,16,23,42].
Imagine it written as
cons(4,cons(8,cons(15,cons(16,cons(23,cons(42,[]))))))
where the cons function is a 'constructor' that constructs a list from a list element and another list. [] is the empty list. The fold function takes three arguments, a binary function f, a constant g, and a list. It then replaces cons by f and [] by g. For example
fold (+) 0 [4,8,15,16,23,42]becomes
4+(8+(15+(16+(23+(42+0)))))
and so is 108, the sum of the numbers. An obvious theorem is that
fold cons [] x = x.
fold also generalises to structures such as trees. In this case there is another constructor that builds a tree from a node and its children. I'll stick with lists as the proof technique is much the same. The authors prove a bunch of theorems about the fold function. But now they want to prove something is true of all of the elements in a list (actually, a tree, but I use a list). Suppose the list is as above and you want to prove the property P holds for all elements in that list. Then you want to prove
P(4) && P(8) && P(15) && P(16) && P(23) && P(42) = True.
(I'm using Haskell's && to mean 'and'.) In other words you want to prove that
fold (&&) True [P(4),P(8),P(15),P(16),P(23),P(42)] = True.
The authors then proceed to show their theorem is true using the theorems about fold they just proved. Neat eh? But Maybe I haven't explained very well so I'll try to spell it out differently.
The authors want to prove theorems about all elements of a certain datastructure. So what they do is take the logical proposition that expresses what they want and rewrite the proposition so that the proposition itself in the same 'shape' as the datastructure. The statement of the truth of the proposition is an application of fold to the datastructure. So they can now use the theory of datastructures they develop to prove something about the structure of the logical proposition itself - namely that it's true. You need to read the actual paper (top left, page 5) to see the actual details of the proof technique.
In the particular case of lists the proof technique turns out to be standard induction. If you can prove P of an an element of the list, and you can prove "if P is true of an element it must be true of the next element", then P holds for every element of the list. Every time you generalise fold to a different datastructure there is a new induction principle that goes with it.
I suppose what they have actually done is prove a metatheorem about certain types of logical proposition and then used that to prove specific propositions.
Well, I hope that makes some kind of sense. I think it's quite a neat trick and that many of the paper's readers might miss the cool bit of self-reference that's going on here.
Monday, March 27, 2006
The General Theory of Self-Reproducing Programs
Consider the following in Haskell:
let p x = x ++ show x in putStrLn $ p"let p x = x ++ show x in putStrLn $ p"
Evaluate this expression in an interactive Haskell session and it prints itself out. But there's a nice little cheat that made this easy: the Haskell 'show' function conveniently wraps a string in quotation marks. So we simply have two copies of once piece of code: one without quotes followed by one in quotes. In C, on the other hand, there is a bit of a gotcha. You need to explicitly write code to print those extra quotation marks. And of course, just like in Haskell, this code needs to appear twice, once out of quotes and once in. But the version in quotes needs the quotation marks to be 'escaped' using backslash so it's notactually the same as the first version. And that means we can't use exactly the same method as with Haskell. The standard workaround is not to represent the quotation marks directly in the strings, but instead to use the ASCII code for this character and use C's convenient %c mechanism to print at. For example:
main(a){printf(a="main(a){printf(a=%c%s%c,34,a,34);}",34,a,34);}
Again we were lucky, C provides this great %c mechanism. What do you need in a language to be sure you can write a self-replicator?
It turns out there is a very general approach to writing self-replicators that's described in Vicious Circles. What follows is essentially from there except that I've simplified the proofs by reducing generality.
We'll use capital letters to represent programs. Typically these mean 'inert' strings of characters. I'll use square brackets to indicate the function that the program evaluates. So if P is a program to compute the mathematical function p, we write [P](x) = p(x). P is a program and [P] is a function. We'll consider both programs that take arguments like the P I just mentioned, and also programs, R, that take no arguments, so [R] is simply the output or return value of the program R.
Now we come to an important operation. We've defined [P](x) to be the result of running P with input x. Now we define P(x) to be the program P modified so that it no longer takes an argument or input but instead substitutes the 'hard-coded' value of x instead. In other words [P(x)] = [P](x). P(x) is, of course, another program. There are also many ways of implementing P(x). We could simply evaluate [P](x) and write a program to simply print this out or return it. On the other hand, we could do the absolute minimum and write a new piece of code that simply calls P and supplies it with a hard-coded argument. Whatever we choose is irrelevant to the following discussion. So here's the demand that we make of our programming language: that it's powerful enough for us to write a program that can compute P(x) from inputs P and x. This might not be a trivial program to write, but it's not conceptually hard either. It doesn't have gotchas like the quotation mark issue above. Typically we can compute P(x) by some kind of textual substitution on P.
With that assumption in mind, here's a theorem: any program P that takes one argument or input has a fixed point, X, in the sense that running P with input X gives the same result as just running X. Given an input X, P acts just like an interpreter for the programming language as it outputs the same thing as an
interpreter would given input X.
So here's a proof:
Define the function f(Q) = [P](Q(Q)). We've assumed that we can write a program that computes P(x) from P and x so we know we can write a program to compute Q(Q) for any Q. We can then feed this as an input to [P]. So f is obviously computable by some program which we call Q0. So [Q0](Q) = [P](Q(Q)).
Now the fun starts:
[P](Q0(Q0)) = [Q0](Q0) (by definition of Q0)
= [Q0(Q0)] (by definition of P(x))
In other words Q0(Q0) is our fixed point.
So now take P to compute the identity function. Then [Q0(Q0)] = [P](Q0(Q0)) = Q0(Q0). So Q0(Q0) outputs itself when run! What's more, this also tells us how to do other fun stuff like write a program to print itself out backwards. And it tells us how to do this in any reasonably powerful programming language. We don't need to worry about having to work around problems like 'escaping' quotation marks - we can always find a way to replicate the escape mechanism too.
So does it work in practice? Well it does for Haskell - I derived the Haskell fragment above by applying this theorem directly, and then simplifying a bit. For C++, however, it might give you a piece of code that is longer than you want. In fact, you can go one step further and write a program that automatically generates a self-replicator. Check out Samuel Moelius's kpp. It is a preprocessor that converts an ordinary C++ program into one that can access its own source code by including the code to generate its own source within it.
Another example of an application of these methods is Futamura's theorem which states that there exists a program that can take as input an interpreter for a language and output a compiler. I personally think this is a little bogus.
Stanislaw Lem has Passed Away
Come, let us hasten to a higher plane,
Where dyads tread the fairy fields of Venn,
Their indices bedecked from one to n,
Comingled in an endless Markov chain!
Come, every frustrum longs to be a cone,
And every vector dreams of matrices.
Hark to the gentle gradient of the breeze
It whispers of a more ergodic zone.
In Riemann, Hilbert or in Banach space
Let superscripts and subscripts go their ways.
Our asymptotes no longer out of phase,
We shall encounter, counting, face to face.
I'll grant thee random access to my heart,
Thou'lt tell me all the constants of thy love;
And so we two shall all love's lemmas prove,
And in our bound partition never part.
For what did Cauchy know, or Christoffel,
Or Fourier, or any Boole or Euler,
Wielding their compasses, their pens and rulers,
Of thy supernal sinusoidal spell?
Cancel me not--for what then shall remain?
Abscissas, some mantissas, modules, modes,
A root or two, a torus and a node:
The inverse of my verse, a null domain.
Ellipse of bliss, converge, O lips divine!
The product of our scalars is defined!
Cyberiad draws nigh, and the skew mind
Cuts capers like a happy haversine.
I see the eigenvalue in thine eye,
I hear the tender tensor in thy sigh.
Bernoulli would have been content to die,
Had he known such a2cos(2φ) !
The Most Amazing and Mysterious Thing in All of Mathematics
For those not familiar with algebraic topology, πm(Sn) is the set of equivalence classes of continuous functions from the m-dimensional sphere to the n-dimensional sphere where two functions are considered equivalent if they are homotopic. An easy way to visualise this is that two functions are homotopic if you can interpolate a continuous animation between them. (Can you guess what industry I work in?) This set also has a group structure which is straightforward to define but which I won't go into here (unless someone requests it). That's all there is to it. How can such simplicity generate such complexity?
Monstrous Moonshine is pretty mysterious too - but it takes a lot of work to state it for a non-expert. So this wins by default.
Like John Baez I also wonder about the curious appearance of 24 on row 3.
And an aside I discovered on Anarchaia: foldr.com related to an earlier post.
Friday, March 24, 2006
6
This is the actual theorem:
consider the category whose objects are the n element sets and whose arrows are the bijections between the sets. This category has a non-trivial functor to itself only for n=6.
By smart use of Google Print you should be able view the proof. It's the first five pages of Chapter 6. (Don't make the obvious mistake with Google Print and end up with only three pages of that chapter.)
Anyway, it's not too hard to give a bit of insight into what this means. Consider the set with n elements. You can build all kinds of combinatorial objects which have some underlying set. Permuting the original n-element set induces a permutation of the combinatorial object and hence its underlying set. If the underlying set also has n elements you usually just end up with the original permutation. For example consider n=3 and the combinatorial object that is the even permutations on the 3 element set. This also has three elements. But the induced permutations on this new set are equivalent to the original permutations (via a bijection from the 3 element set to the set of its even permutations). On the other hand, if n=6 then you can construct another 6 element combinatorial object where the induced action of S6 is quite diffferent to the original one. In fact, it gives an outer automorphism of S6, another bizarre thing that only exists for n=6. To see the actual details of the construction look here.
I should also mention that Todd wrote a paper on this subject: The Odd Number 6, JA Todd, Math. Proc. Camb. Phil. Soc. 41 (1945) 66--68.
It's also mentioned in the Wikipedia but that's only because yours truly wrote that bit.
Anyway, I'm vaguely interested in how this connects to other exceptional objects in mathematics such as S(5,6,12), the Mathieu groups, the Golay codes, the Leech lattice, Modular Forms, String Theory, as well as Life, the Universe and Everything.
Thursday, March 23, 2006
Sets, Classes and Voodoo
Here's how I understand the concept of a class in ZF Set Theory: talk about classes is really just talk about predicates. We enrich the language of Set Theory with a bunch of new terms ('class', 'subclass') and overload other terms ('is an element of', 'is a subset of') to give a new language that reifies classes, but instead of adding new axioms to deal with classes we provide a translation back to ZF without these extra terms. For example if P and Q are classes then "x is in P" means "P(x)" and "P is contained in Q means" "for all x, P(x) implies Q(x)" or even "a subclass of a set is a set" which translates to the axiom of separation. (We could alternatively add new axioms, instead of the translation, and then we'd get NBG Set Theory.)
Am I right so far? (By the way, nobody ever seems to say what I've just said explicitly. In particular, it seems to me that once you add the term 'class' you need to start proving metatheorems about classes to show what kind of deductions about them are valid, but nobody ever seems to do this.)
I understand that this is a sensible thing to do because of the overloading - in the enriched language sets and classes look similar and that allows us to do category theory, for example, in a much wider context, without having to define everything twice. (And also, maybe, because talk about sets is really a substitute for talk about classes...but that's a philosophical point for another day...)
So what does "If a class is a member of a class then it is a set" mean in the context of ZF? A class is really just a predicate, so it doesn't make sense to me that there could be a predicate about predicates. So at this point the book is looking like Voodoo.
Can anyone out there clarify this for me?
(Hmmm...I wonder if the translation to ZF is an adjoint functor...ow...MUST STOP OBSESSING ABOUT ADJUNCTIONS...)
Wednesday, March 22, 2006
The Representation of Integers by Quadratic Forms
If Q is a quadratic form: Zn->Z then if the image of Q contains {1, 2, 3, 5, 6, 7, 10, 13, 14, 15, 17, 19, 21, 22, 23, 26, 29, 30, 31, 34, 35, 37, 42, 58, 93, 110, 145, 203, 290}, then it contains every positiver integer.
Closely related is Conway's 15-theorem which originally inspired Bhargava and which (I think) I first read about in the excellent book The Sensual Quadratic Form
Tuesday, March 21, 2006
Category Theory Screws You Up!
At least I haven't reached the stage where I sit down to dinner wondering whether or not my eating it is an operation with a left or right adjoint. (But I thought it didn't I, so I must be pretty far gone.) And I'm not dreaming commutative diagrams yet.
So here's my advice: if someone comes up to you in a shady bar or alleyway and offers you a monad, or an adjunction, or even an innocent little natural transformation, just say "no!".
Friday, March 17, 2006
What can I do with adjoints? And Lemma 28.
In order to teach myself about F-(co)algebras I wrote some literate Haskell that actually made use of them. The neat thing was that it gave me something for nothing. I wrote a generic unfold function (mostly not by thinking about it but by juggling all the functions that were available to me until I found one of the correct type) and I was amazed to find that it did something that I found useful, or at least interesting.
Can I do the same for adjunctions? Unfortunately, all of the examples I tried were basically uninteresting. They were quintessential category theory, you spend ages unpacking the definitions only to find that what you ended up with was trivial. Unlike the fold/unfold case I didn't find myself ever defining a new function that I could use for something.
I did just find this link with some intuition that I'd never read before: 'To get some understanding of adjunctions, it's best to think of adjunction as a relationship between a "path" to an element and a "data structure" or "space" of elements.' Hmmm...none of the category theory books talks about that. I think I can see it: a forgetful functor rubs out the path and just leaves you with the destination. (Don't take my word for it, I'm just guessing at this point.)
Maybe I'm now on target to be able to understand Lambek & Scott
And I thought I'd mention Newton's Lemma 28. But after I asked a question about it on Usenet, some other people said far more interesting things that I could ever say.
Thursday, March 16, 2006
Answers and questions
(f^n)' = nf^(n-1)f'=nf^(n-1)(1+f^2) = nf^(n-1)+nf^(n+1).
You should now see why the CA rule arises.
(Why n,n instead of (n-1),(n+1) as in the rule I gave? Because this is the 'push' version rather than the 'pull' version of the rule that I gave. If you imagine differentiation acting as a linear operation on the vectors with respect to the basis (1,f,f^2,f^3,...) then the 'push' and 'pull' versions are described by matrices that are transposes of each other.)
I was petty excited by this and tried applying it to all sorts of other functions. But actually it was hard to make it work. Maybe the fact that there is a nice formula for the tangent function is just an accident. Anyway, Jan Rutten, whose paper I got this from, has lots of nice papers on coalgebras and infinite streams. I think I've now learned enough about coalgebras to move onto the next section of my category theory book: Adjoints.
Wednesday, March 15, 2006
Homotopies between proofs and between programs
Anyway, the comment got me thinking. I spend much of my time refactoring code. This is making small modifications to software intended to restructure it to a better form without changing its behaviour (eg. I've just been dumped with a single C++ source file 0.25MB long most of which is one function and is in considerable need of some tidying up!). If you think of a program as a path to get you from an input to an output then a change to that program, that keeps the behaviour the same, is a kind of homotopy between paths. But I couldn't really get anywhere with that idea.
In Baez's This Week's Finds this week he talks about homotopies between proofs. These are ways to convert one proof of a proposition into another proof of the same proposition. Cut elimination is such an operation but there are surely others.
According to the Curry-Howard isomorphism, a computer program that maps type A to type B is essentially the same thing as a proof of B assuming A. So a homotopy between programs is the same thing as a homotopy between proofs. For example, Baez's example of converting between two proofs P -> R corresponds exactly to converting the piece of Haskell code f . ( g . h) to (f . g) . h using the associativity of . (. is Haskell for function composition.)
So, is there some kind of interesting topology to be extracted from this? Does the space of programs that map input x to output f(x) have interesting fundamental groups or homology groups? I guess a good place to start would be with two simple programs to perform the same operation that can't be transformed into each other by simple steps. And do I really need omega-categories to make sense of this?
Anyway, besides Baez's stuff there's also theo's comments with a real world example of what looks like a homotopy between proofs.
Tuesday, March 14, 2006
Cellular automaton puzzle
+--+--+--+--+
|a0|a1|a2|a3| ...
+--+--+--+--+
Instead of a finite number of states the ai are integers.
Here's the update rule:
a0 <- a1
ai <- (i-1)*left+(i+1)*right, i>0
where left and right refer to the cells to the left or right of ai.
We start it in this state:
+--+--+--+--+--+--+--+
| 1| 0| 1| 0| 0| 0| 0| ... all zeroes
+--+--+--+--+--+--+--+
What is the significance of the numbers that appear in the leftmost cell?
If you can't figure it out the answer is here. (Though it might be easier to work it out for yourself than read that paper!)
I think this may actually be useful. I've spent quite a bit of time recently writing code to do the sorts of things that these automata do - but these automata may do it better.
Monday, March 13, 2006
Coalgebras and Automata
UPDATE: That came out formatted ugly so I've moved it to my web site where I'll eventually format it better. Here's the link.
Thursday, March 09, 2006
Hyperfun!
Consider games where the players take turns and the game is guaranteed to terminate in a finite number of moves. These are call well-founded games.
The Hypergame is quite simple. The first player chooses a well-founded game. The game now turns into that game and the second player starts by opening in that game with play continuing in that game. Eg. if the first player says "Chess (with whatever house rules are required to make it terminate)" then they start playing Chess with the second player moving first. Because of the stipulation of well-foundedness the Hypergame lasts precisely one move more than some well-founded game. Therefore the Hypergame is itself well-founded and always terminates.
Anyway, Alice and Bob decide to play:
Alice: Let's play the Hypergame!
Bob: Cool! I'll go first. My move is to select the Hypergame.
Alice: OK, now we're playing the hypergame. So my first move is to select the Hypergame.
Bob: The Hypergame is cool, I pick that.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
Alice: I pick the Hypergame.
Bob: I pick the Hypergame.
etc.
Wednesday, March 08, 2006
It's a square, square world!

Anyway, there are lots of interesting conformal projections out there. Some of them are quite pretty so I thought I'd share. I like the use of the Schwarz-Christoffel transformation, better known (to me) from 2D fluid dynamics, to map the world into a variety of polygons. I was also surprised to see that Charles Sanders Peirce worked in this area.
Just for the hell of it I might place a bid on the book.
UPDATE: Come on! 'fess up if you're the person bidding against me!
Tuesday, March 07, 2006
Blind Games
Consider this trivial game: you deal a pack of cards to N players. The players take turns and the winner is the first one to get rid of all their cards. A turn consists of discarding a set of cards that all have the same number or rank. This game isn't very interesting.
But now apply the 'blind transform' to this game with the 'punishment' of picking up all of the discards. The players play cards face down and make some claim about what cards they are laying down. If someone challenges this claim you have a look and mete out the punishment. A trivial game has now been turned into a mildly entertaining game. It can be turned into an excellent game by means of the 'ethyl transform' - playing whle drunk. But moving on...
One of my favourite games years ago, when I could talk people into playing it, was "Blind Le Truc". It's Le Truc but with all of the cards played face down but players claiming what the cards are. In tis case the punishment is losing a trick. Even played 'sighted' this game is full of bluff. Playing it blind brings it to a whole new level. Players can play against each other for extended periods of time playing what is practically an imaginary game going through all the motions of Le Truc without ever seeing a card. It's a lot of fun. Well, I think so anyway.
Even the most trivial of games can become interesting. Eg. each player is given an entire suit of a deck of cards. Each round a player places one card in the middle. Whoever places the highest card in the centre wins that round. Play continues until nobody has any cards left and the winner is whoever won the most rounds. After applying the blind transform, with round losing as punishment, this game is turned into something fun, at least for a few minutes.
Poker is also a blind version of a trivial game.
How amenable are these games to analysis? That last game is pretty simple. Suppose two players each have only 3 cards. What is the optimal way to play?
You can also apply the blind transform multiple times. It's more interesting if the nth application has a tougher punishment than the (n-1)th. In that case it corresponds to raising the stakes for being caught, or accusing someone of, 'cheating'.
Friday, March 03, 2006
When is one thing equal to some other thing?
And here's a coincidence. I'd just done a web search to find out what properties the number 691 has. (It pops up in the theory of modular forms.) And then I see a link to this paper (completely independently) which uses 691 as an example on the first page. Weird!
Thursday, March 02, 2006
An Actual Application of Fractal Dimension
Of course, me mentioning this fact isn't intended to be an endorsement of the reliability of these methods. I put as much trust in them as I'd put in using catastrophe theory to navigate my way through a prison riot.
Wednesday, March 01, 2006
A Cautionary Tale for Would-Be Generalisers
Define sinc(x) = sin(x)/x and sinc(0) = 1
Let 'I' mean integration from 0 to infinity.
I sinc(x) = pi/2
I sinc(x)sinc(x/3) = pi/2
I sinc(x)sinc(x/3)sinc(x/5) = pi/2
I sinc(x)sinc(x/3)sinc(x/5)sinc(x/7) = pi/2
I sinc(x)sinc(x/3)sinc(x/5)sinc(x/7)sinc(x/9) = pi/2
You see the pattern right? So the story is that some guy was evaluating these integrals for some reason or other getting pi/2 all the time. They happily chugged on getting...
I sinc(x)sinc(x/3)sinc(x/5)sinc(x/7)sinc(x/9)sinc(x/11) = pi/2
I sinc(x)sinc(x/3)sinc(x/5)sinc(x/7)sinc(x/9)sinc(x/11)sinc(x/13) = pi/2
So when the computer algebra package this person was using said:
I sinc(x)sinc(x/3)sinc(x/5)sinc(x/7)sinc(x/9)sinc(x/11)sinc(x/13)sinc(x/15) = 467807924713440738696537864469/935615849440640907310521750000*pi
they knew they'd hit a bug. They complained to the vendor who agreed that, yes, definitely, there was something screwy in their integration routine.
Except there wasn't. Weird eh?
I've known this story for a while but I've only just stumbled on Borwein's paper that explains the phenomenon. They're called Borwein Integrals now.
Update: Corrected a small error. It's pi/2 for the first 7 terms, not just the first 6. Even more amazing!
Monday, February 27, 2006
Counter-counterfactual Computation
Firstly, let's review classical computing. In a classical computer you have a machine in a state, call it s. You let a bit of time pass and it ends up in a new state, call it t. We can write this using the notation
|s> => |t>
That's basically all there is to know about computing, the rest is details. Let's look at an example. Suppose we have a computer with two one bit registers. The first is an on-off switch and the second will contain the result of the computation. For technical reasons we'll make it so that the answer doesn't just go into the second register but is exclusived-ored with it. We initialise the second register to zero so it's effectively the same thing as storing the result there. Write the state as |s,x> where s is the state of the switch and x is the second register. Suppose the result of the computation is r - this is the thing we are actually interested in. Then we have
|s,x> => |s,x^(s&r)>
In other words, this is saying that we only exclusive-or x with r if s is 1. (I used '^' to mean boolean 'exclusive-or' and '&' to mean 'and'.) For example
|0,0> => |0,0> and |1,0> => |1,r>.
Obviously s must be 1 in order to get a useful value of x.
Now let's move onto quantum computing. Now we no longer have a discrete set of states but instead a vector space. The time evolution of a quantum computer is given by a unitary linear operator. If the states evolves to state t we write this as
|s> => |t>
where we now interpret |s> and |t> as vectors in a normed vector space. The first important fact to know about quantum computers are that if |a> and |b> are possible states then so is any complex linear combination of these states. The second (fact(2)) is that if system A has states in a vector space V(A), and
system B has states given by vector space V(B), then the combined system is given by V(A) (x) V(B), where (x) is the usual tensor product over C. The third important fact (fact (3)) is about the interpretation of the state |a>+|b>. This makes no sense classically, but in QM linear combinations (called superpositions) are allowed. The rule is that if you observe this state to see whether it's in state |a> or state |b> then it 'collapses' into state |a> or |b>. The probability of state |s> collapsing into state |a> is given by |<a|s>|^2/|<a|a>||<s|s>>| where <x|y> is fancy notation for the inner product of |x> and |y>. (There's another rule of QM, if |a> and |b> are distinct possible observable outcomes of an experiment, then <a|b>=0.)
We now move our classical computer into Quantumland by considering the states |s,x> to be the basis elements of a vector space. Using fact (2) above we can consider these to be |s> (*) |x>. The rule
|s,x> => |s,x^(s&r)> (rule A)
defines what happens to each basis element and hence defines a linear map. For basis elements it acts just like a quantum system. But with a quantum computer we have available all kinds of interesting linear operators that make no sense for a classical computer. In particular, given a register we can apply this operator to it:
|0> => a*(|0>+|1>)
|1> => a*(|0>-|1>)
where a = 1/sqrt(2). This is linear and unitary. Notice also that if we do it twice we get back to where we started from. To make things easy we'll drop the constant a and work with linear operators that are unitary up to multiplication by a constant. The state |0>+|1> is a kind of weird mixture of on and off-ness.
So the trick to counterfactual computation is that instead of switching on the computer you instead 'half' switch it on using the linear operator above. Note that if you 'half' switch it on twice it just ends up being off again. But instead of 'half' switching it on twice in a row, we half switch it on, allow the quantum computer to evolve by rules (A) and then half switch it on again. Let's work through the details. First note that we are only 'half' switching on the first register. This means the linear operator applies only to the first factor in the tensor product |s>(x)|x>. So we get:
|0,0> => |0,0> + |1,0>
Now consider the result of this followed by allowing the computer to evolve by (A)
|0,0> => |0,0> + |1,r>.
And now follow this by another 'half' switch on:
|0,0> => |0,0> + |1,0> - |1,r> + |0,r>.
If r = 0 then:
|0,0> => |0,0> (modulo a constant multiple)
In other words, the state is unchanged.
If r = 1 then:
|0,0> => |0,0>+|1,0>+|0,1>-|1,1>
We no longer have the cancellation. Following fact (3), because of the |0,1> term above we have a non-zero probability of finding that the register x contains the answer r and yet the on-off switch is off.
And that's all there is to it. With a bit more cunning you can actually get the probability of finding r as high as you like. There are some great potential applications here. We have effectively left x untouched and this translates into practical 'interaction-free' experiments that allow us to make certain kinds of measurement leaving the measure system unchanged. (You may see why I used 'exclusive-or' above - just copying the result into the second register wouldn't have given me a unitary operator.)
BUT, and this is a big but, I see no valid way to interpret this as 'counterfactual computation'. In other words, I completely disagree with the interpretation of this result. In particular, suppose r equals zero. Then the reason we end up with the result |0,0> is that we have had a fortuitious cancellation. But we have actually passed the register |0>+|1> through the system. We only get |0,0> at the end because our second 'half' switching operation has made the effects of this 'destructively' interfere. In other words, it's just like we put on noise-cancelling headphones and then claim that there was no sound at all. That's silly, there was lots of sound, we just arranged for it to cancel at our ears. In our quantum computer both |0> and |1> waves passed through the system, but the second 'half' switch operation made them cancel. In the r=1 case you can thing of the shapes of these waves being messed up so they no longer cancel. (By the way, the talk of waves is quite fair here, you can think of the elements of these state vector spaces in fact being wavefunctions and the linear operators being time evolution of differential equations remarkably similar to the wave equation.)
In other words, I'm calling these physicists' bluff. There is nothing counterfactual about this computation at all. Though we might be able to get the result of our computation with the switch in the off state, that's only because it was partly on during the computation and we arranged for the state of that register to be destructively cancelled out at the end. I claim that the descriptions given of the experiment are completely bogus.
In fact, if you read Jozsa and Mitchison's paper you'll see that in his discussion his language is quite guarded. I think they realise that this talk of counterfactuality is a little bogus.
Update: Scott Aaronson rants about this too over here. Make sure you read Kwiat's response.
Sunday, February 26, 2006
The ambiguous operator, pt.2
There are a number of languages that have support for continuations built in, including Scheme with its call-with-current-continuation function. But it takes a little care to understand exactly what is meant by "wind back" in this context. Although the state of the current computation is wound back, including things like local variables and the state of the functions currently being executed, global variables remain unchanged. So even though we wind back time and throw away much of the current state some information can still "leak through" from the path that we are now pretending never happened.
So, to get back to the devilish story of part 1: suppose we have a C keywords called TRY and FAIL with the bizarre property that TRY functions just like the return statement except that if a later FAIL statement is met your program winds back to the last TRY statement, undoing any effects caused by the previous return value, and continues execution after the TRY statement. The ambiguous operator would be implemented like this:
int amb(int a,int b)
{
TRY(a);
return b;
}
amb(a,b) would return a. But if a later FAIL is met it would unwind the computer back to its previous state and then continue execution with the return b line. Global variables would remain unundone. Given such keywords we can now define:
int devilChoice()
{
printf("I choose B\n");
TRY(B);
printf("Sorry I meant to say A\n");
TRY(A);
}
void devilCheats()
{
FAIL;
}
But C has no such keywords and there's no way to write such functions - unless you cheat a bit. The required code is on my web site here. The trick is to freeze the state of the program by literally reaching in and taking a copy of the C stack. Remarkably this code runs fine, without modification, when compiled with gcc, mipspro or MSVC on IA32, AMD64, PowerPC and mips under Windows, MacOS X, Irix and Linux for all legal combinations of those compilers, processors and operating systems.
But despite the implementation of TRY and FAIL being an ugly hack they really are nice operators to have. Papers like Wadler's lend a little legitimacy to continuation shenanigans by showing there is a nice theory underlying them, and that by allowing them you bridge the gap between intuitionistic and classical logic in the Curry-Howard isomorphism. They can also make for some surprisingly simple implementations of problems. For example, using TRY you can write an elegant non-brute force C sudoku solver in a couple of lines. In fact, you find yourself able to write code that is declarative rather than imperative, even though you are working in C. The backtracking is handled behind the scenes by TRY and FAIL which can be nested recursively (as suggested in McCarthy's original paper) without any difficulty. I originally wrote that code in order to implement a pattern matcher similar to Mathematica's over ten years ago. It took me several years to figure out how to rewrite it without using continuations - using continuation passing style instead. (I had no Computer Science training, I hadn't even heard of this stuff.) Nowadays I'd just use a backtracking monad in Haskell...
For an implementation in Ruby see randomhacks. For code similar to my C code see Aubrey Jaffer's implementation (in C) of Scheme. You can also do the same thing semilegally through the setcontext() and getcontext() functions if they are available for your operating system. You can probably do something similar using threads or fork().
Saturday, February 25, 2006
The ambiguous operator, pt. 1
My recent interest in logic is partly motivated by a desire to understand the paper Call by Name is Dual to Call by Value. I suspect the paper is almost trivial, but that you have to be very familiar with the prerequisites before you can appreciate its triviality. (Ie. I'm using the mathematician's definition of trivial of course :-) )
Now I don't understand that paper yet, but on the seventh page there is a story about the Devil. Below I have a computational representation of the story written in C. The idea is, what must the functions devilChoice() and devilCheats() do in order to allow the devil to capture this soul? devilChoice() must return A or B.
#define A 0
#define B 1
int my_main() {
int choice;
static int devilMoney = 0;
static int suckerMoney = 0;
static int suckerKarma = 0;
printf("Devil says:\n");
printf("Here's my offer.\n");
printf("I will choose either A or B\n");
printf("If A I'll give you $1000000000\n");
printf("If B I'll give you any wish for $1000000000\n");
printf("Do you accept\n");
printf("Sucker says: 'yes'.\n");
choice = devilChoice();
switch (choice)
{
case A:
if (devilMoney<1000000000)
{
printf("Devil is unable to pay up.\n");
printf("Devil is in big trouble...\n");
exit(1);
}
printf("Devil gives 1000000000 to sucker\n");
suckerMoney += 1000000000;
devilMoney -= 1000000000;
break;
case B:
printf("Sucker begs, borrows, steals $1000000000\n");
suckerMoney += 1000000000;
suckerKarma -= 1000000000;
printf("Sucker gives money to Devil\n");
devilMoney += 1000000000;
suckerMoney -= 1000000000;
printf("Sucker is about to make wish...\n");
printf("'I wish to go to heaven'\n");
devilCheats();
printf("Sucker goes to heaven and devil loses a soul\n");
exit(1);
break;
}
if (suckerKarma<0)
{
printf("Sucker goes to hell with $%d\n",suckerMoney);
}
}
(Hmmm...syntax colouring didn't work out quite as well as I wanted.)
In fact, I'm going to leave this as an exercise and give a possible solution another day. But the title of this post gives a clue - and of course the paper proposes a solution too. The code doesn't need to be 100% portable.
PS You don't need to ask. I have a solution in not quite 100% portable C. But of course I cheat. But is there an interesting cheat?
Friday, February 24, 2006
The cut rule and reading seminars
I'm sure than when set theorists first tried to write down the axioms that became ZF they found lots of redundant axioms. Over the years they were whittled them down to the list we have today so that I bet you can't even name the axioms that were jettisoned for redundancy. Not so in "Gentzen style" logic. Every document ever written on the subject seems to introduce this rule and then with a flourish they show how it can be eliminated. They develop the whole subject and then proceed to demolish the earlier work by showing how they can rewrite everything they did earlier without using this rule. The only explanation I can come up with is that authors of books on logic are paid by the word and that this allows them a few extra chapters for very little work.
Of course the problem here must be me. I'm sure there's a perfectly good reason for harping on about the cut rule, I just don't see it. And I think this points to a difficulty with trying to read mathematics texts outside of academia. When you're a student you're sharing extratextual material all the time. People tell you that such and such a result is interesting because it has an application somewhere and then you go and read the formal details knowing what the motivation was. Or someone will give an informal seminar where they write equations on the board, but speak much more informally between the equations. But most mathematics texts hide this material from you and present the bare facts. This is fine when you're in an academic environment, but I have to confess to finding it difficult when working on my own.
One thing I'd love to see online is the equivalent of the reading seminars we did during my PhD work. Each week we'd read a chapter and then have a seminar to discuss what we'd just read. Does anyone do this? Blogs seem like a great way to do this but I've seen no evidence of groups working like this.
Anyway, I shall try to persist until I see the light...
Friday, February 17, 2006
Mathematics has Competition
I'm inclined to ask, with the article's author, "Isn't computer science really just a stepchild of mathematics?"
Thursday, February 16, 2006
Young babies count
Of course it's not clear that this is "counting" in any conventional sense. Additionally, some monkeys can count this well (and I think also many other mammals). So I'm not sure what conclusions can be drawn from this experiment.
Wednesday, February 08, 2006
What's the most implausible but true mathematical proposition?
Addendum: According to this paper, even though the Khinchin property is true for almost all real numbers, nobody has explicitly found a number for which it is true! You can artificially construct a number for which it is true but that doesn't give you an explicit result, and anyway, that's cheating. Experimentally, π gives something near the Khinchin constant, but there's no proof of what happens in the limit.
Monday, February 06, 2006
Is Mathematics too Easy?
It turns out that this is more or less what reverse mathematicians do. They take theorems such as the well known Bolzano-Weierstrass theorem and try to figure out precisely how much mathematical machinery is required to prove them. There is no need to use all of ZFC, or even ZF, to prove such a result.
So reverse mathematicians sometimes start with something simpler like second order arithmetic. In its universe are integers and sets of integers, and nothing else. It can still talk about real numbers by encoding them as Cauchy sequences in the usual way but there is no way to encode subsets of the reals in this way. Maybe surprisingly, it is possible to represent continuous functions because the rationals are dense in the reals so continuous functions are defined by the values they take at rational numbers. It turns out that almost all classical mathematics can be encoded in this system even though it's much weaker than ZF.
But usually reverse mathematicians like to go even weaker still and work with recursive comprehension which is essentially the Peano axioms combined with induction and a restricted comprehension axiom. Using these weak tools it's still possible to prove things like the Intermediate Value Theorem, the uncountability of the reals and even results about metric spaces such as the Baire category theorem (but only for separable spaces).
One thing that has always intrigued me about number theory is how many theorems are proved by making excursions into analysis. For example, think about results in analytic number theory that can easily be stated using the language of Peano's axioms, and yet whose proofs require techniques like contour integration that make reference to infinite sized objects and limits. In 1988 Stephen G Simpson proved that such theorems can in fact be reduced to "primitive recursive arithmetic" and hence turned into "finitistic" proofs. On the other hand, I'm not sure that these proofs are necessarily going to give any insight into why the theorems are true. They probably end up being as unwieldy as Borbaki's definition of one.
Sunday, February 05, 2006
An End to Coding Theory
The idea behind the theory is simple. Suppose there is a set of M messages, one of which you want to send over a noisy communication channel. If you simply send one of the messages it might be corrupted and interpreted as another. So instead you embed the set M inside a larger set C and send an element of C as a message. If the message is corrupted then they might receive a message in C but not in M. They'll know right away that the message has been corrupted. But even better, if the embedding of M in C is designed cunningly enough they'll be able to make a good guess as to what the original message was assuming some probability model for the corruption. You pay a price of course, the elements of C typically take longer to transmit than the elements of M, reducing the rate of data transmission.
There have been lots of proposed schemes over the years since Shannon started the field. They use a variety of methods from discrete mathematics and algebraic geometry. There are nice connections between the sporadic groups and coding theory. The Mathieu groups arise directly as symmetries of the Golay code and the Golay code can itself be derived from the Leech lattice whose symmetries are described by the Monster. A common feature of these codes is that they are linear. The set C is a vector space over F2, the field with two elements. M is a subspace of this space. M is the kernel of an F2-linear map represented by a matrix called the syndrome. If the syndrome, when applied to a message, gives zero, then it's likely that the message has arrived uncorrupted. Much of the work in coding theory has been about generating suitable syndrome matrices.
But in the sixties Robert Gallager looked at generating random sparse syndrome matrices and found that the resulting codes, called Low Density Parity Check (LDPC) codes, were good in the sense that they allowed messages to be transmitted at rates near the optimum rate found by Shannon - the so-called Shannon limit. Unfortunately the computers of the day weren't up to the task of finding the most likely element of M from a given element of C. But now they are. We now have near-optimal error correcting codes and the design of these codes is ridiculously simply. There was no need to use exotic mathematics, random matrices are as good as almost anything else. The past forty years of coding theory has been, more or less, a useless excursion. Any further research in the area can only yield tiny improvements.
A good place to learn about these codes is in the book by one of the 'rediscoverers' of Gallager's codes, David McKay. There's also the Wikipedia article. I originally heard of these codes from a friend of mine and was sceptical at first. Then I read this Science News article and I'm now a bit embarassed about my scepticism.
Monday, January 30, 2006
The Flyspeck Project
Note that this is different from the four colour problem. For that problem, the proof required an enumeration that was too long and complicated to carry out by hand, so a machine carried it out. For this problem we have a slightly different situation: we already have a potential proof, but we need to reduce it to a completely formal form so that the entire thing can be verified to follow from axioms.
Sunday, January 29, 2006
Biggest Test of Benford's Law?
Friday, January 27, 2006
De Rham Cohomology and Computer Algebra
So I was surprised to find that in recent years computers have been calculating de Rham cohomology, and doing so using methods from algebraic geometry. In fact, functions to compute the de Rham cohomology of the complement of various algebraic varieties have been available in Macaulay2 for a while. There's a chapter of the Macaulay2 book available on the subject. The methods used are based on D-modules and discussed in the book Grobner Deformations of Hypergeometric Differential Equations.
I have no idea where the connection with hypergeometric series is coming from but it's interesting to see the connection with algebraic geometry. D-modules are rings with elements xi and di sith the proprty that xixj=xjxi and didj=djdi but that dixj-xjdi=δij. We can form ideals of these things and study them using Grobner basis methods. Given a smooth function we can form the ideal of elements of a D-module that annihilate the function by considering the di to be the differential operators d/dxi. This gives a connection between the methods of algebraic geometry and differential geometry. But that's as far as my knowledge extends for now.
Saturday, January 21, 2006
Geometry is Innate
It's curious to note that the people of this culture weave baskets that appear to involve interesting geometry and whose design look anything but 'innate'. And though 'remote' they also wear glasses and Western style clothing. So clearly they do transmit 'geometric' culture, even if it's nonverbal, and they have had non-trivial contact with cultures that do have a strong geometric vocabulary.
Thursday, January 19, 2006
Eleven Reasons to use Haskell as a Mathematician
(1) Haskell is Functional
Functions are the bread and butter of mathematics so it makes sense to use a programming language in which functions are first class objects. It's not at all unusual for a mathematician to take a function of two arguments, say f(.,.), and construct from it a function of a single variable defined by f_x(y) = f(x,y). Computer scientists call such a thing a closure - it's a function constructed by supplying values for some, but not all, of the arguments of another functions. It's one of the most natural operations in the world and yet it's actually quite horrible to construct such a thing in languages like Fortran, C or C++. It's slightly cleaner in C++ because you can use objects and templates to simulate closures, but the syntax is still pretty hideous. (I know there are libraries that make this even easier - but they do so by implementing a functional language as a 'domain specific language' within C++, essentially making my point for me.) Here's how you do this in Haskell: you write "f x". The way it works is quite neat. Given sets A, B and C there is a natural bijection between the set of functions AxB->C and the set of functions A->C^B. So rather than write functions of multiple arguments Haskell function always take one argument and return the function that operates on the rest of the arguments. At first this might seem like a bad idea because functions of multiple arguments are now, in some sense, a second class object. But it's not a problem at all. To apply the function f to the arguments 1 and 2 we simply write "f 1 2".
To get an idea of how horrible it can be to work with functions without functional programming consider the code in Numerical Recipes in C. Most of their code is 'functional' in character in the sense that it is intended to perform operations on functions. But look at how you have to make use of horrible void * pointers to pass in those extra arguments to simulate closures.
(2) Haskell Insulates You
As a mathematician you're more interested in solving the problem than dealing with the idiosyncracies of the hardware and issues like memory management. Haskell does a great job of insulating you from the underlying hardware so that you don't have to worry about these things. Languages like Java aren't too bad at this either. C++ is terrible, if you decide you don't want to manage memory yourself you can use shared_ptr<> objects and turn your code into a horribly unreadable mess.
(3) Haskell Performs Well
It's surprising how well Haskell peforms. You might think "if every time I add a to b it has to construct a temporary intermediate object that represents the function that adds a to a number and then apply that function to b it's going to be slow". But that is not the way to look at things. Haskell doesn't simply execute code by working top to bottom, left to right through your source code. The compiler (itself written in Haskell) is pretty sophisticated and frequently performs comparably to C. It's generally not as fast as C, but it's usually in the same league. The Haskell compiler rips apart your code and reconstitutes it as an executable in ways that are quite surprising. In Haskell you can often write code using idioms that feel natural, but are generally considered slow in other languges, without performance impact. One example is in list handling. Often you want to return a temporary list of objects and perform some operation on them. In many languages there is an overhead in making such a list, only to take it apart again when you want to operate on the actual elements so you instead have to restructure your code so that no list is constructed. In Haskell the compiler will often figure out that your list is temporary and do the restructuring behind the scenes for you.
(3) Haskell is Declarative
Mathematicians like to spend their time saying what is the case rather than thinking in terms of sequences of operations. If b = f(a) and c = f(b) then it doesn't matter if you say c = f(b) and then b = f(a). Order is irrelevant. In an imperative language like C it matters to a large extent what order your code is written in because a statement like b = f(a) is an instruction. In Haskell you aren't giving a list of instructions so much as stating what is true. This fits much better with the way that mathematicians work. Haskell can take care of some of the details of precisely what order the operations are going to be performed in and hide those details from you.
(4) Haskell is Lazy
This goes hand-in-hand with Haskell being declarative. Haskell only computes what it needs to compute in order to get the answer you require. It tries not to do unnecessary work. Languages that aren't lazy like this are called 'strict' languages. In a strict language, say, if you required the first component of the cross product of a vector, you would typically compute all three components of the vector and throw away the second and third components. If your code needed to be efficient you'd probably end up writing special purpose code to compute just the first component of the cross product. You'd be unable to reuse your cross product code. Mathematicians don't want to do this. They want to say that x is the first component of the cross product because it's true statement and not worry about what exactly gets implemented when. This laziness is ubiquitous throughout all of Haskell.
One place that laziness really pays off is in working with lists. Haskell has absolutely no problem working with infinite lists as long as you don't actually try to look at all of the elements. Instead of evaluating all of the elements it simply retains enough information to construct as many elements as are needed. Computing the elementwise sum, say, of two infinite lists is completely straightforward. And performing computations with power series, say, where the later elements are functions of the earlier elements, is absolutely straightforward. You don't have to check to see if you've computed enough of the earlier elements to compute the later one. Haskell does all that for you. If you want to see the first ten elements of the final infinite list you can just ask for them. You don't need to decide in advance how many elements you want and build that into the code from the beginning.
(5) Haskell is good at Datastructures
Haskell is good at dealing with datastructures. Often mathematicians don't realise that they are using datastructures all the time. But they are. For example a group is a set equipped with functions to perform multiplication and inversion as well as a singled out element, the identitiy. Haskell makes it very easy to work with structures that are tuples of objects like this. It's also very easy to build one datastructure out of another and even specify rules about datastructures. For example a matrix is a two-dimensional array of objects. But if the objects form a ring then so does the matrix. This kind of statement is easy to express in Haskell making it well suited to working with algebraic structures. If you've defined formal power series over a ring and you've defined the ring of polynomials over another ring it's trivial to build the ring of power series over polynomials over the integers, say. What's more, the compiler can tell you when the structure you're trying to create doesn't make sense, for example if you were to try to multiply two matrices whose underlying elements can't be multiplied. A programming language like Mathematica is very poor at this as it doesn't really distinguish between datatypes. You'll only discover your error when the program is actually running and it tries to multiply the underlying objects. Haskell also uses type inference. You don't have to explicitly tell it the type of something because it can often figure it out for itself. If you see the number '1' in a textbook what does it mean? Is it an element of N, Z, R, Q, R[sqrt(2)], C or some other completely different algebraic structure? It could be any of these things depending on the context. The same is true in Haskell, it'll figure out what you mean by '1' using completely unambiguous rules. This means that you don't have to use a different symbol for '1' in different contexts, and yet you don't have to laboriously state what you mean every time you use it.
And rather than having an ad-hoc system of datastructure types Haskell's is algebraic.
(6) Haskell is Compact
Not in the topological sense. Haskell code tends to be much shorter than its counterpart written in other languages. There is little extraneous verbiage and the verbiage that does exist is very powerful. This is more than a space saving exercise. Consider the history of calculus in n-dimensions. When Maxwell first wrote down his equations of electrodynamics they took up most of a page. WIth the introduction of vector notation they were simplified. By using the notation of exterior calculus they were simplified to dF = 0, d*F = *J. This isn't just typographically simpler but conceptually simpler. You can see immediately how you might generalise to n-dimensions instead of three, you can see trivially that d*J=0 from d^2 = 0 and you can now perform countless computations in a few lines that previously would have taken dozens of pages. What's more the new notation makes it harder to accidentally write equations of motion that don't make physical sense because they automatically obey symmetries that are difficult to see in more complex notation. Haskell is just like this too. You can do a lot with a tiny amount of code. And if two blocks of code appear to do similar things you can often factor out the common part and make it a separate block of code in a way that's much harder than with just about any other language. This makes it much easier to generalise your code and have insights about it. This works well with the way mathematicians like to think about things.
(7) Haskell is Mind Expanding
Haskell draws on ideas from interesting branches of mathematics, for example Category Theory and Logic. And the logic part doesn't appear the way you expect - it appears via the Curry-Howard 'isomorphism' in the type system used to define datastructures. As mentioned above, Haskell code lends itself well to generalisations. You'll find all sorts of commonality between things that previously didn't seem similar, and you'll find ways to express that similarity in your code. One of my favourite examples is by Ralph Hinze. He doesn't just illustrate the
similarity between binomial tree and binary arithmetic, he actually shows how merging binomials and adding binary numbers are actually instances of the very same thing. Haskell encourages this kind of abstraction. Though I've wandered into the realm of computer science with talk of trees this is still the type of abstraction that mathematicians find interesting.
(8) Haskell Has No Side Effects
When you add two numbers you get the sum. You don't find that something else in the universe has happened as well (except maybe the temperature in the room goes up if it's a really hard sum). So functions in a programming language shouldn't have any side effects. In Haskell a
function does nothing other than return the value of its result. This means that it's easy to bring your mathematical experience to bear when analysing programs. (Almost) all of the usual rules for manipulating mathematical expressions apply so that, for example, if b = f(a) and c =
g(b,b), then c = g(f(a),f(a)). In many other programming languages you'd have to make sure that calling f twice didn't have some side effect that made it behave differently to calling it once.
(9) Haskell Eats Circular Definitions for Breakfast
In most imperative languages a line of code like x=f(x) means replace the value of x with f applied to the old value of x. In Haskell it means x equals f(x). It has no problem with you defining a variable in terms of itself and x=f(x) means that x is a fixed point of the function f. In practice it's not so simple, you can't just write x = 1-x^2 and expect it to figure out that x is the golden ratio. Haskell is a programming language not an equation solving package. But in my formal power series library I'm able to use a line like "e = 1+integrate e" to define the power series for exp x. See here for another example of using circular definitions to solve differential
equations.
(10) Haskell is Well Founded
What I mean is that Haskell is built on fairly solid mathematical foundations in lambda calculus and proof theory. It's not just an ad hoc language that has accreted features over the years. Everything is there for a reason and people have thought long and hard about the theoretical underpinnings of what went into it.
(11) Haskell is Pretty
Well, that's a matter of taste. But I find it very elegant.
That's 11. I think there's a rule against making lists with prime numbers of elements but I'm ignoring it.
To be fair I ought to point out that it's not all a bowl of cherries. Haskell isn't very good at dealing with arrays. In most programming languages it takes time O(1) to access the ith element of an n element array. In Haskell you can get this kind of performace if you don't mind using the various painful to use array operators available, or you can sacrifice performance, keep the elegance, and get O(log n) performance. On the other hand, you'd be surprised just how many useful matrix algortihms don't require random access to the ith element.
Because Haskell functions don't have side effects it doesn't make sense to perform an operation more than once because it'll just do the same thing each time. So there's no notion of a 'for' loop (well, maybe there is) in Haskell and instead you must find other ways to express iterative operations - ie. using 'higher order functions' or recusrion. But sometimes you feel like you just want to write the loop and be done with it. There's always another way of expressing the problem but when the iterative approach is the first thing to pop into your head it can occasionally be frustrating. And sometimes you'd like side effects.
Haskell comes with a library of functions called the 'standard prelude'. Unfortunately, whoever designed it knew just enough mathematics to be dangerous and made a complete hash of it. Instead of using a well known hierarchy of algebraic structures such as group->ring->field they defined all kinds of bizarre structures starting with something like a ring with a norm or valuation. And while the library supports complex numbers it's not flexible enough to support the Gaussian integers (probably a consequence of forcing rings to have a norm). All things that could have been implemented correctly in a few seconds if a mathematician had been consulted early on.
And the last caveat. I found learning Haskell hard. Let me put it in perspective: I'm 39. I've been programming since I was 11. I've been writing recursive code since about 15 and I've been using whatever approximations to functional programming are avalable to me in any language I use for almost that long (including torturing myself with C++ template metaprogramming). I've even implemented a couple of declarative languages, Prolog and SASL, the latter being a functional lazy language a lot like Haskell that I thought I'd investigate before Haskell. And I still found Haskell tricky to learn. But it was worth it. And I believe that the trickiness of the language is the kind of trickiness that mathematicians should be able to handle. (And it probably helps if you don't know any C or C++...)
A friend of mine, and I, have been writing all kinds of mathematical code and finding Haskell well suited to the task. Logic (eg. building analytic tableau as in Smullyan's book on first order logic), Group Theory (eg. the Schreier-Sims algorithm), Calculus (automatic differentiation), Combinatorics (via formal power series), Algebra (eg. Grobner bases) and now we're looking into investigating relationships between elliptic functions and modular forms. (Hmmm...my friend takes most of the credit here.) Haskell is great at all of this stuff even though it's not in any way an algebra or mathematics package.
And one list thing about Haskell. If x is an integer (called an "Integer" in Haskell) then x+1>x. It's nice to work over the integers rather than a cheap and nasty 2-adic approximation (though the cheap and nasty approximation is there if you want it.)
Friday, January 13, 2006
You're needed!
Friday, January 06, 2006
General Relativity for Dummies
Given a small ball of freely falling test particles initially at rest with respect to each other, the rate at which it begins to shrink is proportional to its volume times: the energy density at the center of the ball, plus the pressure in the $x$ direction at that point, plus the pressure in the $y$ direction, plus the pressure in the $z$ direction.
More information is here. He also has a derivation of Newton's law of gravition here.
As Baez points out - at first it looks implausible because the Einstein field equation is a tensor equation and this looks like a scalar equation. But the statement must hold true for any set of test particles that are at rest relative to each other. So at any point it doesn't just give one constraint.
This is pretty neat stuff and it ought to go into all GR textbooks...
Tuesday, January 03, 2006
Wednesday, December 14, 2005
Theorem Proving and Code Generation
(I only learned about the Curry-Howard isomorphism in the last couple of years but it's probably of general interest. So I recommend reading the Wikipedia article I link to if you haven't met it before. It's also probably not what you're currently guessing it is. It's not about program correctness but type correctness and it doesn't prove programs to be correct but shows how programs themselves are proofs of type correctness. In the language of the article, Djinn solves the "Type Inhabitation Problem".)
Wednesday, December 07, 2005
Rant: Just for kids?
you'll see a story about an adult, but then it's usually about an adult teaching children. Do a search on history and you'll see some actual history. Search on physics and you'll get stories about kids mixed in with stories about people doing actual physics. So why are there so few stories about adults doing mathematics?
It's not just in news stories. Look at Texas Instruments' and Hewlett Packard's web sites and you'll see that calculators are largely marketed towards kids. TI are much worse on this front. They have powerful calculators that can solve all kinds of real world problems and yet they're not being marketed as devices for solving such problems, they're about getting kids through exams. HP at least have a section for professionals. Even so, in the old days if an HP calculator had exchangeable faceplates it was so that you could label the keys with new functions, but now it's to change the colour to make it look cool. The only people working with numbers are clearly children.
On closer inspection I see that I was wrong. I missed a story. It's about an adult. No mention of children or education. Ah...it's about an adult who can't do mathematics.
So clearly no adults use mathematics. So why do we bother teaching children a skill they'll never use in adulthood?
Friday, December 02, 2005
Bees can fly after all!
PS I'm wondering about the field of 'human aviation' that the article mentions. Am I lacking a superpower that normally comes standard with being human?