A Neighborhood of Infinity: 2005

Wednesday, December 14, 2005

Theorem Proving and Code Generation

According to the Curry-Howard isomorphism a type in a programming language is essentially the same thing as a proposition and an object or program of the type is a(n intuitionist) proof of the proposition. So an automatic proof generator should also be usable to generate objects or programs of a given type. This is exactly what Djinn does. It seems pretty neat.

(I only learned about the Curry-Howard isomorphism in the last couple of years but it's probably of general interest. So I recommend reading the Wikipedia article I link to if you haven't met it before. It's also probably not what you're currently guessing it is. It's not about program correctness but type correctness and it doesn't prove programs to be correct but shows how programs themselves are proofs of type correctness. In the language of the article, Djinn solves the "Type Inhabitation Problem".)

Wednesday, December 07, 2005

Rant: Just for kids?

Search Google News for stories about mathematics and you'll find that they are dominated by a particular subsection of our society: kids. It seems that the only people in the world doing mathematics are children. Very occasionally
you'll see a story about an adult, but then it's usually about an adult teaching children. Do a search on history and you'll see some actual history. Search on physics and you'll get stories about kids mixed in with stories about people doing actual physics. So why are there so few stories about adults doing mathematics?

It's not just in news stories. Look at Texas Instruments' and Hewlett Packard's web sites and you'll see that calculators are largely marketed towards kids. TI are much worse on this front. They have powerful calculators that can solve all kinds of real world problems and yet they're not being marketed as devices for solving such problems, they're about getting kids through exams. HP at least have a section for professionals. Even so, in the old days if an HP calculator had exchangeable faceplates it was so that you could label the keys with new functions, but now it's to change the colour to make it look cool. The only people working with numbers are clearly children.

On closer inspection I see that I was wrong. I missed a story. It's about an adult. No mention of children or education. Ah...it's about an adult who can't do mathematics.

So clearly no adults use mathematics. So why do we bother teaching children a skill they'll never use in adulthood?

Friday, December 02, 2005

Bees can fly after all!

According to this article it seems that bees can fly after all.

PS I'm wondering about the field of 'human aviation' that the article mentions. Am I lacking a superpower that normally comes standard with being human?

Wednesday, November 30, 2005

A Game of Logic

I've been thinking about a game which I haven't named yet. The rules are simple. The playing board is a logical expression built from the binary connectives 'and' and 'or', the logical constants 'true' and 'false', and atoms 'A', 'B', 'C', ... For example (A and B) or C. To make things easier I'll use abuttal for 'and' and + for 'or'. The previous example is just AB+C.

There are two players, False and True, who take turns. A legal move for False consists of replacing all occurences of one atom with 'false' and you can guess what True's moves are. True wins if the expression evaluates to 'true' with the obvious interpretation of the symbols I've given and False wins if the expression evaluates to false. There is no possibility of a tie.

For example consider the game (A+B)(C+D)(E+F). This is a win for True whoever plays next. Any time False sets an atom to 'false' True merely has to set the other atom in the same factor to 'true' making the factor 'true'. Eventually all of the factors will have the value 'true'.

So here are some questions:

Has this game already been studied? If so, what's it called?
Is there a nice way to evaluate any position in the game? A partial answer is given by some papers by Anshelevich on the subject of Hex. All Hex positions are positions in this game and many interesting Hex positions and subproblems can be approximated well by simple positions in this game. (Note that a good strategy for this game might not be a good strategy for Hex because a Hex game corresponds to a very large expression.)
Can you assign Conway numbers to positions in this game? What is the value of 'true'?
What happens if you allow the unary operator 'not' and allow the two players to replace an atom with either 'true' or 'false'.
What other games, besides Hex, are special cases of this game? (All combinatorial games I guess.) What games are special cases in an interesting way?

Sunday, November 06, 2005

Opinion: A Farewell to Angles

This story was on Slashdot recently. While the guy might be seen as a bit of a crackpot I'm coming round to his point of view. As a worker in the field of computer graphics I'm frequently faced with software that performs operations like finding the angle, theta, between two normalised vectors, by computing the inverse cosine of the dot product. And yet later down in the code the angle is used only as the argument to further trig functions. Trig functions are expensive to compute. If you know the cosine and the quadrant of an angle then you immediately also know the value of all of the angle's other trig functions by some elementary computations involving the four elementary operations and square roots. These are far cheaper to compute than trig functions. And yet programmers frequently insist on working with the angles themselves because they are familiar with them.

As an extreme example: I recently optimised a block of someone else's code in which a point (x,y) was rotated through an angle π around (0,0) by converting it to polar coordinates, adding π to the angle and then converting back to rectangular coordinates! (Incidentally, this block of code was hard-coded for this one particular rotation, and there were similar blocks of code for π/2 and -π/2 rotations.)

It seems to me that people learning computer graphics need to learn that there are many operations that can be performed without using angles. There needn't be a reflex action to convert any dot product to an angle. Converting to an angle this way may also have unfortunate numerical inaccuracies, especially when the angles are converted back to trig functions again later in the code. Maybe I'll put together a document with a list of things you can compute given the sines, cosines and tangents of angles without having to evaluate a single trig function.

The issue is that the word 'angle' has two meanings. One sense of the word 'angle' is a set of equivalence classes of intersections between two lines. The pair of intersecting lines (A,B) and the pair (C,D) are said to be equivalent if there is a rigid motion of space that transforms A to C and B to D. There are a number of easily computed invariants we can associate with these classes, but the most popular one is called 'the angle' and is often horrible to compute as it typically requires the use of transcendental functions. Unfortunately, as soon as people think 'angle' they also think 'the angle' and are led to use these functions unnecessarily. (It's worse than this, 'the angle' properly lives on S¹ but people work with the cover of S¹, R, leading to troublesome ambiguities.) It seems to me that other invariants besides 'the angle' ought also to be taught. And that is what this book seeks to do, despite its crackpot trappings.

Friday, November 04, 2005

Quadratic Residues and Acoustics

Here are some articles on quadratic residue diffusors to improve sound quality in your home. And an old classic: golden section stranding.

I hear that numerology can also be used to tell your fortune.

Tuesday, October 25, 2005

A Fabulous Biplane

Just read this paper on the (11,5,2)-biplane. A biplane is like a projective plane except that each pair of points is contained in precisely two lines and each pair of lines intersects at exactly two points. It's an exceptional object in the sense that a whole slew of exceptional objects can be constructed from it: the S(5,6,12) and S(5,8,24) Steiner systems, the binary and ternary Golay codes and the Mathieu groups.

I know it's all kid's stuff but I never studied (finite) groups beyond a first course and so little things please me!

Monday, October 24, 2005

This post brought to you by the number six...

The number 6 has an amazing property singling it out as unique among all of the integers. In fact, back in 1945 JA Todd was so amazed by this property he published a paper called "The Odd Number 6". What I have to say about it comes from chapter 6 of the book "Designs, Codes and their Links" by Cameron and van Lint. The chapter is called "A property of the number 6".

The simplest expression of this property is this: "There are six objects constructed in a canonical way on a set of 6 points. 6 is the only number for which this is possible." This is pretty vague so I'll quote the theorem:

Six is the only natural number n for which there is a construction of n isomorphic objects an an n-set A, invariant under all permutations of A, but not naturally in one-to-one correspondence with the points of A.

Pretty amazing result! Except, embarassingly, I'm a bit confused about this because I'm not sure what "naturally" means here. Come to think of it, I'm not sure what "isomorphic" means here either. I don't remember ever doing a course on "objects" and their isomorphisms. The authors make the meaning of "natural" clearer over the page in a category-theoretical definition:

the category whose objects are the n-element sets, and whose morphisms are the bijections between them has a non-trivial functor to itself if and only if n=6

Which is all very well except I don't know what "non-trivial" means here. Or at least I think I know what it means and I can construct counterexamples. My brain must be working sub-par today. Ho hum.

Anyway, there is one version of this property whose statement is absolutely unambiguous to me: the permutation group S_n has an outer automorphism only for n=6. In fact S₆ acts on the set of 6 objects in two inequivalent ways. Interestingly a bunch of 'exceptional' constructions follow from this: the Hoffman-Singleton graph, the projective plane of order 4, the Steiner system S(5,6,12).

This reminds me a little some other outer automorphism that also leads to an exceptional object: the "triality" outer automorphisms of Spin(8) which lead to a well known exceptional object - the octonions.

Hmmm...I just discovered that John Baez has written his own musings on the number six.

Wednesday, October 19, 2005

Alternative Logic

As mentioned a while back, this is an article on "Alternative Logic" I wrote for K5 a while back. I've made some tiny changes to it, in particular some claims that probably reflect my frustration with understanding some logic texts more than anything else. It's intended as a "pop science" level article on alternatives to classical logic. I'm in no way an expert on logic but I think it's mostly correct. I do have some worries about my explanation of an intuitionistic proof but I provide links for people who want a more accurate picture. I just wrote this for fun.

Introduction

In a previous installment you've seen some of the alternatives to numbers that exist. But can we do something even more radical? How about going to the very foundations of mathematics itself and replacing logic itself? That might sound a little unusual, I mean logic's logic, you can't choose what you want to be true can you? Well actually there are plenty of alternatives to familiar classical logic, so many that I can't hope to do justice to them in one article. Nonetheless, I can still try to throw out a few teasers and hope you are motivated to check out the references and links to see what I'm really talking about.

Before talking about alternative logics I must talk a little about classical logic, the familiar logic used by mathematicians, philosophers and others. What do we mean by logic? In the old days it simply meant reasoning correctly. That was pretty vague. These days the subject has been formalized and there are two aspects to consider.

On the one hand consider the sentences "All men have less than 3 legs" and "John is a man". From this we can conclude "John has less than 3 legs". Note the very important fact that we don't really need to know what men or legs are to make that deduction. If we were told "All X are P" and "x is an X" then we know "x is P" regardless of what x, X and P are. So we can think of deductions as mechanical operations on sentences. In fact we can consider sentences to be nothing more than strings of characters and proofs to be strings of sentences where each sentence follows from some previous ones by application of a rule. These rules are called deduction rules. Of course we also need some starting points to which we can start applying rules. Those are called axioms. An example deduction rule is the one that says if you have derived A and have derived "A implies B" then you can derive B. In fact that rule has a name, modus ponens. If you can derive a statement from the axioms then the derivation is called a proof. Sometimes, however, you may hypothesise something without being sure of it. You can still apply the deduction rules to get a derivation but it's not a proof until you can eliminate the hypothesis. In fact some systems of logic have very convenient rules for eliminating hypotheses.

On the other hand - what does it mean for a sentence to be true? Well for a sentence like "Fred has red hair" you can find out by checking directly. But what about compound sentences like "Fred has red hair and Jane has black hair". Here are two propositions conjoined by "and". What you can do is test each sentence individually and assign a truth value: 0 if it's false and 1 if true. In fact we can define a function v (called a valuation), acting on sentences, that maps them to 1 if true and 0 if false. We can then use the binary operation, &, defined by the following table, to determine the truth value of the conjunction:

& |0 1 --+----- eg. 1 & 0 = 0 0 | 0 0 1 | 0 1 | | 0 1 --+----- eg. 0 | 1 = 1 0 | 0 1 1 | 1 1 ~ | 0 1 --+----- eg. ~0 = 1 . | 1 0

If the result is 1 then the conjunction of the two sentences is true. Formally we're saying v(A and B)=v(A) & v(B). This is taking a truth theoretical view ie. that the truth of "A and B" depends on the truth value of A and the truth value of B. Similary we test the truth of "A or B" using the | operator so v(A or B)=v(A) | v(B) and v(not A)=~v(A). These operations are examples of logical connectives and the functions &, | and ~ are collectively known as boolean operations.

There is one more logical connective to mention: implication. v(A implies B) is v(A) -> v(B) where A -> B is defined to be ~(A & ~B). In other words to tell whether or not "A implies B" is true we look at the truth of A and B and the result is false only if A is true and B is false. Note this means that if we can find any inconsistency in our logic, ie. any proof of something provably false, then we can prove anything, because a false statement implies anything and using modus ponens we can then deduce anything we want. So it just takes one little inconsistemcy to bring the whole system crashing down.

By the way, sometimes I'll be a bit lazy and, for example, use & and 'and' interchangeably. Really they are different things: one combines sentences and the other combines truth values, but it should be clear what is going on from the context.

I should also mention the quantifiers for all and there exist. They also have a truth theoretical interpretation but it's quite technical so I'll leave it out.

Use these connectives and throw in a few axioms and you get what's called predicate calculus. (If you leave out the quantifiers you have the simpler system propositional calculus instead).

So we have two sides to logic: the syntactical side which involves proving things by the application of mechanical rules applied to sentences and the semantic side which is about how you decide if a sentence is actually true or valid. But does proving something make it true? Well if this is the case for a type of logic it's said to be 'sound'. And if everything true is also provable it's said to be 'complete'. When logicians consider alternative ways of doing logic soundness and completeness are among the first things they look for. And while there is good reason for sticking with logics that are sound, it is well known that there are many logical systems that are incomplete. However I'll leave discussion of Gödel's incompleteness theorems for another time.

Note. My notation is a little nonstandard - I am limiting myself to a subset of HTML that I think is viewable by most browsers. If your browser is text only you may be having trouble with even this limited subset.

Multivalued Logic

How can we modify logic then? Well there's an obvious thing to do to start with. I mentioned the truth values 0 and 1. How about throwing in some alternative values? 1/2 seems like an obvious choice. This is exactly what Lukasiewicz did in around 1917. One of his motivations was that he was able to sidestep Russell's paradox with it. But if you were following what I was saying above then that's just half the story - the semantic side. So in the thirties an axiomatization (ie. a set of axioms and deduction rules) for this system was also devised introducing new connectives corresponding to necessity and possibility. We can define new logical operators called M and L with M(1/2)=1 and L(1/2)=0. If X is a proposition MX can be interpreted as meaning "maybe X" and LX as "necessarily X".

But there are some problems with this kind of logic. Lukasiewicz defined ~x=1-x so ~(1/2)=1/2. He also defined 1/2 | 1/2 as 1/2. So we find that for some X, X | ~X is 1/2. To many people that simply doesn't feel right, they argue that X | ~X should have a truth value of 1.

Pressing on regardless we can generalize even further to allowing truth values to take any number between zero and one. One way to to this is with Fuzzy Logic but another approach is via Post logics. If you really want to you can even consider probability theory as a branch of logic where probabilities are real-valued truth values.

Relevance Logic

Now consider what I said above about implication. I basically said that implication has a truth theoretical interpretation - that you could determine the truth value of "A implies B" by knowing the truth values of A and B. But does that really capture what is meant by "implies"? If "A implies B" really were the same as "not both A and not B" then any false proposition would imply anything you like. For example, my car has air bags. So according to classical logic it's correct to say "if my car had no air bags the moon would be made of cheese". But what does my car have to do with the moon? There seems to be something fishy about a system of logic that allows you to make deductions from premisses that are completely irrelevant. So some logicians take issue with the idea that "A implies B" should be defined truth theoretically. In fact, some logicians argue that it's so obvious that you can't define implication truth theoretcially and that only several years of brainwashing (eg. a university course in mathematics) could convince someone otherwise.

This is where relevance logics come in. Relevance logic (pioneered by Dunn) attempts to define conditions under which A is relevent to B, for example by tagging the explicit use of A in the demonstration of B from A. There is an interesting consequence of saying that A has to be relevant to B to prove B. We might find that even if we have managed to prove one false statement we can't prove some others. This is different to what I said about classical logic above where just one false statement brings the whole system crashing down. So relevance logics can be robust against inconsistency making them what's called paraconsistent. Some logicians consider this to be an advantage. For example, at several points in history physicists have had inconsistent laws of physics (for example just before a scientific revolution) to work with and yet have still reasoned succsfully. But there are also other approaches to logic that deal with this issue.

Intuitionistic Logic

You may remember I ended the Alternative Numbers article with a quote from Kronecker. He had a hard time believing in some of the constructions mathematicians were using. In particular he disliked the Law of the Excluded Middle (LEM) which says that either X or not X. In other words that either a proposition is true or its negation is true. There's no middle ground between these two possibilities. It's often used by mathematicians like this: suppose I'm having a bit of trouble proving X. Instead I can suppose that not X is true and see where that gets us. If I find that it leads to a contradiction then I know something is up because in classical logic it is assumed that you cannot prove a false statement. If I did every step of the derivation correctly then the original hypothesis must have been false. There is no "middle way" where the original hypothesis might have been half true. If it leads to contradiction it must have been false. (This is known in the trade as Reductio ad Absurdum.) But here's the issue for people like Kronecker: X might be a proposition saying that a certain mathematical object exists. Using LEM you might be able to prove it exists without actually showing how to find it. It's a bit like getting something for nothing and mathematicians use it all the time. But Kronecker didn't like it at all, it seemed like cheating. He wasn't alone.

Another mathematician, Brouwer (probably more famous for his fixed point theorem), also had trouble with LEM, and he started devising an alternative form of logic called Intuitionistic logic. Actually Brouwer was dead against formalism of the type that logicians use, but nonetheless someone came along after Brouwer and formalized his logic! It turned out that this led to a particularly elegant type of logic.

Part of the intuitionist's problem is that it's possible to prove "A or B" without having a proof of A or a proof of B. To an intuitionist a proof of "A or B" is a proof, either of A, or of B. In fact - that's just how intuitionists define a proof of "A or B". When you do this it suddenly become a lot harder to 'cheat'.

Consider what is meant by "A implies B". To a classical logician it's just "not (A and not B)". But this just doesn't seem to capture the "intuition" of what implication means. A better idea might be something along these lines: if you know A implies B, then as soon as you know A then you know B. In other words the statement "A implies B" should give you the means to prove B the moment you've proved A. So you can imagine "A implies B" as a recipe for converting a proof of A into a proof of B. In fact this is exactly how intuitionists view implication. A proof of "A implies B" is a function that takes as input a proof and spits out another. If the input is a proof of A then the output should be a proof of B. This is radically different to the classical view. By making implications a type of function we connect logic with set theory and with subjects like lambda calculus. And there's another really cool thing going on: mathematicians use the symbol -> in many different ways. Among them are two: they use it to mean both implication and a function mapping (eg. f:A->B means f maps from set A to set B). In other words, by strange coincidence we've turned a proof into a function and we use the same symbol for both. It turns out there is a whole branch of mathematics that deals with arrows! It's called category theory and it provides a beautiful integration of everything I have mention in the last two paragraphs.

But before you take up intuitionist logic you should be warned. Brouwer had carried out some great mathematical work in his life. But when he became an intuitionist he regarded much of his work as being no longer of value. He had to reject things that other mathematicians thought were obviously true. Eventually other mathematicians came to see Brouwer as a little crazy! On the other hand intuitionist logic is closely tied up with Constructive Mathematics (not to be confused with a similarly named educational fad) which insists that you always construct things you claim exist. This means that it goes hand in hand with computer science where people's programs are usually expected to produce a result - not just say a result exists.

Second and Higher Order Logic

Earlier I briefly mentioned quantifiers. An obvious question about them is this: when I say "for all x", what kinds of x are we talking about? If you answer this question then you know what the basic 'objects' are that your logic deals with. For example, in mathematics we generally deal with sets and when we say "there exist an x such that..." we really mean "there exists a set x such that...". In fact mathematicians generally build everything out of sets using a set of axioms called ZF (and maybe some more axioms too - like the Axiom of Choice).
For example, as I mentioned in the Alternative Numbers story, mathematicians build the ordinal integers out of sets.

But what about properties of sets or numbers? How do we talk about those as opposed to the underlying sets themselves? Mathematicians have several different ways to talk about properties. For example, if you want to talk about primality you can just talk about the set of prime numbers. As ZF can talk about sets, by considering primality through the set of primes we have just turned primality into something ZF can talk about. But sometimes there are properties we want to discuss where we can't form a set. For example consider the property of being a set. If we were to convert this into a set then we'd need to have a set of all sets.
But ZF doesn't allow this (because of hairy problems like the Russell paradox). So how can we talk about a property like that?

One approach is to modify logic by extending our universe to include properties. This is what is known as Second Order Logic - as opposed to the usual logic that is first order.
Using Second Order Logic you can say things like "there is a property such that if two sets have this property then so does their union". You simply can't say this in First Order Logic.
But there are some difficult issues associated with First Order Logic. Logicians like to prove things, not just in their logical systems, but about their systems. It's actually very hard to prove anything about Second Order Logic. There are also some other tough issues like deciding when two properties are the same. In fact Quine believed that Second Order Logic isn't even Logic. And even though Second Order Logic appears to have greater expressive power mathematicians actually get along fine with First Order Logic. Every time they make a statement about properties there's always a workaround (sometimes quite inelegant) for expressing something that's just as useful in First Order Logic. By the way, you can also generalize further, so for example in third order logic we can talk about properties of properties.

Modal Logic

When I was talking about multi-valued logic I mentioned the concepts of necessity and possibility. These are called 'modalities' by logicians and a logic that deals with this kind of concept is called a modal logic. There are many different types of modal logic but I'll mention four: alethic logic, deontic logic, epistemic logic and temporal logic. An alethic logic is one like that mentioned above. Operators for necessity and possibility are introduced.
These operators are very closely related in that if it is not necessarily the case that X is not true then X is possibly the case and vice versa. We can write this, using the above notation, as MX=~L~X. Let's think about the semantics of this for a moment. What does it mean to say Bush is a Democrat is false not not necessarily false? If it's false, it's false. Well when logicians are thinking about the semantics of modal logic they often work with 'possible worlds'. Although it is true that Bush is a Republican we can imagine a world in which he is a Democrat (well I think I can anyway). Only when something is true in all possible worlds is it considered necessarily true, otherwise it's just plain true (this is not strictly the definition used but it's close enough for now). Can you imagine a world in which 1=2?

Deontic logic deals with what you ought to do. Introduce the symbol O that means 'ought'.
Then OX means X ought to be the case. Just like in alethic logic we can consider the 'dual' operator ~O~. ~O~X means It is not true that it ought not to be the case that X. In other words it means X isn't forbidden. Epistemic logic deals with belief and knowledge.
For example KX means X is known to be true. KX->X but the converse isn't true. Among other things epistemic logic can be used to study cryptographic protocols in a formal way. Lastly temporal logic introduces time into logic with operators like H so that HX means hitherto X. In temporal logic whenever you ask about the truth of something you need to say at what time you are talking about. HX then means X is was always true up to the time specified. Here's an exercise: what does ~H~ mean in English?

Quantum Logic

Are our logical principles necessarily correct or are they an empirical fact that we have discovered about the universe? Once upon a time Euclidean geometry was seen as necessary. It just seemed obvious that parallel lines were always the same distance from each other. But along came Lobachevsky with the notion that there were different types of geometry and then came Einstein who showed that the universe itself was actually described by one of them. So maybe logic will go the same way. One philosopher who argued this was Hilary Putnam. Just as General Relativity changed geometry he said that Quantum Mechanics changes logic and that in fact we should be using Quantum Logic!

So how do you have to modify logic to make it Quantum? It's difficult to go into the details but I can give a taste of it. Instead of truth values 0 and 1 truth values are subspaces of vector spaces. If you don't know what that means I'll try to give an example. A vector space is a flat space that extends out to infinity and has a chosen point in it called the origin or zero.
A subspace is a space contained in that space with the same chosen point. For example a 2D plane can be thought of as a vector space and the lines going through the origin are examples of subspaces. We can now define how the logical connectives work: given a valuation v v(A and B) = means the intersection of v(A) and v(B). v(A or B) means the smallest subspace containing v(A) union v(B). v(not A) means the space perpendicular to v(A) (in particular not(V), where V is the entire vector space, is the set {0} containing just the origin and not({0}) is the whole of V).

So doing logic is now doing geometry. The boolean & operation, for example, is replaced by vector space intersection. If a proposition turns out to have truth value {0} then we can consider it to be false and if the truth value is the whole of V then it can be considered true. But what about all those in-betweens? Well they correspond to those weird situations in quantum mechanics where you simply can't know two things at the same time. For example because of the Heisenberg uncertainty principle it simply doesn't make sense to say "this electron has mass m and momentum p". Quantum Logic perfectly captures these aspects of quantum mechanics.

(Aside for physicists: truth values aren't strictly the subspaces but the operators projecting onto these subspaces. A projection, P, has the property that P²=P. So the eigenvalues of P are 0 and 1. This means that Quantum Logic really is the quantization of logic - we have replaced truth values with non-commuting Hermitian operators whose eigenvalues are the results we want!)

So do physicists use Quantum Logic? Well, no actually. Part of the reason is that it's a victim of its own success. It works so well and fits perfectly with quantum mechanics that it doesn't actually say much that's new. So physicists get along fine without it. But philosophers of science do occasionally make use of it. (By the way, if you thought it was weird that truth values in Quantum Logic were geometrical, then maybe I'd better not mention that one way of defining a valuation for Intuitionistic Logic is through topology!)

Linear Logic and Non-monotonic Logic

Looks like I still have space to briefly mention some other types of logic. There are still some aspects of logic that have the potential to be varied. For example in every system we've looked at above we are able to take propositions and deduce further propositions from them.
As you apply the deduction rules there is an ever increasing set of theorems that you have proved. Consider the propositions "all birds can fly" and "a robin is a bird". From these we may conclude "robins can fly". However, if we also have the proposition "a penguin is a bird" we might conclude that penguins can fly. It looks like the proposition "all birds can fly" needs to be amended. Yet it doesn't seem entirely right to completely throw away that proposition - it appears to have some value. One approach is through default reasoning. By default, in the absence of other information, birds fly. Logics that can handle this type of reasoning are called non-monotonic. The name comes from the fact that if you have more information, eg. penguins don't fly, you can end up being able to prove less, eg. you can no longer use the proposition "all birds can fly" in the default way. Default logic is of interest to practitioners of AI as it gives a way to codify some aspects of 'common sense'.

You can even 'lose' theorems in one form of logic - Linear Logic. Suppose you are describing a vending machine. You put in one dollar and get either a Coke^TM or a Sprite^TM. If you have one dollar you can get one or the other but not both. So the proposition "I have one dollar" can be converted into either "I have a Coke^TM" or "I have a Sprite^TM" but once it has you can't do it again unless you have another dollar. Linear Logic is something like this and certain types of deduction rule actually 'consume' their premisses as you use them. Linear Logic can be useful for describing systems with limited resources.

Laws of Form

Phew! We're coming near to the end. So to wind down I'm going to end with a very simple logic from the book Laws of Form by G Spencer-Brown. Now most mathematicians are pretty sure that this is the work of a crackpot, nonetheless the first part introduces a fun idea that people have managed to get mileage out of. To introduce this type of logic you really have to break the most fundamental rule of logic - that sentences be strings of characters.
In this type of logic there is only one symbol, a circle. (Actually, Spencer-Brown didn't use circles but this notation is equivalent and is in a similar spirit.) It doesn't have to be a precise geometric circle, it could be a squiggly circle. This is known as a distinction.

As I can't use pictures I'll use ASCII art - the net effect being that I'll be using strings of characters anyway.

Here are some circles: () and o. What can you do with a circle? Well you can put one next to the other like this:

o o.

According to Spencer-Brown the rule is that whenever you see that you can replace it with o.
I.e.

o o = o.

The other rule is that if you see one circle directly inside another they cancel.
I.e.

(o) = .

That's a circle within a circle on the left. (Use your imagination!)
Notice there's nothing at all on the right hand side of that '=' sign! Here are some example derivations:

((o)) = o and (((o)(o)o)o) = ((o)o) = (o) = .

Now you can throw in some unknowns and start proving theorems like that
for any x and y, ((x))=x and y y = y. It turns out that you can represent propositional calculus using these symbols.
Have fun doodling!

Further Reading

I learned a tremendous amount from The Blackwell Guide to Philosophical Logic. When I needed more detail on classical logic I used Notes on Logic and Set Theory and Computability and Logic. Otherwise I used the many resources available on the web some of which I have linked to above. Incidentally I learned about Laws of Form from a British magazine Personal Computer World in 1982.

Final Note
I've had to make many sacrifices in order to make this article of a reasonable length. I've had to be a little vague or sloppy in places. So please use the references and links if you are interested in anything I say. Unfortunately I've probably committed the extra sin, in places, of being incorrect too.

Friday, October 07, 2005

Quantum Mechanics and the Fourier-Legendre Transform over a Semiring

Update: see also my more recent notes on this subject.

Consider the two semirings: (R,+,*) and (R',min,+) where R' is R with a positive infinity value added in. The former is just the real numbers added and multiplied in the usual way and in the latter we use the operations min (which gives the lower of its two arguments) and + instead. There are many formal similarities shared by these structures. For example we have the distributive laws a*(b+c) = a*b+a*c and a+min(b,c) = min(a+b,a+c). Zero acts as the identity for + and we have a+0 = a and infinity is the identity for min with min(a,infinity) = a.

It turns out that the former semiring can be viewed as a quantum version of the latter semiring. In particular we can frequently take statements from quantum mechanics and consider them to be statements in a more general semiring rather than over (R,+,*). When we interpret these more general statements in the semiring (R',min,+) they turn out to say things about classical mechanics.

Consider for example the Hamiltonian formulation of classical mechanics. This essentially says that dynamical systems evolve in such a way that the integral of the Lagrangian is minimised. In other words, the integral of the Lagrangian is the min of its value for all possible paths. In quantum mechanics we no longer have systems taking the minimum but in a sense they take all paths. To compute physical quantities we must instead use the Feynman path integral to integrate over all paths. The factor we must integrate is essentially the exponential of the Lagrangian. In classical mechanics we have the min over an infinite set, in quantum mechanics we have the sum (ie. integral) over the same set. See here for a recent paper on this subject.

Another surprising analogy between these two semirings arises we we try to transfer the concept of the Fourier transform to (R,+,*) to (R',min,+). It's not obvious how to interpret the exponential function in (R',min,+) but it turns out that the natural choice is to consider the ordinary linear functions (in the conventional sense) to be the correct analogue. If we then replace the exponentials with linear functions in the definition of the Fourier transform and replace the integral with an infinite min what we end up with is another familiar transform: the Legendre transform. So the Fourier transform and the Legendre transform may be interpreted as the same thing, just over different semirings.

The analogy carries quite far. In classical mechanics the Legendre transform converts between the Lagrangian and Hamiltonian formulations of classical mechanics. So it changes equations of motion written in terms of (generalised) position into equations written in terms of momentum and vice versa. In quantum mechanics the Fourier transform does much the same thing: the Fourier transform of a wavefunction in space gives the wavefunction in momentum space. Sean Walston has a paper on this. I'm not sure he was aware of the semiring connection when he wrote that.

In summary we have:

(R',min,+)	(R,+,*)
min(a,infinity) = a	a+0 = a
a+0 = a	a*1 = a
a+infinity = infinity	a*0 = 0
Classical	Quantum
integral	minimisation
x -> k*x	x -> exp(k*x)
Legendre Transform	Fourier Transform
Hamiltonian principle of least action	Feynman path integral

I never did understand the Legendre transform. It always seemed like this strange thing plucked out of nowhere. So it's amazing to see that in some sense it is the 'right' thing to study and is as natural as the Fourier transform. Fascinating stuff!

Thursday, September 22, 2005

Are Men Innately Better than Women at Mathematics?

No, argues Elizabeth Spelke. In the linked paper (which isn't yet peer reviewed) she looks at a great wealth of published data and comes to the conclusion that the differences can all be explained by socialisation. There's also an old debate between Spelke and Pinker over at Edge.

Wednesday, September 21, 2005

Tetris and matrices over semirings

Here's a position in Tetris:


  X    X
XXXX XXX X

We can describe this as a list of heights (1,1,2,1,0,1,1,2,0,1).
Here's a Tetris piece:


   Y
 YYY
 Y

and here's what you get when you drop one on the other


   Y
 YYY
 YX    X
XXXX XXX X

We get a new list of heights (1,3,3,3,0,1,1,2,0,1). This has failed to capture the 'overhang' in the 4th column but lets just think about Tetris pieces dropped from the top so overhangs make no difference. What is the relationship between the old and new list of heights?

It turns out that you can get one from the other from a single matrix 'multiplication' - the catch is that we have to work over the semiring (R union infinity,min,+) instead of the more usual semiring (well, actually field) (R,+,*).

Let hn be the initial height in the nth column and hn' be the height after dropping the new piece. Think about how low the new piece made of Ys can fall. If the left most column of the Ys is in contact with the X's then it's lowest point ends up at height h2+1. If he middle column is in contact then the lowest point ends up at h3. And if the right column is in contact the lowest point ends up at height h4. So the lowest point ends up at height min(h2+1,h3,h4). So we can now write down expressions for the new height: h2' = min(h2+1,h3,h4)+1, h3' = min(h2+1,h3,h4)+1, h4' = min(h2+1,h3,h4)+2. We can rewrite this as:

h2' = min(h2+2,h3+1,h4+1)

h3' = min(h2+2,h3+1,h4+1)

h4' = min(h2+3,h3+2,h4+2)

Now let's switch to the semiring (R union infinity,min,+) (with 'additive' identity OO = infinity). Use the notation (+) = min and (*) = +. Then we can write:

h2' = 2(*)h2 (+) 1(*)h3 (+) (1)*(h4)

h3' = 2(*)h2 (+) 1(*)h3 (+) (1)*(h4)

h4' = 3(*)h2 (+) 2(*)h3 (+) (2)*(h4)

We can now write this in matrix form h' = Mh where h is


(  0 OO OO OO ...)
( OO  2  1  1 ...)
( OO  2  1  1 ...)
( OO  3  2  2 ...)
( ...  ...    ...)
( ...          0 )

So corresponding to each Tetris piece in a particular column we have a matrix. To find the effect of dropping multiple pieces we just multiply the matrices in the semiring in turn.

Turns out this has all types of applications - for example resource allocation in a queing system. I'll leave you to figure out how falling testris pieces might represent jobs submitted to a queing system and how the columns might represent types of resource.

Here's an easy exercise: If M is the matrix of a piece, what piece is the transpose M^T the matrix of?

I learnt this from this paper by Gaubert and Max Plus. (I think Max Plus is a relative of Bourbaki.)

Many algorithms are 'linear' in this semiring. Another example that seems quite different is the dynamic programming algorithm for approximate DNA string matching. I tried to work out what the transpose matrix did (ie. what the adjoint algorithm was) and then use the code reversal method I mentioned earlier to see what new algorithm I obtained was. It was another approximate string matcher but this time it worked through the strings in the opposite direction. (Note how each operation in this description is essentially of the form a = max(some terms added to constants).)

Tuesday, September 20, 2005

The 100 Greatest Theorems

Here is the list, presented at a conference in 1999. I think that like IMDB we also need a list of the 100 worst theorems. Any suggestions for candidates?

Monday, September 19, 2005

A Certain 24th Root

This paper by Heninger, Rains and Sloane (that's Sloane as in the famous Encyclopedia) is interesting. It's about the conditions required for a formal power series with integer coefficients, to be the nth power of another such series. In particular it focuses on theta functions for lattices.

The theta function of a lattice is simply the sum of x^u.u for all u in the lattice. Clearly it is a formal power series with integer coefficients if the lattice is such that u.u is always an integer. These are often pretty amazing functions, for example for certain types of lattice they are modular forms. It turns out that the theta function for the Leech lattice is a power of 24. More amazingly: this is a new result!

It's clear that the theta function is the generating function for the number of lattice elements of a given length. So if it's a 24th power you'd expect this to have a combinatorial explanation too. Unfortunately the proof isn't combinatorial. So even though the 24th root appears to be counting something, nobody knows what it is that is being counted!

Anyway, Sloane and friends had fun trawling through their sequence database trying to find other sequences that were powers. Another interesting one they found was the exponential generating function for the number of partitions of {1,...,n} into an odd number of ordered subsets. It turns out this is a perfect square. This is begging for a combinatorial explanation but it hasn't yet been found. I'll have to see if I can reproduce the square root of this series with my (unfinished) Haskell code for formal power series.

And of course this paper is interesting to me because of the prominent role of the number 24 :-)

Monday, September 12, 2005

Of Partitions and Primes

I must have been hiding under a rock or something. I've only just found out about a big mathematical result about the partition function, p(n). p(n) is the number of ways of writing n as distinct (unordered) sums of integers. p(4) is 5 as 4 = 4+1 = 2+2 = 2+1+1 = 1+1+1+1.

First a digression: There's a bizarre formula for p(n) due to Rademacher and you can see it here. Note the n-1/24 subexpression. You might wonder where that 1/24 comes comes from. Actually it has a neat physical interpretation. In bosonic string theory you have a string which can oscillate with different harmonics. Given a string with a certain amount of energy there are different ways the energy could be shared out between the different harmonics. In quantum mechanics the amount of energy that can go into each harmonic is 'quantised' so that it is proportional to an integer. In addition the energy of a single quantum of energy in the nth harmomic is proportional to n. What this means is that if you have m₁ quanta in the lowest harmonic, m₂ in the second and so on then the total energy in the system, in suitable units, is E=m₁+2*m₂+3*m₃+... In other words the number of different states with energy E is just the partition function p(E). In practice it's a little more complex because strings oscillate in a high dimensional background space so in fact there are harmonics in different directions. But essentially it's a similar counting problem. What I've actually described is a type of conformal field theory. Bosonic string theory (which is a 26D theory) is basically 24 copies of this system.

The interesting twist with quantum string theory is that when you consider the minimum energy you can put into each harmonic it isn't zero. You can argue this from the Hesenberg uncertainty principle, it can't be at rest because then you'd know its position and momentum simultaneously. So it must have non-zero energy. You can treat each harmonic independently so each harmonic has some energy. In fact they each have a minimum of half a quantum. (This is standard stuff.) So the correct formula is really E=(m1+1/2)+2*(m2+1/2)+3*(m3+1/2)+... The catch is that when you try to add up you get a divergent series. But this is no problem for physicists as you can sum the series using the methods outlined here. In fact, the 'zero-point' energy is essentially 1/2(1+2+3+...) so zeta regularisation gives -1/24. So the partition function of n is actually a function of the energy of a conformal field theory in its nth excited state. As I've obsessed about before, this is the same 24 that is the dimensionality of the Leech lattice, Golay code and lots of other good stuff. It's all connected mysteriously! Somehow this information about the quantum theory of strings is already encoded in the properties of the partition function discovered 70 years ago even though the partition function is defined by the elementary properties of numbers. This is also why I think String Theory is cool even if it might be useless as a physical theory. Even if we wanted to give up the string theory 'research programme' today it'd still be there hidden away inside the mathematics we study, regardless of whether we think that mathematics has physical applications.

I'll stop there as I don't even pretend to understand the connections. But do check out the proof of Rademacher's formula via Farey sequences and Ford circles. It's in a few analytic number theory books (eg. Apostol's Springer text on the subject) and it's one of the more beautiful proofs out there.

Anyway, that's all irrelevant. What's relevant is that the partition function has neat congruence properties. Ramanujan proved p(5m+4) is a multiple of 5 for all natural numbers m. He also showed p(7m+5) is a multiple of 7 and p(11m+6) is a multiple of 11. Many years ago Dyson explained the 5 and 7 cases using what was called the "crank conjecture" and then later this was extended to 11. But in the last couple of months Mahlburg has proved that the crank conjecture applies for all primes giving a way to produce such congruences for any prime. This is a pretty major result. People have been investigating these congruences at least since Ramanujan's day but now we have a handle on all of them simultaneously. More detail is posted here and this brings to the end a long chapter in number theory. Note, by the way, how often the number 24 appears through out this paper. :-)

Update: I've edited this so it makes slightly more sense.

Adjoints

I keep starting papers and not finishing them. Both reading and writing, but this time I mean writing. But my latest is short enough that the first draft must be practically the final draft so once my colleagues have suggested improvements I'll see if I can submit it to a journal before the end of the month. Anyway, the inspiration behind this paper was the realisation that you can transform code that computes linear functions into its adjoint by writing the code 'backwards' and taking the adjoint of each line. I could explain what I mean but I found a web site that explains it all. I found an application to digital filtering where I was able to transform a useful algorithm into another by essentially writing it backwards and reversing all of the loops. But the general method of computing the adjoint of code seems so useful I assume it's commonplace, but I can't find a single paper on transforming code in this way, just the web site I mentioned above. It's closely connected to reverse mode automatic differentiation but more general as it applies to any linear operator, not just the action of the derivative on the tangent space.

OK, I will explain. Consider the line of code, in C++, a += b. We can write this in matrix form as


(a) = (1 1) (a)
(b)   (0 1) (b)

If we take the transpose of the matrix we get


(a) = (1 0) (a)
(b)   (1 1) (b)

which is the same as the code b += a. More generaly, any block of code that performs linear updates to variables can be converted to its adjoint by writing the steps backwards and transforming each line of code like this. A more complex example (exercise) is a = 2*a+3*b-4*c; b = 2*a which maps to a += 2*b; b = 0; b += 3*a; c -= 4*a; a *= 2. If your code has loops, eg. iterates over arrays of variables, you just run the loops backwards. The application for me is that there is a published fast algorithm for some linear operator but I need the adjoint. Surprisingly the adjoint of the fast algorithm can actually turn out to be a whole new fast algorithm.

Incidentally, I asked the journal editor for the expected turnaround time for the decision on whether to publish. He responded with the mean, standard deviation, extrema and quartiles. If only people like plumbers, doctors and cable installers could respond in this way. "If I attempt to fix this leak it has a 50% chance of taking 1 hour to fix, a 25% chance of taking 2 and there's a 5% chance I won't get it done today" rather than "I'm charging by the hour and it'll take as long as it takes unless there's a complication in which case it'll take longer".

Tuesday, September 06, 2005

Infinitesimals

I'm currently reading this article on the history of the infinitesimals. My interest in the subject comes from a subject I've mentioned earlier: automatic differentiation. Implementing infinitesimals as an extension to the real numbers is straightforward in modern programming languages but for some reason it's not well known. They give a nice efficient way to compute derivatives of functions and remove removable singularities from functions.

One of the earliest ways to construct the real numbers from the rationals was as an equivalence class of sequences of rational numbers. Two sequences are equivalent if the difference tends to zero. In Non-Standard Analysis, developed by Robinson, this definition is tweaked slightly so that the sequences are equivalent if they are equal on a 'large' subset of the sequence. The trick is to define 'large' appropriately leading to the concept of an ultrafilter. This results in a set *R which is an extension of the usual set of reals R but also contains infinitely small and large numbers. There's also a nice 'metatheorem' that goes with this construction showing that 'reasonable' theorems about *R are also theorems about R. This basically gives us license to use infinitesimals in our proofs and still be sure that our results make sense as propositions about real numbers. So many of Leibniz's old 'proofs' using infinitesimals, say, can now be reinterpreted as perfectly valid proofs.

Anyway, you don't need all this machinery to do automatic differentiation, just an algebraic structure with elements whose square is zero.

Incidentally, Gödel used ultrafilters to 'prove' the existence of God.

Thursday, August 25, 2005

Caustics and Catastrophes

Caustics, at least the optical ones, are the bright regions you see when light passes through a refracting medium. Examples include the patterns you see on the bottom of a pool, the bright patches you might see on a wall opposite a curved window, the focal point of a lens when a parallel beam falls onto it, or the familiar cardioid shape you see in a cup of coffee. The twinkling of stars are caustics caused by atmospheric turbulence passing across your retina. Caustics aren't always optical effects. For example rogue waves at sea can be caused by refraction-like effects as waves move between regions of sea with different currents.

The shape of a caustic is a complicated function of the physical system. But whatever the system you frequently end up with the same features. For example in my image above you see many regions where two twisty lines meet forming a kind of long twisty V shape where the sides of the V protrude a little past the intersection to make a kind of swallowtail shape. This shape is universal. You'll see it in caustics formed by all kinds of refracting media. You'll see it in the kind of caustics formed by the gravitational fields of astronomical bodies in microlensing events. There are other characteristic shapes that frequently come up and what makes them so interesting is that they are independent of the precise details of the physical system. The underlying explanation is catastrophe theory. Unfortunately, few people talk about catastrophe theory these days as it became tarnished through misapplication. I guess it was the chaos theory of the 70s. But like chaos theory, it has completely sound mathematical underpinnings.

Very roughly, in catastrophe theory we look at (infinitely) differentiable functions f:M->N where M and N are manifolds. The size of the set f^-1(x) may vary with x depending on how f 'folds' M into N. Obviously the map from x to the size of f^-1(x) is discontinuous if it's non-constant on a connected manifold N. Catastrophe theory is basically about the singular points where this set jumps in size. The interesting thing is that in low dimensions, then there are basically 7 types of 'interesting' singularity up to local diffeomorphism. This is known as Thom's theorem. These 7 types have interesting names from the simple 'fold', through the swallowtail to the parabolic umbilic. And you've seen the swallowtail already, it's what I pointed out in the caustic image above. The reason why these catastrophes appear in caustics is that you can think of an optical system mapping incoming rays to the destination surface on which they eventually impinge. For non-trivial systems this is a many-to-one function and it can fold back on itself in all sorts of interesting ways. Thom's Theorem therefore allows us to classify the types of shapes we may see in caustics without knowing specific details about the system.

Anyway, I was interested in finding a way to generate caustic-like images cheaply. Armed with this knowledge that caustics can be seen as folds of some manifold mapping to another I was able to produce I really cheap algorithm. Basically I take the plane. I deform it using a random function (eg. Perlin noise or something synthesised in the Fourier domain). I then render the deformed plane. I do this by tesselating the plane as triangles. Each triangle in the tesselation represents a beam with triangular cross section. So the key thing is to ensure that the amount of light going through each triangle is the same before and after deformation. In other words, if the deformation makes a triangle twice as large I make it half as bright and vice versa. That's it. Animate the deformation nicely and you get something that looks just like the patterns at the bottom of a pool rendered in real-time and without have to use any knowledge of optics. It essentially looks like caustics simply because of the universality of catastrophes meaning that a wide variety of randomly made up deformations look like the right sort of thing. This paper does something similar except that they attempt to approximate the correct caustic for a given water surface whereas I'm just trying to capture a look. The image above is an example.

Catastrophe was used in the 70s for things like modelling conflict, understanding the difference between erotica and pornography, predicting when prison riots will happen and so on. You can see why the subject fell out of favour. As I say, it was the Chaos theory of its day.

Incidentally, you may wonder what deeper significance the number 7 has here. It turns out the catastrophes are each closely tied to Dynkin Diagrams. Dynkin diagrams have a bizarre habit of popping up in all sorts of branches of mathematics - quivers being a popular one right now. But at this point I really have no clue what's going on as I've barely read a few pages of my Catastrophe Theory book.

Wednesday, August 24, 2005

The Unnatural Naturals

In a comment from my previous post, vito got me thinking about the naturals. Most mathematicians use the term 'natural' to refer to numbers in the set {1,2,3,...}. However, I prefer to think of the set {0,1,2,3,...} as the naturals. As the precise definition of 'natural' is still debatable I can get away with my usage, as long as I define exactly what I mean. My gripe with the more usual definition of 'natural' is that it's not natural at all.

Consider some of the natural ways to build the integers. For example there are the Peano axioms and the finite ordinals. These define sequences that start at zero, not one. In fact, both of those web pages define zero to be natural. Countless theorems are more naturally written as statements about {0,1,2,...} rather than {1,2,3,...}. I can't even see why the set {1,2,3,...} needs a name any more than the set {2,3,4,5,6,7,...}. Having said that, when I was a student at Cambridge there was a formal debate over the definition of the naturals. Things suddenly became more complicated when a faction suddenly appeared arguing for the set {2,3,4,...} on the grounds that if numbers are for counting then the number one doesn't exactly do much counting. I think I should have formed a pro-{3,4,5,6,...} faction at this point, after all, the ancient greeks, who are noted for their wisdom, only marked their nouns as plural for numbers strictly greater than 2.

Anyway, there is some discussion of the debate on Wikipedia and the conclusion is probably the same one I would come to: the set {0,1,2,3,...} is more popular among logicians and set theorists (I'd probably add category theorists) and {1,2,3,4,...} is more popular among number theorists.

And of course, when programming, I think arrays should start at zero. You won't catch me using Fortran or matlab. Come to think of it, as the first ordinal is zero, shouldn't we actually start counting at zero?

Sunday, August 21, 2005

The Alternative Numbers

I thought I'd recycle a couple of old posts I made to K5. There may be some broken links which I'll come back and fix. Here's the first:

Fed up of seeing the same numbers over and over again? Need a little change? Well look no further than here for a quick survey of alternative number systems and their applications.

I guess most people are pretty familiar with the natural numbers {0,1,2,3,...}, otherwise known collectively as N, the integers {...,-2,-1,0,1,2,...} called Z (from Ger. Zahl), the rationals (the fractions like 1/3 and -27/11) called Q (from Ger. Quotient) and the reals (because numbers like e and pi can't be written as fractions) called simply R. Of course quite a few of you will be pretty au fait with the complex numbers, C, too.
Start with the real numbers and throw in the number i with the property i²=-1. Some of you might debate all night over whether they exist but the rest of us will just get on and use them in applications such as electronic engineering, digital signal processing and computer graphics. Q, R and C are all fields - basically this means you can add, subtract, multiply and divide them in the familiar way and in other ways are very familiar too. And if you haven't seen it before, this formula due to Euler ought to blow your mind:

e^it=cos(t)+isin(t).

Talking of graphics, quaternions are pretty popular too. You've seen how to make the complex numbers by adding in a new number i so why not try our luck again and throw in j whose square is -1. The catch now is: what is ij? Well let's call it k.
In fact if we add the rules ij=ji=k, jk=-kj=i, j²=k²=-1, and a few more, I'm sure you're getting the idea by now, we end up with H, the quaternions (damn! notice Q's already taken!).
Anyway, any quaternion can be represented by 4 real numbers. For example 1+2i-4j+11k is represented by (1,2,-4,11). That means we can embed good old 4 dimensional spacetime in there.
In fact that's what Hamilton originally used them for.
Well, Einstein hadn't come along yet so actually he was only using the last three components to represent points in space but you can use the first for time too. Turns out the quaternions do a really sweet job of doing geometry which is why graphics people like to use them. If you thought Euler's formula was cool that's nothing compared to what you can do with these babies. In fact their main application is in eliminating all those complicated trig formulas you get when you try to carry out rotations in 3D.
Notice that we lost something by using these numbers. Two times three is three times two but i times j isn't j times i. We've sacrificed what's known in the game as commutativity.

Well there's a bit of a pattern going on. Start with R, move on to C and then Q. The quaternions were 4-dimensional. What about 3-dimensional? Or 5-dimensional? Well you can try making these things (exercise!) but you find they're not all that interesting. They just aren't well behaved.
That is until you come to 8 dimensions. Then you find you can make the octonions, otherwise known as the Cayley numbers and labelled O. Still, you have to make more sacrifices. When you multiply a by b and c it matters whether you do ab or bc first.
No longer are we guaranteed things like (2.3).5=2.(3.5). We have lost associativity. Still, there's fun to be had with octonions. You can discover the delights of spinors and triality, find that there is a well defined cross product in 7 dimensions and if you've enough stamina you'll see they have neat connections with string theory.

How long can we keep this up for? Well if you really insist you can make the 16-dimensional sedenions but they're so unruly that I just wouldn't bother if I were you!

But you might have noticed. All of these things were built from the real numbers. The whole point here is to get away from them. What other alternatives do we have? Well you could play with the integers modulo N. Just do normal math but every time you get an answer bigger than or equal to N divide by N and keep only the remainder. For example modulo 9, 8+7=6 and 3*3=...hmmm...0. That zero isn't too good, we don't want to end up with zero when we multiply perfectly respectable numbers. That's easy to fix: choose N to be prime. (Think about it!). Now we can even divide these numbers. In fact Z/NZ (that's their other name BTW) forms a field, just like R and C. That means you can add, subtract, multiply and divide by anything non-zero. Still, numbers that only go up to N-1. Sounds a bit useless. Au contraire! They're the at the heart of algorithms like RSA encryption.

But we're still not being alternative enough.
After all, numbers modulo N are still, well, numbers. Can we get a bit more radical?
Try this for size: you're used to decimal expansions that go on forever like 1/7=0.14285714285...
Maybe we can make digits go infinitely far to the left of the decimal point instead.
Consider a number like ...99999999. Looks pretty big. But try adding 1. Well 9+1=0, carry 1. Write down the zero. Now 9+1=0, carry 1. Hmmm...looks like the answer is going to be ...000000 with that 1 getting carried out to infinity and its final cashing out getting postponed forever. So what we have is ...9999999=-1. That might look familiar to some of you. I'll give you a clue: this is called tens complement arithmetic. Or at least I've just called it that. So it seems that arithmetic with infinite sequences of digits might just work out. The catch is that you can't always divide these things. However, if we work not with base 10, but base p digits it turns out that we can always divide by any non-zero number. These numbers are called the p-adics, or Z_p. For each prime p the p-adics form a field. They're very useful for doing number theory and of course that means they're good for cryptography. BTW, going back to the 10-adics again, try calculating directly that ...999999²=1.

But what about infinity? Can't we make that a number? Turns out that there are lots of ways to introduce infinity to a number system. Let's start with the cardinals - numbers used to count how many elements there are in a set. For example {red,green,blue} has 3 elements meaning that the set of primary colors has three elements.
Here 3 is being used as a cardinal. But what about this perfectly respectable set {0,1,2,3,...}=N, what is the size of that? Well that's the first transfinite cardinal and it goes by the name of aleph₀. (I apologize, I don't know how to do hebrew HTML.) The next biggest cardinal is called aleph₁ and you can guess the rest of the pattern. But here's an interesting conundrum: how big is the set of real numbers R?
Well it's called c and it's bigger than aleph₀. But is it bigger than aleph₁? Turns out it's your choice!
Whichever way you go the rest of mathematics is happy to accommodate you.

There's more than one way to make an infinity. You can play with the transfinite ordinals too. Go back to the beginning when nobody had invented any mathematics. We had nothing, in other words the empty set {}, which we'll call 0. On the next day we can try making another set - after all we now have something to put in a set, the empty set itself. So define 1={0}. Now we can define 2={0,1} and I'm sure you see the pattern n={0,...,n-1}. These are the ordinals. But why stop there, how about {0,1,2,...}?
Well that's the first transfinite ordinal and we call it w (that's meant to be omega, I don't do Greek HTML either). Now we can make {0,1,2,...,w}. That's called w+1. How about {0,1,2,w,w+1,w+2,...}. Well that has a name too, 2w. Want to carry on? Don't let me stop you. But I'd better warn you that you might not be able to get all cardinals this way because I've only shown you how to get the accessible ones.

Maybe you're not interested in big. We could try small. In calculus you probably got fed up of always using the fact that as dx->0, dx² could be ignored. Why don't you just set dx² exactly equal to zero and be done with it. Well that's what nonstandard analysis is about. It introduces a new kind of infinitesimal number that has just this property. Useful if you're lazy about proofs.

But maybe mathematics isn't your thing and none of these systems looks like fun to you. Then there is one last thing I can throw at you.
Combinatorial games. What do games have to do with numbers?
Though invented by Conway (yes, that's the Conway), Don Knuth gave them another name: Surreal Numbers. Here are some surreal numbers: 0, *1, *2, *3... And here's how to add them: *a+*b=*(a xor b). That's the good
old xor that's written '^' in C. If you know how to win at Nim then you know why these are good for winning games. The *n are called nimbers and they are just a tiny part of the entire collection of surreal numbers, all of which represent positions in games. What's more, the surreal numbers include Z, Q and R. Hard to believe but real numbers are all positions in games. But that's not all, the surreal numbers include the transfinite ordinals, lots of different kinds of infinitesimals and countless other weird and wonderful numbers. If you want to beat your friends at Hackenbush, Nim, Mogul, and maybe even Go, then these are the numbers for you.

Tuesday, August 16, 2005

Toric Varieties

I keep meaning to read up on this subject but my algebraic geometry ability isn't quite there yet. One of the reasons I'd like to understand toric varieties is that they give you a concrete handle on certain types of algebraic curves and thereby give solid interpretations of results like the Riemann-Roch theorem in terms of counting features of simple geometrical objects.

Part of the idea is this: consider a convex polytope in Rⁿ whose vertices all have integer coordinates. A simple example is the polytope in R² with vertices (0,0), (1,0) and (0,1). This is also the list of integer points in this polytope. Now write out the monomials in x and y whose exponents are given by the coordinates of the integer points. In this case we get:

x⁰y⁰=1,x¹y⁰=x,x⁰y¹=y

Use these to define a mapping from Cⁿ to C^m where m is the number of integer points in the polytope. In this case we get

(x,y) -> (1,x,y)

This defines a map from Cⁿ to CP^m-1 by considering (1,x,y) to be homogeneous coordinates. The closure of the image of this map is an example of a toric variety. In this case it's just all of CP². This looks like a highly contrived definition. But what's neat is that many natural properties of the toric variety are closely related to natural properties of the convex polytope. In particular, a result like Pick's Theorem and its higher dimensional analogues are consequences of the Riemann-Roch theorem applied to the toric variety. We get a dictionary converting statements of algebraic geometry, some of them quite tricky (for me) to grasp, to elementary statements about convex polytopes.

Here's another quick example of a toric variety to illustrate the above process. Consider the convex polytope with corners (0,0), (1,0), (0,1), (1,1). This gives monomials

1,x,y,xy

so we have the mapping

(x,y) -> (p,q,r,s) = (1,x,y,xy)

It's not hard to see that the closure of the image of this map is the surface ps=qr in CP³. So this time we have a non-trivial variety.

But I don't yet understand how Pick's theorem comes from the Riemann-Roch theorem.

I'm also interested in the connection to the ubiquitous number 12. (That is the same 12 that appears in 1+2+3+...=-1/12 in an earlier posting of mine. Again, a result of the Riemann-Roch theorem.)

Thursday, August 11, 2005

Absence of Evidence is Evidence of Absence

I know Carl Sagan said the opposite, but he was clearly wrong.

Suppose that a priori X has probability p of being true. We now look for evidence for X of a certain type. Suppose that there is a probablity q that we find this evidence if X is true and probability q' that we find this evidence if X is false. We will assume p<1 (otherwise we wouldn't bother looking for evidence) and that q>q' (otherwise it couldn't be said that the evidence we're looking for is evidence for X).

So we have four possibilities:

X is true and we find evidence for X: probability pq
X is true and we don't find evidence for X: probability p(1-q)
X is false and we find evidence for X: probability (1-p)q'
X is false and we don't find evidence for X: probability (1-p)(1-q')

Theorem
Under the hypotheses above, the conditional probability that X is true given that we failed to find the evidence is p(1-q)/(p(q'-q)+1-q').

Proof
Use Bayes' Theorem.

Some elementary rearrangement shows this is always less than p given the above hypotheses. It doesn't matter if we are unable to assign an a priori probability, this holds whatever value p has as long as it's less than 1. And if we don't know that q>q' then we shouldn't be in the business of looking for evidence. If the experiment we're doing is any good then q'=0 but as I have shown, the result holds even if we relax this condition.

So clearly failing to find evidence for X should lower our estimate of the probability that X is true.

I wonder what made Sagan say this. I think that maybe he meant to say "absence of evidence is not proof of absence". The theorem shows that under the original hypotheses the conditional probability is never 1, and so while we have evidence of absence, we don't have a proof. But if we can look for enough independent types of evidence it's quite possible for the conditional probability to get close to 1.

Update: The first paragraph was badly written and so I've edited it slightly. The main argument is unchanged. The argument stands or falls regardless of what Sagan actually said, but read the comments below on some context for this quotation. (02 Jan 09)

Wednesday, August 10, 2005

A Neighbourhood of Infinity

A neighbourhood of infinity is a subset of the Riemann sphere that contains an open set containing the point at infinity. But it's also a song by The Fall, from their album Perverted by Language, that I was listening to long before I did my first course on Riemann surfaces. Try as I might I can't find any mathematical references in the lyrics and as far as I know no other songs by The Fall have mathematical references in their titles.

Tuesday, August 09, 2005

ars mathematica

I'm posting stuff over at ars mathematica for a bit. Of course the stuff that's not fit to print will get posted here.

Monday, August 08, 2005

The Use and Abuse of Godel's Theorem

I just finished a first reading of Gödel's Theorem: An Incomplete Guide to its Use and Abuse by Torkel Franzén.

Gödel's theorem has a habit of appearing in all sorts of literature. Hofstadter makes it a central part of his discussion of AI in the classic Gödel, Escher, Bach. Penrose uses it to argue for the specialness of human brains. Chaitin uses it as part of various discussions about irreducible complexity. It has been invoked by theologists and literary critics. And yet, it's hard to find work on Gödel's theorem that deals with these issues and yet is written by a logician with a good grasp of the subject. This book fills that gap. It looks closely at the various ways people have abused the theorem. And I say 'abused' because all of the extra-mathematical accounts he quote are abuses.

One of the ways in which the theorem is abused is the way it is applied to all kinds of formal and non-formal systems regardless of applicability. Gödel's theorem is explicitly about formal systems that are capable of formalizing 'part of arithmetic'. It's not about complexity. As Franzén points out, there are very complex systems that are incapable of being used for arithmetic and yet there are also some very simple systems that can be used for it. Additionally, Gödel's thoerem deals only with the arithmetic component of such systems. For example if we find a way to axiomatise physics it is likely we will be able to use some of those axioms for doing arithmetic. As a result the entire system will be incomplete. Freeman Dyson has been using this to argue that physics is 'inexhaustible'. But the incompleteness might only be a feature of the arithmetic part of this system and have nothing to do with the physical part. Interestingly Dyson conceded this point when it was raised by Solomon Feferman in the New York Review of Books.

Franzén makes some very simple statements that show the falsity of some of the claims made about the theorem. For example if G is undecidable in formal system P then clearly it can be proved in P+G. So statements like "Godel's theorem demonstrates the existence of unprovable propositions" that abound in the literature are shown to be trivially false. He similarly dismisses some statements about the complexity of axiom systems made by Gregory Chaitin. Chaitin has done some great work, in particular he has a beautiful proof of Gödel's theorem using Kolmogorov complexity. But a quick visit to his web site reveals that he has a slight problem with understanding the importance of his own work. Franzén does a great job of trying to understand which parts of Chaitin's work really are significant.

Importantly Franzén is pedantic in just the right measure. When working in logic it is important to be clear about details. For example it's easy to make mistakes keeping track of which formal systems we have used to prove which results and we often need to make clear whether we use the word 'proof' to mean a string of symbols or something that convinces a mathematician (or both). But Franzén also understands that many people use Gödel's theorem, not literally, but as a metaphor or inspiration in other fields.

Franzén also devotes a good amount of space to the issue of Gödel's theorem and mathematical skepticism. He shows that the theorem can do nothing to help skeptical arguments. After all, if a formal system were able to prove its own consistency would we trust it more?

There's a reason why I mention "first reading" above. Among the statements about Gödel's theorem that Franzén dismantles are statements that I myself have been making [1]. So I now feel less confident of my understanding of the theorem than before I started the book. This is definitely not an easy book for non-specialists who really want to make sure they understand every statement made in it and it is going to require repeated readings. But any book that I immediately want to reread on completing it can't be all that bad.

[1] I've always understood Gödel's theorem as requiring a step outside of the formal system in which we interpret the symbols of a formal system as a tool for making predictions about what happens when we manipulate symbols, allowing us to turn a formal system in on itself. Franzén explicitly argues that this view is incorrect. This has left me a littlle confused.

Saturday, August 06, 2005

The Resurrection of God Incarnate

You may think this is a little off topic. But I want to point out the existence of this book: The Resurrection of God Incarnate by Richard Swinburne. What makes it on topic is that he uses Bayes' Theorem to conclude that there is a 97% chance that Jesus was God incarnate who rose from the dead. I also found this paper on ths subject and some related work here. There has been a long tradition of using logic and mathematics to argue for the existence of God - the most well known being Pascal's wager, but these look like the most sophisticated attempts I have seen so far. I haven't yet taken the time to study either this book or the papers and so I make no comment. But maybe you'd like to.

Thursday, August 04, 2005

Wavelet Noise

I'm attending a conference this week: SIGGRAPH 2005. It's nice to see that every year the papers get a little bit more mathematical. My favourite so far was on Wavelet Noise.

For those who don't work in graphics, a 'noise' function is a pseudo-random continuous, and usually differentiable, function. The most fanous example is Perlin noise. Typically a noise function is first defined by data defined on a lattice in R^n, eg. through some kind of hash function (ie. a function whose value is 'hard' to predict) and then extended to all of R^n via some kind of filter - for example fitting a smooth Catmull-Rom spline through the data points. These types of function are ubiquitous in computer graphics where they can be used to synthesise anything from the shape of a simulated tornado to the distribution of dirt on a car body.

Unfortunately one of the problems with most noise functions is that they are hard to 'anti-alias'. When you render a 3D image you typically need to colour an individual pixel by integrating over all of the light that falls onto it. The light that falls on a single pixel is typically the integral of the colour of an object over a small part of its surface - and this colour is often given by a noise function of some sort. The problem is that integration is expensive and so people often just 'fire' a bunch of rays from the pixel into the object and average the colour of the ray-object intersections. But we know from the Nyquist
theorem that if you don't sample the colour enough you may end up getting high frequency garbage 'aliasing' as a low frequency signal. With regular colour patterns like checkerboards you get moire patterns but with noise functions you get unwanted low frequency noise. What we really need is a noise function that is band limited so that the high frequency components simply aren't present to cause aliasing. Perlin noise isn't band limited so that if you try to reduce the amount of it to suppress aliasing you end up not having any pseudo-random detail at all. In Wavelet Noise the authors explicitly construct a noise function that has a hard frequency cutoff. It becomes a true band limited function and hence we can get a reasonable approximation to its integral by sampling.

The great thing about this paper was its presentation. It had a plot. As the publication deadline approached the authors realises their noise function had a flaw. Typically we're not interested in sampling the value of the noise function in R^3 but over an R^2 surface embedded in R^3. The catch is that such a 'slice' of a band limited 3D function is not a 2D band-limited function. This meant that their noise function couldn't actually be used to colour the surfaces of objects! Fortunately they figured out the solution: instead of colouring the surface simply by pulling back the colour from the value of noise in R^3, you instead integrate along the normal to the R^2 surface. The Fourier transform of such an integral is simply a slice through the Fourier transform of the original function on R^3 and hence is band limited. In the nick of time the authors figured this out and then amazingly the authors were able to implement this with a small piece of code that runs reasonably fast.

By the way, this is also where the paper on Dual Photography was presented. I mentioned that a few months ago as I copied one of their results here.

Apologies: this entry probably has many typos and other kinds of error. For some reason the Safari web browser refuses to work at my hotel and for some reason I'm having trouble editing text in Firefox.

Search This Blog