A Neighborhood of Infinity

Sunday, April 14, 2013

Why Heisenberg can't stop atomic collapse

TL;DR

A heuristic argument to show that hydrogen atoms are stable and have a minimum energy level is wrong. I will assume undergraduate level quantum mechanics in the discussion.

Introduction

There's a popular argument used to explain why atoms are stable. It shows there is a lower bound on the energy level of an electron in the atom that makes it impossible for electrons to keep "falling" forever all the way down to the nucleus. You'll find it not only in popular science books but in courses and textbooks on quantum mechanics.

A rough version of the argument goes like this: the closer an electron falls towards the nucleus the lower its potential energy gets. But the more closely bound to the nucleus it is, the more accurately we know its position and hence, by Heisenberg's uncertainty principle (HUP), the less accurately we know its momentum. Increased variance in the momentum corresponds to an increase in kinetic energy. Eventually the decrease in potential energy as the electron falls is balanced by an increase in kinetic energy and the electron has reached a stable state.

The problem is, this argument is wrong. It's wrong related to the kind of heuristic reasoning about wavefunctions that I've talked about before.

Before showing it's wrong, let's make the argument a bit more rigorous.

Bounding wavefunctions

The idea is to show that for any possible normalised wavefunction ψ of an electron in a Coulomb potential, the expected energy is bounded below by some constant. So we need to show that
is bounded below where
and p is momentum.
Consider a wavefunction that is confined mainly around the nucleus so

The first fact we need is that Heisenberg uncertainty principle tells us that 
(assuming we're in a frame of reference where the expected values of p and x are zero).

If the wavefunction is spread out with a standard deviation of a then the electron is mostly around a distance a from the nucleus. So the second fact is that we can roughly approximate the expected value of 1/r as 1/a.

Combine these two facts and we get, roughly, that
I hope you can see that the right hand side, as a function of a, is bounded below. The graph of the right hand side as a function of a looks like:
It's now an exercise in calculus to find a lower bound on the expected energy. You can find the details in countless places on the web. Here a link to an example from MIT, which may have come directly from Feynman's Lectures on Physics.

The problem

The above discussion assumes that the wavefunction is basically a single packet confined around a distance a from the nucleus, something like that graphed above. But if a lower energy state can be found with a different wavefunction the electron will eventually find it, or an even lower energy state. In fact, by using a wavefunction with multiple peaks we will find that the Heisenberg uncertainty principle doesn't give a lower bound at all.

We'll use a wavefunction like this:
It has a packet around the origin just like before but it also has a sharp peak around r=l. As I'm showing ψ as a function of r this means we have a shell of radius l.


Let's say

where ψ1 is normalized and peaked near the original and ψis our shell of radius l. Assume no overlap between ψ1 and ψ2.

In this case you can see that we can make
as large as we like by making l as large as we like while still leaving us free to make the central peak whatever shape we want. This means that the estimate of 
coming from HUP can be made as small as we like while making the central peak as close to a Dirac delta as we want. Informally, HUP controls of the overall spread of the wave function but not the spread of individual peaks within it.

For a large enough shell, ψcontributes little to the total expected potential energy, but ψ1 can contribute an arbitrarily low amount because we can concentrate it in areas where 1/r is as large as we want. So we can make the total expected potential energy as low as we like. And yet we can also keep the estimate of the kinetic energy given by HUP as close to zero as we like. So contrary to the original argument, the Heisenberg uncertainty principle doesn't give us a lower bound on the energy at all. The argument is wrong.

But wait, we know there is a lowest energy state...

Yes, the energy of a wavefunction in a Coulomb potential is in fact bounded below. After all, atoms are stable. But the Heisenberg uncertainty principle doesn't show it. The inequality in HUP becomes an equality when the wavefunction is a Gaussian function. It provides a good bound for functions that are roughly Gaussian, ie. that form a single "lump". But it provides only weak bounds for wavefunctions with multiple peaks and in this case it's not the appropriate tool to use.

The Heisenberg uncertainty principle is an inequality about ordinary functions interpreted in the context of quantum mechanics (QM). The field of functional analysis provides many such inequalities. A great paper by Lieb, The Stability of Matter, gives an inequality due to Sobolev that can also be interpreted in the context of QM. Sobolev's inequality is more appropriate when considering the hydrogen atom and it gives a good lower bound, demonstrating that the hydrogen atom is stable after all.

But wait, the Heisenberg uncertainty principle argument gives the right energy...

Getting a correct answer doesn't always justify the methods. I can give at least two reasons why the original method appears to work.

1. The HUP gives a good bound for wavefunctions that are roughly Gaussian. The lowest energy level for the hydrogen atom is given (very roughly) by such a function. So an estimate based on HUP should be roughly correct. However, HUP alone can't tell us that the lowest energy state is Gaussian. The argument is only useful if we can get this information from somewhere else.

2. You can get an estimate for the lowest energy level of the hydrogen atom (assuming it exists) by dimensional analysis. Invalid physical arguments that are dimensionally correct will often give the correct result because there is only one dimensionally correct expression possible.

But wait, it's just a heuristic argument...

Heuristic arguments are crucial to physics. But when similar heuristic arguments give opposite results they become problematic. In particular, it's no good saying an argument is inexact or qualitative when it gives a bound on the energy that isn't just off by an order of magnitude, but completely fails to give a bound at all. Part of the issue here is that the Coulomb potential goes to infinity as r goes to zero and so more care is required. The HUP argument above can be adapted to give good results when the potential is bounded below, for example it gives a reasonable estimate for square wells.

But there may be a clever way of using HUP to bound the energy that I haven't seen. If you can see it, please tell me.

The source

Most of what I said above I learnt from the excellent paper on the Stability of Matter by Lieb that I mentioned above.

Sunday, January 13, 2013

Aliasing and the Heisenberg uncertainty principle.

TL;DR

The Dirac comb is an example of a wavefunction whose position and momentum aren't fuzzy.

Introduction

The Heisenberg uncertainty principle says that if you have a particle in some state and observe either its momentum or its position then the products of the standard deviations of distributions of the outcomes satisfy this identity:

I think many people have a mental picture a bit like this:

You can know the position and momentum with some degree of fuzziness and you can trade the fuzziness between the two measurements as long as the product of their sizes is larger than ℏ/2.

Here's another way of thinking about that kind of picture (assuming some units I haven't specified):

position=123.4???
momentum=65?.???
The idea is that the question mark represents digits we don't know well. As you move towards the right in the decimal representation our certainty in the accuracy of the digit quickly goes downhill to the point where we can't reasonably write digits.

But this picture is highly misleading. For example, the following state of affairs is also compatible with the uncertainty principle, in suitably chosen units:

position=...???.123...
momentum=...???.654...

In other words, it's compatible with the uncertainty principle that we could know the digits beyond the decimal point to as much accuracy as we like as long as we don't know the digits before the point. It trivially satisfies Heisenberg's inequality because the variance of the position and the momentum aren't even finite quantities.

But being compatible with Heisenberg uncertainty isn't enough for something to be realisable as a physical state. Is there a wavefunction that allows us to know the digits to the right of the decimal point as far as we want for both position and momentum measurements?

Sampling audio and graphics

Maybe surprisingly, the worlds of audio and graphics can help us answer this question. Here's what a fraction of a second of music might look like when the pressure of the sound wave is plotted against time:

But if we sample this signal at regular intervals, eg. at 44.1KHz for a CD, then we can graph the resulting signal as something like this:

The red curve here is just to show what the original waveform looked like. The black vertical lines correspond to regular samples and we can represent them mathematically with Dirac delta functions multiplied by the amplitude measured at the sample.

There is a well known problem with sampling like this. If you sample a signal that is a sine wave sin(ωt) at rate f then the signal sin((ω+2πnf)t) will generate exactly the same samples for any integer n. The following illustration shows what might happen:

The two waveforms are sampled at the same regular intervals (shown by vertical lines) and give exactly the same amplitudes at those samples.

This forms the basis for the famous Nyquist-Shannon sampling theorem. You can reconstruct the original signal from regularly spaced samples only if it doesn't contain frequency components higher than half your sampling rate. Otherwise you get ambiguities in the form of high frequency parts of the signal masquerading as low frequency parts. This effect is known as aliasing. As a result, the Fourier transform of a sampled function is periodic with the "repeats" corresponding to the aliasing.

In the audio world you need to filter your sound to remove the high frequencies before you sample. This is frequently carried out with an analogue filter. In the 3D rendering world you need to do something similar. Ray tracers will send out many rays for each pixel, in effect forming a much higher resolution image than the resolution of the final result, and that high resolution image is filtered before being sampled down to the final resulting image. The "jaggies" you get from rendering polygons are an example of this phenomenon. It seems like jaggies have nothing to do with the world of Fourier transforms. But if you compute the Fourier transform of a polygonal image, remove suitable high frequency components, and then take the inverse Fourier transform before sampling you'll produce an image that's much more pleasing to the eye. In practice there are shortcuts to achieving much the same effect.

The connection to physics

Now consider a particle whose wavefunction takes the form of the Dirac comb:

This is a wavefunction that is concentrated at multiples of some quantity a, ie. ∑δ(x-an) summing over n = ...,-1,0,1,2,... If the wavefunction is ψ(x) then the probability density function for the particle position is |ψ(x)|². So the particle has a zero probability of being found at points other than those where x=na. In other words, modulo a, the particle position is given precisely.

But what about the particle momentum. Well the wavefunction has, in some sense, been sampled onto the points na, so we expect that whatever the momentum distribution is it'll be ambiguous modulo b where ab=ℏ. In fact, if we take the Fourier transform of the Dirac comb we get another Dirac comb. So in the frequency domain we get the same kind of phenomenon: the momentum is concentrated at integer multiples of b. So now we know we have a wavefunction whose uncertainty precisely fits the description I gave above. We know the position precisely modulo a and the momentum precisely modulo b. In some sense this isn't contrived: we know the momentum modulo b precisely because of the aliasing that results from knowing the position modulo a.

What this means

The message from this is that position-momentum uncertainty isn't fuzziness. At least it's not fuzziness in the ordinary sense of the word.

And in reality

I'm not very experienced in attaching numbers to results from theoretical physics so I'd find it hard to say how accurately we can create a Dirac comb state in reality. When we measure a position using interferometry techniques we automatically compute the position modulo a wavelength so this isn't an unusual thing to do. Also an electron in a periodic potential may take on a form that consists of a train of equally spaced lumps. Even if not described exactly by a Dirac comb, we can still know the position modulo a and the momentum modulo b much more accurately than you might expect from a naive interpretation of the Heisenberg uncertainty principle as fuzziness.

Exercises
1. Investigate approximations to the Dirac comb: eg. what happens if we sum only a finite number of Dirac deltas, or replace each delta with a finite width Gaussian, or both.
2. Investigate the "twisted" Dirac comb: ∑δ(x-an)exp(inθ) where θ is some constant.

Sunday, December 30, 2012

Shuffles, Bayes' theorem and continuations.

Introduction
Back in the 80s when I was a kid I came across a program for the BBC Micro that could tell what card you had picked from a deck of cards even though you'd buried your card within the deck wherever you wanted and had cut and shuffled the deck. I thought I'd try to implement a slightly more sophisticated version of the same trick that could handle multiple shuffles, and multiple types of shuffle.
The idea is that we prepare a deck of cards in some known sequence and have someone pick out a card and place it at the top of the deck. They perform some kind of randomisation procedure on the cards, eg. cut and shuffle it a couple of times, and then you get to look at the final sequence of cards. Can we tell which card was picked out?
Some probability theory
Let's formalise this a bit. Our decks will have cards. There is a small number of initial states our deck can be in, corresponding to the known sequence with a single card moved to the top. Let's label these initial states . There is a (usually large) number of permutations that could be applied through shuffling. We'll label these . We'll try to do arrange that this isn't simply the set of all permutations (though it's not necessarily a disaster if it is).
We want to figure out the initial state given some final state . In other words we want
We can use Bayes theorem to get:
Now is the sum over all ways of starting with and ending up with . So
where the sum is over all such that . I'm assuming that the shuffles are independent of the initial sequence of cards. This gives us an algorithm. We do a brute force simulation of every possible shuffle that we're considering applied to each possible initial state. After each shuffle we sum the corresponding probability for those shuffles that give our known final state .
Each shuffle is going to be built up as the product of a sequence of building blocks with each block randomly selected based on what happened before. Let's call the blocks names like . So if then . As we work through the shuffle we will accumulate the probability. After the first block we have a probability of . The probability after the second is and so on. At any point we'll call the probability accumulated so far the importance. I've borrowed that name from the world of rendering because this algorithm has a remarkable similarity to recursive ray-tracing.
Some computer science
I'd like to be able to chain a sequence of shuffles. But wait! There's a catch! Today's the day I finally want to get around to checking out the lambda expression support in C++. I've been putting this off for years. (I'm using gcc 4.7.) So I'm not going to have a Haskell non-determinism monad to make life easy.
Suppose I have two types of shuffle, type and type . I could easily write a loop to iterate over all shuffles of type , and in the innermost part of the loop I could call another loop over all shuffles of type . But then if I want to replace with I have to change the code to replace the inner part with code for . That's no good. I'd like to be able to replace the innermost part of the outer loop with any code I want without actually editing that part of the code. It's easy with lambda expressions. I write the type loop code so that it takes as argument a lambda function representing what I want done inside the loop.
There's another way of looking at this. You can skip this paragraph if you don't care about the connection to Haskell. But in Haskell you might do something like this by using a non-determinism monad, or even a probability monad. But as I pointed out a while back, you can fake every monad using the continuation monad. One way to implement continuations in C++ is to use continuation passing style. And that's what I'll do. The continuations are just the lambdas that I mentioned in the previous paragraph.
Some C++ code


> #include <iostream>
> #include <cstdlib>
> using namespace std;
You can bump this up the deck size if you have the CPU power:
> const int deck_size = 13;
A deck of cards is represented by a simple array of integers with each card being assigned a unique integer.
> struct Deck {
>     int card[deck_size];
>     bool operator==(const Deck &other) {
>         for (int i = 0; i < deck_size; ++i) {
>             if (card[i] != other.card[i]) {
>                 return false;
>             }
>         }
>         return true;
>     }
> };
The riffle shuffle works by splitting a deck into two piles and interleaving the parts onto a new destination deck. Here's a schematic diagram with the two piles coloured orange and blue:
The function riffle_helper helps loop through all possible riffles. I could assume that each card arriving at the destination is equally likely to come from the left pile or the right pile. But I observe that whenever I do a real riffle shuffle the cards seem to come in 'runs'. So if a card falls from the left pile then the next one is more likely to as well. That's just an empirical observation based on a small number of trials, you can tweak the probabilities yourself to fit reality better. (Oh, and I got this code upside-down compared to what people really do. I need to fix it when I have a moment...)

> enum Side {
>     LEFT,
>     RIGHT,
>     NO_SIDE
> };
This function shuffles together cards from the locations given by left_ptr and right_ptr in src_deck into dest_deck, eventually calling cont on each result. I use a template because I don't know the type of the lambda expression I'm passing in. (If I want to know its type I think I have to mess with decltype. It's all a bit weird.)
> template<class Cont>
> void riffle_helper(double importance, int split,
>                    int left_ptr, int right_ptr, int dest_ptr, Side oldside,
>                    const Deck &src_deck, Deck dest_deck, Cont cont) {
>     if (dest_ptr == deck_size) {
>         cont(importance, dest_deck);
>         return;
>     }
First I deal with the cases where one or other of the piles is empty so there's no choice about where the next card is coming from:
>     if (left_ptr >= split) {
>         dest_deck.card[dest_ptr] = src_deck.card[right_ptr];
>         riffle_helper(importance, split, left_ptr, right_ptr+1, dest_ptr+1, RIGHT, src_deck, dest_deck, cont);
>         return;
>     }
>     if (right_ptr >= deck_size) {
>         dest_deck.card[dest_ptr] = src_deck.card[left_ptr];
>         riffle_helper(importance, split, left_ptr+1, right_ptr, dest_ptr+1, LEFT, src_deck, dest_deck, cont);
>         return;
>     }
>     double p;
>     if (oldside == NO_SIDE) {
>         p = 0.5;
>     } else {
>         p = LEFT == oldside ? 0.75 : 0.25;
>     }
>     double new_importance = importance*p;
>     dest_deck.card[dest_ptr] = src_deck.card[left_ptr];
>     riffle_helper(new_importance, split, left_ptr+1, right_ptr, dest_ptr+1, LEFT, src_deck, dest_deck, cont);

> if (oldside == NO_SIDE) { > p = 0.5; > } else { > p = RIGHT == oldside ? 0.75 : 0.25; > } > new_importance = importance*p; > dest_deck.card[dest_ptr] = src_deck.card[right_ptr]; > riffle_helper(new_importance, split, left_ptr, right_ptr+1, dest_ptr+1, RIGHT, src_deck, dest_deck, cont); > }
The function riffle iterates over all possible riffle shuffles of src_deck calling cont on each one. Note that I assume that when the deck is split into two before shuffling together, each pile has at least 3 cards. You may want to change that assumption.
> template<class Cont>
> void riffle(double importance, const Deck &src_deck, Cont cont) {
>     double new_importance = importance/(deck_size-5);
>     for (int split = 3; split < deck_size-2; ++split) {
>         riffle_helper(new_importance, split, 0, split, 0, NO_SIDE, src_deck, Deck(), cont);
>     }
> }
Iterate over all possible cuts of src_dec calling cont on each result. I assume the cut leaves at least 3 cards in each pile.
> template<class Cont>
> void cut(double importance, const Deck &src_deck, Cont cont) {
>     double new_importance = importance/(deck_size-5);
>     for (int split = 3; split < deck_size-2; ++split) {
>         Deck new_deck;
>         for (int i = 0; i < deck_size; ++i) {
>             if (i < deck_size-split) {
>                 new_deck.card[i] = src_deck.card[i+split];
>             } else {
>                 new_deck.card[i] = src_deck.card[i-(deck_size-split)];
>             }
>         }
>         cont(new_importance, new_deck);
>     }
> }
Overhand shuffle remaining cards in src_deck to dest_deck. Here's an attempt to represent what an overhand shuffle does. It reverses the order of a deck that has been split into segments. The order within each segment is left unchanged.

> template<class Cont>
> void overhand_helper(double importance, const Deck &src_deck,
>                      int cards_left, Deck dest_deck, Cont cont) {
>     if (cards_left <= 0) {
>         cont(importance, dest_deck);
>     } else {
>         double new_importance = importance/cards_left;
>         for (int ncards = 1; ncards <= cards_left; ++ncards) {
>             //
>             // Take i cards from the source and place them at the bottom of the
>             // destination.
>             //
>             for (int j = 0; j < ncards; ++j) {
>                 dest_deck.card[cards_left-ncards+j] = src_deck.card[deck_size-cards_left+j];
>             }
>             overhand_helper(new_importance, src_deck, cards_left-ncards, dest_deck, cont);
>         }
>     }
> }
Iterate over all possible overhand shuffles of cards in src_deck calling cont on each result. In practice I often find overhand shuffles result in cards mysteriously jumping segments and messing up the algorithm, whereas poorly executed riffle shuffles still work fine. I'm also assuming that each time a pile of cards is transferred the size of the pile is chosen uniformly from the set of all possible segments at that stage.
> template<class Cont>
> void overhand(double importance, const Deck &src_deck, Cont cont) {
>     overhand_helper(importance, src_deck, deck_size, Deck(), cont);
> }
The final code doesn't bother computing the denominator from Bayes' theorem. The most likely initial state is given by the one that results in the highest score. If you normalise the scores to sum to one you'll get actual probabilities.
> int main() {
This is the array representation of the cards in the following picture:
>     Deck target = {{ 10, 11, 6, 12, 1, 13, 8, 2, 9, 3, 5, 4, 7 }};

>     Deck deck;
Our known starting sequence is just 1, 2, 3, ..., J, Q, K. We iterate over all ways to pick a card out from this sequence and place it at the top.
>     for (int k = 0; k < deck_size; ++k) {
>         deck.card[0] = k+1;
>         for (int i = 1; i < deck_size; ++i) {
>             deck.card[i] = (i > k ? i : i-1)+1;
>         }
>         double likelihood = 0.0;
Here is where I use the lambdas. For this example I'm doing an overhand shuffle followed by a riffle shuffle. (The syntax is pretty bizarre and its also weird that I kinda sorta specify the type of my lambda but that's not really what the type of the expression is. But having manually faked and lifted lambdas many times in C++ I can see why it's the way it is.) Note how I've made likelihood mutable and have given these lambda expressions write access to it.
>         overhand(1.0, deck, [&likelihood, target](double importance, Deck &deck) -> void {
>            riffle(importance, deck, [&likelihood, target](double importance, Deck &deck) -> void {
>                if (deck == target) {
We sum the probabilities for all ways of generating the target deck:
>                    likelihood += importance;
>                }
>         }); });
>         cout << "If top card = " << deck.card[0] << endl;
>         cout << "then unnormalised probability = " << likelihood << endl;
>         cout << endl;
>     }

> }
Run the above code and you get unnormalised probabilities
If top card = 4
then unnormalised probability = 5.7568e-12
If top card = 6
then unnormalised probability = 5.37301e-11
If top card = 7
then unnormalised probability = 1.791e-11
In fact, I had chosen 6. Some discussion
Don't expect it to work perfectly! It can only give probabilities but it's often surprisingly good. But there is a lot of room for improvement. Some work looking at how people actually shuffle could give a better probabilistic model.
Some exercises.
1. The code can be made orders of magnitude faster. The final shuffle is performed and then the result is compared to the target sequence. But you can start comparing cards with the target before the shuffle is finished. Most times you'll only need to look at the first card of the result of a shuffle before you know you haven't matched the target. Fixing this will give a big speed up.
2. The continuation passing style makes it easy to incorporate other sources of knowledge. For example if you 'accidentally' peek at the bottom card after the first shuffle you can incorporate that knowledge into the algorithm. Figure out how.
3. Write lots more kinds of shuffles and experiment. I'm hoping someone good with magic will come up with a sequence of operations that looks hopelessly random but allows a good probability of recovering the chosen card. You could also combine this with other techniques such as designing shuffles that maintain various kinds of invariant.
4. The code can be rewritten to work backwards from the final state to the initial states. Work out how to do this. (This parallels ray-tracing where you can work from eye to light or from light to eye.)
5. We're doing the same work over and over again. We don't need to compute all of the shuffles for each initial state. We can compute each shuffle once and reuse it on each initial state. Try implementing it.







































Sunday, November 18, 2012

A pictorial proof of the hairy ball theorem

The hairy-ball theorem says that there is no continuous non-zero vector field on the surface of a sphere. There are lots of popular accounts that tell you what this means, giving great examples. Here's a Youtube video for example:



My goal is to show why it's always true.

A simply connected domain in the plane is one with the property that any loop in it can be shrunk down to a point. Here's an example of a domain D with an example loop L being shrunk down to a point P:

Here's an example of a domain that's not simply connected. It has a hole in the middle. I've drawn a L loop around the hole. You can't shrink that loop to a point because the hole gets in the way:
Here's a simply connected domain with a vector field on it:
Think of the vectors as being drawn literally in the surface so that if we were to pick up the surface and stretch it like a piece of rubber the vectors would get stretched with it. Remember that a vector field is defined everywhere in the domain so the arrows are just a random sprinkling of examples to show what's going on. For this to be an accurate picture you want to imagine an infinity of arrows, one at every single point of the domain.

Let's put a loop, starting and ending at P, in our simply-connected domain:
Now imagine travelling along the loop, starting at P and ending at P. As you move along there's an arrow at each point in your journey. Here's what the arrows look like as you travel from P to P anti-clockwise, plotted as a kind of graph:
The vectors start off pointing to the right. They swing anti-clockwise by about 45º and then swing back to where they started. As the journey is a loop they clearly must end where they started. A different, really swirly vector field, might have resulted in arrows that that rotated around hundreds of times along your journey. But by time you reach the end of the journey they must swing back to where they started. What's slightly less obvious is that they'd also have to rotate back to cancel out the hundreds of swings. You might think "the vectors could rotate round a hundred times but as long as they make exactly 100 turns they'll return to where they started and there's no need for them to unwind". But actually, every bit of rotation in the journey must be unwound. The total amount of rotation, adding all the positive rotations, and subtracting off the negative rotations, is called the winding number for the loop. We count anti-clockwise rotation as positive and clockwise as negative. So I'm claiming that the winding number for a closed loop in a simply-connected domain is always zero.

(Note: in most books the winding number normally refers to how many times the loop itself winds around a point. I'm using it to refer to how many times the vector winds around itself you follow the loop. To help with your intuition: the hour hand of a working clock normally accumulates a winding number of -2 in one day. If it ran forward for a day, but then ran backwards for half a day, the winding number would be -1.)

Here's why the winding number for simply connected domains must be zero: firstly - it's pretty clear that the winding number for any loop must be an integer. If the winding number was a half, say, the arrow wouldn't end up pointing 180º from where it started which makes no sense for a closed loop. Now the domain is simply connected, so the loop can be shrunk to a point. Now imagine doing the shrinking really slowly and keeping track of the winding number as the loop shrinks. As the loop shrinks, the graph of the vectors along the loop must vary slowly. The total winding number depends continuously on the vectors in the graph so the winding number must vary slowly as the loop shrinks. But the winding number is an integer. It can't change really slowly, it can only change by amounts of a whole integer. So the winding number can't change at all. Every loop in a simply-connected domain must have a winding number that's the same as the winding number of a loop that is just one point ie. zero.

On to the sphere. Here's a sphere with a vector field where all of the vectors point along lines of longitude to the north pole:
(Sorry about my poor quality drawing but I'm sure you know what vectors pointing north look like.)

At this point you may be tempted to say "aha! That's a continuous vector field on the sphere that's non-zero everywhere!" Alas, it's not defined everywhere. It's a vector field everywhere except at the north and south poles. If you're at the north pole, no non-zero vector can point north. And at the south pole every non-zero vector points north with no continuous way to pick just one.

Given any vector field on the Earth we can imagine slicing the earth through the equator and flattening out the surfaces of the northern and souther hemispheres as two separate disks. Here's what you get if you do this with the north vector field (ignoring the problems at the poles for now):

You reconstruct the Earth again by gluing the two disks together according to the orange arrows, and then inflating. Any vector field on the surface of the Earth gives rise to a pair of vector fields on disks like this. But there will be a constraint. The vectors around the boundary of the two disks will match. In fact, vectors at the opposite ends of the orange arrow have to match. But they won't necessarily be equal as drawn in this diagram because the disk for the southern hemisphere corresponds to a view from below.

Suppose we start at the point P and follow a loop eastwards along the equator. That's an anti-clockwise loop round the upper disk and simultaneously a clockwise loop round the lower disk. Here are the graphs:



In the upper map the loop gives rise to winding number one. But in the lower map we get winding number minus one. So here's an important lesson: the winding number makes perfect sense for a flat domain in the plane. But on the surface of 3D objects it depends on how you flatten out your map. In this case, the winding number on the upper map is 2 more than the winding number for the lower map. (Remember, these fields aren't defined at the poles so we haven't contradicted the original theorem that the winding number is zero for any vector field defined in a simply-connected domain.)

But here's the most important thing in this proof: the winding number for the upper hemisphere loop will be two more that the winding number for the lower hemisphere loop, no matter what vector field you have. This is because if you've travelled an angle θ around the equator, the vectors at opposite ends of the orange arrows will differ by an angle of 2θ. For example, once you're 90º around the earth, the north arrow is draw as a down-arrow in the upper graph and as an up-arrow in the lower graph. They're already 180º apart. You can see this is true for north pointing vectors literally by tracing with your fingers around the loops. It's also true for vectors pointing east. I'll leave that as an exercise for you, but here's a picture of some eastward vectors to get you started:
Along the equator, every vector on the surface is a linear combination of north and east vectors. So if it's true for both the north and east vectors then it must be true for all vectors. But if the graph for one picture of the equatorial loop has vectors that are 2θ more than the vectors for another graph, the first one must complete two revolutions more than the second one. So the first has a winding number two more than the second.

If you had a continuous vector field that really was non-zero over the entire sphere, cutting the sphere in half would give a pair of continuous vector fields defined on disks. As disks are simply-connected, the theorem we started with tells us they must both have winding number zero as you loop around them. But we've also just shown that looping round one has winding number two more than looping around the other. This is a contradiction. So there is no continuous vector field that is non-zero everywhere. ∎

If you get stuck above I strongly recommend trying to draw some continuous non-zero vector fields on the sphere, transferring them to disks, and counting winding numbers.

Notice how we've done more than prove the theorem. We now know that if we have a continuous vector field on a sphere we can find out whether to look for its zeros in the northern or southern hemisphere by computing the winding numbers as above. At least one of the two winding numbers must be non-zero and that tells us which hemisphere we can be sure contains a zero. The fact that the two winding numbers differ by two, and not by just one, also tells us a bit about the nature of the zeros. But that's another story. That two is also related to the fact that the Euler characteristic of the sphere is two. It's also related to the Lefschetz index

This proof is based on proofs I studied years ago relating to Chern classes. I recently became interested in Chern classes again because they play an important role in understanding phenomena in solid state physics such as the quantum Hall effect. That argument about slowly shrinking a loop leaving its winding number unchanged tells you a lot about slowly changing certain types of quantum system.

It's possible I completely messed up. Here's an "elementary" proof. It looks much harder than what I did. But I feel like I did faithfully capture, in pictures, an argument that's buried in Lectures on Riemann surfaces. And it seems to correctly reproduce the Lefschetz number of 2.

Blog Archive