Search This Blog

Saturday, January 21, 2012

Some parallels between classical and quantum mechanics

Introduction
This isn't really a blog post. More of something I wanted to interject in a discussion on Google plus but wouldn't fit in the text box.

I've always had trouble with the way the Legendre transform is introduced in classical mechanics. I know I'm not the only one. Many mathematicians and physicists have recognised that it seems to be plucked out of a hat like a rabbit and have even written papers to address this issue. But however much an author attempts to make it seem natural, it still looks like a rabbit to me.

So I have to ask myself, what would make me feel comfortable with the Legendre transform?

The Legendre transform is an analogue of the Fourier transform that uses a different semiring to the usual. I wrote briefly about this many years ago. So if we could write classical mechanics in a form that is analogous to another problem where I'd use a Fourier transform, I'd be happier. This is my attempt to do that.

When I wrote about Fourier transforms a little while back the intention was to immediately follow it with an analogous article about Legendre transforms. Unfortunately that's been postponed so I'm going to just assume you know that Legendre transforms can be used to compute inf-convolutions. I'll state clearly what that means below, but I won't show any detail on the analogy with Fourier transforms.

Free classical particles
Let's work in one dimension with a particle of mass \(m\) whose position at time \(t\) is \(x(t)\). The kinetic energy of this particle is given by \(T=\frac{1}{2}m\dot{x}^2\). Its Lagrangian is therefore \(L=\frac{1}{2}m\dot{x}^2-V(x)\).

The action of our particle for the time from \(t_0\) to \(t_1\) is therefore
\(\int_{t_0}^{t_1}(\frac{1}{2}m\dot{x}^2-V(x))dt\)

The particle motion is that which minimises the action.

Suppose the position of the particle at time \(t_0\) is \(x_0\) and the position at time \(t_1\) is \(x_1\). Then write \(\psi(t_0,t_1,x_0,x_1)\) for the action minimising path from \(x_0\) to \(x_1\). So
\(\psi(t_0,x_0;t_1,x_1) = \min_x\int_{t_0}^{t_1}(\frac{1}{2}m\dot{x}^2-V(x))dt\)
where we're minimising over all paths \(x\) such that \(x(t_i)=x_i\).

Now suppose our system evolves from time \(t_0\) to \(t_2\). We can consider this to be two stages, one from \(t_0\) to \(t_1\) followed by one from \(t_1\) to \(t_2\). Let \(\phi\) be the minimised action analogous to \(\psi\) for the period \(t_1\) to \(t_2\). The action from \(t_0\) to \(t_2\) is the sum of the actions for the two subperiods. So the minimum total action for the period \(t_0\) to \(t_2\) is given by
\(\psi(t_0,x_0;t_2,x_2) = \min_{x_1}(\psi(t_0,x_0;t_1,x_1)+\psi(t_1,x_1;t_2,x_2))\)

Let me simply that a little. I'll use \(\psi(t,x)\) where I previously used \(\psi(t_0,x_0;t,x)\) and \(\phi(x_1,x_2)\) for \(\psi(t_1,x_1;t_2,x_2)\). So that last equation becomes:
\(\psi(t_2,x_2)=\min_{x_1}(\psi(t_1,x_1)+\phi(x_2-x_1))\)

Now suppose \(\phi\) is translation-independent in the sense that \(\phi(x s,x' s)=\phi(x,x')\). So we can write \(\phi(x_1,x_2)=\phi(x_2-x_1)\). Then the minimum total action is given by
\(\psi(t_2,x_2)=\min_{x_1}(\psi(t_1,x_1)+\phi(x_2-x_1))\)

Infimal convolution is defined by
\((f\odot g)(x) = \min_s(f(s)+g(x-s))\)
so the minimum we seek is
\(\psi(t_2,x_2) = (\psi(t_1,\cdot)\odot\phi)(x_2)\)

So now it's natural to use the Legendre transform. We have the inf-convolution theorem:
\((f\odot g)^\ast=f^\ast+g^\ast\)
where \(f^\ast\) is the Legendre transform of \(f\) given by
\(f^\ast(p) = \sup_x(px-f(x))\)
and so \(\psi^\ast(t_2,p) = \psi^\ast(t_1,p)+\phi^\ast(p)\) (where we use \(\ast\) to represent Legendre transform with respect to the spatial variable).

Let's consider the case where from \(t_1\) onwards the particle motion is free, so \(V=0\). In this case we clearly have translation-invariance and so the time evolution is given by repeated inf-convolution with \(\phi\) and in the "Legendre domain" this is nothing other than repeated addition of \(\phi^\ast\).

Let's take a look at \(\phi\). We know that if a particle travels freely from \(x_1\) to \(x_2\) over the period from \(t_1\) to \(t_2\) then it must have followed the minimum action path and we know, from basic mechanics, this is the path with constant velocity. So
\(T = \frac{1}{2}m(x_2-x_1)^2/(t_2-t_1)^2\)
and hence the action is given by
\(\phi(s) = \frac{1}{2}ms^2/(t_2-t_1)\)
So the time evolution of \(\psi\) is given by repeated inf-convolution with a quadratic function. The time evolution of \(\psi^\ast\) is therefore given by repeated addition of the Legendre transform of a quadratic function. It's not hard to prove that the Legendre transform of a quadratic function is also quadratic. In fact:
\(\phi^\ast(p) = \frac{1}{8}mp^2(t_2-t_1)^2\)
Addition is easier to work with than inf-convolution so if we wish to understand the time evolution of the action function it's natural to work with this Legendre transformed function.

So that's it for classical mechanics in this post. I've tried to look at the evolution of a classical system in a way that makes the Legendre transform natural.

Free quantum particles
Now I want to take a look at the evolution of a free quantum particle to show how similar it is to what I wrote above. In this case we have the Schrödinger equation
\(i\hbar\frac{\partial}{\partial t}\psi(t,x) = -\frac{\hbar^2}{2m}\frac{\partial^2}{\partial x^2}\psi(x,t)+V\psi(t,x)\)
Let's suppose that from time \(t_1\) onwards the particle is free so \(V=0\). Then we have
\(i\hbar\frac{\partial}{\partial t}\psi(t,x) = -\frac{\hbar^2}{2m}\frac{\partial^2}{\partial x^2}\psi(t,x)\)
Now let's take the Fourier transform in the spatial variable. We get:
\(i\hbar\frac{\partial}{\partial t}\hat{\psi}(t,k) = -\frac{\hbar^2}{2m}(ik)^2\hat{\psi}(t,k)\)
So
\(\hat{\psi}(t,k) = \exp(\frac{i\hbar k^2(t-t_1)}{2m})\hat{\psi}(t_1,k)\)
We can write this as
\(\hat{\psi}(t,k) = \hat{\phi}(k)\hat{\psi}(t_1,k)\)
where
\(\phi(x) = \exp(\frac{ix^2}{2\hbar m})\)
So the time evolution of the free quantum particle is given by repeated convolution with a Gaussian function which in the Fourier domain is repeated multiplication by a Gaussian. The classical section above is nothing but a tropical version of this section.

Conclusion
I doubt I've said anything original here. Classical mechanics is well known to be the limit of quantum mechanics as \(\hbar\rightarrow 0\) and it's well known that in this limit we find that occurrences of the semiring \((\mathbb{R},+,\times)\) are replaced by the semiring \((\mathbb{R},\min,+)\). But I've never seen an article that attempts to describe classical mechanics in terms of repeated inf-convolution even though this is close to Hamilton's formulation and I've never seen an article that shows the parallel with the Schrödinger equation in this way. I'm hoping someone will now be able to say to me "I've seen that before" and post a relevant link below.

Note
I'm not sure how the above applies for a non-trivial potential \(V\). I wrote this little Schrödinger equation solver a while back. As might be expected, it's inconvenient to use the Fourier domain to deal with the part of the evolution due to \(V\). In order to simulate a time step of \(dt\) the code simulates \(dt/2\) in the Fourier domain assuming the particle is free and then spends \(dt/2\) solving for the \(V\)-dependent part in the spatial domain. So even in the presence of non-trivial \(V\) it can still be useful to work with a Fourier transform. Almost the same iteration could be used to numerically compute the action for the classical case.

Saturday, January 07, 2012

Lossless decompression and the generation of random samples

Let S be some finite set with a probability distribution on it. Here's a diagram showing some example probabilities for the set S={A, B, C, D, E}:
How can we generate lots of random samples from this distribution? A popular method involves first storing the cumulative distribution function in a table like so:

We then use any of a number of popular methods to generate uniform pseudorandom numbers in the range [0,1) and for each one walk through the table until we find the first entry greater than our pseudorandom number. The symbol above the number is the one we generate. So if we generated 0.512, we'd pick symbol C. It's straightforward to prove this gives the correct probability distribution.

As described this algorithm can be slow. If the size of the table is N we may have to walk through up to N entries to generate each sample.

One approach to accelerating this algorithm is to quantise our pseudorandom number and use it to look up, in a precomputed table, a jump into our cumulative distribution table. I've used this several times in my visual effects career. But today I'm going to take a different approach.

Another natural way to speed up sample generation is to use a binary search to find the appropriate point in the cumulative distribution, for example the C++ standard template library's upper_bound method will do the job.


But what is a binary search algorithm going to do? Typically it's going to start by comparing our pseudorandom number with the midpoint of the table. If our number is bigger then it will recursively use binary search on the left (looking next at the midpoint of the left half), otherwise on the right, and so on. If we're generating many samples from the same distribution we're going to be repeatedly looking up the same midpoints in the table. At the end of the day, the process can be described by a decision tree. So we may as well throw away the table and build the decision tree up front. Here's what it might look look like:


But maybe we can do better. For example C is three times as likely as A but they both take the same amount of time to generate as they both require walking down to depth 2. Meanwhile D has a probability of 0.25 and requires walking to depth 3. We'd like to rebalance the tree. Note also that there's no reason to list our original PDF in the order I gave. Different orderings might give different trees.


It's straightforward to describe what we want to optimise. We want to place sample i at depth di so that the expected value of di is as small as possible. In other words we want to minimise Σpidi. But this is precisely the problem solved by Huffman coding.


And that's the conclusion I wanted to reach: Huffman coding gives the optimal decision tree to generate random samples. It also tells us this interesting consequence: if we use a decision tree method then the performance of our algorithm is bounded by the entropy of the probability distribution. I find this connection between entropy and algorithmic complexity pretty surprising.


I learnt the above during my interview at Google!


Why is there this connection between a compression algorithm and the generation of random samples? It  took me a little longer to realise why but it's quite straightforward.


Huffman coding tries to compress text one letter at a time on the assumption that each letter comes from some fixed and known probability distribution. If the algorithm is successful then we'd expect the compressed text to look like a uniformly distributed sequence of bits. If it didn't then there'd be patterns that could be used for further compression.

So when we're decompressing Huffman encoded data we have a machine that takes as input uniformly distributed bits and which outputs letters sampled from some probability distribution. But that's exactly the same problem that I posed above. So, at least from some perspective, decompression is precisely the same thing as generating samples from a probability distribution.

Or to put it another way: there is a class of algorithm whose purpose is to convert samples from one probability distribution into samples from another. When we use one of these algorithms to convert from samples from some distribution P that's easy to generate samples from, into samples from Q, we call this "an algorithm to sample from distribution Q". When we use one of these algorithms to convert from some distribution to a uniform distribution, and the function given by the algorithm is invertible, then we call it "lossless compression".

Blog Archive