Saturday, June 25, 2011

An elementary way to approach Fourier transforms

Why another introduction to the Fourier transform?
There are many elementary introductions to the discrete Fourier transform on the web. But this one is going to be a little different. I hope to motivate and demonstrate an application of the Fourier transform starting from no knowledge of the subject at all. But apart from this sentence, I'll make no mention of complex numbers or trigonometric functions. That may seem weird - the standard definitions seem to make it clear that these concepts are crucial parts of the definition. But I claim that in some ways these definitions miss the point. An analogy from software engineering is appropriate: most definitions of the Fourier transform are a bit like explaining an interface to an API by reference to the implementation. That might tell you how it works at a nuts-and-bolts level, but that can obscure what the API is actually for.

There's code all the way through so if anything I've said is ambiguous, that should help resolve it. I've chosen to write the following in 'literate Octave'. I can't say I love the language but it seems like the easiest way to express what I want here computationally.

Suppose your camera has a hexagonal iris and it's out of focus. When you take a picture of a point light source (towards your lower left) you'll end up with a result like this:

The effect is known as bokeh and the wikipedia page has a nice example with an octagonal iris.

(If you want to use the Octave code in this article, save the image above as hexagon.png.)

Suppose we take a picture of an ordinary scene instead. We can think of every visible point in the scene as a point light source and so the out-of-focus photograph will be the sum of many hexagons, one for each point in the image, with each hexagon's brightness determined by the brightness of the point it comes from. You can also flip this around and think about the out-of-focus image as being a sum of lots of copies of the original image, one of each point in the hexagon. We can go ahead and code this up directly in octave.

Here's the picture that we'll apply bokeh to:

(Background: I used to work for a division of Kodak and that was the test image we frequently used. Save the image as marcie1.png.)

Let's read in our image and iris shape, converting the image to grayscale:

> I = double(imread('marcie1.png'))/256;
> I = (I(:,:,1)+I(:,:,2)+I(:,:,3))/3;
> J = double(imread('hexagon.png'))/256;

> h = size(I)(1);
> w = size(I)(2);

Now we'll apply the bokeh by looping over the entire iris, accumulating shifted copies of the original image. We're optimising a little bit. As many of the pixels in J are black we can skip over them. I'm using circshift so that we'll get wraparound at the edge. That turns out to be very convenient later.

> total = 0;
> K = zeros(h, w);
> for i = 1:h
>     i
>     for j = 1:w
>         if J(i, j) != 0
>             K = K + J(i, j)*circshift(I, [i, j]);
>             total = total + J(i, j);
>         endif
>     endfor
> endfor

We use total to scale the overall brightness back into a reasonable range:

> imshow(K/total)
> pause(5)

Make sure you understand what that code is doing because the rest of this article depends on it. The central line in the double loop is repeatedly adding copies of the original image, shifted by [i, j], and scaled by J(i, j).

There's just one problem with that code. It's incredibly slow. It's only tolerable because I know that most of J is zero so I could optimize it with a conditional. More general higher resolution images will leave you waiting for a long time.

The image we have computed above is known as the convolution of I and J. My goal is to show how we can use the Fourier transform to look at this in a very different way. As a side effect we will also get a much faster convolution algorithm - but the reason it runs faster is a story for another time. In this article just want to show what Fourier transforms are and why they're relevant at all.

Fourier transforms
Central to the code above is the fact that we're shifting and adding the same image over and over again. If we could avoid that we might find a way to speed the code up. So let me define some shorthand. I'll use R to represent the function that shifts an image right one pixel. In Octave that's given by circshift(I,[0,1]). I'll use U to mean a shift up by one pixel. Rather than define *the* Fourier transform I'm going to define a family of operations, all called Fourier transforms. (More precisely, these are two-dimensional discrete Fourier transforms.)

(1) A Fourier transform is a linear operation that converts an image to another one of the same dimensions. This means that if you apply it to sums and multiples of images you get sums and multiples of the Fourier transforms. In the language of Octave, if F is such an operation, then F(A+B) == F(A)+F(B) and F(s*A) == s*F(A), for s a scalar.

(2) A Fourier transform has this property: there is a pair of images, A and B, such that for any image I, F(U(I)) = AF(I) and F(R(I)) = BF(I). (AF(I) means the pixelwise product of A and F(I)). In Octave notation:
F(circshift(I, [1,0])) == A .* F(I)
F(circshift(I, [0,1])) == B .* F(I)
Fourier transforms convert shifts to multiplications. (If you only learn one thing from this article, this should be it.)

(3) Fourier transforms are invertible so there must be another linear transform F-1 such that F-1(F(I)) = I and F(F-1(I)) = I.

Anything with these properties is an example of a Fourier transform. It's the second property that is crucially important. In jargon it's said to diagonalise translation.

From (2) it follows that we can compute the Fourier transform of an image shifted by [i, j] by multiplying the original Fourier transform by A.^i .* B.^j. (.^ is the pixelwise power function.)

If h is the image height, then shifting up h times should wrap us around to the beginning again. Similarly for w. So from (2) we know that Ah=1 and Bw=1. (That's shorthand for saying that each of the individual elements of A raised to the power of h give 1 and so on.)

It just so happens that Octave comes supplied with a suitable function that satisfies the three conditions I listed. It's called fft2. Let's find out what the corresponding images A and B are:

> r = rand(h, w);
> A = fft2(circshift(r, [1, 0])) ./ fft2(r);
> B = fft2(circshift(r, [0, 1])) ./ fft2(r);

(Note that ./ is the pixelwise division operator. If you're unlucky your random numbers will lead to a division by zero. Just try again.)

Let's try again for another random image:

> s = rand(h, w);
> A0 = fft2(circshift(s, [1, 0])) ./ fft2(s);
> B0 = fft2(circshift(s, [0, 1])) ./ fft2(s);

We can see that A is almost exactly A0 and B is almost B0:

> abs(A-A0)
> abs(B-B0)

So now we can go back to our original convolution algorithm. Let's define

> II = fft2(I);

And now we can use the first and third properties to compute the Fourier transform of the image with bokeh applied to it. We're applying properties (1) and (2) to the central line of the double loop above:

> KK = zeros(h, w);
> for i = 1:h
>     i
>     for j = 1:w
>         if J(i, j) != 0
>             KK = KK + J(i, j).*A.^i.*B.^j.*II;
>         endif
>     endfor
> endfor

I said that the definition of Fourier transform requires the existence of an inverse, and claimed that fft2 was a Fourier transform. So we must have an inverse. Octave conveniently provides us with the name ifft2:

> K = ifft2(KK/total);
> imshow(K)
> pause(5)

We've eliminated all that shifting, but at the end of the day this code is slower. But did you notice something curious about the innermost line of the double loop? It's always adding the same multiple of II to KK. We can completely factor it out. So we can rewrite the code as:

> JJ = zeros(h, w);
> for i = 1:h
>     i
>     for j = 1:w
>         if J(i, j) != 0
>             JJ = JJ + J(i, j).*A.^i.*B.^j;
>         endif
>     endfor
> endfor

We can leave the multiplication by II all the way to the end:

> KK = JJ .* II;
> K = ifft2(KK/total);
> imshow(K)
> pause(5)

This is pretty cool. We can precompute JJ. Any time we want to convolve an image with the hexagon we apply fft2 to the image, multiply by JJ and then apply ifft2. But there's something even better going on. Let's look more closely at that double loop above involving powers of the elements of A and B. Let's write out the function it computes in standard mathematical notation:

f(J) = Σi,jAi Bj Ji,j

What is f(U(J))?

f(U(J)) = ΣAi Bj Ji-1,j (dy definition of shifting up)
f(U(J)) = ΣAi+1 Bj Ji,j (by summing over i+1 instead of i and using wraparound)
f(U(J)) = A f(J)


f(R(J)) = B f(J)

In other words, f satisfies the second property of Fourier transforms. It obviously satisfies the first property. Our transform f looks like a Fourier transform. In fact, it is, and Octave's fft2 is defined this way. So now we have this procedure for convolving:

> II = fft2(I);
> JJ = fft2(J);
> KK = II .* JJ;
> K = ifft2(KK/total);

> imshow(K)
> pause(5)

We now have a fast convolution algorithm. Of course I've left out an important point: fft2 and ifft2 are fast. They don't use the obvious summation algorithm suggested by my definition of f. But that's an implementation detail. We're able to reason successfully about important properties of fft2 using the properties I listed above.

Let me recapitulate so you can see

1. I defined Fourier transforms
2. I showed how convolution could be rewritten using one
3. I told you that fft2 was a Fourier transform, giving an alternative algorithm for convolution
4. I showed that another part of the convolution algorithm looks like a Fourier transform. (I didn't prove it had an inverse.)
5. I told you that (by amazing coincidence!) this other part is also fft2.

I haven't shown you how to implement a Fourier transform. But I have shown you how you can reason about many of their properties. Enough to get much of the way to the convolution theorem.

Approaching Fourier transforms through the properties I listed is common in the more advanced mathematical literature. But for some reason, in the elementary literature people often choose to describe things in a more complicated way. This is true of many things in mathematics.

I hope the above serves as useful motivation when you come to check out a more standard and more complete introduction to Fourier transforms. In particular, now's a good time to try to understand why the usual definition of the discrete Fourier transform satisfies the properties above and so fill in the steps I missed out.

You may need to set up Octave for graphics. On MacOSX I started the X server and used "export GNUTERM=x11" before running Octave.


  1. So what does the test image look like with bokeh applied?

  2. @Anonymous: The post is written in "literate Octave", which means you're supposed to have Octave/Matlab and actually run the code as you read. You'll see the image if you do.

  3. Something seems fishy, here.

    The convolution theorem is that the transform of a convolution is the (pointwise) product of the transforms.

    I think a shift is convolution of image A with a special image B that has all ones in a single row (or column).

    Apply convolution theorem: transform of a shift is Transform(A) .* Transform(B that has all ones in a single row [or column]). If that's a constant .* Transform(A) (as in your axiom #2) then Transform(B that has all ones in a single row [or column]) should be a constant, but it's not.

    Where am I going wrong?

  4. I got really excited for "an elementary approach to Fourier transforms". And then I got really confused, really quickly.

    Seriously... what?

    Can someone please explain this explanation?

  5. @A Breaking Change: The transform of a constant is obviously a constant. Why do you say "but it's not"? What definition of "constant" are you using?

    ("Constant" here is the opposite of "variable". It doesn't mean a scalar, nor has sigfpe claimed that it's the same constant for every shift.)

  6. By a "constant," here, I mean an array that has the same value in all its slots. The transform of an array that has all ones in a single row or a single column and zeros is not, in general, a constant. For example, in 1 dimension, Fourier of {1,0,0,0} is {.5,.5,.5,.5}, a constant, but Fourier of {0,1,0,0} is {.5,.5i,-.5,-.5i}, not a constant. So I'm confused about the transform of a shift, which is convolution with {0,1,0,0} or {0,0,1,0}, etc., being the product of a constant with the transform of the original image.

  7. I'm confused by the use of the word "constant" in this discussion. I didn't use the word constant and at no point did I talk about images that have the same value for each pixel.

    BTW A shift is convolution with an image that has *one* non-zero pixel, not an image with a row of non-zero pixels. The two coordinates of the pixel give the amount of shift along the two different axes.

    A and B aren't constant images in the sense you describe. They're just images. You can see what they are because I give code to compute them (in two different ways). You use Octave to examine them and see they don't have the same value for each pixel. Given any choice of Fourier transform (eg. Octave's fft2 function) there is a pair of images, A and B, that have the property I give.

    Once I figure out what the confusion is here I'll try to go back and edit the original text to try to avoid it in future.

  8. "Approaching Fourier transforms through the properties I listed is common in the more advanced mathematical literature." Could you provide some references? Any favorite textbook using this kind of approach?

  9. @m I sold all my books on Fourier transforms a while ago so I'm not sure what books to recommend.

    Many of the elementary books tend to harp on about how the Fourier transform decomposes a signal into frequencies. But this gives no intuition for why the important theorems hold. But a web search on "fourier transform diagonalizes translation" will give you lots of useful hits.

    From a pure mathematical perspective, the DFT is a special (in fact, easy) case of representation theory for finite groups:

  10. Stricktly speaking: A "Constant" signal (a "Cosine" and many other signals) do not have Fourier Transform, because it is not a finite energy signal. But we can work around this by "taking the Fourier Transform to the limit". In doing so, the Fourier Transfom (in the limit) for a "Constant" signal is a: "Dirac Delta"