There are many elementary introductions to the discrete Fourier transform on the web. But this one is going to be a little different. I hope to motivate and demonstrate an application of the Fourier transform starting from no knowledge of the subject at all. But apart from this sentence, I'll make no mention of complex numbers or trigonometric functions. That may seem weird - the standard definitions seem to make it clear that these concepts are crucial parts of the definition. But I claim that in some ways these definitions miss the point. An analogy from software engineering is appropriate: most definitions of the Fourier transform are a bit like explaining an interface to an API by reference to the implementation. That might tell you how it works at a nuts-and-bolts level, but that can obscure what the API is actually for.
There's code all the way through so if anything I've said is ambiguous, that should help resolve it. I've chosen to write the following in 'literate Octave'. I can't say I love the language but it seems like the easiest way to express what I want here computationally.
Suppose your camera has a hexagonal iris and it's out of focus. When you take a picture of a point light source (towards your lower left) you'll end up with a result like this:
The effect is known as bokeh and the wikipedia page has a nice example with an octagonal iris.
(If you want to use the Octave code in this article, save the image above as hexagon.png.)
Suppose we take a picture of an ordinary scene instead. We can think of every visible point in the scene as a point light source and so the out-of-focus photograph will be the sum of many hexagons, one for each point in the image, with each hexagon's brightness determined by the brightness of the point it comes from. You can also flip this around and think about the out-of-focus image as being a sum of lots of copies of the original image, one of each point in the hexagon. We can go ahead and code this up directly in octave.
Here's the picture that we'll apply bokeh to:
(Background: I used to work for a division of Kodak and that was the test image we frequently used. Save the image as marcie1.png.)
Let's read in our image and iris shape, converting the image to grayscale:
> I = double(imread('marcie1.png'))/256; > I = (I(:,:,1)+I(:,:,2)+I(:,:,3))/3; > J = double(imread('hexagon.png'))/256; > h = size(I)(1); > w = size(I)(2);Now we'll apply the bokeh by looping over the entire iris, accumulating shifted copies of the original image. We're optimising a little bit. As many of the pixels in J are black we can skip over them. I'm using circshift so that we'll get wraparound at the edge. That turns out to be very convenient later.
> total = 0; > K = zeros(h, w); > for i = 1:h > i > for j = 1:w > if J(i, j) != 0 > K = K + J(i, j)*circshift(I, [i, j]); > total = total + J(i, j); > endif > endfor > endforWe use total to scale the overall brightness back into a reasonable range:
> imshow(K/total) > pause(5)Make sure you understand what that code is doing because the rest of this article depends on it. The central line in the double loop is repeatedly adding copies of the original image, shifted by [i, j], and scaled by J(i, j).
There's just one problem with that code. It's incredibly slow. It's only tolerable because I know that most of J is zero so I could optimize it with a conditional. More general higher resolution images will leave you waiting for a long time.
The image we have computed above is known as the convolution of I and J. My goal is to show how we can use the Fourier transform to look at this in a very different way. As a side effect we will also get a much faster convolution algorithm - but the reason it runs faster is a story for another time. In this article just want to show what Fourier transforms are and why they're relevant at all.
Central to the code above is the fact that we're shifting and adding the same image over and over again. If we could avoid that we might find a way to speed the code up. So let me define some shorthand. I'll use R to represent the function that shifts an image right one pixel. In Octave that's given by circshift(I,[0,1]). I'll use U to mean a shift up by one pixel. Rather than define *the* Fourier transform I'm going to define a family of operations, all called Fourier transforms. (More precisely, these are two-dimensional discrete Fourier transforms.)
(1) A Fourier transform is a linear operation that converts an image to another one of the same dimensions. This means that if you apply it to sums and multiples of images you get sums and multiples of the Fourier transforms. In the language of Octave, if F is such an operation, then F(A+B) == F(A)+F(B) and F(s*A) == s*F(A), for s a scalar.
(2) A Fourier transform has this property: there is a pair of images, A and B, such that for any image I, F(U(I)) = AF(I) and F(R(I)) = BF(I). (AF(I) means the pixelwise product of A and F(I)). In Octave notation:
F(circshift(I, [1,0])) == A .* F(I) F(circshift(I, [0,1])) == B .* F(I)Fourier transforms convert shifts to multiplications. (If you only learn one thing from this article, this should be it.)
(3) Fourier transforms are invertible so there must be another linear transform F-1 such that F-1(F(I)) = I and F(F-1(I)) = I.
Anything with these properties is an example of a Fourier transform. It's the second property that is crucially important. In jargon it's said to diagonalise translation.
From (2) it follows that we can compute the Fourier transform of an image shifted by [i, j] by multiplying the original Fourier transform by A.^i .* B.^j. (.^ is the pixelwise power function.)
If h is the image height, then shifting up h times should wrap us around to the beginning again. Similarly for w. So from (2) we know that Ah=1 and Bw=1. (That's shorthand for saying that each of the individual elements of A raised to the power of h give 1 and so on.)
It just so happens that Octave comes supplied with a suitable function that satisfies the three conditions I listed. It's called fft2. Let's find out what the corresponding images A and B are:
> r = rand(h, w); > A = fft2(circshift(r, [1, 0])) ./ fft2(r); > B = fft2(circshift(r, [0, 1])) ./ fft2(r);(Note that ./ is the pixelwise division operator. If you're unlucky your random numbers will lead to a division by zero. Just try again.)
Let's try again for another random image:
> s = rand(h, w); > A0 = fft2(circshift(s, [1, 0])) ./ fft2(s); > B0 = fft2(circshift(s, [0, 1])) ./ fft2(s);We can see that A is almost exactly A0 and B is almost B0:
> abs(A-A0) > abs(B-B0)So now we can go back to our original convolution algorithm. Let's define
> II = fft2(I);And now we can use the first and third properties to compute the Fourier transform of the image with bokeh applied to it. We're applying properties (1) and (2) to the central line of the double loop above:
> KK = zeros(h, w); > for i = 1:h > i > for j = 1:w > if J(i, j) != 0 > KK = KK + J(i, j).*A.^i.*B.^j.*II; > endif > endfor > endforI said that the definition of Fourier transform requires the existence of an inverse, and claimed that fft2 was a Fourier transform. So we must have an inverse. Octave conveniently provides us with the name ifft2:
> K = ifft2(KK/total); > imshow(K) > pause(5)We've eliminated all that shifting, but at the end of the day this code is slower. But did you notice something curious about the innermost line of the double loop? It's always adding the same multiple of II to KK. We can completely factor it out. So we can rewrite the code as:
> JJ = zeros(h, w); > for i = 1:h > i > for j = 1:w > if J(i, j) != 0 > JJ = JJ + J(i, j).*A.^i.*B.^j; > endif > endfor > endforWe can leave the multiplication by II all the way to the end:
> KK = JJ .* II; > K = ifft2(KK/total); > imshow(K) > pause(5)This is pretty cool. We can precompute JJ. Any time we want to convolve an image with the hexagon we apply fft2 to the image, multiply by JJ and then apply ifft2. But there's something even better going on. Let's look more closely at that double loop above involving powers of the elements of A and B. Let's write out the function it computes in standard mathematical notation:
f(J) = Σi,jAi Bj Ji,j
What is f(U(J))?
f(U(J)) = ΣAi Bj Ji-1,j (dy definition of shifting up)
f(U(J)) = ΣAi+1 Bj Ji,j (by summing over i+1 instead of i and using wraparound)
f(U(J)) = A f(J)
f(R(J)) = B f(J)
In other words, f satisfies the second property of Fourier transforms. It obviously satisfies the first property. Our transform f looks like a Fourier transform. In fact, it is, and Octave's fft2 is defined this way. So now we have this procedure for convolving:
> II = fft2(I); > JJ = fft2(J); > KK = II .* JJ; > K = ifft2(KK/total); > imshow(K) > pause(5)We now have a fast convolution algorithm. Of course I've left out an important point: fft2 and ifft2 are fast. They don't use the obvious summation algorithm suggested by my definition of f. But that's an implementation detail. We're able to reason successfully about important properties of fft2 using the properties I listed above.
Let me recapitulate so you can see
1. I defined Fourier transforms
2. I showed how convolution could be rewritten using one
3. I told you that fft2 was a Fourier transform, giving an alternative algorithm for convolution
4. I showed that another part of the convolution algorithm looks like a Fourier transform. (I didn't prove it had an inverse.)
5. I told you that (by amazing coincidence!) this other part is also fft2.
I haven't shown you how to implement a Fourier transform. But I have shown you how you can reason about many of their properties. Enough to get much of the way to the convolution theorem.
Approaching Fourier transforms through the properties I listed is common in the more advanced mathematical literature. But for some reason, in the elementary literature people often choose to describe things in a more complicated way. This is true of many things in mathematics.
I hope the above serves as useful motivation when you come to check out a more standard and more complete introduction to Fourier transforms. In particular, now's a good time to try to understand why the usual definition of the discrete Fourier transform satisfies the properties above and so fill in the steps I missed out.
You may need to set up Octave for graphics. On MacOSX I started the X server and used "export GNUTERM=x11" before running Octave.