Saturday, June 25, 2011

An elementary way to approach Fourier transforms

Why another introduction to the Fourier transform?
There are many elementary introductions to the discrete Fourier transform on the web. But this one is going to be a little different. I hope to motivate and demonstrate an application of the Fourier transform starting from no knowledge of the subject at all. But apart from this sentence, I'll make no mention of complex numbers or trigonometric functions. That may seem weird - the standard definitions seem to make it clear that these concepts are crucial parts of the definition. But I claim that in some ways these definitions miss the point. An analogy from software engineering is appropriate: most definitions of the Fourier transform are a bit like explaining an interface to an API by reference to the implementation. That might tell you how it works at a nuts-and-bolts level, but that can obscure what the API is actually for.

There's code all the way through so if anything I've said is ambiguous, that should help resolve it. I've chosen to write the following in 'literate Octave'. I can't say I love the language but it seems like the easiest way to express what I want here computationally.

Convolution
Suppose your camera has a hexagonal iris and it's out of focus. When you take a picture of a point light source (towards your lower left) you'll end up with a result like this:


The effect is known as bokeh and the wikipedia page has a nice example with an octagonal iris.

(If you want to use the Octave code in this article, save the image above as hexagon.png.)

Suppose we take a picture of an ordinary scene instead. We can think of every visible point in the scene as a point light source and so the out-of-focus photograph will be the sum of many hexagons, one for each point in the image, with each hexagon's brightness determined by the brightness of the point it comes from. You can also flip this around and think about the out-of-focus image as being a sum of lots of copies of the original image, one of each point in the hexagon. We can go ahead and code this up directly in octave.

Here's the picture that we'll apply bokeh to:


(Background: I used to work for a division of Kodak and that was the test image we frequently used. Save the image as marcie1.png.)

Let's read in our image and iris shape, converting the image to grayscale:

> I = double(imread('marcie1.png'))/256;
> I = (I(:,:,1)+I(:,:,2)+I(:,:,3))/3;
> J = double(imread('hexagon.png'))/256;


> h = size(I)(1);
> w = size(I)(2);

Now we'll apply the bokeh by looping over the entire iris, accumulating shifted copies of the original image. We're optimising a little bit. As many of the pixels in J are black we can skip over them. I'm using circshift so that we'll get wraparound at the edge. That turns out to be very convenient later.

> total = 0;
> K = zeros(h, w);
> for i = 1:h
>     i
>     for j = 1:w
>         if J(i, j) != 0
>             K = K + J(i, j)*circshift(I, [i, j]);
>             total = total + J(i, j);
>         endif
>     endfor
> endfor

We use total to scale the overall brightness back into a reasonable range:

> imshow(K/total)
> pause(5)

Make sure you understand what that code is doing because the rest of this article depends on it. The central line in the double loop is repeatedly adding copies of the original image, shifted by [i, j], and scaled by J(i, j).

There's just one problem with that code. It's incredibly slow. It's only tolerable because I know that most of J is zero so I could optimize it with a conditional. More general higher resolution images will leave you waiting for a long time.

The image we have computed above is known as the convolution of I and J. My goal is to show how we can use the Fourier transform to look at this in a very different way. As a side effect we will also get a much faster convolution algorithm - but the reason it runs faster is a story for another time. In this article just want to show what Fourier transforms are and why they're relevant at all.

Fourier transforms
Central to the code above is the fact that we're shifting and adding the same image over and over again. If we could avoid that we might find a way to speed the code up. So let me define some shorthand. I'll use R to represent the function that shifts an image right one pixel. In Octave that's given by circshift(I,[0,1]). I'll use U to mean a shift up by one pixel. Rather than define *the* Fourier transform I'm going to define a family of operations, all called Fourier transforms. (More precisely, these are two-dimensional discrete Fourier transforms.)

(1) A Fourier transform is a linear operation that converts an image to another one of the same dimensions. This means that if you apply it to sums and multiples of images you get sums and multiples of the Fourier transforms. In the language of Octave, if F is such an operation, then F(A+B) == F(A)+F(B) and F(s*A) == s*F(A), for s a scalar.

(2) A Fourier transform has this property: there is a pair of images, A and B, such that for any image I, F(U(I)) = AF(I) and F(R(I)) = BF(I). (AF(I) means the pixelwise product of A and F(I)). In Octave notation:
F(circshift(I, [1,0])) == A .* F(I)
F(circshift(I, [0,1])) == B .* F(I)
Fourier transforms convert shifts to multiplications. (If you only learn one thing from this article, this should be it.)

(3) Fourier transforms are invertible so there must be another linear transform F-1 such that F-1(F(I)) = I and F(F-1(I)) = I.

Anything with these properties is an example of a Fourier transform. It's the second property that is crucially important. In jargon it's said to diagonalise translation.

From (2) it follows that we can compute the Fourier transform of an image shifted by [i, j] by multiplying the original Fourier transform by A.^i .* B.^j. (.^ is the pixelwise power function.)

If h is the image height, then shifting up h times should wrap us around to the beginning again. Similarly for w. So from (2) we know that Ah=1 and Bw=1. (That's shorthand for saying that each of the individual elements of A raised to the power of h give 1 and so on.)

It just so happens that Octave comes supplied with a suitable function that satisfies the three conditions I listed. It's called fft2. Let's find out what the corresponding images A and B are:

> r = rand(h, w);
> A = fft2(circshift(r, [1, 0])) ./ fft2(r);
> B = fft2(circshift(r, [0, 1])) ./ fft2(r);

(Note that ./ is the pixelwise division operator. If you're unlucky your random numbers will lead to a division by zero. Just try again.)

Let's try again for another random image:

> s = rand(h, w);
> A0 = fft2(circshift(s, [1, 0])) ./ fft2(s);
> B0 = fft2(circshift(s, [0, 1])) ./ fft2(s);

We can see that A is almost exactly A0 and B is almost B0:

> abs(A-A0)
> abs(B-B0)

So now we can go back to our original convolution algorithm. Let's define

> II = fft2(I);

And now we can use the first and third properties to compute the Fourier transform of the image with bokeh applied to it. We're applying properties (1) and (2) to the central line of the double loop above:

> KK = zeros(h, w);
> for i = 1:h
>     i
>     for j = 1:w
>         if J(i, j) != 0
>             KK = KK + J(i, j).*A.^i.*B.^j.*II;
>         endif
>     endfor
> endfor

I said that the definition of Fourier transform requires the existence of an inverse, and claimed that fft2 was a Fourier transform. So we must have an inverse. Octave conveniently provides us with the name ifft2:

> K = ifft2(KK/total);
> imshow(K)
> pause(5)

We've eliminated all that shifting, but at the end of the day this code is slower. But did you notice something curious about the innermost line of the double loop? It's always adding the same multiple of II to KK. We can completely factor it out. So we can rewrite the code as:

> JJ = zeros(h, w);
> for i = 1:h
>     i
>     for j = 1:w
>         if J(i, j) != 0
>             JJ = JJ + J(i, j).*A.^i.*B.^j;
>         endif
>     endfor
> endfor

We can leave the multiplication by II all the way to the end:

> KK = JJ .* II;
> K = ifft2(KK/total);
> imshow(K)
> pause(5)

This is pretty cool. We can precompute JJ. Any time we want to convolve an image with the hexagon we apply fft2 to the image, multiply by JJ and then apply ifft2. But there's something even better going on. Let's look more closely at that double loop above involving powers of the elements of A and B. Let's write out the function it computes in standard mathematical notation:

f(J) = Σi,jAi Bj Ji,j

What is f(U(J))?

f(U(J)) = ΣAi Bj Ji-1,j (dy definition of shifting up)
f(U(J)) = ΣAi+1 Bj Ji,j (by summing over i+1 instead of i and using wraparound)
f(U(J)) = A f(J)

Similarly,

f(R(J)) = B f(J)

In other words, f satisfies the second property of Fourier transforms. It obviously satisfies the first property. Our transform f looks like a Fourier transform. In fact, it is, and Octave's fft2 is defined this way. So now we have this procedure for convolving:

> II = fft2(I);
> JJ = fft2(J);
> KK = II .* JJ;
> K = ifft2(KK/total);

> imshow(K)
> pause(5)

We now have a fast convolution algorithm. Of course I've left out an important point: fft2 and ifft2 are fast. They don't use the obvious summation algorithm suggested by my definition of f. But that's an implementation detail. We're able to reason successfully about important properties of fft2 using the properties I listed above.

Conclusion
Let me recapitulate so you can see

1. I defined Fourier transforms
2. I showed how convolution could be rewritten using one
3. I told you that fft2 was a Fourier transform, giving an alternative algorithm for convolution
4. I showed that another part of the convolution algorithm looks like a Fourier transform. (I didn't prove it had an inverse.)
5. I told you that (by amazing coincidence!) this other part is also fft2.

I haven't shown you how to implement a Fourier transform. But I have shown you how you can reason about many of their properties. Enough to get much of the way to the convolution theorem.

Approaching Fourier transforms through the properties I listed is common in the more advanced mathematical literature. But for some reason, in the elementary literature people often choose to describe things in a more complicated way. This is true of many things in mathematics.

I hope the above serves as useful motivation when you come to check out a more standard and more complete introduction to Fourier transforms. In particular, now's a good time to try to understand why the usual definition of the discrete Fourier transform satisfies the properties above and so fill in the steps I missed out.

Note
You may need to set up Octave for graphics. On MacOSX I started the X server and used "export GNUTERM=x11" before running Octave.

Saturday, June 04, 2011

Simulating visual artifacts with Fourier optics

The problem

One of the fun things about working in the visual effects industry is the large number of disciplines involved, especially when working for a smaller company where everyone has to do everything. At one point many years ago we did some work on simulating camera artifacts including diffraction spikes. Although the wikipedia page is in the context of astronomy you can get diffraction artifacts with any kind of camera, and you'll get spikes whenever the iris has a polygonal shape. I hope to sketch why later.

I noticed an intriguing question about a diffraction pattern on Quora. (For those not following the link: this photo is a picture of a reflection of the photographer and camera in an LCD screen.) My immediate thought was "how could I have faked this in a render?".



My main interest was in the 'X' shape. And the first question is this: how do I know this is a diffraction effect? It might be that there are specular reflections off periodic structures in the LCD display orthogonal to the arms of the 'X'. This would explain everything with only geometric optics.

There are two big clues I know of that diffraction is at work: we have separation of white light into separate colours. This is typical of diffraction effects. We also see a discrete structure: rays directed along a discrete set of directions in addition to the arms of the 'X'. This is typical of diffraction from periodic microstructures, and again I hope to sketch why later.

So how can we explain the appearance of this photograph?

A simple model

If we consider diffraction effects we need to model light as waves. Let's start by working with just one frequency of light from the camera flash. Let's also suppose that the flash is far enough away from the monitor that the spherical wavefronts can be modelled as plane waves when they meet the monitor. Here's a picture:
The thick black vertical line is the monitor screen. The other vertical lines are the incoming plane waves. And the red arrow shows the direction of motion. Huygen's principle tells us that when the waves reflect off the monitor surface we can treat the entire monitor surface as a collection of points, each of which is emitting spherical waves equally in all directions. If you're unfamiliar with Huygen's principle it may seem wrong to you. When you shine a light at a mirror we get reflection along one particular direction, not in all directions. That's true, but we can view this fact as a consequence of interference between all of the little points on the surface emitting in all directions. Again, that's another thing that will emerge from the calculation below.

The question I now want to answer is this: how much light gets reflected in direction k? I'll assume we're observing things from far enough away that we can neglect the details of the geometry. We'll just compute the contribution in direction k (a unit vector (l, m, n)) from all of the points on our surface. We'll assume that because of variations on the surface, the intensity of the emission varies along the surface. We'll model the emission as a function I, so I(y, t) is the emission from position y at time t. We want to collect the light going in direction k through some light collector at time t. Call that J(k,t). Here's a picture:
Remember: there are waves going in all directions, but I've just drawn the waves going in direction k.

Fourier optics for beginners

Now comes the interesting bit: the light arriving at the collector is the integral of all the light emitted from the points on the monitor surface. But the time of flight from each point on the monitor surface is different. So when we perform our integral we need to build in a suitable delay. It's straightforward geometry to show that the time delay between waves arriving from y₁ and y₂ is my/c where c is the speed of light and m, as defined above, is the y-component of k. So, up to a constant of proportionality, and some absolute choice if time, we want this integral

J(k,t) = ∫ dy I(y,t-my/c)

We assumed that the monitor surface was being struck by plane waves. Assuming they were orthogonal to the surface and coherent this means that I is proportional to exp(iωt). The time dependence in J is just the sinusoid, so we drop that from further calculations. So we can write, ignoring another constant of proportionality,

J(k) = ∫ dy I(y)exp(-my/c) = ∫ dy I(y)exp(-i2πmy/λ)

Where I used ω = 2πc/λ and λ is the wavelength of the light. In other words, J is the Fourier transform of I. More generally, if we modelled the z-axis we'd find that J is given by the 2D Fourier transform of I as a function of the 2D surface of the monitor. The actual power striking the sensor is given by the absolute value of the square of J.

Ordinary reflections

So now I can pop the first of my pending explanations off the stack. Suppose that the surface is completely uniform. Then I(y) = constant. The Fourier transform of a constant function is the dirac delta function. In other words - despite the fact that we're modelling every point on the surface as its own emitter, the resulting reflection from a perfectly smooth surface is a wave going straight back where it came from. (Exercise, modify the above treatment to show that waves striking a mirror at an angle obey the usual reflection law for mirrors.)

Periodic structure

Now suppose that the surface of our monitor has a periodic structure. The Fourier transform of a periodic function is concentrated at spikes corresponding to the fundamental frequency of the structure and its overtones. So we expect to see a grid-like structure in the result. I believe that's what we're seeing in the spray of quantised vectors coming from the center of the image and the fact that the arms of the 'X' look like discrete blobs of colour arranged in a line rather than the white spikes in the astronomical example I linked to above. That tells us we're probably seeing artifacts caused by the microstructure of the LCD pixels.

An edge

Consider the function I(y,z) = -1 for y<0 and 1 for y>0. In other words, the sgn function along the y-axis and constant along the z-axis. Up to multiplication by a constant, its Fourier transform is given by the dirac delta along the z-axis and 1/ω along the y-axis. On other words, it's a spike, starting at the origin, extending along the y-axis, but fading away as we approach infinity. This is the source of diffraction spikes. Any time we see such a spike it's likely there was a perpendicular edge somewhere in the function I. For telescopes it often comes from the struts supporting the secondary mirror. For cameras it comes from the polygonal shape of the iris. In this case, it looks like we must have pixels whose shape has edges perpendicular to the 'X'. I have to admit, when I came to this conclusion it sounded very implausible. But then someone posted this image from here:
There are those edges at the predicted angles.

Simulating the artifact

Now we have enough information to simulate the artifact. You can think of the remainder of this article as written in "literate octave".

Save the above close-up of the pixels in a file and read it in.

I = imread('lcd-pattern.jpg');

I crop out one quarter as it simply repeats. This also removes the text.

I1 = I(251:500,251:500,1);

Now clean up some stray (image) pixels and pick out just one (LCD) pixel.

I1(1:250,100:250) = 0;

The photo is blurry, probably has compression artifacts and seems quite noisy to me. So I threshold it at intensity 200 to get a hard shape. I then compute the discrete Fourier transform and scale the intensity to something nice.

I2 = 0.0001*abs(fft2(I1>200).^2);

The Fourier transform puts the origin in the corner so I shift it to the centre.

I2 = circshift(I2,[125,125]);

Now we need to do the same thing for red and green. We reuse the red image but it needs scaling in proportion to the wavelength. The numbers 1.2 and 1.4 are intended to be the ratio of the wavelengths to that of blue light. They're just ballpark figures: I chose them to make the image sizes convenient when rescaled. I also crop out a piece from the centre of the rescaled images so they line up nicely. I use imresize2 from here.

Ib = I2;
Ig = imresize2(Ir,1.2,'nearest')(26:275, 26:275);
Ir = imresize2(Ir,1.4,'nearest')(51:300, 51:300);

Now I assemble the final RGB image.

J = zeros(250, 250, 3);
J(:, :, 1) = Ir;
J(:, :, 2) = Ig;
J(:, :, 3) = Ib;
imwrite(J,'cross.jpg')

Here's the result:


Analysis

Not bad, but it's not perfect. That's unsurprising. Have a look at the thresholded image straight after thresholding. It's a very poor representation of the pixel shape. It would be better to redraw the shape correctly at a higher resolution (or even work analytically, not infeasible for simple shapes like these pixels). Stray (image) pixels can make a big difference in Fourier space. That explains the 'dirtiness' of my image. Nonetheless, it has the big X and it has the rows of dots near the middle. Qualitatively it's doing the right thing.

Note that the big X in the middle seems to have ghost reflections at the left and right. These are, I think, aliasing and purely artifacts of the digitisation.

It'd look slightly better with some multi-spectral rendering. I've treated the pixels as reflecting one wavelength but actually each one reflects a band.

There are also, almost certainly, lots more effects going in that picture. The close-up photograph is only revealing one part of the structure. I'm sure there is 3D structure to those pixels, not just a flat surface. I suspect that's where the horizontal white line is coming from. Because it's white it suggests an artifact due to something interacting equally with all visible wavelengths. Maybe a vertical edge between pixels.

But overall I think I've showed that Fourier optics is a step in the right direction for simulating this kind of effect.

Note

Everything I know about Fourier optics I learnt from my ex-colleague Oliver James.

All the errors are mine of course.