## Friday, July 01, 2005

### Two Papers

I'm trying to get the time to write two papers at the moment.

The first is an application of a result about convex polygons, that first came out of the theory of Toric Varieties, for an image processing application. I actually submitted to a conference but it wasn't quite the right venue so I'll do what they recommended in the rejection and resubmit to another journal related to the conference. Need to add another example though.

The second is on automatic differentiation (AD). Here's the idea behind AD. Given a bit of code like

y = cos(x)
z = x+y

we can rewrite them as

dx = 1
y = cos(x)
dy = -sin(x)*dx
z = x+y
dz = dx+dy

In order to compute the derivatives w.r.t x.

But that's an ugly way to work. The code to compute the original function is interleaved with the code to compute derivatives. You have to maintain both sets of code simultaneously, you have to do the differentiation yourself, and it violates the principle of separating concerns. But a nice solution that I first learnt from a paper by Max Jerrell is to replace the float or double datatype with a new type that tracks derivatives for you. The next effect is that the original code

y = cos(x)
z = x+y

is left intact but you override +, cos and so on so that they automatically implement the chain rule for you. That way the differentiation code is neatly encapsulated. Here's a link to a paper I've already written on the subject. This approach is called forward AD. Note that this method is exact (unlike finite differences) but not symbolic algebra (which is usually a really inefficient way of computing derivatives).

Note how the code above computes derivatives in the same order as the original code. It turns out that there is a more efficient method for computing derivatives (at least in many cases) that works by reversing the order of the original code and then applying the chain rule. In effect it's like computing the matrix/vector product ABCx as A(B(Cx)) instead of as ((AB)C)x. The former is usually much faster. A solution to this is to again override +, cos and so on. But this time, instead of immediately applying the chain rule, a tree representing the computation is built. Only at the end are the numbers crunched, this time sweeping through the expression tree in reverse order. This usually requires quite different code to the method in the previous paragraph and is called reverse or adjoint AD.

But I have an approach to unifying these two algorithms. Again there is a separation of concerns issue: the second AD algorithm performs two functions - reversing the order of the computation and then computing the derivative. I've figured out a way to override the operators * and + in such a way that the reverse AD code is simply the forward AD code with the underlying float type replaced with yet another type. This simplifies AD libraries considerably, makes the code easier to maintain, and just seems like a nice way to do things.

So now you know what I'm up to.