# A Neighborhood of Infinity

## Monday, September 12, 2005

I keep starting papers and not finishing them. Both reading and writing, but this time I mean writing. But my latest is short enough that the first draft must be practically the final draft so once my colleagues have suggested improvements I'll see if I can submit it to a journal before the end of the month. Anyway, the inspiration behind this paper was the realisation that you can transform code that computes linear functions into its adjoint by writing the code 'backwards' and taking the adjoint of each line. I could explain what I mean but I found a web site that explains it all. I found an application to digital filtering where I was able to transform a useful algorithm into another by essentially writing it backwards and reversing all of the loops. But the general method of computing the adjoint of code seems so useful I assume it's commonplace, but I can't find a single paper on transforming code in this way, just the web site I mentioned above. It's closely connected to reverse mode automatic differentiation but more general as it applies to any linear operator, not just the action of the derivative on the tangent space.

OK, I will explain. Consider the line of code, in C++, a += b. We can write this in matrix form as

(a) = (1 1) (a)(b)   (0 1) (b)

If we take the transpose of the matrix we get
(a) = (1 0) (a)(b)   (1 1) (b)

which is the same as the code b += a. More generaly, any block of code that performs linear updates to variables can be converted to its adjoint by writing the steps backwards and transforming each line of code like this. A more complex example (exercise) is a = 2*a+3*b-4*c; b = 2*a which maps to a += 2*b; b = 0; b += 3*a; c -= 4*a; a *= 2. If your code has loops, eg. iterates over arrays of variables, you just run the loops backwards. The application for me is that there is a published fast algorithm for some linear operator but I need the adjoint. Surprisingly the adjoint of the fast algorithm can actually turn out to be a whole new fast algorithm.

Incidentally, I asked the journal editor for the expected turnaround time for the decision on whether to publish. He responded with the mean, standard deviation, extrema and quartiles. If only people like plumbers, doctors and cable installers could respond in this way. "If I attempt to fix this leak it has a 50% chance of taking 1 hour to fix, a 25% chance of taking 2 and there's a 5% chance I won't get it done today" rather than "I'm charging by the hour and it'll take as long as it takes unless there's a complication in which case it'll take longer".

Barak Pearlmutter said...

This really is just reverse-mode AD. With f:R^n->R^m, and J the Jacobian of f at x, forward-mode AD maps (x,x')->(f(x),J*x'), which is just the push-forward. Reverse-mode AD maps y'->J^T*y'. But note that if f is linear then f(x')=J*x', or we might just say f=J. So in this case, forward-mode AD gives the reasonably useless (x,x')->(f(x),f(x')) and reverse-mode AD gives simply the adjoint of f, which is in this case just a fancy way of saying the transpose of J.

Also ... your description of AD is a little off WRT loops. As described reverse-mode wouldn't work for iterate-to-fixedpoint loops, which are not uncommon (ahem) in numeric code. What you give works only for loops that could be in principle unrolled away, eg loops for a fixed number of iterations, or loops over the indices of an array.

There is a special transformation rule for iterate-to-fixedpoint loops, but it does not ammount to just reversing them, it is slightly more complicated. Feynman found and published the rule as his senior thesis (!) but the notation he uses is very specific so you wouldn't recognize this this is what he was doing unless you already know it.
--
Barak Pearlmutter, barak@cs.nuim.ie

sigfpe said...

This is just reverse mode AD. But I've never seen anyone else factor reverse mode AD as the product of two program transformations: forward AD followed by the adjoint. The adjoint step on its own is useful for applications besides differentiating.

My description of loops is correct, as you say, for fixed loops. For iterate-until-fixed-point, the implementations I have seen simply 'record' the forward pass and 'replay' it backwards with the same number of iterations on the reverse pass (eg. by building a tree that represents the unrolled loop during the forward run). So I don't see that my description is off. However, it seems to me that you're talking about something more interesting than this simple minded approach. Maybe you have a reference. If you don't post back here I might have to email you...