Monday, September 12, 2005

Adjoints

I keep starting papers and not finishing them. Both reading and writing, but this time I mean writing. But my latest is short enough that the first draft must be practically the final draft so once my colleagues have suggested improvements I'll see if I can submit it to a journal before the end of the month. Anyway, the inspiration behind this paper was the realisation that you can transform code that computes linear functions into its adjoint by writing the code 'backwards' and taking the adjoint of each line. I could explain what I mean but I found a web site that explains it all. I found an application to digital filtering where I was able to transform a useful algorithm into another by essentially writing it backwards and reversing all of the loops. But the general method of computing the adjoint of code seems so useful I assume it's commonplace, but I can't find a single paper on transforming code in this way, just the web site I mentioned above. It's closely connected to reverse mode automatic differentiation but more general as it applies to any linear operator, not just the action of the derivative on the tangent space.


OK, I will explain. Consider the line of code, in C++, a += b. We can write this in matrix form as


(a) = (1 1) (a)
(b) (0 1) (b)

If we take the transpose of the matrix we get

(a) = (1 0) (a)
(b) (1 1) (b)

which is the same as the code b += a. More generaly, any block of code that performs linear updates to variables can be converted to its adjoint by writing the steps backwards and transforming each line of code like this. A more complex example (exercise) is a = 2*a+3*b-4*c; b = 2*a which maps to a += 2*b; b = 0; b += 3*a; c -= 4*a; a *= 2. If your code has loops, eg. iterates over arrays of variables, you just run the loops backwards. The application for me is that there is a published fast algorithm for some linear operator but I need the adjoint. Surprisingly the adjoint of the fast algorithm can actually turn out to be a whole new fast algorithm.


Incidentally, I asked the journal editor for the expected turnaround time for the decision on whether to publish. He responded with the mean, standard deviation, extrema and quartiles. If only people like plumbers, doctors and cable installers could respond in this way. "If I attempt to fix this leak it has a 50% chance of taking 1 hour to fix, a 25% chance of taking 2 and there's a 5% chance I won't get it done today" rather than "I'm charging by the hour and it'll take as long as it takes unless there's a complication in which case it'll take longer".

Labels:

1 Comments:

Blogger sigfpe said...

This is just reverse mode AD. But I've never seen anyone else factor reverse mode AD as the product of two program transformations: forward AD followed by the adjoint. The adjoint step on its own is useful for applications besides differentiating.

My description of loops is correct, as you say, for fixed loops. For iterate-until-fixed-point, the implementations I have seen simply 'record' the forward pass and 'replay' it backwards with the same number of iterations on the reverse pass (eg. by building a tree that represents the unrolled loop during the forward run). So I don't see that my description is off. However, it seems to me that you're talking about something more interesting than this simple minded approach. Maybe you have a reference. If you don't post back here I might have to email you...

Sunday, 09 October, 2005  

Post a Comment

<< Home