OK, I will explain. Consider the line of code, in C++, a += b. We can write this in matrix form as
(a) = (1 1) (a)
(b) (0 1) (b)
If we take the transpose of the matrix we get
(a) = (1 0) (a)
(b) (1 1) (b)
which is the same as the code
b += a. More generaly, any block of code that performs linear updates to variables can be converted to its adjoint by writing the steps backwards and transforming each line of code like this. A more complex example (exercise) is a = 2*a+3*b-4*c; b = 2*a which maps to a += 2*b; b = 0; b += 3*a; c -= 4*a; a *= 2. If your code has loops, eg. iterates over arrays of variables, you just run the loops backwards. The application for me is that there is a published fast algorithm for some linear operator but I need the adjoint. Surprisingly the adjoint of the fast algorithm can actually turn out to be a whole new fast algorithm.
Incidentally, I asked the journal editor for the expected turnaround time for the decision on whether to publish. He responded with the mean, standard deviation, extrema and quartiles. If only people like plumbers, doctors and cable installers could respond in this way. "If I attempt to fix this leak it has a 50% chance of taking 1 hour to fix, a 25% chance of taking 2 and there's a 5% chance I won't get it done today" rather than "I'm charging by the hour and it'll take as long as it takes unless there's a complication in which case it'll take longer".

2 comments:
This really is just reverse-mode AD. With f:R^n->R^m, and J the Jacobian of f at x, forward-mode AD maps (x,x')->(f(x),J*x'), which is just the push-forward. Reverse-mode AD maps y'->J^T*y'. But note that if f is linear then f(x')=J*x', or we might just say f=J. So in this case, forward-mode AD gives the reasonably useless (x,x')->(f(x),f(x')) and reverse-mode AD gives simply the adjoint of f, which is in this case just a fancy way of saying the transpose of J.
Also ... your description of AD is a little off WRT loops. As described reverse-mode wouldn't work for iterate-to-fixedpoint loops, which are not uncommon (ahem) in numeric code. What you give works only for loops that could be in principle unrolled away, eg loops for a fixed number of iterations, or loops over the indices of an array.
There is a special transformation rule for iterate-to-fixedpoint loops, but it does not ammount to just reversing them, it is slightly more complicated. Feynman found and published the rule as his senior thesis (!) but the notation he uses is very specific so you wouldn't recognize this this is what he was doing unless you already know it.
--
Barak Pearlmutter, barak@cs.nuim.ie
This is just reverse mode AD. But I've never seen anyone else factor reverse mode AD as the product of two program transformations: forward AD followed by the adjoint. The adjoint step on its own is useful for applications besides differentiating.
My description of loops is correct, as you say, for fixed loops. For iterate-until-fixed-point, the implementations I have seen simply 'record' the forward pass and 'replay' it backwards with the same number of iterations on the reverse pass (eg. by building a tree that represents the unrolled loop during the forward run). So I don't see that my description is off. However, it seems to me that you're talking about something more interesting than this simple minded approach. Maybe you have a reference. If you don't post back here I might have to email you...
Post a Comment