Thursday, July 28, 2005

Automatic Differentiation


The most important equation in mathematics is x2=0
Sir Michael Atiyah


Automatic Differentiation (also known as AD) is a great way to numerically compute derivatives of functions which is 'exact' (up to machine precision) and yet is easy to implement and can run fast. In particular it's useful for computing derivatives of functions that are implemented as many lines of code. I've seen people do crazy things like try to write their hundred lines of code as one big expression and then attempt to differentiate it symbolically in Mathematica. But in general, symbolic differentiation produces expressions that are extremely inefficient.

Here's one approach to automatic differentiation:

In a programming language which supports classes we can easily implement the complex numbers. We write complex numbers as a+ib which is typically represented in memory by a pair of machine precision floats a and b. We then implement rules for things like complex multiplication by translating rules like

(a+a'i)*(b+b'i) = ab-a'b'+(ab'+a'b)i

into code like

c0 = a0*b0-a1*b1
c1 = a0*b1+a1*b0

In a language like C++ we can then overload operators like * with an implementation that acts on our complex numbers to give a transparent way to write functions that operate on complex numbers.

Now, suppose that instead of introducing a complex number i such that i2=-1 we instead introduce d such that d2=0. Numbers of the form a+bd are known as dual numbers.
We now get rules like

(a+a'd)*(b+b'd) = ab+(ab'+a'b)d

that translates into code like

c0 = a0*b0
c1 = a0*b1+a1*b0

Note the important thing: if we implement a function f on real numbers, and then use operator overloading to allow f to act on dual numbers just as we might allow it to act on complex numbers then we automatically compute the derivative. In particular, we find that for real x

f(x+d) = f(x)+f'(x)d
When we build functions by composing our overloaded operators the d2=0 rule automagically implements the chain and product rules for us.

Once we have such classes it is essentially trivial to compute derivatives using them. This is much easier than almost any of the other methods available, and is much more accurate and reliable than finite differencing methods. For more details see the web site or even my own paper (recently published in the Journal of Graphics Tools) giving an application.

I've only mentioned first derivatives above. The method generalises to nth derivatives straightforwardly and also generalises to derivatives with respect to multiple variables. The catch is that if you have a function, f, of n variables, and your code takes time T to compute f, then computing the derivative with respect to the n variables takes time proportional to nT. But there's also another trick you can do to get this back to a time proportional to T and independent of n. This is called adjoint or reverse automatic differentiation. There is more on this at the autodiff web site.


I learnt this stuff originally from a paper by Max Jerrell.

Anyway...you now have no excuse for wasting your time with symbolic differentiation. If it takes more than one sheet of paper to differentiate your expressions you should be using AD. It's provably no harder to implement than complex numbers and yet you wouldn't hesitate to use complex numbers to evaluate expressions would you? It's not exactly rocket science. (OK, I'm lying, rocket scientists use this stuff all the time to differentiate entire fluid dynamics simulations.)


BTW Dual numbers also crop up in algebraic geometry as they give a nice algebraic way to define things like tangent spaces. The tensor product of the quaternions with the dual numbers has some nice applications to Newtonian dynamics. I've also seen fluid dynamics papers where the authors use complex numbers as an approximation to the dual numbers. In fact, there are quite a few people in aerodynamics doing this - it's called the complex step method. It's a very stupid thing to do seeing as the dual numbers give better results and are easier to implement than complex numbers. But I guess if you're writing in an ancient dialect of FORTRAN and it gives you the COMPLEX type for free then maybe it's the easiest option.

10 comments:

Suresh said...

i guess I don't get it. how is

f(x + d ) = f(x) + f'(x)d ?

sigfpe said...

Hi Suresh,

That identity doesn't hold for real numbers but in an extended algebraic structure where d²=0. If you look at my own paper on the subject you'll see some actual C++ code that implements numbers in such a structure. That should bring it down to earth in a practical way.

shaneal said...

I was also confused by the
f(x + d ) = f(x) + f'(x)d thing.

I played with it a while, and this more verbose derivation may help those of us with less experience.

f(x) = c0 + c1*x + c2*x^2 + ... + cn*x^n

therfore:
f(x + x'*d) = c0 + c1*(x+x'*d) + c2*(x+x'*d)^2 + ... + cn*(x+x'*d)^n

= c0 + c1*x + c2*x^2 + ... + cn*x^n
+ c1*x'*d + 2*c2*x*x'*d + ... + n*cn*x^(n-1)*x'*d

= f(x) + f'(x)*x'*d

By the way, this is one of the best blogs I've read in quite a while. I'm basically done with my undergrad math degree, and this is interesting stuff. It's real applications of all the theory i picked up in abstract algebra.

Thanks

Porges said...

Neat. Is there something similar for integration?

sigfpe said...

porges,

There isn't a corresponding method for integration. Integration turns out to be very ugly compared to differentiation.

Porges said...

I thought so, or we wouldn't have the 800 different methods for numerical integration :P

What do you get if you *do* integrate with dual numbers? I guess I'll go have a peek at Wikipedia.

Anonymous said...

Link to your paper is broken.

Very nice blog and replies

Anonymous said...

The link to your paper is broken

Anonymous said...

The link to your paper is broken

Jeff said...

Anyone looking for the paper can find it on the Wayback Machine.

Blog Archive