Automatic Divided Differences
Divided Differences
I've previously talked about automatic differentiation here a few times. One of the standard arguments for using automatic differentiation is that it is more accurate than numeric differentiation implemented via divided differences. We can approximate f'(x) by using (f(x)-f(y))/(x-y) with a value of y near x. Accuracy requires y to be close to x, and that requires computing the difference between two numbers that are very close. But subtracting close numbers is itself a source of numerical error when working with finite precision. So you're doomed to error no matter how close you choose x and y to be.
However, the accuracy problem with computing divided differences can itself be fixed. In fact, we can adapt the methods behind automatic differentiation to work with divided differences too.
(This paragraph can be skipped. I just want to draw a parallel with what I said here. Firstly I need to correct the title of that article. I should have said it was about *divided differences*, not *finite differences*. The idea in that article was that the notion of a divided difference makes sense for types because for a large class of function you can define divided differences without using either differencing or division. You just need addition and multiplication. That's the same technique I'll be using here. I think it's neat to see the same trick being used in entirely different contexts.)
The Direct Approach
Firstly, here's a first attempt at divided differencing:
The Indirect Approach
Automatic differentiation used a bunch of properties of differentiation: linearity, the product rule and the chain rule. Similar rules hold for divided differences. First let me introduce some notation. If f is a function then I'll use f(x) for normal function application. But I'll use f[x,y] to mean the divided difference (f(x)-f(y))/(x-y). We have
We can modify the product rule to make it more symmetrical though it's not strictly necessary:
(I got that from this paper by Kahan.)
In each case, given f evaluated at x and y, and its divided difference at [x, y], and the same for g, we can compute the corresponding quantities for the sum and product of f and g. So we can store f(x), f(y) and f[x,y] together in a single structure:
Evaluating diff0 f 1 1.000001 gives -14.000035000083997. Much closer to the result derived using CReal. One neat thing about this is that we have a function that's well defined even in the limit as x tends to y. When we evaluate diff1 f 1 1 we get the derivative of f at 1.
I thought that this was a novel approach but I found it sketched at the end of this paper by Reps and Rall. (Though their sketch is a bit vague so it's not entirely clear what they intend.)
Both the Kahan paper and the Reps and Rall papers give some applications of computing divided diferences this way.
It's not clear how to deal with the standard transcendental functions. They have divided differences that are very complex compared to their derivatives.
Aside
There is a sense in which divided differences are uncomputable(!) and that what we've had to do is switch from an extensional description of functions to an intensional description to compute them. I'll write about this some day.
Note that the ideas here can be extended to higher order divided differences and that there are some really nice connections with type theory. I'll try to write about these too.
Update: I found another paper by Reps and Rall that uses precisely the method described here.
I've previously talked about automatic differentiation here a few times. One of the standard arguments for using automatic differentiation is that it is more accurate than numeric differentiation implemented via divided differences. We can approximate f'(x) by using (f(x)-f(y))/(x-y) with a value of y near x. Accuracy requires y to be close to x, and that requires computing the difference between two numbers that are very close. But subtracting close numbers is itself a source of numerical error when working with finite precision. So you're doomed to error no matter how close you choose x and y to be.
However, the accuracy problem with computing divided differences can itself be fixed. In fact, we can adapt the methods behind automatic differentiation to work with divided differences too.
(This paragraph can be skipped. I just want to draw a parallel with what I said here. Firstly I need to correct the title of that article. I should have said it was about *divided differences*, not *finite differences*. The idea in that article was that the notion of a divided difference makes sense for types because for a large class of function you can define divided differences without using either differencing or division. You just need addition and multiplication. That's the same technique I'll be using here. I think it's neat to see the same trick being used in entirely different contexts.)
The Direct Approach
Firstly, here's a first attempt at divided differencing:
> diff0 f x y = (f x - f y)/(x - y)We can try it on the function f:
> f x = (3*x+1/x)/(x-2/x)diff0 f 1 1.000001 gives -14.0000350000029. Repeating the calculation with an arbitrary precision package (I used CReal) gives -14.000035000084000. We are getting nowhere near the precision we'd like when working with double precision floating point.
The Indirect Approach
Automatic differentiation used a bunch of properties of differentiation: linearity, the product rule and the chain rule. Similar rules hold for divided differences. First let me introduce some notation. If f is a function then I'll use f(x) for normal function application. But I'll use f[x,y] to mean the divided difference (f(x)-f(y))/(x-y). We have
(f+g)[x,y] = f[x,y]+g[x,y]
(fg)[x,y] = f(x)g[x,y]+f[x,y]g(y)
h[x,y] = f[g(x),g(y)]g[x,y] when h(x)=f(g(x))
We can modify the product rule to make it more symmetrical though it's not strictly necessary:
(fg)[x,y] = 0.5(f(x)+f(y))g[x,y]+0.5f[x,y] (g(x)+g(y))
(I got that from this paper by Kahan.)
In each case, given f evaluated at x and y, and its divided difference at [x, y], and the same for g, we can compute the corresponding quantities for the sum and product of f and g. So we can store f(x), f(y) and f[x,y] together in a single structure:
> data D a = D { fx :: a, fy :: a, fxy :: a } deriving (Eq, Show, Ord)And now we can implement arithmetic on these structures using the rules above:
> instance Fractional a => Num (D a) where > fromInteger n = let m = fromInteger n in D m m 0 > D fx fy fxy + D gx gy gxy = D (fx+gx) (fy+gy) (fxy+gxy) > D fx fy fxy * D gx gy gxy = D (fx*gx) (fy*gy) (0.5*(fxy*(gx+gy) + (fx+fy)*gxy)) > negate (D fx fy fxy) = D (negate fx) (negate fy) (negate fxy)I'll leave as an exercise the proof that this formula for division works:
> instance Fractional a => Fractional (D a) where > fromRational n = let m = fromRational n in D m m 0 > D fx fy fxy / D gx gy gxy = D (fx/gx) (fy/gy) (0.5*(fxy*(gx+gy) - (fx+fy)*gxy)/(gx*gy))For the identity function, i, we have i(x)=x, i(y)=y and i[x,y]=1. So for any x and y, the evaluation of the identity function at x, y and [x,y] is represented as D x y 1. To compute divided differences for any function f making use of addition, subtraction and division we need to simply apply f to D x y 1. We pick off the divided difference from the fxy element of the structure. Here's our replacement for diff0.
> diff1 f x y = fxy $ f (D x y 1)This is all mimicking the construction for automatic differentiation.
Evaluating diff0 f 1 1.000001 gives -14.000035000083997. Much closer to the result derived using CReal. One neat thing about this is that we have a function that's well defined even in the limit as x tends to y. When we evaluate diff1 f 1 1 we get the derivative of f at 1.
I thought that this was a novel approach but I found it sketched at the end of this paper by Reps and Rall. (Though their sketch is a bit vague so it's not entirely clear what they intend.)
Both the Kahan paper and the Reps and Rall papers give some applications of computing divided diferences this way.
It's not clear how to deal with the standard transcendental functions. They have divided differences that are very complex compared to their derivatives.
Aside
There is a sense in which divided differences are uncomputable(!) and that what we've had to do is switch from an extensional description of functions to an intensional description to compute them. I'll write about this some day.
Note that the ideas here can be extended to higher order divided differences and that there are some really nice connections with type theory. I'll try to write about these too.
Update: I found another paper by Reps and Rall that uses precisely the method described here.
4 Comments:
Is there a 0.5 missing from the (*) definition for the Num instance of D?
Trevor,
I factored out a 0.5 which may be what is leading you to think the definition of (*) differs from the formula I gave a few lines earlier. So I think the code is correct.
Ah, I didn't match parentheses. Thanks!
Have you looked at differential Galois theory? It seems that could provide answers since the type derivation that makes zippers "obvious" fits the criteria for a derivation on the ring (field?) of Haskell types.
Post a Comment
<< Home