Practical Synthetic Differential Geometry
Previously I've talked about Automatic Differentiation (AD). My own formulation of the technique is more algebraic than the description that is usually given, and recently it's begun to dawn on me that all I've done is rediscover Synthetic Differential Geometry (SDG). I've looked at a variety of SDG papers but none of them seem to make the connection with AD. This is a pity because SDG allows you to do an amazing thing - write computer programs that easily manipulate objects such as vector fields and differential forms on manifolds without doing symbolic algebra. My goal here is to illustrate how the definition of vector field in Anders Kock's book gives a nice functional definition of a vector field and how that definition leads naturally to the Lie bracket. Normally it takes a significant amount of mathematical machinery to define these things but in the framework of SDG it becomes very simple. (Unfortunately this is going to be very sketchy as there's a lot I could say and I only have a small amount of time while I sit at home waiting for a new TV to be delivered...)
Kock's work is based on the idea of infinitesimal numbers whose square is zero but which aren't themselves zero. Clearly we aren't talking about real numbers here, but instead an extension to the real numbers. Here's a toy implementation:
They're called dual numbers because this particular algebra was apparently first described by Clifford and that's what he called them. The important point to note is that e^2==0. We can use the fact that
to compute derivatives of functions. For example if we define
and evaluate f (1+e) we get D 1 4. We can directly read off f(1)=1 and f'(1)=4.
But, as Kock points out, the dual numbers don't provide 'enough' infinitesimals, we just get one, e, from which all of the others are built. So we can replace Dual with a new class:
A slightly more extended version of the Num interface:
And some more functions for good measure:
The d i provide a basis for these infinitesimals.
We now have an infinite set of independent dual numbers, d i, one for each natural i. Note that we have (d i)*(d j)==0 for any i and j. Apart from this, they act just like before. For example
We can use these compute partial derivatives. For example consider
Computing g (7+d 0) (0+d 1) gives R 7.0 [1.0,3.0] so we know g(7,0) = 7, gx(7,0)=1 and gy(7,0)=3.
I'll use R as mathematical notation for the type given by R Float. (I hope the double use of 'R' won't be confusing, but I want to be consistent-ish with Kock.)
Before we do any more calculus, consider the following geometric figure:
A point is in the intersection of the circle and the line if it satisfies the equations:
Notice that in R this has many solutions. In fact, we find that onS (a*d 1,0) && onL (a*d 1,0) gives True for any Float value a. Does this mean anything? Actually, it fits some people's intuition of what the intersection should look like better than the usual idea that there is only one intersection point. In fact, Protagoras argued that there was a short line segment in common between the circle and the line and that because Pythogoras's theorem gave only one solution the whole of geometry was at fault! Using R instead of the reals makes sense of this intuition. The points of the form (a*d 1,0) do, indeed, form a line tangent to the circle. In fact, there is a way to construct tangent spaces and vector fields directly from this - but instead I want to now consider Kock's construction which is slightly different.
Let D be the subset of R consisting of numbers whose square is zero. These numbers are in some sense infinitesimal. In differential geometry a vector field is intuitively thought of as an 'infinitesimal flow' on a manifold. A finite flow on a manifold, M, is simply a function f:R×M→M such that f(x,0)=0. In f(x,t) we can think of t as time and each point x on the manifold flows along a path t→f(x,t). If f is differentiable, then as t becomes smaller, the path becomes closer and closer to a short line segment which we can draw as a vector. So the intuition is that a vector field is what you get in the limit as t becomes infinitesimal. The catch is that infinitesimal things aren't first class objects in differential geometry (so to speak). You can use infinitesimals for intuition, and I suspect almost all differential geoneters do, but actually talking about them is taboo. In fact, the definition of vector field in differential geometry is a bit of a kludge to work around this issue. When most people first meet the definition of a vector field as a differential operator it comes as quite a shock. But we have no such problem. In fact, Kock defines a vector field simply as a function
In the language of SDG, a vector field is an infinitesimal flow. For example, consider
This simply translates along the x-axis infinitesimally. Of course this code defines a flow for non-infinitesimal values of d, but when considering this as a vector field we're only interested in when d is infinitesimal. transX defines a vector field that everywhere points along the x-axis.
For those familiar with the usual differential geometric definition of vector fields I can now make contact with that. In that context a vector field is a differential operator that acts on scalar functions on your manifold. In our case, we can think of it acting by infinitesimally dragging the manifold and then looking at what happens to the function that's dragged along. In other words, if we have a funtion, f, on the manifold, the vector field v, applied to f, is that function vf such that f (f d x)==f x + d*vf x. For example with
h (transX (d 1) (1,2)) == R 5 [2] because h(x,y)=5 and hx(x,y)=2.
I think I have enough time to briefly show how to define the Lie bracket in this context. If you have two vector fields, then intuitively you can view the Lie bracket as follows: take a point, map it using the first field by some small parameter d1, then map it using the second for time d2, then doing the first one by parameter -d1, and then the second by -d2. Typically you do not return to where you started from, but somewhere 'close'. In fact, your displacement is proportional to the product of d1 and d2. In other words, if your fields are v and w, then the Lie bracket [v,w] is defined by the equation v (-d2) $ v (-d1) $ w d2 $v d1 x==vw (d1*d2) x where vw represents [v,w]. But you might now see a problem. The way we have defined things d1*d2 is zero if d1 and d2 are infinitesimal. Kock discusses this issue and points out that in order to compute Lie brackets we need infinitesimals whose squares are zero, but whose product isn't. I could go back and hack the original definition of R to represent a new kind of algebraic structure where this is true, but in the spirit of Geometric Algebra for Free I have a trick. Consider d1=1⊗d and d2=d⊗1 in R⊗R. Clearly d12=d22=0. We also have d1d2=d⊗d≠0. And using the constructor as tensor product trick we can write:
Let's test it out. Define three infinitesimal rotations:
Note that there's no need to use trig functions here. We're working with infinitesimal rotations so we can use the fact that sin d=d and cos d=1 for infinitesimal d. It is well known that [rotX,rotY]=rotZ and cyclic permutations thereof. (In fact, these are just the commutation relations for angular momentum in quantum mechanics.)
We can write
and compare values of test1 and test2 at a variety of points to see if they are equal. For example test1 (1,2,3) == test2(1,2,3). I don't know about you, but I'm completely amazed by this. We can get our hands right on things like Lie brackets with the minimum of fuss. This has many applications. An obvious one is in robot dynamics simulations where we frequently need to work with the Lie algebra of the group of rigid motions. This approach won't give us new algorithms, but it gives a very elegant way to express existing algorithms. It's also, apprently, close to Sophus Lie's original description of the Lie bracket.
I have time for one last thing. So far the only manifolds I've considered are those of the form Rn. But this stuff works perfectly well for varieties. For example, the rotations rotX, rotY and rotZ all work fine on the unit sphere. In fact, if T is the unit sphere define
Notice how, for example, onT (rotY (d 1) (0.6,0.8,0))==True, so rotY really does give an infinitesimal flow from the unit sphere to the unit sphere, and hence a vector field on the sphere. Compare this to the complexity of the traditional definition of a vector field on a manifold.
Kock's book also describes a really beautiful definition of the differential forms in terms of infinitesimal simplices, but I've no time for that now. And I've also no time to mention the connections with intuitionistic logic and category theory making up much of his book (which is a good thing because I don't understand them).
PS The D⊗D trick, implemented in C++, is essentially what I did in my paper on photogrammetry. But there I wasn't thinking geometrically.
(I don't think I've done a very good job of writing this up. Partly I feel horribly constricted by having to paste back and forth between two text editors to get this stuff in a form suitable for blogger.com as well as periodically having to do some extra munging to make sure it is still legal literate Haskell. And diagrams - what a nightmare! So I hit the point where I couldn't face improving the text any more without climbing up the walls. What is the best solution to writing literate Haskell, with embedded equations, for a blog?)
Kock's work is based on the idea of infinitesimal numbers whose square is zero but which aren't themselves zero. Clearly we aren't talking about real numbers here, but instead an extension to the real numbers. Here's a toy implementation:
> data Dual a = D a a deriving (Show,Eq)
> instance Num a => Num (Dual a) where
> fromInteger i = D (fromInteger i) 0
> (D a a')+(D b b') = D (a+b) (a'+b')
> (D a a')-(D b b') = D (a-b) (a'-b')
> (D a a')*(D b b') = D (a*b) (a*b'+a'*b)
> e = D 0 1
They're called dual numbers because this particular algebra was apparently first described by Clifford and that's what he called them. The important point to note is that e^2==0. We can use the fact that
f(x+e)=f(x)+ef'(x)
to compute derivatives of functions. For example if we define
> f x = x^3+2*x^2-3*x+1
and evaluate f (1+e) we get D 1 4. We can directly read off f(1)=1 and f'(1)=4.
But, as Kock points out, the dual numbers don't provide 'enough' infinitesimals, we just get one, e, from which all of the others are built. So we can replace Dual with a new class:
> infixl 5 .-
> infixl 5 .+
> infixl 6 .*
> infixl 6 *.
> data R a = R a [a] deriving Show
> zipWith' f l r [] [] = []
> zipWith' f _ r a [] = map (flip f r) a
> zipWith' f l _ [] b = map (f l) b
> zipWith' f l r (a:as) (b:bs) = (f a b) : zipWith' f l r as bs
> (.+) :: Num a => [a] -> [a] -> [a]
> (.+) = zipWith' (+) 0 0
> (.-) :: Num a => [a] -> [a] -> [a]
> (.-) = zipWith' (-) 0 0
> (.*) :: Num a => a -> [a] -> [a]
> (.*) a = map (a*)
> (*.) :: Num a => [a] -> a -> [a]
> (*.) a b = map (*b) a
> (.=) :: (Eq a,Num a) => [a] -> [a] -> Bool
> (.=) a b = and $ zipWith' (==) 0 0 a b
> instance (Eq a,Num a) => Eq (R a) where
> R a a' == R b b' = a==b && a'.=b'
A slightly more extended version of the Num interface:
> instance Num a => Num (R a) where
> fromInteger a = R (fromInteger a) []
> (R a a') + (R b b') = R (a+b) (a'.+b')
> (R a a') - (R b b') = R (a-b) (a'.-b')
> (R a a') * (R b b') = R (a*b) (a.*b' .+ a'*.b)
> negate (R a a') = R (negate a) (map negate a')
And some more functions for good measure:
> instance (Num a,Fractional a) => Fractional (R a) where
> fromRational a = R (fromRational a) []
> (R a a') / (R b b') = let s = 1/b in
> R (a*s) ((a'*.b.-a.*b') *. (s*s))
> instance Floating a => Floating (R a) where
> exp (R a a') = let e = exp a in R e (e .* a')
> log (R a a') = R (log a) ((1/a) .* a')
> sin (R a a') = R (sin a) (cos a .* a')
> cos (R a a') = R (cos a) (-sin a .* a')
The d i provide a basis for these infinitesimals.
> d n = R 0 (replicate n 0 ++ [1])
We now have an infinite set of independent dual numbers, d i, one for each natural i. Note that we have (d i)*(d j)==0 for any i and j. Apart from this, they act just like before. For example
f(x+di)=f(x)+dif'(x).
We can use these compute partial derivatives. For example consider
> g x y = (x+2*y)^2/(x+y)
Computing g (7+d 0) (0+d 1) gives R 7.0 [1.0,3.0] so we know g(7,0) = 7, gx(7,0)=1 and gy(7,0)=3.
I'll use R as mathematical notation for the type given by R Float. (I hope the double use of 'R' won't be confusing, but I want to be consistent-ish with Kock.)
Before we do any more calculus, consider the following geometric figure:
A point is in the intersection of the circle and the line if it satisfies the equations:
> onS (x,y) = x^2+(y-1)^2==1
> onL (x,y) = y==0
Notice that in R this has many solutions. In fact, we find that onS (a*d 1,0) && onL (a*d 1,0) gives True for any Float value a. Does this mean anything? Actually, it fits some people's intuition of what the intersection should look like better than the usual idea that there is only one intersection point. In fact, Protagoras argued that there was a short line segment in common between the circle and the line and that because Pythogoras's theorem gave only one solution the whole of geometry was at fault! Using R instead of the reals makes sense of this intuition. The points of the form (a*d 1,0) do, indeed, form a line tangent to the circle. In fact, there is a way to construct tangent spaces and vector fields directly from this - but instead I want to now consider Kock's construction which is slightly different.
Let D be the subset of R consisting of numbers whose square is zero. These numbers are in some sense infinitesimal. In differential geometry a vector field is intuitively thought of as an 'infinitesimal flow' on a manifold. A finite flow on a manifold, M, is simply a function f:R×M→M such that f(x,0)=0. In f(x,t) we can think of t as time and each point x on the manifold flows along a path t→f(x,t). If f is differentiable, then as t becomes smaller, the path becomes closer and closer to a short line segment which we can draw as a vector. So the intuition is that a vector field is what you get in the limit as t becomes infinitesimal. The catch is that infinitesimal things aren't first class objects in differential geometry (so to speak). You can use infinitesimals for intuition, and I suspect almost all differential geoneters do, but actually talking about them is taboo. In fact, the definition of vector field in differential geometry is a bit of a kludge to work around this issue. When most people first meet the definition of a vector field as a differential operator it comes as quite a shock. But we have no such problem. In fact, Kock defines a vector field simply as a function
f:D×M→Msuch that f(0,x)=x.
In the language of SDG, a vector field is an infinitesimal flow. For example, consider
> transX d (x,y) = (x+d,y)
This simply translates along the x-axis infinitesimally. Of course this code defines a flow for non-infinitesimal values of d, but when considering this as a vector field we're only interested in when d is infinitesimal. transX defines a vector field that everywhere points along the x-axis.
For those familiar with the usual differential geometric definition of vector fields I can now make contact with that. In that context a vector field is a differential operator that acts on scalar functions on your manifold. In our case, we can think of it acting by infinitesimally dragging the manifold and then looking at what happens to the function that's dragged along. In other words, if we have a funtion, f, on the manifold, the vector field v, applied to f, is that function vf such that f (f d x)==f x + d*vf x. For example with
> h (x,y) = x^2+y^2
h (transX (d 1) (1,2)) == R 5 [2] because h(x,y)=5 and hx(x,y)=2.
I think I have enough time to briefly show how to define the Lie bracket in this context. If you have two vector fields, then intuitively you can view the Lie bracket as follows: take a point, map it using the first field by some small parameter d1, then map it using the second for time d2, then doing the first one by parameter -d1, and then the second by -d2. Typically you do not return to where you started from, but somewhere 'close'. In fact, your displacement is proportional to the product of d1 and d2. In other words, if your fields are v and w, then the Lie bracket [v,w] is defined by the equation v (-d2) $ v (-d1) $ w d2 $v d1 x==vw (d1*d2) x where vw represents [v,w]. But you might now see a problem. The way we have defined things d1*d2 is zero if d1 and d2 are infinitesimal. Kock discusses this issue and points out that in order to compute Lie brackets we need infinitesimals whose squares are zero, but whose product isn't. I could go back and hack the original definition of R to represent a new kind of algebraic structure where this is true, but in the spirit of Geometric Algebra for Free I have a trick. Consider d1=1⊗d and d2=d⊗1 in R⊗R. Clearly d12=d22=0. We also have d1d2=d⊗d≠0. And using the constructor as tensor product trick we can write:
> d1 = R (d 0) [] :: R (R Double)
> d2 = R 0 [1] :: R (R Double)
Let's test it out. Define three infinitesimal rotations:
> rotX d (x,y,z) = (x,y+d*z,z-d*y)
> rotY d (x,y,z) = (x-d*z,y,z+d*x)
> rotZ d (x,y,z) = (x+d*y,y-d*x,z)
Note that there's no need to use trig functions here. We're working with infinitesimal rotations so we can use the fact that sin d=d and cos d=1 for infinitesimal d. It is well known that [rotX,rotY]=rotZ and cyclic permutations thereof. (In fact, these are just the commutation relations for angular momentum in quantum mechanics.)
We can write
> commutator d1 d2 f g = (g (-d2) . f (-d1) . g d2 . f d1)
> test1 = commutator d1 d2 rotX rotY
> test2 = rotZ (d1*d2)
and compare values of test1 and test2 at a variety of points to see if they are equal. For example test1 (1,2,3) == test2(1,2,3). I don't know about you, but I'm completely amazed by this. We can get our hands right on things like Lie brackets with the minimum of fuss. This has many applications. An obvious one is in robot dynamics simulations where we frequently need to work with the Lie algebra of the group of rigid motions. This approach won't give us new algorithms, but it gives a very elegant way to express existing algorithms. It's also, apprently, close to Sophus Lie's original description of the Lie bracket.
I have time for one last thing. So far the only manifolds I've considered are those of the form Rn. But this stuff works perfectly well for varieties. For example, the rotations rotX, rotY and rotZ all work fine on the unit sphere. In fact, if T is the unit sphere define
> onT (x,y,z) = x^2+y^2+z^2==1
Notice how, for example, onT (rotY (d 1) (0.6,0.8,0))==True, so rotY really does give an infinitesimal flow from the unit sphere to the unit sphere, and hence a vector field on the sphere. Compare this to the complexity of the traditional definition of a vector field on a manifold.
Kock's book also describes a really beautiful definition of the differential forms in terms of infinitesimal simplices, but I've no time for that now. And I've also no time to mention the connections with intuitionistic logic and category theory making up much of his book (which is a good thing because I don't understand them).
PS The D⊗D trick, implemented in C++, is essentially what I did in my paper on photogrammetry. But there I wasn't thinking geometrically.
(I don't think I've done a very good job of writing this up. Partly I feel horribly constricted by having to paste back and forth between two text editors to get this stuff in a form suitable for blogger.com as well as periodically having to do some extra munging to make sure it is still legal literate Haskell. And diagrams - what a nightmare! So I hit the point where I couldn't face improving the text any more without climbing up the walls. What is the best solution to writing literate Haskell, with embedded equations, for a blog?)
Labels: haskell, mathematics
16 Comments:
What's the preferred way to do higher derivatives? I found that e2 = D e 1 works, but it also calculates the first derivative twice.
Ummmm, I seem to be running into problems quite early on. I defined the duals following your exposition closely (I added a signum and an abs because ghci was complaining about those), and then got
*DiffGeo> let f x = x^3+2*x^2-3*x+1
*DiffGeo> f (1+e)
D 1 4
wgucg seems to contradict your calculation of that very example.
For the second derivtive you need an element such that d^3=0 but d^2=0. Such an element can be found in R⊗R, as you've discovered. But, as you point out, you get the 1st derivative twice and it gets worse with higher derivatives. You can implement an appropriate algebra directly. One paper that does this is here. (That implements exactly the right thing, but doesn't give an algebraic description.) For arbitrary higher derivatives you're probably need power series code and there are many implementations of that in Haskell, at least for the single variable case.
Michi, the code's correct, the 'comments' weren't. Now fixed.
A reference to AD:
http://www.bcl.hamilton.ie/~bap/papers/popl2007-multi-forward-AD.pdf
augustss,
Thanks for the link. That's the only paper I've seen that formulates AD the way I do.
Judging by the date of that paper, I think that my approach to Lie algebras must be novel so maybe I should write a paper on it. (It's slightly different to what I wrote here because Kock's 'functional' definition of a vector field is pretty, but less useful for calculation.)
Michael Shulman has an introductory lecture on synthetic differential geometry that starts out with dual numbers and eventually describes a Lie bracket. I don't know enough about Lie groups to know how his formulation compares to yours.
I'd ask how SDG compares to geometric calculus, but I don't think I'd understand the answer
Soooo... your Lie bracket is "just" the commutator of d(x)1 and 1(x)d in the tensor product, right?
Is this just me being WAY to used to the algbraic point of view thinking that this is almost tautological?
Michi,
d(x)1 and 1(x)d commute so their commutator is zero. In fact, both of the algebraic structures I define are commutative. The Lie bracket comes from the non-commutativity of the vector field functions, not the underlying algebras.
I'd ask how SDG compares to geometric calculus...
Geometric calculus is what you get when you combine calculus with geometric algebra. SDG provides an alternative approach to calculus. So the two should happily coexist. In fact, the code here can be combined with this latest code without problem and I guess that some of the language of geometric calculus turns directly into usable code in a nice way.
...because Pythogoras's theorem gave only one solution...
Is this a spelling mistake, is *Pythagoras* spelled like this or is this just another guy?
Hello. Several comments.
Koch's book (nice link, btw--i have a first edition, but did not know there existed a second) defines higher order nilpotent (?) differentials (or these little dual number guys, whatever you want to call them).
And I WILL ask the question: What does this stuff have to do with "Geometric Algebra", Clifford Algebras, Grassmann Algebras, Supersymmetry, anything you can handle ... ?
Did i succeed in leaving a comment or not? I guess i'll have to wait and see.
I hope (?) i might get notified by email if i did succeed in leaving a comment ... ?
Actually you get higher derivative for free, you don't need explicitly to do the trick! Just define
> im (D _ a) = a
> derivative f x = im $ f (D x 1)
Then
derivative (\x -> (x-1)^3 ) 0 == 3
and you can immediately calculate higher derivatives!
(derivative . derivative) (\x -> (x-1)^3 ) 0 == -6
(derivative . derivative . derivative) (^5) 1 == 60
I'm not sure this is a correct definition of vf: "f (f d x)==f x + d*vf x" Especially because f is supposed to map into R.
I'm not sure this is a correct definition of vf: "f (f d x)==f x + d*vf x" Especially because f is supposed to map into R.
Post a Comment
<< Home