This article assumes knowledge of Conor McBride's work on the differentiation and dissection of types, even if the early parts look deceptively simple.
Tables of Divided Differences
Given a sequence of numbers like 2, 6, 12, 20, 30 we can use the well known method of finite differences to predict the next in the series. We write the numbers in a column. In the next column we write the differences between the numbers in the previous column and iterate like so:
2 | |||
4 | |||
6 | 2 | ||
6 | 0 | ||
12 | 2 | ||
8 | 0 | ||
20 | 2 | ||
10 | |||
30 |
On the assumption that the rightmost column is zero all the way down we can extend this table to:
2 | |||
4 | |||
6 | 2 | ||
6 | 0 | ||
12 | 2 | ||
8 | 0 | ||
20 | 2 | ||
10 | 0 | ||
30 | 2 | ||
12 | |||
42 |
We can think of this as an exercise in polynomial fitting. We have some x-values, in this case: x0=1, x2=2,..., x4=5, and some y-values: y0=2, y1=6,..., y4=30. We hope to fit a polynomial f so that f(xi)=yi. The table of finite differences has the property that the (n+1)-th column becomes constant if a degree n polynomial fits the data.
But what happens if we want to fit data to x-values that aren't equally spaced? Then we can use the *divided* difference table instead. In this case we don't record the differences between the y-values, but the quotient between the y-differences and the x-differences. For x0,1,2,3 = 1, 2, 4, 6 and y0,1,2,3 = 3, 6, 18, 138 we get:
1 | 3 | |||
(6-3)/(2-1)=3 | ||||
2 | 6 | (6-3)/(4-1)=1 | ||
(18-6)/(4-2)=6 | (1-1)/(6-1)=0 | |||
4 | 18 | (10-6)/(6-2)=1 | ||
(38-18)/(6-4)=10 | ||||
6 | 38 |
Again we reach zero. This is because yi = f(xi) = xi2+2 so we have a quadratic again.
Let's assume yi = f(xi) for some set of points. We'll leave x, y and f unknown and fill out the table:
x0 | f(x0) | |||
(f(x1)-f(x0))/(x1-x0)= f[x0,x1] | ||||
x1 | f(x1) | (f[x1,x2]-f[x0,x1])/(x2-x0)= f[x0,x1,x2] | ||
(f(x2)-f(x2))/(x2-x1)= f[x1,x2] | (f[x1,x2,x3]-f[x0,x1,x2])/(x3-x0)= f[x0,x1,x2,x3] | |||
x2 | f(x2) | (f[x2,x3]-f[x1,x2])/(x3-x1)= f[x1,x2,x3] | ||
(f(x3)-f(x2))/(x3-x2)= f[x2,x3] | ||||
x3 | f(x3) |
Note that for any f, this table also defines the generalised divided differences f[x,y], f[x,y,z] and so on. (Compare with the notation at Wikipedia).
Divided Differences of Types
Now suppose that f isn't a function of real numbers, but is a container type. Then as I noted here, the second column corresponds to Conor McBride's dissection types. f[x,y] is an f-container with a hole, with everything to the left of the hole containing an x, and everything to the right of the hole containing a y.
So what's the next column? Consider f(x)=x4. Then f[x,y] = (y4-x4)/(y-x) = x3+x2y+xy2+y3. This corresponds to the description I just gave involving a hole. Now consider f[x,y,z] = (f[y,z]-f[x,y])/(z-x) = z2+xz+x2+y2+yz+yx. This is the type consisting of a 4-tuple, with two holes, and with x left of the first hole, y between the first and second holes, and z to the right of the second hole. In other words, it's a trisection of a container. More generally, f[x0,...,xn] is the type corresponding to n holes and blocks of elements of xi between them. I've only shown this for the type x4 but it works for all the same types in Conor's paper. But there's a catch: I've used a definition involving subtraction, and that makes no sense for types. Don't worry, I'll address that later.
We can give trisections and so on a computational interpretation like in Conor's paper. Dissections correspond to tail recursion elimination. They make explicit the state of a function traversing and transforming a recursive type. (I recommend Conor's description in the paper so I won't reproduce it here.) Trisections correspond to a pair of coroutines. The first is transforming a recursive type. The second is transformed the result of the first routine. The second one can (optionally) start consuming pieces of tree as soon as the first has started producing them. At any moment in time we have 3 pieces: (1) the part of tree that is as yet untouched, (2) the part that the first coroutine has produced, but which the second hasn't seen yet, and (3) the output from the second coroutine. Make that explicit, and you get a trisection. The deal's much the same for quadrisections and so on.
Rediscovering Newton
A consequence of this is that we can now give a computational interpretation to much of the description of divided differences at Wikipedia. Among other things, when Conor derives a type theoretical analogue of the Taylor series, he's actually rediscovering a form of the Newton polynomial.
Type Matrices
One interesting property of divided differences is that they have a description in terms of matrices. In particular, Wikipedia describes a homomorphism from the ring of functions of x to the ring of matrices whose elements are functions of n unknowns. We know that types don't form a ring, but they do form a semiring. So we can form an analogous semiring homomorphism from the set of container types to the set of matrices whose entries are *types*. Matrices of types aren't something we see every day. How can we put this notion to work?
Let's use the notation from Wikipedia and consider the matrices we get when n = 1. (At this point I recommend reading all of the Wikipedia article on divided differences.) Let's define ourselves a tree container:
> data F a = Leaf a | Form (F a) (F a)We'll call the homomorphism T so that
T(f)(x0,x1)= | f(x0) | f[x0,x1] |
0 | f(x1) |
Our tree type satisfies the equation f(x)=x+f(x)2. As we have a homomorphism, we also expect this to hold:
T(f)(x0,x1) = T(i)(x0,x1)+T(f)(x0,x1)2 (Equation 1)where i is the identity function i(x) = x.
We can easily compute T(i)(x0,x1) directly using (y-x)/(y-x)=1. We get
T(i)(x0,x1)= | x0 | 1 |
0 | x1 |
Multiplying out the matrices in equation 1 now gives us a bunch of equations:
f(x0) = x0 + f(x0)2The first two equations are just the definition of our tree. The third line, in Haskell, is:
f(x1) = x1 + f(x1)2
f[x0,x1] = 1 + f(x0)f[x0,x1]+f[x0,x1]f(x1)
> data F' x0 x1 = Empty | ForkL (F x0) (F' x0 x1) | ForkR (F' x0 x1) (F x1)This is essentially the application of Conor's version of the Leibniz law to trees. F' is the dissected tree. And note how we don't have any subtraction. By using the matrix formulation of divided differences we only have matrix multiplications and sums to deal with.
The image of f(x) = x+f(x)2 under the homomorphism T yields the simultaneous definitions of f and its dissection. More generally, if we had chosen to work with larger n we'd get the simultaneous definition of trees, their dissections, their trisections and so on. And it'll work for any recursive container type.
So matrices of types are a meaningful concept. They give a way to organise the mutually recursive definitions of higher order divided differences. If you look at this hard enough you may also see that the matrix I defined above is playing the same role as the D type in my previous article. With a little template Haskell I think we could in fact implement automatic (higher order) divided differences at the type level.
But this is all just scratching the surface. The matrix and homomorphism defined above don't just apply to divided differences. Matrices of types have another deeper and surprising interpretation that will allow me to unify just about everything I've ever said on automatic differentiation, divided differences, and derivatives of types as well as solve a wide class of problems relating to building data types with certain constraints on them. I'll leave that for my next article.
Exercise
In preparation for the next installment, here's a problem to think about: consider the tree type above. We can easily build trees whose elements are of type A or of type B. We just need f(A+B). We can scan this tree from left to right building a list of elements of type A+B, ie. whose types are each either A or B. How can we redefine the tree so that the compiler enforces the constraint that at no point in the list, the types of four elements in a row spell the word BABA? Start with a simpler problem, like enforcing the constraint that AA never appears.
Apology
Sorry about the layout above. My literate Haskell->HTML program doesn't support tables yet so there was a lot of manual HTML. This meant that I didn't write stuff out as fully as I could have and it may be a bit sketchy in places. I may have to switch to PDF for my next post. (Or use Wordpress, but I had trouble there too. or I could use one of the Javascript TeX renderers, but I don't like the external dependency.)
I can't wait to see the next installment.
ReplyDeleteJesus, I never imagined that numerical analysis course I had to take would have applications to Haskell type theory, of all things...
ReplyDelete-illissius
illissius,
ReplyDeleteThere are quite a few places in discrete computational mathematics where connections to numerical methods pop up. For example, all kinds of numerical stuff appears in the theory of automata. But it's in disguise. Jacobians and derivatives, Taylor-like series, approximation with linear terms and so on.
I've attempted to change Connor's code so repeated dissection is possible: http://gist.github.com/514469
ReplyDeleteIt compiles fine, but I'm not sure it is correct.
Mind-blowing stuff! Here's my attempt at the exercise, although I suspect you had something even cooler in mind. I look forward to the next installment!
ReplyDelete