## Saturday, August 14, 2010

### Constraining Types with Regular Expressions

Structures with Constraints
Here's a picture of an element of a binary tree type:

The leaves correspond to elements and the letters indicate the type of those elements. If we read the leaves from left to right they match the regular expression A*B*. We can define a binary type whose leaves always match this string by making use of dissections. Similarly we can construct all kinds of other structures, such as lists and more general trees, whose leaves match this expression.

But can we make structures that match other regular expressions? Here are some examples:

1. Structures that match the expression An for some fixed n. These are structures, like trees, that have a fixed size. For the case n=5 we need a type that includes elements like:

2. Structures that match (AA)*. These are structures like trees with an even number of elements. This is easy for lists. We can just use a list of pairs. But it's more complex for trees because subtrees may have odd or even size even though the overall structure is even in size. We could also generalise to ((AA)n)* for some fixed n.

3. Structures that match A*1A*. Here's an example:

The element marked 1 is of unit type (). It's a 'hole'. So this regular expression is a derivative type. We can also construct a variety of types like those matching A*BA*CB* by using a mixture of dissection and differentiation:

4. A bytestring- or rope-like type used to represent a string that is statically guaranteed to match a given regular expression.

5. Many kinds of other constraints you could imagine on a datastructure. Like trees which are guaranteed not to have two neighbouring leaves of the same type, or whose sequence of leaves never contain a certain subsequence.

So the challenge is this: can we implement a uniform way of building any container type to match any regular expression we want?

We'll need some extensions:

```> {-# OPTIONS_GHC -fwarn-incomplete-patterns #-}
> {-# LANGUAGE TypeFamilies, EmptyDataDecls, UndecidableInstances, TypeOperators #-}

```
First Example: Differentiating Lists
Let l(x) be the type of lists of elements of type x. We can define a list type through

```data List x = Nil | Cons x (List x)
```

Algebraically we can write this as
l(x) = 1+x l(x)
Let's try approaching the derivative of a list from the perspective of regular expressions. We know from Kleene's Theorem that the set of strings (ie. the language) matching a regular expression is precisely the language accepted by a finite state automaton. Let's consider the case x*1x*. This is the language we get by considering all paths from state 0 to state 1 in the following automaton:

Note that I have overloaded the numeral '1' to mean both the type with one element (when labelling an edge) and to mean the state numbered 1 (when labelling a vertex).

Define the Lij(x) to be the type of lists whose sequence of elements correspond to all possible paths from state i to state j. A list in Lij(x) is either an empty list, or constructed from a first element and a list. If we combine an element and a list, the combination has to match a possible path from i to j. There are a number of ways we could do this. The element could correspond to a transition from i to k. But if this is the case, then the remainder of the list must correspond to a path from k to j. So we must replace Cons with something that for its first argument takes a type corresponding to a single automaton step from i to k. For its second argument it must take an element of Lkj(x). The set of paths from i to j that correspond to a single step is the (i,j)-th element of the transition matrix for the automaton. This is Xij(x) where

 X(x) = (Xij(x)) = (x 1) (0 x)

On the other hand, we also need to replace Nil with a version that respects transitions too. As Nil takes no arguments, it must correspond to paths of length zero in the automaton. The only such paths are zero length paths from a state to itself. So the matrix for such paths is:

 I = (1 0) (0 1)

Let's also define the matrix L(x)= (Lij(x)).

The words above boil down to:
Lij(x) = Iijk Xik(x)Lkj(x)
where the sum is over all the places we might visit on the first step of the journey from i to j.

We can rewrite this using standard matrix notation:
L(x) = I + X(x)L(x)
Compare with the definition of ordinary lists given above. We get the type of constrained lists by taking the original definition of a list and replacing everything with matrices. We replace 1 with I. We replace x with the transition matrix of the automaton. And we replace the structure we're trying to define with a family of types - one for each pair of start and end states for the automaton. We can describe this replacement more formally: it's a homomorphism from the set of types to the set of matrices of types. (Actually, it's a bit more subtle than that. This isn't quite the usual semiring of types. For one thing, the order of multiplication matters.) And it doesn't just apply to lists. We can apply this rule to any container type. For example, suppose we wish to repeat the above for trees. Then we know that for ordinary binary trees, t(x), we have
t(x) = x+t(x)2
We replace this with
T(x) = X(x)+T(x)2

We started this section by considering the specific case of the pattern x*1x* with a corresponding matrix X(x). Because X10=0 and X00=X11 it's not hard to see that any type T we constrain using this regular expression will also have similar 'shape', ie. T10=0 and T00=T11. So we can write

 T = (t(x) t'(x)) (0 t(x))

where by definition, t(x)=T00(x) and t'(x)=T01(x). Suppose we have two such collections of types, (Sij) and (Tij). Now consider the types of pairs where the first element is of type Sij and the second of Tjk. Then the leaves of the pair structure correspond to a path from i to k. So we have

 (st(x) (st)'(x)) = (s(x) s'(x)) (t(x) t'(x)) (0 st(x)) (0 s(x)) (0 t(x))

Multiply out and we find that
(st)'(x) = s(x)t'(x)+s'(x)t(x)
In other words, the usual Leibniz rule for differentiation is nothing more than a statement about transitions for the automaton I drew above. To get a transition 0→1 you either go 0→0→1 or 0→1→1.

Although I talked specifically about differentiation, much of what I said above applies for any finite state automaton whose edges are labelled by types. The best thing now is probably to put together some code to see how this all looks.

A Specification
If you haven't checked Brent Yorgey's solution to my problem last week, now is a good opportunity. My code is a generalisation of that but it may be helpful to look at Brent's specialisation first.

The goal is to be able to define a transition matrix like this. ('K' is an abbreviation for 'constant' matrix.)

```> type D a = K22 (a,    ())
>                (Void, a )

```
And then define a functor like this:

```> type ListF = I :+: (X :*: Y)

```
Think of ListF being a bifunctor taking arguments X and Y.

We'd like then to able to form the matrix of fixed points of Y = ListF X Y. In this case, ordinary lists should appear as the element at position (0,0) in the matrix:

```> type List x = Fix (I0, I0) (D x) ListF

```
I'm using In to represent the integer n at the type level.

Derivatives of lists should appear at (0,1) so we want

```> type List' x = Fix (I0, I1) (D x) ListF

```
But Fix is intended to be completely generic. So it needs to be defined in a way that also works for trees:

```> type TreeF = X :+: (Y :*: Y)
> type Tree  = Fix (I0, I0) (D Int) TreeF
> type Tree' = Fix (I0, I1) (D Int) TreeF

```
And of course it needs to work with other transition matrices. For example x*1y*1z* has the following transition diagram and matrix:

```> type E x y z = K33 (x,    (),   Void)
>                    (Void, y,    ()  )
>                    (Void, Void, z   )

```
So we'd expect

```> type DDTree x y z = Fix (I0, I1) (E x y z) TreeF

```
to define the second divided difference of trees.

The Implementation

We'll need a type with no elements:

```> data Void

```
And some type level integers:

```> data Zero
> data S a
> type I0 = Zero
> type I1 = S I0
> type I2 = S I1
> type I3 = S I2

```
Now we'll need some type-level matrices. For any square matrix, we need a type-level function to give its dimension and another to access its (i, j)-th element:

```> type family Dim x :: *
> type family (:!) x ij :: *

```
(Thanks to Brent's tutorial that code is much better than how it used to look.)

We can now define matrix addition through pointwise addition of the elements:

```> data (:+) m n
> type instance Dim (m :+ n) = Dim m
> type instance (m :+ n) :! ij = Either (m :! ij) (n :! ij)

```
And similarly we can define multiplication. I'm using the type-level function Product' to perform the loop required in the definition of matrix multiplication:

```> data (:*) m n
> type instance Dim (m :* n) = Dim m
> type instance (m :* n) :! ij = Product' I0 (Dim m) m n :! ij

> data Product' i k m n
> type instance Product' p I1 m n :! (i, j) = (m :! (i, p), n :! (p, j))
> type instance Product' p (S (S c)) m n :! (i, j) = Either
>    (m :! (i, p), n :! (p, j))
>    (Product' (S p) (S c) m n :! (i, j))
>

```
(Weird seeing all that familiar matrix multiplication code at the type level.)

Now we need some types to represent our functors:

```> data I
> data X
> data Y
> data K n
> data (f :+: g)
> data (f :*: g)
> data F m f

```
(I think phantom empty data types should be called ethereal types.)

To turn these into usable types we need to implement the homomorphism I described above. So here are the rules laid out formally:

```> type Id = K22 ((),   Void)
>               (Void, ())

> type family   Hom self m f :: *
> type instance Hom self m I = Id
> type instance Hom self m X = m
> type instance Hom self m Y = self
> type instance Hom self m (K n) = n
> type instance Hom self m (f :+: g) = Hom self m f :+ Hom self m g
> type instance Hom self m (f :*: g) = Hom self m f :* Hom self m g
> type instance Dim (F m f) = Dim m

> data Fix ij m f = Fix (Hom (F m f) m f :! ij)
> type instance (:!) (F m f) ij = Fix ij m f

```
That's more or less it. We can now go ahead and try to construct some elements. We could (as Brent suggests) write some smart constructors to make our life easier. But for now I'm writing everything explicitly so you can see what's going on:

```> x0 = Fix (Left ())                         :: List Int
> x1 = Fix (Right (Left (1, Fix (Left ())))) :: List Int

```
x0 is the empty list. x1 is the list [1]. The Left and Right get a bit tedious to write. But this is intended as a proof that the concept works rather than a user-friendly API.

We can explicitly implement the isomorphism with the more familiar list type:

```> iso1 :: [x] -> List x
> iso1 []     = Fix (Left ())
> iso1 (a:as) = Fix (Right (Left (a, iso1 as)))

> iso1' :: List x -> [x]
> iso1' (Fix (Left ()))              = []
> iso1' (Fix (Right (Left (a, as)))) = a : iso1' as
> iso1' (Fix (Right (Right a)))      = error "Can't be called as a is void"

```
So that's it! If we can write our container as the fixed point of a polynmomial functor, and if we can convert our regular expression to a finite state automaton, then Fix completely automatically builds the constrained container type.

What have we learnt?

We haven't just solved the original problem. We've shown that derivatives and dissections are special cases of a more general operation. Take a look at the definition of D x again. we can think of it as xI+Delta where Delta is the matrix

```> type Delta = K22 (Void, ()  )
>                  (Void, Void)

```
This matrix has the property that its square is zero. It's the 'infinitesimal type' I described here. In other words, this is type-level automatic differentiation. We've also been doing type-level automatic divided differencing.

We can now go back and look at the matrix form of divided differences on wikipedia. I hope you can now see that the matrix Tid(x0,...,xn-1) defined there is nothing other than a transition matrix for this automaton:

In fact, we can use what we've learnt about regular expressions here to solve some numerical problems. But I won't write about that until the next article.

By the way, I think what I've described here can be viewed as an application of what Backhouse talks about in these slides.

I think that for any automaton we have a 2-category. The 0-cells are states, the 1-cells are the types associated with paths from one state to another, and the 2-cells are functions between types that respect the constraint. I haven't worked out the details however. The 2-category structure is probably important. As things stand, I've just shown how to make the types. But we don't yet have an easy way to write functions that respect these constraints. I suspect 2-categories give a language to talk about these things. But that's just speculation right now.

By the way, I couldn't write a working Show instance for Fix. Can you write one? And an implementation of arbitrary for QuickCheck?

And I hope you can now solve my problem from last week.

Leftover bits of code

K22 and K33 are constructors for 2×2 and 3×3 matrices. It would probably have been better to have used lists like Brent did.

```> data K22 row0 row1
> type instance Dim (K22 row0 row1) = I2
>
> type instance (:!) (K22 (m00, m01) row1) (I0, I0) = m00
> type instance (:!) (K22 (m00, m01) row1) (I0, I1) = m01
> type instance (:!) (K22 row0 (m10, m11)) (I1, I0) = m10
> type instance (:!) (K22 row0 (m10, m11)) (I1, I1) = m11
>
> data K33 row0 row1 row2
> type instance Dim (K33 row0 row1 row2) = I3
>
> type instance (:!) (K33 (m00, m01, m02) row1 row2) (I0, I0) = m00
> type instance (:!) (K33 (m00, m01, m02) row1 row2) (I0, I1) = m01
> type instance (:!) (K33 (m00, m01, m02) row1 row2) (I0, I2) = m02
> type instance (:!) (K33 row0 (m10, m11, m12) row2) (I1, I0) = m10
> type instance (:!) (K33 row0 (m10, m11, m12) row2) (I1, I1) = m11
> type instance (:!) (K33 row0 (m10, m11, m12) row2) (I1, I2) = m12
> type instance (:!) (K33 row0 row1 (m20, m21, m22)) (I2, I0) = m20
> type instance (:!) (K33 row0 row1 (m20, m21, m22)) (I2, I1) = m21
> type instance (:!) (K33 row0 row1 (m20, m21, m22)) (I2, I2) = m22
```
Update: I neglected to mention that there is a bit of subtlety with the issue of being able to create the same string by different walks through the automaton. I'll leave that as an exercise :-)

## Saturday, August 07, 2010

### Divided Differences and the Tomography of Types

Health Warning
This article assumes knowledge of Conor McBride's work on the differentiation and dissection of types, even if the early parts look deceptively simple.

Tables of Divided Differences
Given a sequence of numbers like 2, 6, 12, 20, 30 we can use the well known method of finite differences to predict the next in the series. We write the numbers in a column. In the next column we write the differences between the numbers in the previous column and iterate like so:

 2 4 6 2 6 0 12 2 8 0 20 2 10 30

On the assumption that the rightmost column is zero all the way down we can extend this table to:

 2 4 6 2 6 0 12 2 8 0 20 2 10 0 30 2 12 42

We can think of this as an exercise in polynomial fitting. We have some x-values, in this case: x0=1, x2=2,..., x4=5, and some y-values: y0=2, y1=6,..., y4=30. We hope to fit a polynomial f so that f(xi)=yi. The table of finite differences has the property that the (n+1)-th column becomes constant if a degree n polynomial fits the data.

But what happens if we want to fit data to x-values that aren't equally spaced? Then we can use the *divided* difference table instead. In this case we don't record the differences between the y-values, but the quotient between the y-differences and the x-differences. For x0,1,2,3 = 1, 2, 4, 6 and y0,1,2,3 = 3, 6, 18, 138 we get:

 1 3 (6-3)/(2-1)=3 2 6 (6-3)/(4-1)=1 (18-6)/(4-2)=6 (1-1)/(6-1)=0 4 18 (10-6)/(6-2)=1 (38-18)/(6-4)=10 6 38

Again we reach zero. This is because yi = f(xi) = xi2+2 so we have a quadratic again.

Let's assume yi = f(xi) for some set of points. We'll leave x, y and f unknown and fill out the table:

 x0 f(x0) (f(x1)-f(x0))/(x1-x0)=f[x0,x1] x1 f(x1) (f[x1,x2]-f[x0,x1])/(x2-x0)=f[x0,x1,x2] (f(x2)-f(x2))/(x2-x1)=f[x1,x2] (f[x1,x2,x3]-f[x0,x1,x2])/(x3-x0)=f[x0,x1,x2,x3] x2 f(x2) (f[x2,x3]-f[x1,x2])/(x3-x1)=f[x1,x2,x3] (f(x3)-f(x2))/(x3-x2)=f[x2,x3] x3 f(x3)

Note that for any f, this table also defines the generalised divided differences f[x,y], f[x,y,z] and so on. (Compare with the notation at Wikipedia).

Divided Differences of Types
Now suppose that f isn't a function of real numbers, but is a container type. Then as I noted here, the second column corresponds to Conor McBride's dissection types. f[x,y] is an f-container with a hole, with everything to the left of the hole containing an x, and everything to the right of the hole containing a y.

So what's the next column? Consider f(x)=x4. Then f[x,y] = (y4-x4)/(y-x) = x3+x2y+xy2+y3. This corresponds to the description I just gave involving a hole. Now consider f[x,y,z] = (f[y,z]-f[x,y])/(z-x) = z2+xz+x2+y2+yz+yx. This is the type consisting of a 4-tuple, with two holes, and with x left of the first hole, y between the first and second holes, and z to the right of the second hole. In other words, it's a trisection of a container. More generally, f[x0,...,xn] is the type corresponding to n holes and blocks of elements of xi between them. I've only shown this for the type x4 but it works for all the same types in Conor's paper. But there's a catch: I've used a definition involving subtraction, and that makes no sense for types. Don't worry, I'll address that later.

We can give trisections and so on a computational interpretation like in Conor's paper. Dissections correspond to tail recursion elimination. They make explicit the state of a function traversing and transforming a recursive type. (I recommend Conor's description in the paper so I won't reproduce it here.) Trisections correspond to a pair of coroutines. The first is transforming a recursive type. The second is transformed the result of the first routine. The second one can (optionally) start consuming pieces of tree as soon as the first has started producing them. At any moment in time we have 3 pieces: (1) the part of tree that is as yet untouched, (2) the part that the first coroutine has produced, but which the second hasn't seen yet, and (3) the output from the second coroutine. Make that explicit, and you get a trisection. The deal's much the same for quadrisections and so on.

Rediscovering Newton
A consequence of this is that we can now give a computational interpretation to much of the description of divided differences at Wikipedia. Among other things, when Conor derives a type theoretical analogue of the Taylor series, he's actually rediscovering a form of the Newton polynomial.

Type Matrices
One interesting property of divided differences is that they have a description in terms of matrices. In particular, Wikipedia describes a homomorphism from the ring of functions of x to the ring of matrices whose elements are functions of n unknowns. We know that types don't form a ring, but they do form a semiring. So we can form an analogous semiring homomorphism from the set of container types to the set of matrices whose entries are *types*. Matrices of types aren't something we see every day. How can we put this notion to work?

Let's use the notation from Wikipedia and consider the matrices we get when n = 1. (At this point I recommend reading all of the Wikipedia article on divided differences.) Let's define ourselves a tree container:

```> data F a = Leaf a | Form (F a) (F a)

```
We'll call the homomorphism T so that

 T(f)(x0,x1)= f(x0) f[x0,x1] 0 f(x1)

Our tree type satisfies the equation f(x)=x+f(x)2. As we have a homomorphism, we also expect this to hold:
T(f)(x0,x1) = T(i)(x0,x1)+T(f)(x0,x1)2 (Equation 1)
where i is the identity function i(x) = x.

We can easily compute T(i)(x0,x1) directly using (y-x)/(y-x)=1. We get

 T(i)(x0,x1)= x0 1 0 x1

Multiplying out the matrices in equation 1 now gives us a bunch of equations:
f(x0) = x0 + f(x0)2
f(x1) = x1 + f(x1)2
f[x0,x1] = 1 + f(x0)f[x0,x1]+f[x0,x1]f(x1)
The first two equations are just the definition of our tree. The third line, in Haskell, is:

```> data F' x0 x1 = Empty | ForkL (F x0) (F' x0 x1) | ForkR (F' x0 x1) (F x1)

```
This is essentially the application of Conor's version of the Leibniz law to trees. F' is the dissected tree. And note how we don't have any subtraction. By using the matrix formulation of divided differences we only have matrix multiplications and sums to deal with.

The image of f(x) = x+f(x)2 under the homomorphism T yields the simultaneous definitions of f and its dissection. More generally, if we had chosen to work with larger n we'd get the simultaneous definition of trees, their dissections, their trisections and so on. And it'll work for any recursive container type.

So matrices of types are a meaningful concept. They give a way to organise the mutually recursive definitions of higher order divided differences. If you look at this hard enough you may also see that the matrix I defined above is playing the same role as the D type in my previous article. With a little template Haskell I think we could in fact implement automatic (higher order) divided differences at the type level.

But this is all just scratching the surface. The matrix and homomorphism defined above don't just apply to divided differences. Matrices of types have another deeper and surprising interpretation that will allow me to unify just about everything I've ever said on automatic differentiation, divided differences, and derivatives of types as well as solve a wide class of problems relating to building data types with certain constraints on them. I'll leave that for my next article.

Exercise
In preparation for the next installment, here's a problem to think about: consider the tree type above. We can easily build trees whose elements are of type A or of type B. We just need f(A+B). We can scan this tree from left to right building a list of elements of type A+B, ie. whose types are each either A or B. How can we redefine the tree so that the compiler enforces the constraint that at no point in the list, the types of four elements in a row spell the word BABA? Start with a simpler problem, like enforcing the constraint that AA never appears.

Apology
Sorry about the layout above. My literate Haskell->HTML program doesn't support tables yet so there was a lot of manual HTML. This meant that I didn't write stuff out as fully as I could have and it may be a bit sketchy in places. I may have to switch to PDF for my next post. (Or use Wordpress, but I had trouble there too. or I could use one of the Javascript TeX renderers, but I don't like the external dependency.)