A Neighborhood of Infinity: June 2008

Saturday, June 28, 2008

A blessed man's formula for holey containers

I love the way derivatives of types tell you about holes in containers. It works the other way too and holes can give insight into derivatives.

Suppose S and T are containers so that S(X) and T(X) are containers of elements of type X. Then S(T(X)) is an S-container of T-containers of X's. If we draw S as a square (rectangle actually) and T as a triangle then we can draw a picture of an example of such a thing:

Taking the nth derivative of a type gives the type with (an ordered sequence of) n holes. Here's that previous container with 3 holes made in it:

As the holes have an ordering I've numbered them from 1 to 3.

An S-container of T-containers with holes is essentially an S-container containing both ordinary T-containers, and some T-containers with holes. If we excise the T-containers with holes, we're left with an S-container containing just T-containers, and some holes. Here's a picture of that:

But there are a couple of problems with that. Once we've excised the holey sub T-containers we don't know which S-holes to plug them back into and we don't know how to reconstruct the original numbering of the holes. We need to keep a tiny bit more information. That's the set I wrote down: {{1,3},{2}}. Call each element of this set a block. I've written the elements of the blocks in ascending order and I've written the blocks in ascending order of their lowest elements. The first block corresponds to hole 1 in the S-container and the second block corresponds to hole 2. Similarly, we write the elements of the blocks into the holes in the T-containers. And that allows us to reconstruct the original ST-container.

Think about the general case. We make n holes in an ST-container. So up to n of the T-containers acquire holes. Let's say it's m holes. We can think of the S-container having m holes and these holes being filled by T-containers with b₁, b₂,...,b_m holes where the sum of the b_i is n. We're essentially just partitioning the original n holes {1,2,...,n} into m sets. So each ST-container with n holes gives a partition of {1,...,n}, an S-container with n holes, and m T-containers where the ith has b_i holes. Writing containers with n holes as nth derivatives we get

dⁿ(F(G(X))/dXⁿ = sum over each partition P of {1,...,n} of d^|P|F/dG^|P| times the product over each block B in P of d^|B|F/dX^|B|

Note that the equality above isn't just a numerical equality, it's an isomorphism of types. In fact, it's the type version of the 'combinatorial form' of the Faà di Bruno formula. Although that wikipedia page describes the formula as 'forbidding', if I've done my job right then I think the above picture make it seem almost trivial. I find it much easier to think of this version of the chain rule in terms of holes.

And the explanation for the title of this post: in 1988 Pope John Paul II beatified Faà di Bruno.

Saturday, June 14, 2008

Categories of polynomials and comonadic plumbing

Suppose you have a Haskell program and you want to introduce a new global constant into your program. There are at least two approaches you could take:

Simply introduce a new global constant. You could name it x and write something like x=1.23456 and refer to x throughout your code. This has the advantage of being easy to implement.
Write all of your code in monadic style and make use of the reader monad. This is intrusive in the sense that you may have to make many changes to your code to support it. But it has the advantage that all of your functions now explicitly become functions of your global constant.

Now I’m going to roughly sketch a more categorical view of both of these approaches. So let’s restrict ourselves to the subset of Haskell that corresponds to typed lambda calculus without general recursion so that we know all of our functions will be total and correspond to the mathematical notion of a function. Then all of our functions become arrows in the category that we’ll call Hask.

Firstly consider approach (1). Suppose we want to introduce a new constant, x, of type A. Category theory talks about arrows rather than elements of objects, so instead of introducing x of type A, introduce the function x:1->A where 1 is the terminal object in Hask, normally called (). An element of A is the same thing as an element of 1->A, but in the latter case we have an arrow in the category Hask.

Before continuing, let me digress to talk about polynomials. Suppose we have a ring (with an identity) R. We define R[x], where x is an indeterminate, to be the ring of polynomials in x. Another way to describe that is to say that R[x] is the smallest ring containing R and an indeterminate x, that makes no assumptions about x other than those required to make R[x] a ring. For example we know that (1+x)(1-x)=1-x², because that must hold in any ring. Given a polynomial p in R[x] we can think of it as a function f_p from R to R. f_p(a) is the value we get when substituting the value of a for x in p. So a polynomial in R[x] is the same as a function from R to R that can be written in terms of elements of R, multiplication and addition.

We can do the same with category theory. Given a category A we can ask for the smallest category extending A and containing an indeterminate arrow x:1 -> A. Just as with polynomials we have to allow all possible arrows that can be made by composing arrows of A with x. The resulting expressions for arrows will contain x as a free variable, just like the way x appears in polynomials. In fact, by analogy we can call the resulting category, A[x], the category of polynomials in x:1->A. In the special case A=Hask, you can see that Hask[x] is the category of Haskell functions extended by a new constant of type x:1->A but assuming no equations other than those necessary to make Hask[x] a category. Just as an arrow in Hask is a Haskell function, an arrow in Hask[x] is a Haskell function making use of an as yet undefined constant x.

(I've glossed over some subtleties. Just as we need a suitable equivalence relation to ensure that (1+x)(1-x)=1-x² in R[x], we need suitable equivalence relations in our category. I'll be showing you where to find the missing details later.)

Here's the implementation of a function, h, making use of a constant x:

(Note that I'll be using Edward Kmett's category-extras shortly so I need some imports)



> import Control.Monad.Reader
> import Control.Comonad
> import Control.Comonad.Reader

> x = 1.23456

> f a = 2*a+x
> g a = x*a
> h a = f (g a)

> test1 = h 2

Now consider the second approach. The easiest thing is to just give an implementation of the above using the reader monad:



> f' a = do
>    x <- ask
>    return $ 2*a+x

> g' a = do
>    x <- ask
>    return $ x*a

> h' a = return a >>= g' >>= f'

> test2 = runReader (h' 2) 1.23456

Note how, as is typical in monadic code, I have to plumb f' and g' together using >>= so that 1.23456 is passed through f' and g'. Previously I've described another way to think about the composition of monadic functions. Using >>= we can compose functions of type a->m b and b->m c to make a function of type a->m c. The result is that given a monad we can form the Kleisli category of the monad. The objects are the same as in Hask, but an arrow from a->b in the Kleisli category is an arrow of type a->m b in Hask. It's not hard to show this satisfies all of the axioms of a category. When we program in the reader monad it's a bit like we've stopped using Hask and switched to the Kleisli category of the reader monad. It's not quite like that because we used functions like +. But in theory we could use lifted versions of those functions too, and then we'd be programming by composing things in the Kleisli category. If we call the reader monad R then we can call the corresponding Kleisli category Hask^R. (Strictly speaking that R needs a subscript telling is the type of the value we intend to ask for.)

So here's the important point: Hask[x] is the same category as Hask^R. In both cases the arrows are things, which when supplied a value of the right type (like 1.23456), give arrows in Hask from their head object to their tail object.

But there's another way to do this. We can use the reader comonad:



> f'' a = 2*extract a+askC a
> g'' a = extract a*askC a
> h'' a = a =>> g'' =>> f''

> test3 = runCoreader (h'' (Coreader 1.23456 2))

In a similar way, we're dealing with arrows of the form wa -> b and we can compose them using =>>. These arrows form the coKleisli category of the reader comonad, S, which we can write Hask_S. So we must have

Hask[x]≅Hask^R≅Hask_S.

Now some back story. Over 20 years ago I was intrigued by the idea that logic might form a category with logical ‘and’ and ‘or’ forming a product and coproduct. I came across the book Introduction to Higher Order Categorical Logic by Lambek and Scott for ₤30.00. That’s ₤60.00 at today's prices, or about $120.00. On a student grant? What was I thinking? And as it bore no relation to anything I was studying at the time, I barely understood a word of it. I was probably fairly applied at that point doing courses in stuff like solid state physics and electromagnetism as well as a bit of topology and algebra. I doubt I'd heard of lambda calculus though I could program in BASIC and APL. So there it sat on my bookshelf for 22 years. Periodically I’d look at it, realise that I still didn’t understand enough of the prerequisites, and put it back on the shelf. And then a month or so ago I picked it up again and realised that the first third or so of it could be interpreted as being about almost trivial Haskell programs. For example, on page 62 was

Proposition 7.1
The category A[x] of all polynomials in the indeterminate x:1->A over the cartesian or cartesian closed category A is isomorphic to the Kleisli category A_A=A_{S_A} of the cotriple (S_A,&epsilon_A,δ_A).

The language is a little different. Lambek and Scott used the term cotriple instead of comonad and Kleisli category where I’d say coKleisli category. δ and ε are cojoin and coreturn. And Lambek and Scott's theorem applies to any cartesian closed category. But after staring at this claim for a while it dawned on me that all it was really saying was this: here are two ways to introduce new constants into a category. But there’s no way I would have seen that without having practical experience of programming with monads. Learning Haskell has finally paid off. It’s given me enough intuition about category theory for me to get some return on my ₤30.00 investment paid to Heffers all those years ago. I expected to take this book to my deathbed, never having read it.

Anyway, for the details I left out above, especially the correct equivalence relation on Hask[x], you'll just have to read the book yourself.

Also, note the similarity to the deduction theorem. This theorem says that if we can prove B, assuming A, then we can deduce A implies B without making any assumptions. It unifies two way to introduce a proposition A, either as a hypothesis, or as an antecedent in an implication. In fact, the above theorem is just a categorical version of the deduction theorem.

Also note the connection with writing pointfree code. In fact, the pointfree lambdabot plugin makes use good use of the reader monad to eliminate named parameters from functions.

I’m amazed by seeing a book from 1986 that describes how to use a comonad to plumb a value through some code. As far as I know, this predates the explicit use of the reader monad in a program, Wadler and Moggi’s papers on monads, and certainly Haskell. Of course monads and comonads existed in category theory well before this date, but not, as far as I know, for plumbing computer programs. I’d love to hear from anyone who knows more about the history these ideas.

Friday, June 06, 2008

Drugs, Kate Moss, and Intuitionistic Logic

Before make the post I promised I'd make I thought I'd make a digression to point out a connection between Intuitionistic Logic and a recent news story.

Intuitionistic logic is what we get when we take ordinary everyday classical logic and drop the Law of the Excluded Middle (LEM), in other words, we drop the law that says that for any proposition, either it or its negation is true. This is such an ingrained notion that it's hard to imagine giving it up. For example, it seems obvious that either it's raining or it's not raining. But there are good reasons for not taking it for granted in mathematics. The main issue is that sometimes when we use LEM we find that we can prove the existence of something, but have no way of constructing it. Mathematics is full of such proofs. The best known example is probably the proof that an irrational number raised to the power of an irrational number can be rational. Some mathematicians (and many computer scientists) only like constructive proofs, ie. proofs that actually exhibit the thing whose existence is being proved. Using intuitionistic logic is a good way to always force this to be true. So, for example, in classical logic, you might find you have proved "A or B", but not have a proof of A or a proof of B. But in intuitionistic logic, if you prove "A or B", then you must have proved A or you must have proved B.

Now (British) criminal law is more lax than mathematical logic when it comes to proof. You only need to prove something beyond reasonable doubt, rather than providing a rigorous derivation from axioms. (Though admittedly mathematicians rarely do this in practice.) But criminal law does have one place where it has higher standards of proof than classical logic: you can't necessarily convict someone of "A or B" unless you have a proof of A or a proof of B. A proof of "A or B" will not do.

It seems that there is enough evidence to show that Kate Moss recently used a controlled substance of class A or class B. Unfortunately, the law requires either a proof that she had been using a class A drug, or a proof that she had been using a class B drug. A proof that she had been using one or the other will not do. And hence Kate Moss cannot be prosecuted. In this regard, the law is Intuitionistic. If you're going to have a legal system that doesn't recognise LEM then you really need to carve up crime-space as a semilattice so you can charge people with the join of two crimes :-)