Tuesday, October 13, 2009

"What Category do Haskell Types and Functions Live In?"

The question in my title is one that is often raised by Haskell programmers and it's a difficult one to answer rigorously and satisfyingly. But you may notice that I've put the question in quotes. This is because I'm not asking the question myself. Instead I want to argue that often there's a better question to ask.

Superficially Haskell looks a lot like category theory. We have types that look like objects and functions that look like arrows. Given two functions we can compose them just how arrows compose in a category. We also have things that look like products, coproducts, other kinds of limit including infinite ones, natural transformations, Kan extensions, monads and quite a bit of 2-categorical structure.

So what goes wrong? (Besides the obvious problem that on a real computer, composing two working functions might result in a non-working function because you run out of memory.)

Among other things, Haskell functions can fail to terminate because of things like infinite loops. Computer scientists often use the notation ⊥ to represent a non-terminating computation. So when we talk of the Haskell integers, say, we don't just mean the values 0, 1, 2, ... but we also have to include ⊥. Unfortunately, when we do this we break a few things. For one thing we no longer have coproducts. But people find it useful to talk about algebraic datatypes as constructing types using products and coproducts and that would be a hard thing to give up.

So we could restrict ourselves to considering only the category theory of computable functions. But that's not a trivial thing to do either, and it doesn't reflect what real Haskell programs do.

But even if we did manage to tweak this and that to get a bona fide category out of Haskell, all we'd get is a custom tailored category that serves just one purpose. One theme running through much of my blog is that Haskell can be used to gain an understanding of a nice chunk of elementary category theory in general. Showing that Haskell simply gives us one example of a category really isn't that interesting. When I talked about the Yoneda Lemma I felt like I was talking about more than just one property of some obscure category that I can't actually define and that most category theorists have never even heard of.

So what's going on? Why does it feel like Haskell is so naturally category theoretical while the details are so messy?

Going back to my Yoneda lemma code, consider my definition of `check`
`> check a f = fmap f a`

It's straightforward to translate this into standard category theoretical notation that applies to any category. Even though the code is implemented in a specific programming language there's nothing about it that prevents it being translated for use in any category. So it doesn't matter what category Haskell corresponds to. What matters is that this bit of code is written in language suitable for any category. And the proof I give can be similarly translated.

Consider this standard problem given to category theory students: prove that (A×B)×C is isomorphic to A×(B×C). In Haskell we could construct the isomorphism as:
`> iso :: ((a,b),c) -> (a,(b,c))> iso ((x,y),z) = (x,(y,z))`

But now we hit a problem. We can straightforwardly translate this into mathematical notation and it will give a valid isomorphism in the category of sets, Set. But `iso` is written to accept arguments which are elements of some type. Not all objects in categories have elements, and arrows might not correspond to functions. And even if they did, if we were working with (certain types of) topological spaces we'd be giving a construction for the isomorphism, and our proof would show the underlying function had an inverse, but we'd be failing to show it's continuous. It looks like writing Haskell code like this only tells us about a particularly limited type of category.

But not so. Type `cabal install pointfree` to install pointfree and then run `pointfree 'iso ((x,y),z) = (x,(y,z))'` and it responds with
`> iso = uncurry (uncurry ((. (,)) . (.) . (,)))`

pointfree rewrites a function in point-free style. There are no x's, y's or z's in the written version, only `uncurry`, composition `(.)`, and the product function `(,)`. These exist in all Cartesian closed categories (CCC). So our original function definition, despite apparently referring to elements, can be mechanically turned into a definition valid for any CCC. We can now reinterpret the meaning of x, y and z in the original definition as not referring to elements at all, but as labels indicating how a bunch of fairly general categorically defined primitives are to be assembled together.

(Incidentally, my first foray into pure functional programming was to write a SASL compiler. It was little more than a bunch of rewrite rules to convert SASL code into point-free compositions of S, K and I, among other combinators.)

What we have here is an example of an internal language at work. I'm not sure what a precise definition of "internal language" is, but it's something like this: take a formal system and find a way to translate it to talk about categories in such a way that true propositions in one are turned into true propositions in the other. The formal system now becomes an internal language for those categories.

The best known example is topos theory. A topos is a category that has a bunch of properties that make it a bit like Set. We take a subset of the language of set theory that makes use of just these properties. Our propositions that look like set theory can now be mechanically translated into statements valid of all toposes. This means we can happily write lots of arguments referring to elements of objects in a topos and get correct results.

In their book Introduction to Higher-Order Categorical Logic, Lambek and Scott showed that "pure typed λ-calculus" is the internal language of CCCs. Even though expressions in the λ-calculus contain named variables these can always be eliminated and replaced by point-free forms. Theorems about typed λ-calculus are actually theorems about CCCs. When we write Haskell code with 'points' in it, we don't need to interpret these literally.

So despite not knowing which category Haskell lives in, much of the code I've written in these web pages talks about a wide variety of categories because Haskell is essentially a programming language based on an internal language (or a bunch of them). Despite the fact that even a function like `iso` might have quite complex semantics when run on a real computer, the uninterpreted programs themselves often represent completely rigorous, and quite general pieces of category theory.

So the question to ask isn't "what category does Haskell live in?" but "what class of category corresponds to the internal language in which I wrote this bit of code?". I partly answer this question for do-notation (a little internal language of its own) in an earlier post. Haskell (and various subsets and extensions) is essentially a way to give semantics to internal languages for various classes of category. However complicated and messy those semantics might get on a real world computer, the language itself is a thing of beauty and more general than might appear at first.

BTW This trick of reinterpreting what look like variables as something else was used by Roger Penrose in his abstract index notation. Just as we can sanitise variables by reinterpreting them as specification for plumbing in some category, Penrose reinterpreted what were originally indices into arrays of numbers as plumbing in another category. Actually, this isn't just an analogy. With a little reformatting abstract index notation is very close to the way I've been using monads to work with vector spaces so that abstract index notation can be viewed as a special case of an internal language for categories with monads.

Jason Dusek said...

Thank you for this post. I've often wondered whether Haskell fits easily into a category and what the value of category theory is to Haskell if it doesn't.

I believe there is a missing quote mark here:

In their book Introduction to Higher-Order Categorical Logic, Lambek and Scott showed that "pure typed λ-calculus is the internal language of CCCs. Even though expressions in the λ-calculus contain named variables these can always be eliminated and replaced by point-free forms. Theorems about typed λ-calculus are actually theorems about CCCs.

Saturday, 17 October, 2009
Saizan said...

So the idea is that in this context we should think of haskell as something even more general than what the conventional denotational semantics would suggest.

It shouldn't be a so weird notion to haskell programmers, when we write typeclass polymorphic context we already have to avoid thinking only in terms of specific instances if we want to understand the code in its full scope.

This is taking it a step further, and realizing that things like seq aren't part of what we should overload.

am i getting the point of this post?

Saturday, 17 October, 2009
sigfpe said...

Saizan,

Yes, just like type classes, but they're sort of implicit depending on which features of Haskell you use.

Saturday, 17 October, 2009
Marc Hamann said...

Having thought about the problem for quite a long time, I'm inclined to go a step further and claim that there can be no category that is exactly equivalent to a Turing-complete language.

The intuitive reason is that any category worthy of the name is defined by its arrows: those it has by definition or those that it can be inferred to have.

Since there is no a priori way to determine what functions are computable, we can't have a satisfactory definition for the arrows in a category of all (and only) computable functions.

Why do functions in Haskell still show category-theoretic properties so often?

In practice, we tend to write programs that are:

- provably terminating
- provably non-terminating under-known circumstances
- provably "correct" if we were allowed to non-terminate in a well-behaved way.

In general, such functions do tend to be arrows in some well-defined category that is either an embedded sub-space of the computable functions (e.g. typed λ-calculus) or a conservative super-space of them (e.g. Set).

Does this match your thinking in suggesting a "better question"?

Saturday, 17 October, 2009
sigfpe said...

Marc,

Even if we can't determine what programs terminate, computability is a well-defined mathematical concept. In fact, there are perfectly good categories for representing functions that might not terminate (eg. the category of CPOs). But these categories don't have the properties that people take for granted - like that the type (a,b) the product of a and b.

Saturday, 17 October, 2009
Existential Type said...

There's nothing special about Haskell as an internal language for a class of categories; one could just as well use any other decently functional language for the same purpose.

Saturday, 17 October, 2009
sigfpe said...

Existential Type,

Yes, any strongly typed pure functional language would do. But one advantage Haskell has over strict languages is that because of laziness, many more mathematical expressions give rise to terminating programs.

Saturday, 17 October, 2009
Marc Hamann said...

True, CPO can provide semantic models for the execution of programs.

In the context of your post, where you are referring to the category of the type system for a programming language, the implication is we are talking about properties of syntactic representations of programs.
It is this latter sort of category I'm suggesting doesn't exist.

The intuition I'm promoting is that the very reason that categories which include non-termination lose the nice "language" properties, such as coproducts, is because of the loss of information that is introduced by non-termination.

The only option to regain these nice "language" properties in the presence of termination is to over-extend the model to include obviously excessive arrows as well.

Saturday, 17 October, 2009
Existential Type said...

How so? Haskell is readily transliterated into ML (but not conversely). Every expression in Haskell literally _is_ an expression in ML (but not conversely).

Saturday, 17 October, 2009
Neel Krishnaswami said...

No, this isn't correct. In the presence of recursion, laziness means you lose the eta-rule for coproducts, and strictness means you lose the eta-rule for products and function types.

Ie, in a strict language with general recursion, the equation (f e == case e of Inl x -> f (Inl x) | Inr y -> f (Inr y)) holds. This is because e will always be evaluated, and so this eta-expansion will not change the termination behavior of the program. In a lazy language, e might not be evaluated, and the eta-expansion can cause it to be run.

There's a dual property for products; it should be the case that e == (fst e, snd e) in a lazy language. Unfortunately, this doesn't hold in Haskell, because of seq -- Haskell tuples are not categorical products.

It's easiest to see this with the unit type; it ought to be the case that () == bot at unit type -- but seq lets you distinguish those two cases.

Sunday, 18 October, 2009
yitz said...

But the pointfree version of "iso" is not the same as the pointed version - it is less strict. So I still don't see how your "mechanical" translation is any different than the usual tweaks - either ignore strictness, or use strict function composition.

Sunday, 18 October, 2009
sigfpe said...

Existential Type,

You're right. Any time you have a mechanism for translating from (a fragment of) one formal system to another, in a way that carries over proofs and intuition, you have a useful tool. It doesn't matter whether the source is Haskell, ML, ZFC or anything else.

I think this is an interestingly non-trivial thing in practice. In the case of typed lambda calculus you make repeated use of the deduction lemma to turn lambda expressions into the language of CCC and it's not obvious at first that this can be done.

Sunday, 18 October, 2009
sigfpe said...

yitz,

Not sure what you mean by "usual tweaks". My main goal here is to justify using Haskell to illustrate ideas from category theory. I have done lots of this, but usually it takes the form "here's what happens when we specialise this theorem to Hask (whatever that means) and by analogy you can see how it might work for more general categories". Now I can say "this purely Haskell based argument can be mechanically converted into a genuine proof for some family of categories".

Sunday, 18 October, 2009
miles said...

Good stuff!

Your comments on "plumbing" and index notation reminded me of the results on TQFT and Frobenius algebras. I don't know how (or if) Penrose's index notation is related, but I've seen Einstein summation notation turned into something that looks very much like a plumbing diagram - or, if you prefer, a game of PipeMania :-) See Joachim Kock's book on the subject if you're not familiar with it.

Wednesday, 21 October, 2009
Dan Doel said...

seq (the primitive) isn't necessary to keep Haskell tuples from being products. Pattern matching is enough. Take the eta rule for pairs, for instance:

p = (fst p, snd p)

This is invalidated by the function:

f :: (Int, Int) -> Int
f (_,_) = 1

for p = _|_. In fact, you can implement seq for any algebraic type using case analysis. Merely making tuples strict doesn't work, either, because we're supposed to have commutativity rules like:

f : A -> B
g : A -> C

fst . <f,g> = f
snd . <f,g> = g

but if we have element-strict tuples, then:

f _ = _|_
g _ = 1

snd . <f,g> = \_ -> _|_

so, an actual categorical product would seem to be an unlifted tuple, where

(f,g) = _|_ iff f = _|_ & g = _|_

seq does break the eta rule for functions, which normally wouldn't be possible, though, which means that haskell-types-and-functions technically don't even meet the definition of a category, because:

id . _|_ = \x -> id (_|_ x)
_|_ . id = \x -> _|_ (id x)

both of which seq recognizes as non-bottom, when according to the definition of a category, they should be bottom (and they are, extensionally, but seq doesn't respect expected extensional equality).

Saturday, 24 October, 2009
sigfpe said...

Dan,

Good example of Haskell functions and types not even forming a category. I'm liking the internal language view more and more as a way of making sense of CT talk about Haskell programs.

Saturday, 24 October, 2009
Dan Doel said...

Yeah, mentioning Haskell as an internal language of various sorts of categories is good if only because it's a fairly interesting fact that such languages, which look rather concrete, in that we're talking about values that inhabit types, which is a no-no in category theory, can actually be made sense of in categories that aren't so concrete.

However, I don't want to oversell my last point. It's true that Haskell's semantics ruin it categorically, but that probably doesn't matter. We typically don't intend to write programs where functions end up as _|_; that'd be a bug that'd need to be fixed (the usual time it shows up intentionally, instead of as a semantic necessity, is during type hackery). So it's probably the case that:

id . f = f = f . id

for all fs that we actually care about. And similarly for eta rules for data.

And this point is made more rigorous in the (relatively) famous paper Fast and Loose Reasoning is Morally Correct. In fact, at the end, they even give a category with which one can reason about the portion of Haskell we care about behaving categorically.

Saturday, 24 October, 2009
Neel Krishnaswami said...

Extending Dan's comments: there are actually two sorts of pair, which both Haskell and ML (unfortunately) conflate. There's the strict pair, whose elimination is by pattern matching, and the lazy pair, whose elimination is by projection.

These two notions of pair coincide for total functional languages, but they are naturally different in linear logic -- where they show up as the tensor and with connectives, with entirely different linearity behavior. The distinction becomes visible again in impure functional languages (such as Haskell and ML) where nontermination lets you distinguish them. The strict pair gives rises to symmetric monoidal structure, and the lazy pair gives you Cartesian product structure. (Both of these are closed, so there's a strict and lazy function space, as well.)

It's this ambiguity that Dan is exploiting when he observes that seq can be defined using pattern matching for any algebraic datatype -- you can inductively follow the structure to figure out where to use the "wrong" elim-form. Seq is even worse than that, though, because it lets you distinguish between \x. bot and bot.

Tuesday, 27 October, 2009
Cory said...

Neel, Your point about the two types of pairs seems interesting, but I don't quite understand it. Would you be so kind as to explain it at a slightly lower level?

Wednesday, 28 October, 2009
Dan Doel said...

Hopefully Neel won't mind if I answer some.

Linear logic tracks resource usage explicitly (aside from the modalities that let you express ordinary logical statements in it). If you think about linear logic through the lens of type systems, then something with type:

A -o B

is a function that uses its argument A exactly once to produce a B, -o being the linear implication connective. So:

\_ -> b

is not acceptable, nor is:

\a -> f a a

But, what about something with type:

A * B -> C

There are two ways you can think about eliminating products:

\(a,b) -> ... a ... b ...
\p -> ... fst p ...
\p -> ... snd p ...

In Haskell, those two are mixed, because you can do:

\p -> ... fst p ... snd p ...
\(a,b) -> ... a ...
\(a,b) -> ... b ...

But those are not acceptable in linear logic, because the first uses p twice, and the others leave one thing unused. So, linear logic has to types of conjunction/product to distinguish these two cases:

f, g : A & B -o C
f p = ... fst p ...
g p = ... snd p ...

h : A x B -o C
h (a,b) = ... a ... b ...

A & B can be used to prove/obtain either an A or a B, but not both, while A x B provides both A and B, and you must use both. There's also A + B which is similar to the sum types in Haskell (classical linear logic also has "multiplicative disjunction" ⅋, but I'm not familiar enough with the classical stuff to tell you about it).

Anyhow, back in total type theory, we're allowed to use values as many times as we wish, so the only difference between matching and using fst/snd would be the order in which things are evaluated (if that), and in a total language, normal forms are reached regardless of evaluation order.

However, in a language like Haskell, you can again distinguish the two usages, because evaluation order can make a difference as to whether or not things terminate. Using pairs as if they were A & B is noticeably different from using them as if they were A x B. If you were dedicated to keeping them separate (and not writing fst for A x B even though you can in Haskell, etc.), the types could give you (vague, probably) information about the strictness/evaluation order of your program.

Or, of course, you could go whole hog, and build a type system based on linear logic, and program in that (which has been done; one reason people are interested in such type systems is that they track precisely when you're allowed to deallocate things, so you can be, in theory, more eager in that regard than a garbage collector; closer to manual memory management, if that's what you're after).

Thursday, 29 October, 2009
Cory said...

Thanks, Dan, that makes perfect sense. Linear logic is one of those things that's right on the edge of the horizon of things I need to learn.

Thursday, 05 November, 2009
sigfpe said...

Cory,

(A suitable fragment of) linear logic is another nice example of an internal language. This time for closed symmetric monoidal categories. And again it gives rise to a fragment of a programming language.

Thursday, 05 November, 2009