Superficially Haskell looks a lot like category theory. We have types that look like objects and functions that look like arrows. Given two functions we can compose them just how arrows compose in a category. We also have things that look like products, coproducts, other kinds of limit including infinite ones, natural transformations, Kan extensions, monads and quite a bit of 2-categorical structure.
So what goes wrong? (Besides the obvious problem that on a real computer, composing two working functions might result in a non-working function because you run out of memory.)
Among other things, Haskell functions can fail to terminate because of things like infinite loops. Computer scientists often use the notation ⊥ to represent a non-terminating computation. So when we talk of the Haskell integers, say, we don't just mean the values 0, 1, 2, ... but we also have to include ⊥. Unfortunately, when we do this we break a few things. For one thing we no longer have coproducts. But people find it useful to talk about algebraic datatypes as constructing types using products and coproducts and that would be a hard thing to give up.
So we could restrict ourselves to considering only the category theory of computable functions. But that's not a trivial thing to do either, and it doesn't reflect what real Haskell programs do.
But even if we did manage to tweak this and that to get a bona fide category out of Haskell, all we'd get is a custom tailored category that serves just one purpose. One theme running through much of my blog is that Haskell can be used to gain an understanding of a nice chunk of elementary category theory in general. Showing that Haskell simply gives us one example of a category really isn't that interesting. When I talked about the Yoneda Lemma I felt like I was talking about more than just one property of some obscure category that I can't actually define and that most category theorists have never even heard of.
So what's going on? Why does it feel like Haskell is so naturally category theoretical while the details are so messy?
Going back to my Yoneda lemma code, consider my definition of
> check a f = fmap f a
It's straightforward to translate this into standard category theoretical notation that applies to any category. Even though the code is implemented in a specific programming language there's nothing about it that prevents it being translated for use in any category. So it doesn't matter what category Haskell corresponds to. What matters is that this bit of code is written in language suitable for any category. And the proof I give can be similarly translated.
Consider this standard problem given to category theory students: prove that (A×B)×C is isomorphic to A×(B×C). In Haskell we could construct the isomorphism as:
> iso :: ((a,b),c) -> (a,(b,c))
> iso ((x,y),z) = (x,(y,z))
But now we hit a problem. We can straightforwardly translate this into mathematical notation and it will give a valid isomorphism in the category of sets, Set. But
isois written to accept arguments which are elements of some type. Not all objects in categories have elements, and arrows might not correspond to functions. And even if they did, if we were working with (certain types of) topological spaces we'd be giving a construction for the isomorphism, and our proof would show the underlying function had an inverse, but we'd be failing to show it's continuous. It looks like writing Haskell code like this only tells us about a particularly limited type of category.
But not so. Type
cabal install pointfreeto install pointfree and then run
pointfree 'iso ((x,y),z) = (x,(y,z))'and it responds with
> iso = uncurry (uncurry ((. (,)) . (.) . (,)))
pointfree rewrites a function in point-free style. There are no x's, y's or z's in the written version, only
(.), and the product function
(,). These exist in all Cartesian closed categories (CCC). So our original function definition, despite apparently referring to elements, can be mechanically turned into a definition valid for any CCC. We can now reinterpret the meaning of x, y and z in the original definition as not referring to elements at all, but as labels indicating how a bunch of fairly general categorically defined primitives are to be assembled together.
(Incidentally, my first foray into pure functional programming was to write a SASL compiler. It was little more than a bunch of rewrite rules to convert SASL code into point-free compositions of S, K and I, among other combinators.)
What we have here is an example of an internal language at work. I'm not sure what a precise definition of "internal language" is, but it's something like this: take a formal system and find a way to translate it to talk about categories in such a way that true propositions in one are turned into true propositions in the other. The formal system now becomes an internal language for those categories.
The best known example is topos theory. A topos is a category that has a bunch of properties that make it a bit like Set. We take a subset of the language of set theory that makes use of just these properties. Our propositions that look like set theory can now be mechanically translated into statements valid of all toposes. This means we can happily write lots of arguments referring to elements of objects in a topos and get correct results.
In their book Introduction to Higher-Order Categorical Logic, Lambek and Scott showed that "pure typed λ-calculus" is the internal language of CCCs. Even though expressions in the λ-calculus contain named variables these can always be eliminated and replaced by point-free forms. Theorems about typed λ-calculus are actually theorems about CCCs. When we write Haskell code with 'points' in it, we don't need to interpret these literally.
So despite not knowing which category Haskell lives in, much of the code I've written in these web pages talks about a wide variety of categories because Haskell is essentially a programming language based on an internal language (or a bunch of them). Despite the fact that even a function like
isomight have quite complex semantics when run on a real computer, the uninterpreted programs themselves often represent completely rigorous, and quite general pieces of category theory.
So the question to ask isn't "what category does Haskell live in?" but "what class of category corresponds to the internal language in which I wrote this bit of code?". I partly answer this question for do-notation (a little internal language of its own) in an earlier post. Haskell (and various subsets and extensions) is essentially a way to give semantics to internal languages for various classes of category. However complicated and messy those semantics might get on a real world computer, the language itself is a thing of beauty and more general than might appear at first.
BTW This trick of reinterpreting what look like variables as something else was used by Roger Penrose in his abstract index notation. Just as we can sanitise variables by reinterpreting them as specification for plumbing in some category, Penrose reinterpreted what were originally indices into arrays of numbers as plumbing in another category. Actually, this isn't just an analogy. With a little reformatting abstract index notation is very close to the way I've been using monads to work with vector spaces so that abstract index notation can be viewed as a special case of an internal language for categories with monads.