Monads from Algebra and the the Gray Code from Groups
There is a close association between algebraic structures and monads. I've mentioned this in passing before but I think it's interesting to work out some details. Among other things, it gives us a way of converting your favourite algebraic structures into domain specific languages embedded in Haskell.
First an aside. Haskell monads are in some sense only an approximation to mathematical monads. In fact, Haskell is only an approximation to mathematics. It's easy to define a Haskell function f, say, such that x == y but f x /= f y. Once you do such a thing you can no longer reason about Haskell functions with the assumption that == represents equality in the mathematical sense. (For an example, see Saizan's comment here.) So in the following I'm going to assume that we've limited ourselves to functions such that x == y implies that f x == f y, x < y and y < z imply x < z and so on. I'd love to see some way, in Haskell, to make explicit 'promises' about the properties of functions. (I guess that's what they call programming by contract.) But for the following we'll just assume it holds.
The first abstract algebraic structure that people study is usually the group, but I'm going to start with something simpler, the monoid. A monoid is a set M (called the underlying set of the monoid) with a binary operator, ·:M×M→M and identity element e such that e·x = x·e = x and x ·(y·z) = (x·y)·z for all x,y,z in M. By abuse of notation we'll often use M for the name of the monoid as well as the underlying set. Monoids are easy to come by and there are lots of examples. Any time you have a sequence of operations that can be applied, or not, in any order, to some system, you have a monoid. Sequences of operations form the underlying set of the monad and the binary operator means "do what's on the right" followed by "do what's on the left". For example the set of uniform scaling operations we can perform on a circle form a monoid. Write a scaling by a factor of x as s(x). If we double the size of a circle, and then triple it we have s(2) followed by s(3), written s(3)·s(2). Note that we have s(m)·s(n) = s(mn) and that s(1)·s(m)=s(m)·s(1)=s(m) so s(1) plays the role of the identity, e. Also note that for all m apart from 0 we have s(m)·s(1/m)=e so every scaling apart from s(0) has in inverse. Note however that this monoid is special because for all x and y in it, x·y=y·x. This doesn't always hold in monoids and when it does hold our monoid is said to be commutative.
If we have a type of algebraic structure then we can often form "free" versions of that structure. For monoids we get free monoids. There are many different ways of looking at "freeness" and I'm going to go through some of them informally:
The important thing here is that the operator F, ie. the "free monoid generated by" operator, forms a monad. It's tempting to think that monads got their name from monoids for this reason, but I don't think it is because just about any "free X generated by" operator forms a monad. Haskell monads give rise to DSLs via do-notation. So this means that algebraic structures give rise to monads, which may give rise to Haskell monads, and hence DSLs.
So now to explain why F forms a monad. Note that just about everything I say here works with other kinds of algebraic structures ranging from monoids through groups to vector spaces. Consider our set S={x,y} above. For convenience, let's drop the · and write the binary operator as multiplication in the usual way. Given an element of S we can easily get an element of FS. In fact, we have an embedding i:S→FS with i(x) = x. This is an abuse of notation. The element on the LHS is an element of S and the x on the RHS is a variable labeled with the symbol x, but we also write this as x because vx, or something like that, would be a pain in the ass to keep writing.
Now think about the elements of F(FS). These are strings of elements of FS, ie. strings of monoid expressions in x and y. We could write a typical member as (xyx)(e)(xxy) where the xyx, e and xxy are each elements of FS and I'm using parentheses to make clear which bits they are. It's tempting to say that this is simply xyxxxy, but that would be wrong. The former is an element of F(FS) and the latter is an element of FS. It would be clearer that these weren't equal if we used the v notation to write vxyxvevxxy. But in this case the oscurity is to our advantage. Even though (xyx)(e)(xxy) doesn't equal xyxxxy, the notation strongly suggests that we define a function that maps the first to the second. In fact, we can define a function m:F(FS)→FS that essentially just erases parentheses.
So we have functions i:S→FS and m:F(FS)→FS. Looks suspiciously like a monad. In fact, i and m satisfy the monad laws (exercise!) and make F into a monad.
So what monad is this? It's probably not hard to guess: elements of FS are finite strings of elements of S. So it's essentially the List monad. Unfortunately, Haskell allows you to form infinite lists and so the correspondence isn't 100% precise. Nonetheless, it's good enough that if you hadn't already invented the List monad (as in Haskell monad), you probably would if you had considered the free monoid monad (as in mathematical monad). i is just the embedding \x -> [x] and m is concat, which essentially just erases brackets. So you can think of [x,y,z] has the Haskell way to write the monoid element xyz.
In fact, if you repeat the above discussion with other algebraic structures that I and others have discussed you'll get other familiar monads (modulo details like being able to create infinite lists). Here's a table (with apologies for bad formatting caused by blogger.com):
(Note that the above table is approximate in the sense that sometimes you may need to restrict to instances of Eq and tweak the code slightly to make the Haskell monad behave like the mathematical one.)
Notice the gap in the "group" row. A group is a monoid in which every element has a left and right inverse. The monad is the "free group generated by" monad and I'll call it G. It doesn't correspond to any of the standard Haskell monads. So firstly - if free monoids give lists, what does the underlying datatype for free groups look like? Go back to our set S = {x,y}. GS contains e, x and y as well as all of the products of x and y that appear in FS. But additionally GS contains the inverses of x and y, x-1 and y-1. And of course you need all strings of x, y and their inverses. But do you need more? What about (xy)-1y? Well we can expand out the inverse using (xy)-1=y-1x-1. The net effect is that the free group contains precisely all (possibly empty) strings of x, y, x-1 and y-1, where substrings like xx-1 and x-1x are removed. We can model this in Haskell using the following type:
We're using Left x to represent x and Right x to represent x-1. We implement >>= so that it uses the inverse of product formula above. It also ought to cancel out terms like xx-1 but to do that requires that we restrict the monad to instances of Eq and if I do that I risk castigation from Saizan. So I'll leave out the cancellation to get the following Haskell monad:
But what purpose might this serve? Try these:
The first does something almost identical to what you might expect from the ordinary list monad. But note the ordering in the second. This is the Gray code monad! By using Left and Right we can control the direction in which future combinatorial searches are carried out. Admittedly not the most useful monad, but it's curious that we do get something recognisable from this abstract nonsense. And maybe you spotted the "beautiful" Haskell implementation of the power set function on reddit recently. Here's a cute twist on that:
The first gives a result similar to the original, but the second lists the subsets in such a way that each element differs from the previous one by one element.
I wonder what other algebarically motivated monads are waiting to be discovered.
And I hope to write something about deriving comonads from coalgebras, as soon as I've read this. It looks really intriguing. Problem is, I'm having trouble making sense of it.
And sorry, this is a bit incoherent today. I only get a short period of time to write this stuff up and I blew most of mine this week in traffic jams. As usual, if the text makes no sense, you can be sure the code above works as I just tested it. But ask questions and complain about errors...
* The tropical semiring is the proper name for the (R,min,+) semiring I talked about earlier.
Update: Saizan's doing a great job of keeping me honest. He noticed that I'd omitted a step. I was originally finding the inverse of a free group element by simply reversing the elements in the product. But I was failing to flip the polarities of each of the elements (so to speak) despite having spelled out in the text exactly what I needed to do. The code is now fixed.
First an aside. Haskell monads are in some sense only an approximation to mathematical monads. In fact, Haskell is only an approximation to mathematics. It's easy to define a Haskell function f, say, such that x == y but f x /= f y. Once you do such a thing you can no longer reason about Haskell functions with the assumption that == represents equality in the mathematical sense. (For an example, see Saizan's comment here.) So in the following I'm going to assume that we've limited ourselves to functions such that x == y implies that f x == f y, x < y and y < z imply x < z and so on. I'd love to see some way, in Haskell, to make explicit 'promises' about the properties of functions. (I guess that's what they call programming by contract.) But for the following we'll just assume it holds.
The first abstract algebraic structure that people study is usually the group, but I'm going to start with something simpler, the monoid. A monoid is a set M (called the underlying set of the monoid) with a binary operator, ·:M×M→M and identity element e such that e·x = x·e = x and x ·(y·z) = (x·y)·z for all x,y,z in M. By abuse of notation we'll often use M for the name of the monoid as well as the underlying set. Monoids are easy to come by and there are lots of examples. Any time you have a sequence of operations that can be applied, or not, in any order, to some system, you have a monoid. Sequences of operations form the underlying set of the monad and the binary operator means "do what's on the right" followed by "do what's on the left". For example the set of uniform scaling operations we can perform on a circle form a monoid. Write a scaling by a factor of x as s(x). If we double the size of a circle, and then triple it we have s(2) followed by s(3), written s(3)·s(2). Note that we have s(m)·s(n) = s(mn) and that s(1)·s(m)=s(m)·s(1)=s(m) so s(1) plays the role of the identity, e. Also note that for all m apart from 0 we have s(m)·s(1/m)=e so every scaling apart from s(0) has in inverse. Note however that this monoid is special because for all x and y in it, x·y=y·x. This doesn't always hold in monoids and when it does hold our monoid is said to be commutative.
If we have a type of algebraic structure then we can often form "free" versions of that structure. For monoids we get free monoids. There are many different ways of looking at "freeness" and I'm going to go through some of them informally:
- Given any monoid there are equations satisfied by its elements. From the above example we have that x·y=y·x and also that x·e·x=x·x. But notice how there is a big difference between these equations. The former doesn't hold in all monads, but the latter does. A monoid is said to be free when the only equations that hold follow from equations that hold in every monoid. You can think of a free monoid as being a generic monoid. It has no special properties above and beyond being a monoid. Given a set S, the free monoid generated by S is the smallest free monoid containing the set S and with no equations relating the elements of S. Write this monoid as FS. For example, suppose S = {x,y}. Then we know that e, x and y are all in FS. We also know that x·x, x·y, y·x and y·y are all in S. Importantly we know that all 4 of these elements are distinct because we know there can be no equations connecting them beyond those that define a monoid. In fact, it's not hard to see that the elements of FS are essentially just the (possibly zero length) strings of x's and y's.
- Given a set S, consider its elements to be 'labels' for unknown variables. In fact, just use the elements of S as variables. Then we can also consider the free monoid generated by S to be the set of "monoid expressions" in these unknowns. By "monoid expression" we just mean functions of these variables that can be written using the language of monoids, ie. the symbols e and ·. If S = {x,y} then examples of such expressions are e, x, y, x·y and so on. I hope it's not hard to see that this is simply another description of the same thing as in the previous paragraph.
- Another way to think of free monoids requires a tiny bit more algebra. Define a monoid homomorphism from one monoid, M, to another, N, to be a function f:M→N on the underlying sets such that f(e)=e and f(x·y)=f(x)·f(y). A bijective monoid homomorphism is called an isomorphism. If there is an isomorphism between two monoids then in some sense they are the same monoid. Note that e and · are being overloaded here - we have two different monoids and these symbols have different meanings in each one. Now we can define the free monoid generated by S to be the monoid, M, such that (1) there is a function i:S→M (2) given any monoid N, and any function f:S→N, then f can be factored as f' o i where f' is a monoid homomorphism.
- Very vaguely, the set S lives in the category of sets. But FS lives in the category of monoids. FS is the closest approximation to S in the category of monoids. The previous property gives a sense of what this means: anything that can be said about a function from S to a monoid N can be said in the language of monoids using a homomorphism from FS to N. Don't worry if this is too vague as I won't be using it below - but it may click in some people's minds.
The important thing here is that the operator F, ie. the "free monoid generated by" operator, forms a monad. It's tempting to think that monads got their name from monoids for this reason, but I don't think it is because just about any "free X generated by" operator forms a monad. Haskell monads give rise to DSLs via do-notation. So this means that algebraic structures give rise to monads, which may give rise to Haskell monads, and hence DSLs.
So now to explain why F forms a monad. Note that just about everything I say here works with other kinds of algebraic structures ranging from monoids through groups to vector spaces. Consider our set S={x,y} above. For convenience, let's drop the · and write the binary operator as multiplication in the usual way. Given an element of S we can easily get an element of FS. In fact, we have an embedding i:S→FS with i(x) = x. This is an abuse of notation. The element on the LHS is an element of S and the x on the RHS is a variable labeled with the symbol x, but we also write this as x because vx, or something like that, would be a pain in the ass to keep writing.
Now think about the elements of F(FS). These are strings of elements of FS, ie. strings of monoid expressions in x and y. We could write a typical member as (xyx)(e)(xxy) where the xyx, e and xxy are each elements of FS and I'm using parentheses to make clear which bits they are. It's tempting to say that this is simply xyxxxy, but that would be wrong. The former is an element of F(FS) and the latter is an element of FS. It would be clearer that these weren't equal if we used the v notation to write vxyxvevxxy. But in this case the oscurity is to our advantage. Even though (xyx)(e)(xxy) doesn't equal xyxxxy, the notation strongly suggests that we define a function that maps the first to the second. In fact, we can define a function m:F(FS)→FS that essentially just erases parentheses.
So we have functions i:S→FS and m:F(FS)→FS. Looks suspiciously like a monad. In fact, i and m satisfy the monad laws (exercise!) and make F into a monad.
So what monad is this? It's probably not hard to guess: elements of FS are finite strings of elements of S. So it's essentially the List monad. Unfortunately, Haskell allows you to form infinite lists and so the correspondence isn't 100% precise. Nonetheless, it's good enough that if you hadn't already invented the List monad (as in Haskell monad), you probably would if you had considered the free monoid monad (as in mathematical monad). i is just the embedding \x -> [x] and m is concat, which essentially just erases brackets. So you can think of [x,y,z] has the Haskell way to write the monoid element xyz.
In fact, if you repeat the above discussion with other algebraic structures that I and others have discussed you'll get other familiar monads (modulo details like being able to create infinite lists). Here's a table (with apologies for bad formatting caused by blogger.com):
Algebraic Structure | DSL |
Monoid | Combinatorial search |
M-Set | "Write-only" side effects |
Vector Space | Probability theory/Quantum Mechanics |
Semirings | Two player game tree search |
Modules over tropical semiring* | Min-sum algorithm |
Group | ??? |
(Note that the above table is approximate in the sense that sometimes you may need to restrict to instances of Eq and tweak the code slightly to make the Haskell monad behave like the mathematical one.)
Notice the gap in the "group" row. A group is a monoid in which every element has a left and right inverse. The monad is the "free group generated by" monad and I'll call it G. It doesn't correspond to any of the standard Haskell monads. So firstly - if free monoids give lists, what does the underlying datatype for free groups look like? Go back to our set S = {x,y}. GS contains e, x and y as well as all of the products of x and y that appear in FS. But additionally GS contains the inverses of x and y, x-1 and y-1. And of course you need all strings of x, y and their inverses. But do you need more? What about (xy)-1y? Well we can expand out the inverse using (xy)-1=y-1x-1. The net effect is that the free group contains precisely all (possibly empty) strings of x, y, x-1 and y-1, where substrings like xx-1 and x-1x are removed. We can model this in Haskell using the following type:
> import Control.Monad
> data Group a = G [Either a a] deriving Show
We're using Left x to represent x and Right x to represent x-1. We implement >>= so that it uses the inverse of product formula above. It also ought to cancel out terms like xx-1 but to do that requires that we restrict the monad to instances of Eq and if I do that I risk castigation from Saizan. So I'll leave out the cancellation to get the following Haskell monad:
> instance Monad Group where
> return x = G [Left x]
> G x >>= f = G $ concatMap g x where
> g (Left y) = let G u = f y in u
> g (Right z) = let G u = f z in reverse (map (either Right Left) u)
But what purpose might this serve? Try these:
> test1 = sequence $ replicate 4 (G [Left 0,Left 1])
> test2 = sequence $ replicate 4 (G [Left 0,Right 1])
The first does something almost identical to what you might expect from the ordinary list monad. But note the ordering in the second. This is the Gray code monad! By using Left and Right we can control the direction in which future combinatorial searches are carried out. Admittedly not the most useful monad, but it's curious that we do get something recognisable from this abstract nonsense. And maybe you spotted the "beautiful" Haskell implementation of the power set function on reddit recently. Here's a cute twist on that:
> powerset1 = filterM (const $ G [Left False,Left True])
> powerset2 = filterM (const $ G [Left False,Right True])
The first gives a result similar to the original, but the second lists the subsets in such a way that each element differs from the previous one by one element.
I wonder what other algebarically motivated monads are waiting to be discovered.
And I hope to write something about deriving comonads from coalgebras, as soon as I've read this. It looks really intriguing. Problem is, I'm having trouble making sense of it.
And sorry, this is a bit incoherent today. I only get a short period of time to write this stuff up and I blew most of mine this week in traffic jams. As usual, if the text makes no sense, you can be sure the code above works as I just tested it. But ask questions and complain about errors...
* The tropical semiring is the proper name for the (R,min,+) semiring I talked about earlier.
Update: Saizan's doing a great job of keeping me honest. He noticed that I'd omitted a step. I was originally finding the inverse of a free group element by simply reversing the elements in the product. But I was failing to flip the polarities of each of the elements (so to speak) despite having spelled out in the text exactly what I needed to do. The code is now fixed.