A Neighborhood of Infinity

Saturday, April 07, 2012

Generalised entropy

Introduction
The entropy of a probability distribution can be seen as a measure of its uncertainty or a measure of the diversity of samples taken from it. Over the years I've talked lots about how probability theory gives rise to a monad. This suggests the possibility that maybe the notion of entropy can be generalised to monads other than probability. So here goes...

> {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, GeneralizedNewtypeDeriving #-}
> {-# LANGUAGE FunctionalDependencies, TypeSynonymInstances #-}

> import Control.Monad
> import Control.Monad.Writer hiding (lift)


Shannon entropy
I've talked in the past about how there is some trickiness with defining the probability monad in Haskell because a good implementation requires use of the Eq typeclass, and hence restricted monads. Restricted monads are possible through a bunch of methods, but this time I don't want them.

It's common to represent probability distributions on finite sets as lists of pairs where each pair (p, x) means x has a probability p. But I'm going to allow lists without the restriction that each x appears once and make my code work with these generalised distributions. When I compute the entropy, say, it will only be the usual entropy in the case that each x in the list is unique.

So here's our type and some instances for it:

> data P a = P [(a, Float)] deriving Show

> instance Functor P where
>      fmap f (P xs) = P [(f a, p) | (a, p) <- xs]

> instance Monad P where
>      return x = P [(x, 1)]
>      P xss >>= f = P [(y, p*q) | (pxs, p) <- xss, let P ys = f pxs, (y, q) <- ys]

We can easily compute the expected value of a distribution, and its entropy, like this:

> expectation0 (P xs) = sum [x*p | (x, p) <- xs]
> entropy0 (P xs) = -sum [if p==0 then 0 else p*log p/log 2.0 | (_, p) <- xs]

An important property of entropy is known as the grouping property which can be illustrated through an example tree like this:


The entropy for the probability distribution of the final leaves is the sum of two components: (1) the entropy of the branch at the root of the tree and (2) the expected entropy of the subtrees. Here's some corresponding code. First simple bernoulli trials:

> bernoulli p a b = P [(a, p), (b, 1-p)]

Now the branch at the root of the tree:

> root = bernoulli 0.3 False True

We can compute the entropy for the distrbution on the leaves:

> test1 = entropy0 $ do
>   x <- root
>   if x
>     then bernoulli 0.2 3 4
>     else bernoulli 0.4 5 6

Or the sum of the root entropy and the expected subtree entropy:

> test2 = entropy0 root + (expectation0 $ do
>   x <- root
>   if x
>     then return $ entropy0 (bernoulli 0.2 3 4)
>     else return $ entropy0 (bernoulli 0.4 5 6))

You can confirm for yourself that test1 == test2.

We can rewrite that a little. We're drawing True or False from root only to decide which distribution to use at the next stage. But we may as will pick the distribution itself at random. So define:

> dist = bernoulli 0.3 (bernoulli 0.4 5 6) (bernoulli 0.2 3 4)

And now we expect the equality of test3 and test4:

> test3 = entropy0 $ do
>   x <- dist
>   x

> test4 = entropy0 dist + (expectation0 $ do
>   x <- dist
>   return $ entropy0 x)

There's a more elegant way of writing this. Define:

> left0 dist = entropy0 (join dist)
> right0 dist = entropy0 dist+expectation0 (fmap entropy0 dist)

Now we expect left0 dist and right0 dist to always be equal. We've almost generalised to something that makes sense in the context of monads other than probability.

The algebra of a monad
Here are a couple of important properties of expectation0:

1. expectation0 (return d) = d
2. expectation0 (join d) = expectation0 (fmap expectation d)

In English: the expectation of certainty is just the certain value, and the expectation of an expectation is just the expectation. But these rules are precisely the conditions that define an -algebra, where is a monad.

So let's define a type class:

> class Algebra m a | m -> a where
>      expectation :: m a -> a

We'll assume that when m is a monad, any instance satisfies the two laws above. Here's the instance for probability:

> instance Algebra P Float where
>      expectation (P xs) = sum [x*p | (x, p) <- xs]

In keeping with the notion that entropy measure diversity let's also define:

> class Diverse m r | m -> r where
>      entropy :: m x -> r

with the instance:

> instance Diverse P Float where
>      entropy (P xs) = -sum [if p==0 then 0 else p*log p/log 2.0 | (_, p) <- xs]

It's not clear what laws we need but for now we'll assume a generalised entropy satisfies left dist == right dist :

> left dist = entropy (join dist)
> right dist = entropy dist+expectation (fmap entropy dist)

We'll call that the generalised grouping law.

Binary trees
It's not hard to find other structures that satisfy these laws if we cheat and use alternative structures to represent probabilities. For example We can make Tree an instance by assuming Fork represents a 50/50 chance of going one way or another:

> data Tree a = Leaf a | Fork (Tree a) (Tree a) deriving Show

> instance Functor Tree where
>      fmap f (Leaf a) = Leaf (f a)
>      fmap f (Fork l r) = Fork (fmap f l) (fmap f r)

> instance Monad Tree where
>      return x = Leaf x
>      Leaf a >>= f = f a
>      Fork l r >>= f = Fork (l >>= f) (r >>= f)

> instance Algebra Tree Float where
>      expectation (Leaf a) = a
>      expectation (Fork l r) = 0.5*expectation l+0.5*expectation r

> instance Diverse Tree Float where
>      entropy (Leaf a) = 0
>      entropy (Fork l r) = 1+0.5*entropy l+0.5*entropy r

Lists
We could make non-empty lists into an instance by assuming a uniform distribution on the list. But another way to measure the diversity is simply to count the elements. We subtract one so that [x] corresponds to diversity zero. This subtraction gives us a non-trivial instance:

> newtype L a = L [a] deriving (Show, Monad, Functor)

> instance Algebra L Int where
>      expectation (L xs) = sum xs

> instance Diverse L Int where
>      entropy (L xs) = length xs-1

Tsallis entropy
There are measures of diversity for probability distributions that are distinct from Shannon entropy. An example is Tsallis entropy. At this point I'd like a family of types parametrised by reals but Haskell doesn't support dependent types. So I'll just fix a real number q and we can define:

> q = 2.5

> data T a = T [(a, Float)] deriving Show

> instance Functor T where
>      fmap f (T xs) = T [(f a, p) | (a, p) <- xs]

> instance Monad T where
>      return x = T [(x, 1)]
>      T xss >>= f = T [(y, p*q) | (pxs, p) <- xss, let T ys = f pxs, (y, q) <- ys]

> instance Algebra T Float where
>      expectation (T xs) = sum [x*p**q | (x, p) <- xs]

> instance Diverse T Float where
>      entropy (T xs) = (1-sum [p**q | (_, p) <- xs])/(q-1)

And again we find our generalised grouping rule for entropy holds.

Operads
This is all derived from Tom Leinster's post last year at the n-category cafe. As I talked about here there's a close relationship between monads and operads. Operads area a bit like container monads where the containers don't contain anything, but just have holes where contents could be placed. This makes operads a better place to work because you don't have the awkward issue I started with: having to disallow lists of value/probability pairs where the same value can appear more than once. Nonetheless, in (unrestricted) Haskell monads you don't have Eq available so you can't actually have definitions of return or >>= that can notice the equality of two elements. If such definitions were possible, the grouping law would no longer work as stated above.

Crossed homomorphisms
The generalised grouping law even makes sense for very different monads. For the Reader monad the law gives the definition of a crossed homomorphism. It's pretty weird seeing a notion from group cohomology emerge like this and I recommend skipping to the final section unless you care about this sort of thing. But if you do, this is related to research I did a long time ago. This is to test that the Schwarzian derivative really does give rise to a crossed homomorphism.

Firstly let me set up some automatic differentiation code:

> data D a = D { re::a, im::a } deriving (Show, Ord, Eq)

> instance Num a => Num (D a) where
>      fromInteger n = D (fromInteger n) 0
>      D a a'+D b b' = D (a+b) (a'+b')
>      D a a'*D b b' = D (a*b) (a*b'+a'*b)
>      D a a'-D b b' = D (a-b) (a'-b')

> instance Fractional a => Fractional (D a) where
>      fromRational n = D (fromRational n) 0
>      D a a'/D b b' = let q = 1/b in D (a*q) ((-a*b'+a'*b)*q*q)

> lift x = D x 0

> d f x = im (f (D x 1))

> raised f = re . f . lift
> raised2 = raised . raised
> raised3 = raised2 . raised

The Cn are the n-times (automatically) differentiable functions. Unfortunately the Endo defined in Data.Monoid acts the wrong way round from what I want so I need a Dual:

> type C1 = Dual (Endo (D Double))
> type C3 = Dual (Endo (D (D (D Double))))
> type C4 = Dual (Endo (D (D (D (D Double)))))

> instance Eq (Endo (D Double))
> instance Ord (Endo (D Double))

A silly Show instance that simply evaluates a function at a number I chose randomly: 1.234.

> instance Show (Endo (D Double)) where
>         show (Endo f) = show (f 1.234)

> instance Num C1 where
>      fromInteger n = Dual (Endo (\x -> fromInteger n))
>      Dual (Endo  f)+Dual (Endo  g) = Dual (Endo  (\x -> f x + g x))
>      Dual (Endo  f)-Dual (Endo  g) = Dual (Endo  (\x -> f x - g x))
>      Dual (Endo  f)*Dual (Endo  g) = Dual (Endo  (\x -> f x * g x))

> instance Fractional C1 where
>      fromRational n = Dual (Endo (\x -> fromRational n))
>      Dual (Endo f)/Dual (Endo g) = Dual (Endo (\x -> f x / g x))

> newtype Q a = Q (Writer C4 a) deriving (Monad, Functor)

We can give Q a a geometrical interpretation. The underlying type is a pair (a, C4). If we think of elements of C4 as charts charts on a piece of Riemann surface then for any , an element of (a, C4) represents a local piece of a section of the th tensor power of the canonical bundle. Ie. we can think of it as representing . I'll concentrate on the case which gives quadratic differentials. We can think of an element of ((a, C4), C4) as forms where we're composing two charts. We can collapse down to an ordinary chart by using the chain rule. Here's the code:

> instance Algebra Q C1 where
>      expectation (Q ma) = let (Dual (Endo a), Dual (Endo f)) = runWriter ma
>                           in Dual (Endo (\x -> a (raised3 f x)*(raised2 (d f) x)^2))

Now we can define the Schwarzian derivative:

> schwarzian f x = let f0 = raised3 f x
>                      f1 = raised2 (d f) x
>                      f2 = raised (d $ d f) x
>                      f3 = (d $ d $ d f) x
>                  in f3/f1-1.5*(f2/f1)^2

And somwehat bizarrely, we now have a generalised entropy:

> instance Diverse Q C1 where
>      entropy (Q ma) = let (_, Dual (Endo f)) = runWriter ma
>                       in Dual (Endo (\x -> schwarzian f x))

This is the construction that gives rise to the Virasoro algebra which plays such an important role in String Theory.

Some tests
And here's a bunch of tests. I'd have used QuickCheck but it won't install for me today...

> test :: (Algebra m t, Diverse m t, Num t, Functor m, Monad m) => m (m x) -> IO ()
> test x = do
>      print (left x, right x)

> main = do
>      test $ L [L [1, 2, 3], L [2, 3, 4], L [1], L [5], L [2, 7::Int]]
>      test $ P [(P [(0, 0.5), (1, 0.5)], 0.5), (P [(2, 0.5), (3::Int, 0.5)], 0.5::Float)]
>      test $ T [(T [(0, 0.5), (1, 0.5)], 0.5), (T [(2, 0.5), (3::Int, 0.5)], 0.5::Float)]
>      test $ Leaf (Leaf 1 `Fork` Leaf 2) `Fork` Leaf (Leaf 3 `Fork` (Leaf 4 `Fork` Leaf 5))
>      test $ (Q (writer
>               (Q (writer (Dual (Endo (\x -> x)),
>                           Dual (Endo (\x -> x^2+1)))),
>                           Dual (Endo (\x -> (2+x)/(3+x*x))))) :: Q (Q C3))

Saturday, March 17, 2012

Overloading Python list comprehension

Introduction
Python is very flexible in the way it allows you to overload various features of its syntax. For example most of the binary operators can be overloaded. But one part of the syntax that can't be overloaded is list comprehension ie. expressions like [f(x) for x in y].

What might it mean to overload this notation? Let's consider something simpler first, overloading the binary operator +. The expression a+b is interpreted as a.__add__(b) if a is of class type. So overloading + means nothing more than writing a function. So if we can rewrite list comprehensions in terms of a function (or functions) then we can overload the notation by providing alternative definitions for those functions. Python doesn't provide a facility for doing this directly, but we can at least think about what it might mean to do this. Later we'll see how to tweak the Python interpreter to make it possible.

map
Consider the expression
[a for x in y]
Here the single letter variables are 'metavariables' representing fragments of Python code. To a good approximation this is equal to:
map(lambda x: a, y)
(BTW Everything I say here is "to a good approximation". Python is an incredibly complex language and I'm not good enough at it to make any categorical statements about when one fragment of code is the same as another.)

So it's tempting to see list comprehensions as syntactic sugar for map, in which case one approach to overloading comprehension is to consider interpreting it in terms of replacements for map. But this isn't a very powerful overloading. It just gives us a slightly different way to write something that's already straightforward.

concatMap
Another reason for not simply seeing list comprehension in terms of map is that nested list comprehensions need another operation. Consider
[(y, z) for y in [1, 2] for z in ['a', 'b']]
This isn't quite the same as
[[(y, z) for z in ['a', 'b']] for y in [1, 2]]
but it's close. The latter produces nested lists whereas the first gives one flat list. We can think of nested comprehensions as applying a flattening operation. Let's use list comprehension to implement flattening:
def concat(xs):
 return [y for x in xs for y in x]
We now write our nested comprehension as:
concat([[(y, z) for z in ['a', 'b']] for y in [1, 2]])
We know how to write non-nested comprehensions using map so we get:
concat(map(lambda y: [(y, z) for z in ['a', 'b']], [1, 2]))
And rewriting the inner comprehension we get:
concat(map(lambda y: map(lambda z: (y, z), ['a', 'b']), [1, 2]))
Every time we add another level of nesting we're going to need another concat. But the innermost map doesn't have a concat. Purely for reasons of symmetry we can ensure every map has a concat by enclosing the innermost element as a singleton list:
concat(map(lambda y: concat(map(lambda z: [(y, z)], ['a', 'b'])), [1, 2]))
Every map has a concat so we can simplify slightly. Let's define:
def concatMap(f, xs):
 return [f(y) for x in xs for y in x]

def singleton(x):
 return [x]
Our expression becomes:
concatMap(lambda y: concatMap(lambda z: singleton((y, z)), ['a', 'b']), [1, 2])
Importantly we've completely rewritten the comprehension in terms of concatMap and singleton. By changing the meaning of these functions we can change the meaning of comprehension notation, or at least we could if the Python interpreter defined comprehension this way. It doesn't, but we can still reason about it. Although any comprehension that doesn't use ifs can be rewritten to use these functions, I won't give a formal description of the procedure. Instead I'll provide code to perform the rewrite later. While I'm at it, I'll also handle the ifs.

Laws
Freely redefining singleton and concatMap to redefine comprehension could get weird. If we're going to redefine them we should at least try to define them so that list comprehension still has some familiar properties. For example, for y a list we usually expect:
y == [x for x in y]
In other words
y == concatMap(lambda x: singleton(x), y)
At this point I could give a whole bunch more laws but it's time to own up.

Monads
A pair of functions singleton and concatMap, along with a bunch of laws, are essentially the same thing as a monad. In Haskell, concatMap is usually called bind and singleton is called return. What I've done here is show how Wadler's Comprehending Monads paper might look like in Python. Haskell has specialised monad notation built into its grammar. But what's less well known is that so does Python! The catch is that although the grammar is right, the semantics can't be generalised beyond lists.

Monad-Python
One great thing about Python is that there seem to be libraries for working with every aspect of Python internals. So it's fairly easy to write a simple Python interpreter that rewrites list comprehensions to use singleton and concatMap. I've placed the source on github. Use mpython.py instead of python as your interpreter. I've tested it with Python 2.6 and 2.7.

When using mpython, list comprehension uses whatever definitions of __mapConcat__ and __singleton__ are currently in scope. By default they are the definitions I gave above so we get something close to the usual list comprehension.

An example of the kind of code you can run with mpython.py is:
import math

def __concatMap__(k, m):
  return lambda c:m(lambda a:k(a)(c))

def __singleton__(x):
  return lambda f:f(x)

def callCC(f):
  return lambda c:f(lambda a:lambda _:c(a))(c)

def __fail__():
  raise "Failure is not an option for continuations"

def ret(x):
  return __singleton__(x)

def id(x):
  return x

def solve(a, b, c):
  return callCC(lambda throw: [((-b-d)/(2*a), (-b+d)/(2*a))
                               for a0 in (throw("Not quadratic") if a==0 else ret(a))
                               for d2 in ret(b*b-4*a*c)
                               for d in (ret(math.sqrt(d2)) if d2>=0 else throw("No roots"))
                              ])

print solve(1, 0, -9)(id)
print solve(1, 1, 9)(id)
print solve(0, 1, 9)(id)
I have defined our functions so that comprehension syntax gives us the continuation monad. This makes continuation passing style relatively painless in Python. (At least easier than chaining many lambdas.) I have then defined callCC to be similar to its definition in Haskell. There are many uses for callCC including the implementation of goto. Above I use it in a trivial way to throw exceptions.

Conclusion
My script mpython.py is a long way from an industrial strength interpreter and I'm not proposing the above as an extension to Python. My goal was simply to show how Haskell-style monads are not as alien to Python as you might think. In fact, it's reasonable to say that Python already supports one flavour of specialised monad syntax. Most users don't realise it as such because it has been hard-wired to work with just one monad, lists.

BTW if you attempt to implement all of the other Haskell monads you'll find that Haskell behaves a little differently because of its laziness. You can recover some of that laziness by careful use of continuations in Python. But I've no time to go into that now.

Saturday, February 11, 2012

Using Lawvere theories to combine effects

> {-# LANGUAGE MultiParamTypeClasses, ExplicitForAll, RankNTypes, FlexibleInstances, FlexibleContexts, TypeSynonymInstances #-}

> import Data.Monoid
> import Data.Functor.Identity
> import Control.Monad.Writer

In an earlier post I talked about how monads arise from free algebras. Let me recap a bit.

In Part 1 I described algebras. They're sets with operations on them satisfying some laws. We can build new elements of an algebra from old ones by using its operations. Eg. if x and y are in an algebra then x `mappend` y must be in it too. Starting with a bunch of symbols, thought of as leaves, we can consider the set of all expressions trees we can build from them. If we consider pairs of trees to be equivalent if the laws say the corresponding expressions are equal, then the set of trees itself forms an algebra known as a free algebra (for the given theory).

Let's start with some code. This type class says that the type b has leaves of type a:

> class Free a b where
>   leaf :: a -> b

Effects from monoids
Now we can make the type of all trees built from Monoid operations and including all leaves of type a:

> data FreeMonoid a = FreeMonoid (forall b. (Monoid b, Free a b) => b)

And we have:

> instance Monoid (FreeMonoid a) where
>   mempty = FreeMonoid mempty
>   FreeMonoid a `mappend` FreeMonoid b = FreeMonoid (a `mappend` b)

Unfortunately elements like e1 and e2 two ought to be equal but Haskell doesn't know this:

> e1, e2 :: FreeMonoid Char
> e1 = FreeMonoid (leaf 'a' `mappend` (leaf 'b' `mappend` leaf 'c'))
> e2 = FreeMonoid ((leaf 'a' `mappend` leaf 'b') `mappend` leaf 'c')

Instead we can manually construct a type that does respect equality in monoids. Elements of FreeMonoid are binary trees with a `mappend` at each node. Associativity means that we can always replace a tree with an equivalent one where the left branch is a leaf. We can also use the laws to eliminate any occurrence of mempty. So every element of FreeMonoid a is equivalent to one of the form:
Leaf x1 `mappend` (Leaf x2 `mappend` (... mempty))


In other words, free monoids are lists. We can make this explicit. The standard prelude already makes [] an instance of Monoid so we just need:

> instance Free a [a] where
>       leaf x = [x]

Here's the isomorphism (modulo tree equivalence):

> iso1 :: FreeMonoid a -> [a]
> iso1 (FreeMonoid x) = x

> iso1' :: [a] -> FreeMonoid a
> iso1' [] = FreeMonoid mempty
> iso1' (a : as) = let FreeMonoid r = iso1' as
>                  in FreeMonoid (leaf a `mappend` r)

As I talked about in that earlier article, free algebras give monads and the trees representing expressions in the algebra can be thought of as abstract syntax trees for domain specific languages. In this case it's the usual list monad. So the Monoid type class gives us a language for talking about non-determinism. The operation mappend gives us a way to "fork" a process and mempty gives as a way to "kill a thread". Here's an example using non-determinism to search for some Pythagorean triples:

> test1 :: [(Int, Int, Int)]
> test1 = do
>   a <- return 3 `mappend` return 4
>   b <- return 4 `mappend` return 5
>   c <- return 5 `mappend` return 6
>   if a*a+b*b==c*c then return (a, b, c) else mempty

Effects form M-sets
We can do exactly the same for -sets.

> class Monoid m => MSet m s where
>       act :: m -> s -> s

> data FreeMSet w a = FreeMSet (forall b. (MSet w b, Free a b) => b)

> instance Monoid w => MSet w (FreeMSet w a) where
>   m `act` FreeMSet b = FreeMSet (m `act` b)

Again we have the problem that FreeMSet doesn't automatically make equivalent elements equal. But it's not hard to see that every element of FreeMSet is equivalent to one of the form:
m `act` (leaf x)
So the free -set on the set of variables is simply the set of pairs . This is the basis of Haskell's writer monad:

> instance Monoid w => MSet w (Writer w a) where
>   act w1 m = let (a, w2) = runWriter m in WriterT (Identity (a, w1 `mappend` w2))

> instance Monoid w => Free a (Writer w a) where
>   leaf x = return x

Here's the isomorphism (again treating equivalent elements of FreeMSet as equal):

> iso2 :: Monoid w => FreeMSet w a -> Writer w a
> iso2 (FreeMSet x) = x

> iso2' :: Writer w a -> FreeMSet w a
> iso2' m = let (a, w) = runWriter m in FreeMSet (act w (leaf a))

And now the -set operation gives us an interface to an effect. This time the side effect of accumulating in a monoid:

> test2 :: Writer String Int
> test2 = do
>   act "foo" (return ())
>   a <- return 2
>   act "bar" (return ())
>   b <- return (10*a)
>   return b

Combining effects

And now we can finally combine the two effects of non-determinism and accumulation. We make the free algebra that is both a monoid and an -set:

> data FreeMMonoid w a = FreeMMonoid (forall b. (Monoid b, MSet w b, Free a b) => b)

> instance Monoid w => Monoid (FreeMMonoid w a) where
>   mempty = FreeMMonoid mempty
>   FreeMMonoid a `mappend` FreeMMonoid b = FreeMMonoid (a `mappend` b)

> instance Monoid w => MSet w (FreeMMonoid w a) where
>   m `act` FreeMMonoid b = FreeMMonoid (m `act` b)

Again we have the problem that equivalent elements aren't recognised as equal so we have to manually find a suitable type. For this we need to use the compatibility notion I introduced in Part 1. We can take 2 variables and and write them in a 1 by 2 array:

Apply mappend horizontally and act vertically to get:
m `act` (x `mappend` y)
Now apply act vertically and then mappend horizontally to get:
(m `act` x) `mappend` (m `act` y)
The law we want is:
m `act` (x `mappend` y) == (m `act` x) `mappend` (m `act` y)
Given an arbitrary tree in FreeMMonoid we can use this law to "push" all occurrences of act inwards. Ultimately every element can be written uniquely in the form:
act m1 (leaf x1) `mappend` (act m2 (leaf x2) `mappend` (... mempty)


We can then use the same argument as above to show that we end up with a list of pairs of elements of . This is exactly what we get if we apply the WriterT monad transformer to []. Here are the relevant instances:

> instance Monoid w => Monoid (WriterT w [] a) where
>   mempty = WriterT []
>   WriterT xs `mappend` WriterT ys = WriterT (xs ++ ys)

> instance Monoid w => MSet w (WriterT w [] a) where
>   m `act` WriterT xs = WriterT $ map (\(x, w) -> (x, m `mappend` w)) xs

> instance Monoid w => Free a (WriterT w [] a) where
>   leaf x = return x

Here's the isomorphism though we won't use it:

> iso3 :: Monoid w => FreeMMonoid w a -> WriterT w [] a
> iso3 (FreeMMonoid x) = x

> iso3' :: Monoid w => WriterT w [] a -> FreeMMonoid w a
> iso3' m = let xws = runWriterT m in FreeMMonoid $
>     foldr mappend mempty $ map (\(x, w) -> act w (leaf x)) xws

The monad WriterT (Product Float) [] is in fact the probability monad. Here's an example of its use:

> coin :: (Monoid a, MSet (Product Float) a, Free Bool a) => a
> coin = act (Product 0.5 :: Product Float) (leaf False)
>            `mappend`
>        act (Product 0.5 :: Product Float) (leaf True)

Compute unnormalised conditional probability distribution on a pair of coin tosses given that first coin can't be True unless second one is:

> test3 :: WriterT (Product Float) [] (Bool, Bool)
> test3 = do
>   coin1 <- coin
>   coin2 <- coin
>   if coin1>coin2 then mempty else return (coin1, coin2)

(Compare with Eric Kidd's article that also 'refactors' probability theory.)

What just happened?
Something miraculous just happened though it may have been lost in the details. We combined the list monad and the writer monad to get a new monad. We did it without using monad transformers and without specifying an order for the two monads. It just so happens in this case that the result was the same as using a monad transformer.

M-set with M-set
We can try other products of theories. It's tricky to deal with a theory combined with itself because repeating a type class in a context doesn't do anything. We need to make another type class that looks exactly like MSet but with different names. The result is that the product of the theory of -sets and the theory of -sets is the theory of -sets. This agrees with what we'd get from using monad transformers. It also agrees with intuition. -sets correspond to the effect of accumulating data in a monoid. The product theory corresponds to using two accumulators simultaneously.

(This makes me think type classes should take as arguments the name of the operations within them. That way a type can be an instance of the same type class in multiple ways. Compare with Agda modules.)

Monoid with monoid
This example illustrates why we can't expect a programming language to use the above method to combine theories. If an algebra has two multiplication operators with identities on it, and the two operators are compatible, then something surprising happens. The multiplications turn out to be the same operation. What's more, the operation is commutative. So the product of the theory of monoids with itself is the theory of commutative monoids. A free commutative monoid is a multiset. Multisets require a very different implementation to lists and I doubt any automatic algebra combiner in the near future could discover one. (The Eckmann-Hilton argument also appears here.)

The compatibility condition
To form the product of two theories we add in extra laws to ensure commutativity. If we don't add in such laws we get the sum of two theories. For the example theories I used here these theories can lead to quite complex types. For example the sum of the theory of -sets and -sets is, I think, the theory of -sets where is the "free product" of monoids. I this is a bit of a messy object from the perspective of types. Other effects, however, may behave nicely with respect to . I haven't yet investigated.

Conclusion
If you don't mind computing the relevant types by hand there are perfectly good alternative to monad transformers for combining effects. But it seems very difficult to automatically combine theories. In fact, I expect finding canonical forms for the elements of free algebras for a product theory isn't even computable. So this approach isn't going to replace monad transformers any time soon.

Exercise
Make a multiplication table showing the result of forming the product of algebras for lots of useful effects.

Sunday, February 05, 2012

Lawvere theories made a bit easier

Introduction
I still don't think anyone has found a completely satisfactory way to combine effects in Haskell. (That's computational effects, not visual effects.) Monads are great for one kind of effect at a time, but it's not clear how to combine two arbitrary monads. Instead of monads we can work with monad transformers, but they are tricky both to implement and to use.

I want to sketch a different, though incomplete approach to combining effects. There are a bunch of papers that describe this approach, and even some code that implements part of it. Almost everything I say is from a paper by Hyland and Powers that I read a few years ago though I recently ran into this helpful answer by Andrej Bauer on mathoverflow. Even if we don't get code we can immediately use, we still get a good way to think about and analyse effects.

I'll get onto the effects part in another post. This one concentrates on what are known as Lawvere theories.

Monoids

> {-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, FlexibleContexts, ExplicitForAll #-}
> import Data.Monoid

The (finite) lists of integers form a monoid. Here are some functions that operate on such lists:

> f1 xs = map (+1) $ reverse xs
> f2 xs = xs `mappend` xs

They can both be given signature [Integer] -> [Integer]. But there's one very important difference between these functions. f2 has been written using only operations from the type class Monoid. It's a sort of universal function that can be applied to any monoid. On the other hand, the function f1 can only applied to very specific monoids. In fact, the type signature of f2 can be written as:

> f2 :: forall a. Monoid a => a -> a

That type signature essentially says that we're not going to do anything with elements of a except use the interface defined by the Monoid type class.

(Although Haskell type classes can't enforce them, we also assume that any instance of Monoid satisfies the axioms for a monoid.)

We can also define functions on tuples of monoids. Here are some examples:

> g1 :: forall a. Monoid a => (a, a, a) -> (a, a)
> g1 (xs, ys, zs) = (mempty, ys `mappend` xs)

> g2 :: forall a. Monoid a => (a, a) -> (a, a)
> g2 (xs, ys) = (xs `mappend` ys, ys `mappend` xs)

Notice that we can compose these functions. So we have
g2 . g1 :: forall a. Monoid a => (a, a, a) -> (a, a)

We also have have identity functions for tuples. Armed with functions, identities and compositions can now form a category that I'll call . I'll call the (distinct) objects of this category , , and so on. The arrows from to are the total functions of type . So, for example, g1 is an arrow from to . Note that it doesn't matter what the objects are (as long as they're distinct). They're just placeholders between which we can string our arrows. Note how because of the universal quantifier, the functions we use have a type not of the form A -> B. So we can't represent the objects of our category as types in the usual way. We can think of mempty as a 0-ary operator, ie. an element of forall a. Monoid a => () -> a.

But there's one more detail I want. I'll consider two arrows to be equal if they can be proved equal using the axioms for monoids. For example, these two Haskell functions represent the same arrow in because of the associativity law:

> h1 (x, y, z) = (x `mappend` y) `mappend` z
> h2 (x, y, z) = x `mappend` (y `mappend` z)

We now have a bona fide category. Note that contains lots of arrows. Anything you can build using mempty and mappend as well as all of the projections and permutations between tuples. completely captures everything that can be deduced using the axioms for monoids. For example, the associativity law is contained in the fact that h1 and h2 represent the same arrow.

The category also has products. In fact it is given by with the projections back to the factors being represented by the obvious projection functions. serves as the product of no factors.

So captures the properties shared by all monoids. But what is its relationship to actual monoids? It's pretty nice. A monoid is a functor that preserves products.

Let's unpack that. First must take objects in to sets. But we've stipulated that so is completely determined on objects once we know . In fact will be the carrier for our monoid and is its th power. takes arrows in to functions on sets. So, for example, it gives concrete realisations of mempty and mappend. Because, for example, h1 and h2 represent the same arrow in , and must be equal. So associativity must hold for these realisations. The same goes for all of the other laws, and everything we can deduce from them. So the requirement of functoriality makes into a monoid with identity F(mempty) and product F(mappend).

Given an instance of the Monoid type class we can immediately apply an arrow from to it. The functor is applied implicitly by the Haskell compiler. For example h1 can be applied to ("a", "b", "c") :: (String, String, String).

The object in is weird. It's a lot like the universal monoid sharing all of the properties you expect to hold simultaneously in all monoids. Except for one important one: it's not a monoid itself.

Note that in a sense the category isn't anything new. It's just a convenient 'datastructure' into which we can pack everything we can deduce about monoids.

M-sets
Before generalising, let's try a similar treatment for another algebraic structure, the horribly named -set. An -set is a structure on a set that assumes we already have some choice of monoid . It defines an action of on the set. An action is simply a function defined for each element of and which is compatible with the operations on . Here's a suitable type class:

> class Monoid m => MSet m s where
> act :: m -> s -> s

and the laws are
act mempty x == x
act a (act b x) = act (a `mappend` b) x
We're thinking of act not as an operation on two arguments but instead thinking of act a being an operation on s for each element a. Note how that second law is really a lot of laws, one for every pair (a, b) in our monoid.

A simple example is the way scalars act on vectors.

> data Vector a = V a a a deriving Show
> instance Monoid a => MSet a (Vector a) where
>    act w (V x y z) = V (w `mappend` x) (w `mappend` y) (w `mappend` z)

Given any particular monoid (eg. String) we can define functions like:

> j1 :: forall a. MSet String a => (a, a) -> (a, a)
> j1 (x, y) = ("x" `act` x, "y" `act` y)
> j2 :: forall a. MSet String a => (a, a) -> a
> j2 (x, y) = "abc" `act` x

We can form a category in exactly the same way as for monoids. As the objects were just placeholders we may as well reuse them. The arrows from to are the functions on tuples we can make from repeated applications of act, ie. . For each choice of we get a new category encoding everything we want to know about -sets. In much the same way, any functor gives an -set. We have and maps act a to the actual action on the set.

Lawvere theories
So what do these two categories have in common?

Let's start with the simplest possible algebraic structure: a plain old set with no operations on it. The corresponding category will have objects . The arrows from to will be represented by functions of the form . That includes functions like the projections onto elements of tuples or permutations of tuples. This category is called .

Both and defined earlier contain all of the objects and arrows of . But both and also contain arrows corresponding to all of the operations you can perform in their respective algebraic structures. So we can define a Lawvere theory as nothing more than a category that is with extra arrows. The category defined earlier is the "theory of monoids" and is the "theory of -sets". A 'model' of , or a -algebra, is a product-preserving functor .

Product theories
Suppose we have two theories and . There's a straightforward way to combine them into one theory. Form the category that (like all Lawvere theories) shares the same objects as and but has all of the arrows from both. (Obviously we'll have to keep just one identity, not an identity from and another from .) The catch is that if is an arrow in and is an arrow in we'll need the composition too. We simply take the smallest set of arrows that contains all of the arrows from and and contains all of their compositions modulo all of the laws in and .

Thus far we already have a new Lawvere theory built from and . But sometimes it's useful to ensure the operations from are 'compatible', in some sense, with those from . We want them to commute with each other. I'll describe what that commutativity means now:

Suppose we have an arrows and on and respectively. If we have a set that is both a -algebra and a -algebra, then gives an operation and gives an operation . Write out an array of variables:

We can apply across the rows to get the array:

and now we can apply down the columns to get an array.

We could also have done this by applying down the columns first and then across the rows. We now throw in extra commutativity laws stating that these two operations give equal results, whatever the operations and .

For example, if the theory has a binary multiplication operator and the theory has binary multiplication operator then commutativity requires . This is the usual interchange law. In this example the commutativity law is therefore an assertion about the equality of two arrows in .

The result of combining the two theories and , additionally throwing in these commutativity laws, is known as the product theory .

Product theories in Haskell
Although we can't get Haskell to enforce the commutativity rules we can get part way to defining product theories using type classes. We make our type an instance of both classes. For and defined as the theories of monoids and -sets above, a -algebra is given by a type that is an instance of both Monoid and MSet. Unfortunately we can't make Haskell automatically enforce the rule that elements be equal if the laws say they should be.

Coming soon
We've defined Lawvere theories. I've explained previously how many kinds of effects can be modelled by algebraic theories. I've defined a product on Lawvere theories. Which means we now have a way to combine effects. But for that you'll have to wait for the sequel.

Blog Archive