# A Neighborhood of Infinity

## Monday, January 09, 2017

### Building free arrows from components

Introduction

Gabriel Gonzalez has written quite a bit about the practical applications of free monads. And "haoformayor" wrote a great stackoverflow post on how arrows are related to strong profunctors. So I thought I'd combine these and apply them to arrows built from profunctors: free arrows. What you get is a way to use arrow notation to build programs, but defer the interpretation of those programs until later.

Heteromorphisms

Using the notation here I'm going to call an element of a type P a b, where P is a profunctor, a heteromorphism.

A product that isn't much of a product

As I described a while back you can compose profunctors. Take a look at the code I used, and also Data.Functor.Composition.

data Compose f g d c = forall a. Compose (f d a) (g a c)


An element of Compose f g d c is just a pair of heteromorphisms, one from each of the profunctors, f and g, with the proviso that the "output" type of one is compatible with the "input" type of the other. As products go it's pretty weak in the sense that no composition happens beyond the two objects being stored with each other. And that's the basis of what I'm going to talk about. The Compose type is just a placeholder for pairs of heteromorphisms whose actual "multiplication" is being deferred until later. This is similar to the situation with the free monoid, otherwise known as a list. We can "multiply" two lists together using mappend but all that really does is combine the elements into a bigger list. The elements themselves aren't touched in any way. That suggests the idea of using profunctor composition in the same way that (:) is used to pair elements and lists.

Free Arrows

Here's some code:

> {-# OPTIONS -W #-}
> {-# LANGUAGE ExistentialQuantification #-}
> {-# LANGUAGE Arrows #-}
> {-# LANGUAGE RankNTypes #-}
> {-# LANGUAGE TypeOperators #-}
> {-# LANGUAGE FlexibleInstances #-}

> import Prelude hiding ((.), id)
> import Control.Arrow
> import Control.Category
> import Data.Profunctor
> import Data.Monoid

> infixr :-

> data FreeA p a b = PureP (a -> b)
>                  | forall x. p a x :- FreeA p x b


First look at the second line of the definition of FreeA. It says that a FreeA p a b might be a pair consisting of a head heteromorphism whose output matches the input of another FreeA. There's also the PureP case which is acting like the empty list []. The reason we use this is that for our composition, (->) acts a lot like the identity. In particular Composition (->) p a b is isomorphic to p a b (modulo all the usual stuff about non-terminating computations and so on). This is because an element of this type is a pair consisting of a function a -> x and a heteromorphism p x b for some type x we don't get to see. We can't project back out either of these items without information about the type of x escaping. So the only thing we can possibly do is use lmap to apply the function to the heteromorphism giving us an element of p a b.

Here is a special case of PureP we'll use later:

> nil :: Profunctor p => FreeA p a a
> nil = PureP id


So an element of FreeA is a sequence of heteromorphisms. If heteromorphisms are thought of as operations of some sort, then an element of FreeA is a sequence of operations waiting to be composed together into a program that does something. And that's just like the situation with free monads. Once we've build a free monad structure we apply an interpreter to it to evaluate it. This allows us to separate the "pure" structure representing what we want to do from the code that actually does it.

The first thing to note is our new type is also a profunctor. We can apply lmap and rmap to a PureP function straightforwardly. We apply lmap directly to the head of the list and we use recursion to apply rmap to the PureP at the end:

> instance Profunctor b => Profunctor (FreeA b) where
>     lmap f (PureP g) = PureP (g . f)
>     lmap f (g :- h) = (lmap f g) :- h
>     rmap f (PureP g) = PureP (f . g)
>     rmap f (g :- h) = g :- (rmap f h)


We also get a strong profunctor by applying first' all the way down the list:

> instance Strong p => Strong (FreeA p) where
>     first' (PureP f) = PureP (first' f)
>     first' (f :- g) = (first' f) :- (first' g)


We can now concatenate our lists of heteromorphisms using code that looks a lot like the typical implementation of (++):

> instance Profunctor p => Category (FreeA p) where
>     id = PureP id
>     g . PureP f = lmap f g
>     k . (g :- h) = g :- (k . h)


Note that it's slightly different to what you might have expected compared to (++) because we tend to write composition of functions "backwards". Additionally, there is another definition of FreeA we could have used that's analogous to using snoc lists instead of cons lists.

And now we have an arrow. I'll leave the proofs that the arrow laws are obeyed as an exercise :-)

> instance (Profunctor p, Strong p) => Arrow (FreeA p) where
>     arr = PureP
>     first = first'


The important thing about free things is that we can apply interpreters to them. For lists we have folds:

foldr :: (a -> b -> b) -> b -> [a] -> b


In foldr f e we can think of f as saying how (:) should be interpreted and e as saying how [] should be interpreted.

Analogously, in Control.Monad.Free in the free package we have:

foldFree :: Monad m => (forall x . f x -> m x) -> Free f a -> m a
foldFree _ (Pure a)  = return a
foldFree f (Free as) = f as >>= foldFree f


Given a natural transformation from f to m, foldFree extends it to all of Free f.

> foldFreeA :: (Profunctor p, Arrow a) =>
>              (forall b c.p b c -> a b c) -> FreeA p b c -> a b c
> foldFreeA _ (PureP g) = arr g
> foldFreeA f (g :- h) = foldFreeA f h . f g


It's a lot like an ordinary fold but uses the arrow composition law to combine the interpretation of the head with the interpretation of the tail.

"Electronic" components

Let me revisit the example from my previous article. I'm going to remove things I won't need so my definition of Circuit is less general here. Free arrows are going to allow us to define individual components for a circuit, but defer exactly how those components are interpreted until later.

I'll use four components this time: a register we can read from, one we can write from and a register incrementer, as well as a "pure" component. But before that, let's revisit Gabriel's article that gives some clues about how components should be built. In particular, look at the definition of TeletypeF:

data TeletypeF x
= PutStrLn String x
| GetLine (String -> x)
| ExitSuccess


We use GetLine to read a string, and yet the type of GetLine k could be TeletypeF a for any a. The reason is that free monads work with continuations. Instead of GetLine returning a string to us, it's a holder for a function that says what we'd like to do with the string once we have it. That means we can leave open the question of where the string comes from. The function foldFree can be used to provide the actual string getter.

Free arrows are like "two-sided" free monads. We don't just provide a continuation saying what we'd like to do to our output. We also get to say how we prepare our data for input.

There's also some burden put on us. Free arrows need strong profunctors. Strong profunctors need to be able to convey extra data alongside the data we care about - that's what first' is all about. This means that even though Load is functionally similar to GetLine, it can't simply ignore its input. So we don't have Load (Int -> b), and instead have Load ((a, Int) -> b. Here is our component type:

> data Component a b = Load ((a, Int) -> b)
>                    | Store (a -> (b, Int))
>                    | Inc (a -> b)


The Component only knows about the data passing through, of type a and b. It doesn't know anything about how the data in the registers is stored. That's the part that will be deferred to later. We intend for Inc to increment a register. But as it doesn't know anything about registers nothing in the type of Inc refers to that. (It took a bit of experimentation for me to figure this out and there may be other ways of doing things. Often with code guided by category theory you can just "follow your nose" as there's one way that works and type checks. Here I found a certain amount of flexibility in how much you store in the Component and how much is deferred to the interpreter.)

I could implement the strong profunctor instances using various combinators but I think it might be easiest to understand when written explicitly with lambdas:

> instance Profunctor Component where
>     lmap f (Load g) = Load $\(a, s) -> g (f a, s) > lmap f (Store g) = Store (g . f) > lmap f (Inc g) = Inc (g . f) > rmap f (Load g) = Load (f . g) > rmap f (Store g) = Store$ \a -> let (b, t) = g a
>                                      in  (f b, t)
>     rmap f (Inc g) = Inc (f . g)

> instance Strong Component where
>     first' (Load g) = Load $\((a, x), s) -> (g (a, s), x) > first' (Store g) = Store$ \(a, x) -> let (b, t) = g a
>                                           in  ((b, x), t)
>     first' (Inc g) = Inc (first' g)


And now we can implement individual components. First a completely "pure" component:

> add :: Num a => FreeA Component (a, a) a
> add = PureP $uncurry (+)  And now the load and store operations. > load :: FreeA Component () Int > load = Load (\(_, a) -> a) :- nil > store :: FreeA Component Int () > store = Store (\a -> ((), a)) :- nil > inc :: FreeA Component a a > inc = Inc id :- nil  Finally we can tie it all together in a complete function using arrow notation: > test = proc () -> do > () <- inc -< () > a <- load -< () > b <- load -< () > c <- add -< (a, b) > () <- store -< c > returnA -< ()  At this point, the test object is just a list of operations waiting to be executed. Now I'll give three examples of semantics we could provide. The first uses a state arrow type similar to the previous article: > newtype Circuit s a b = C { runC :: (a, s) -> (b, s) } > instance Category (Circuit s) where > id = C id > C f . C g = C (f . g) > instance Arrow (Circuit s) where > arr f = C$ \(a, s) -> (f a, s)
>     first (C g) = C $\((a, x), s) -> let (b, t) = g (a, s) > in ((b, x), t)  Here is an interpreter that interprets each of our components as an arrow. Note that this is where, among other things, we provide the meaning of the Inc operation: > exec :: Component a b -> Circuit Int a b > exec (Load g) = C$ \(a, s) -> (g (a, s), s)
> exec (Store g) = C $\(a, _) -> g a > exec (Inc g) = C$ \(a, s) -> (g a, s+1)


Here's a completely different interpreter that is going to make you do the work of maintaining the state used by the resgisters. You'll be told what to do! We'll use the Kleisli IO arrow to do the I/O.

> exec' :: Component a b -> Kleisli IO a b
> exec' (Load g) = Kleisli $\a -> do > putStrLn "What is your number now?" > s <- fmap read getLine > return$ g (a, s)
> exec' (Store g) = Kleisli $\a -> do > let (b, t) = g a > putStrLn$ "Your number is now " ++ show t ++ "."
>     return b
> exec' (Inc g) = Kleisli $\a -> do > putStrLn "Increment your number." > return$ g a


The last interpreter is simply going to sum values associated to various components. They could be costs in dollars, time to execute, or even strings representing some kind of simple execution trace.

> newtype Labelled m a b = Labelled { unLabelled :: m }

> instance Monoid m => Category (Labelled m) where
>     id = Labelled mempty
>     Labelled a . Labelled b = Labelled (a mappend b)

> instance Monoid m => Arrow (Labelled m) where
>     arr _ = Labelled mempty
>     first (Labelled m) = Labelled m

> exec'' (Load _) = Labelled (Sum 1)
> exec'' (Store _) = Labelled (Sum 1)
> exec'' (Inc _) = Labelled (Sum 2)


Note that we can't assign non-trivial values to "pure" operations.

And now we execute all three:

> main = do
>     print $runC (foldFreeA exec test) ((), 10) > putStrLn "Your number is 10." >> runKleisli (foldFreeA exec' test) () > print$ getSum $unLabelled$ foldFreeA exec'' test



Various thoughts

I don't know if free arrows are anywhere near as useful as free monads, but I hope I've successfully illustrated one application. Note that because arrow composition is essentially list concatenation it may be more efficient to use a version of Hughes lists. This is what the Cayley representation is about in the monoid notions paper. But it's easier to see the naive list version first. Something missing from here that is essential for electronics simulation is the possibility of using loops. I haven't yet thought too much about what it means to build instances of ArrowLoop freely.

Profunctors have been described as decategorised matrices in the sense that p a b, with p a profunctor, is similar to the matrix . Or, if you're working in a context where you distinguish between co- and contravariant vectors, it's similar to . The Composition operation is a lot like the definition of matrix product. From this perspective, the FreeA operation is a lot like the function on matrices that takes to . To work with ArrowLoop we need a trace-like operation.

One nice application of free monads is in writing plugin APIs. Users can write plugins that link to a small library based on a free monad. These can then be dynamically loaded and interpreted by an application at runtime, completely insulating the plugin-writer from the details of the application. You can think of it as a Haskell version of the PIMPL idiom. Free arrows might give a nice way to write plugins for dataflow applications.

People typically think of functors as containers. So in a free monad, each element is a container of possible futures. In a free arrow the relationship between the current heteromorphism and its "future" (and "past") is a bit more symmetrical. For example, for some definitions of P, a heteromorphism P a b can act on some as to give us some bs. But some definitions of P can run "backwards" and act on elements of b -> r to give us elements of a -> r. So when I use the words "input" and "output" above, you might not want to take them too literally.

## Saturday, January 07, 2017

### Addressing Pieces of State with Profunctors

Attempted segue

Since I first wrote about profunctors there has been quite a bit of activity in the area so I think it's about time I revisited them. I could just carry on from where I left off 5 years ago but there have been so many tutorials on the subject that I think I'll have to assume you've looked at them. My favourite is probably Phil Freeman's Fun with Profunctors. What I intend to do here is solve a practical problem with profunctors.

The problem

Arrows are a nice mechanism for building circuit-like entities in code. In fact, they're quite good for simulating electronic circuits. Many circuits are very much like pieces of functional code. For example an AND gate like this



can be nicely modelled using a pure function: c = a && b. But some components, like flip-flops, have internal state. What comes out of the outputs isn't a simple function of the inputs right now, but depends on what has happened in the past. (Alternatively you can take the view that the inputs and outputs aren't the current values but the complete history of the values.)

We'll use (Hughes) arrows rather than simple functions. For example, one kind of arrow is the Kleisli arrow. For the case of Kleisli arrows built from the state monad, these are essentially functions of type a -> s -> (b, s) where s is our state. We can write these more symmetrically as functions of type (a, s) -> (b, s). We can think of these as "functions" from a to b where the output is allowed to depend on some internal state s. I'll just go ahead and define arrows like this right now.

First the extensions and imports:

> {-# OPTIONS -W #-}
> {-# LANGUAGE Arrows #-}
> {-# LANGUAGE RankNTypes #-}
> {-# LANGUAGE FlexibleInstances #-}

> import Prelude hiding ((.), id)
> import Control.Arrow
> import Control.Category
> import Data.Profunctor
> import Data.Tuple


And now I'll define our stateful circuits. I'm going to make these slightly more general than I described allowing circuits to change the type of their state:

> newtype Circuit s t a b = C { runC :: (a, s) -> (b, t) }

> instance Category (Circuit s s) where
>     id = C id
>     C f . C g = C (f . g)

> instance Arrow (Circuit s s) where
>     arr f = C $\(a, s) -> (f a, s) > first (C g) = C$ \((a, x), s) -> let (b, t) = g (a, s)
>                                       in  ((b, x), t)


This is just a more symmetrical rewrite of the state monad as an arrow. The first method allows us to pass through some extra state, x, untouched.

Now for some circuit components. First the "pure" operations, a multiplier and a negater:

> mul :: Circuit s s (Int, Int) Int
> mul = C $\((x, y), s) -> (x*y, s) > neg :: Circuit s s Int Int > neg = C$ \(x, s) -> (-x, s)


And now some "impure" ones that read and write some registers as well as an accumulator:

> store :: Circuit Int Int Int ()
> store = C $\(x, _) -> ((), x) > load :: Circuit Int Int () Int > load = C$ \((), s) -> (s, s)

> accumulate :: Circuit Int Int Int Int
> accumulate = C $\(a, s) -> (a, s+a)  I'd like to make a circuit that has lots of these components, each with its own state. I'd like to store all of these bits of state in a larger container. But that means that each of these components needs to have a way to address its own particular substate. That's the problem I'd like to solve. Practical profunctor optics In an alternative universe lenses were defined using profunctors. To find out more I recommend Phil Freeman's talk that I linked to above. Most of the next paragraph is just a reminder of what he says in that talk and I'm going to use the bare minimum to do the job I want. Remember that one of the things lenses allow you to do is this: suppose we have a record s containing a field of type a and another similar enough kind of record t with a field of type b. Among other things, a lens gives a way to take a rule for modifying the a field to a b field and extend it to a way to modify the s record into a t record. So we can think of lenses as giving us functions of type (a -> b) -> (s -> t). Now if p is a profunctor then you can think of p a b as being a bit function-like. Like functions, profunctors typically (kinda, sorta) get used to consume (zero or more) objects of type a and output (zero or more) objects of type b. So it makes sense to ask our lenses to work with these more general objects too, i.e. we'd like to be able to get something of type p a b -> p s t out of a lens. A strong profunctor is one that comes pre-packed with a lens that can do this for the special case where the types s and t are 2-tuples. But you can think of simple records as being syntactic sugar for tuples of fields, so strong profunctors also automatically give us lenses for records. Again, watch Phil's talk for details. So here is our lens type: > type Lens s t a b = forall p. Strong p => p a b -> p s t  Here are lenses that mimic the well known ones from Control.Lens: > _1 :: Lens (a, x) (b, x) a b > _1 = first' > _2 :: Lens (x, a) (x, b) a b > _2 = dimap swap swap . first'  (Remember that dimap is a function to pre- and post- compose a function with two others.) Arrows are profunctors. So Circuit s s, when wrapped in WrappedArrow, is a profunctor. So now we can directly use the Circuit type with profunctor lenses. This is cool, but it doesn't directly solve our problem. So we're not going to use this fact. We're interested in addressing the state of type s, not the values of type a and b passed through our circuits. In other words, we're interested in the fact that Circuit s t a b is a profunctor in s and t, not a and b. To make this explicit we need a suitable way to permute the arguments to Circuit: > newtype Flipped p s t a b = F { unF :: p a b s t }  (It was tempting to call that ComedyDoubleAct.) And now we can define: > instance Profunctor (Flipped Circuit a b) where > lmap f (F (C g)) = F$ C $\(a, s) -> g (a, f s) > rmap f (F (C g)) = F$ C $\(a, s) -> let (b, t) = g (a, s) > in (b, f t) > instance Strong (Flipped Circuit a b) where > first' (F (C g)) = F$ C $\(a, (s, x)) -> let (b, t) = g (a, s) > in (b, (t, x))  Any time we want to use this instance of Profunctor with a Circuit we have to wrap everything with F and unF. The function dimap gives us a convenient way to implement such wrappings. Let's implement an imaginary circuit with four bits of state in it.   Here is the state: > data CPU = CPU { _x :: Int, _y :: Int, _z :: Int, _t :: Int } deriving Show  As I don't have a complete profunctor version of a library like Control.Lens with its template Haskell magic I'll set things up by hand. Here's a strong-profunctor-friendly version of the CPU and a useful isomorphism to go with it: > type ExplodedCPU = (Int, (Int, (Int, Int))) > explode :: CPU -> ExplodedCPU > explode (CPU u v w t) = (u, (v, (w, t))) > implode :: ExplodedCPU -> CPU > implode (u, (v, (w, t))) = CPU u v w t  And now we need adapters that take lenses for an ExplodedCPU and (1) apply them to a CPU the way Control.Lens would... > upgrade :: Profunctor p => > (p a a -> p ExplodedCPU ExplodedCPU) -> > (p a a -> p CPU CPU) > upgrade f = dimap explode implode . f > x, y, z, t :: Flipped Circuit a b Int Int -> Flipped Circuit a b CPU CPU > x = upgrade _1 > y = upgrade$ _2 . _1
> z = upgrade $_2 . _2 . _1 > t = upgrade$ _2 . _2 . _2


...and (2) wrap them so they can be used on the flipped profunctor instance of Circuit:

> (!) :: p s t a b -> (Flipped p a b s t -> Flipped p a b s' t') ->
>        p s' t' a b
> x ! f = dimap F unF f x


After all that we can now write a short piece of code that represents our circuit. Notice how we can apply the lenses x, ..., t directly to our components to get them to use the right pieces of state:

> test :: Circuit CPU CPU () ()
> test = proc () -> do
>     a  <- load ! x       -< ()
>     b  <- load ! y       -< ()
>     c  <- mul            -< (a, b)
>     d  <- neg            -< c
>     e  <- accumulate ! t -< d
>     () <- store ! z      -< e

>     returnA              -< ()

> main :: IO ()
> main = do
>     print \$ runC test ((), CPU 2 30 400 5000)


Of course with a suitable profunctor lens library you can do a lot more, like work with traversable containers of components.

Note that we could also write a version of all this code using monads instead of arrows. But it's easier to see the symmetry in Flipped Circuit when using arrows, and it also sets the scene for the next thing I want to write about...

## Sunday, October 16, 2016

### Expectation-Maximization with Less Arbitrariness

Introduction

There are many introductions to the Expectation-Maximisation algorithm. Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic. They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems. Examples of tricks I've seen used are:

1. Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to this way of writing this expression and not that one which is equal?
2. Substituting in the middle of an expression. Again, you can use just about anywhere. Why choose this at this time? Similarly I found derivations that insert a into an expression.
3. Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.
My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary. There's nothing original here, I'm merely expanding on a stackexchange answer.

The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe.

First assume no hidden variables. We assume there is a vector of parameters that defines some model. We make some observations . We have a probability density that depends on . The likelihood of given the observations is . The maximum likelhood estimator for is the choice of that maximises for the we have observed.

Now suppose there are also some variables that we didn't get to observe. We assume a density . We now have

where we sum over all possible values of . The MLE approach says we now need to maximise
One of the things that is a challenge here is that the components of might be mixed up among the terms in the sum. If, instead, each term only referred to its own unique block of , then the maximisation would be easier as we could maximise each term independently of the others. Here's how we might move in that direction. Consider instead the log-likelihood
Now imagine that by magic we could commute the logarithm with the sum. We'd need to maximise
One reason this would be to our advantage is that often takes the form where is a simple function to optimise. In addition, may break up as a sum of terms, each with its own block of 's. Moving the logarithm inside the sum would give us something we could easily maximise term by term. What's more, the for each is often a standard probability distribution whose likelihood we already know how to maximise. But, of course, we can't just move that logarithm in.

Maximisation by proxy

Sometimes a function is too hard to optimise directly. But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead. That will give us a new guess and we can continue from there. This is the basis of gradient descent. Suppose is a differentiable function in a neighbourhood of . Then around we have

We can try optimising with respect to within a neighbourhood of . If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent. (Note that picking a circular neighbourhood is itself a somewhat arbitrary step, but that's another story.) For gradient descent we're choosing because it matches both the value and derivatives of at . We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration.

We want our logarithm of a sum to be a sum of logarithms. But instead we'll settle for a proxy function that is a sum of logarithms. We'll make the derivatives of the proxy match those of the original function precisely so we're not making an arbitrary choice.

Write

The are constants we'll determine. We want to match the derivatives on either side of the at :
On the other hand we have

To achieve equality we want to make these expressions match. We choose

Our desired proxy function is:

So the procedure is to take an estimated and obtain a new estimate by optimising this proxy function with respect to . This is the standard EM algorithm.

It turns out that this proxy has some other useful properties. For example, because of the concavity of the logarithm, the proxy is always smaller than the original likelihood. This means that when we optimise it we never optimise too far'' and that progress optimising the proxy is always progress optimising the original likelihood. But I don't need to say anything about this as it's all part of the standard literature.

Afterword

As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute

you can iterate, at each step computing
where is the previous iteration. If the take a convenient form then this may turn out to be much easier.

Note

This was originally written as a PDF using LaTeX. It'll be available here for a while. Some fidelity was lost when converting it to HTML.