### A tail we don't need to wag

**Introduction**

I've been reading a little about concentration inequalities recently. I thought it would be nice to see if you can use the key idea, if not the actual theorems, to reduce the complexity of computing the probability distribution of the outcome of stochastic simulations. Examples might include random walks, or queues.

The key idea behind concentration inequalities is that very often most of the probability is owned by a small proportion of the possible outcomes.
For example, if we toss a fair coin enough (say ) times we expect the number of heads to lie within of the mean about 99.99% of the time despite there being different total numbers possible.
The probable outcomes tend to concentrate around the expectation.
On the other hand, if we consider not the total number of heads, but the possible sequences of tosses, there are possibilities, all equally likely.
In this case there is no concentration.
So a key ingredient here is a reduction operation: in this case reducing a sequence of tosses to a count of the number that came up heads.
This is something we can use in a computer program.

I (and many others) have written about the "vector space" monad that can be used to compute probability distributions of outcomes of simulations and I'll assume some familiarity with that.
Essentially it is a "weighted list" monad which is similar to the list monad except that in addition to tracking all possible outcomes, it also propagates a probability along each path.
Unfortunately it needs to follow through every possible path through a simulation.
For example, in the case of simulating coin tosses it needs to track different possiblities, even though we're only interested in the possible sums.
If, after each bind operation of the monad, we could collect together all paths that give the same total then we could make this code much more efficient.
The catch is that to collect together elements of a type the elements need to be comparable, for example instances of `Eq` or `Ord`. This conflicts with the type of `Monad` which requires that we can use the `>>= :: m a -> (a -> m b) -> m b` and `return :: a -> m a` functions with any types `a` and `b`.

I'm going to deal with this by adapting a technique presented by Oleg Kiselyov for efficiently implementing the Set monad.
Instead of `Set` I'm going to use the `Map` type to represent probability distributions.
These will store maps saying, for each element of a type, what the probability of that element is.
So part of my code is going to be a direct translation of that code to use the `Map` type instead of the `Set` type.

> {-# LANGUAGE GADTs, FlexibleInstances #-} > {-# LANGUAGE ViewPatterns #-}The following code is very similar to Oleg's. But for first reading I should point out some differences that I want you to ignore. The type representing a probability distribution is

> module Main where

> import Control.Monad > import Control.Arrow > import qualified Data.Map as M > import qualified Data.List as L

`P`:

> data P p a where > POrd :: Ord a => p -> M.Map a p -> P p a > PAny :: p -> [(a, p)] -> P p aBut note how the constructors take two arguments - a number that is a probability, in addition to a weighted

`Map`or list. For now pretend that first argument is zero and that the functions called

`trimXXX`act similarly to the identity:

> instance (Ord p, Num p) => Functor (P p) where > fmap = liftMThis is a translation of Oleg's crucial function that allows us to take a weighted list of probability distributions and flatten them down to a single probability distribution:

> instance (Ord p, Num p) => Applicative (P p) where > pure = return > (<*>) = ap

> instance (Ord p, Num p) => Monad (P p) where > return x = PAny 0 [(x, 1)] > m >>= f = > let (e, pdf) = unP m > in trimAdd e $ collect $ map (f *** id) pdf

> returnP :: (Ord p, Num p, Ord a) => a -> P p a > returnP a = POrd 0 $ M.singleton a 1

> unP :: P p a -> (p, [(a, p)]) > unP (POrd e pdf) = (e, M.toList pdf) > unP (PAny e pdf) = (e, pdf)

> fromList :: (Num p, Ord a) => [(a, p)] -> M.Map a p > fromList = M.fromListWith (+)

> union :: (Num p, Ord a) => M.Map a p -> M.Map a p -> M.Map a p > union = M.unionWith (+)

> scaleList :: Num p => p -> [(a, p)] -> [(a, p)] > scaleList weight = map (id *** (weight *))

> scaleMap :: (Num p, Ord a) => p -> M.Map a p -> M.Map a p > scaleMap weight = fromList . scaleList weight . M.toList

> collect :: Num p => [(P p a, p)] -> P p a > collect [] = PAny 0 [] > collect ((POrd e0 pdf0, weight) : rest) = > let wpdf0 = scaleMap weight pdf0 > in case collect rest of > POrd e1 pdf1 -> POrd (weight*e0+e1) $ wpdf0 `union` pdf1 > PAny e1 pdf1 -> POrd (weight*e0+e1) $ wpdf0 `union` fromList pdf1 > collect ((PAny e0 pdf0, weight) : rest) = > let wpdf0 = scaleList weight pdf0 > in case collect rest of > POrd e1 pdf1 -> POrd (weight*e0+e1) $ fromList wpdf0 `union` pdf1 > PAny e1 pdf1 -> PAny (weight*e0+e1) $ wpdf0 ++ pdf1But now I really must explain what the first argument to

`POrd`and

`PAny`is and why I have all that "trimming".

Even though the `collect` function allows us to reduce the number of elements in our PDFs, we'd like to take advantage of concentration of probability to reduce the number even further.
The `trim` function keeps only the top probabilities in a PDF, discarding the rest.
To be honest, this is the only point worth taking away from what I've written here :-)

When we throw away elements of the PDF our probabilities no longer sum to 1.
So I use the first argument of the constructors as a convenient place to store the amount of probability that I've thrown away.
The `trim` function keeps the most likely outcomes and sums the probability of the remainder.
I don't actually need to keep track of what has been discarded.
In principle we could reconstruct this value by looking at how much the probabilities in our trimmed partial PDFs fall short of summing to 1.
But confirming that our discarded probability and our partial PDF sums to 1 gives a nice safety check for our code and can give us some warning if numerical errors start creeping in.
I'll call the total discarded probability the *tail* probability.

Here is the core function to keep the top values.
In this case is given by a global constant called `trimSize`.
(I'll talk about how to do this better later.)

> trimList :: (Ord p, Num p) => [(a, p)] -> (p, [(a, p)]) > trimList ps = > let (keep, discard) = L.splitAt trimSize (L.sortOn (negate . snd) ps) > in (sum (map snd discard), keep)And now some functions representing textbook probability distributions. First the uniform distribution on a finite set. Again this is very similar to Oleg's

> trimAdd :: (Ord p, Num p) => p -> P p a -> P p a > trimAdd e' (POrd e pdf) = > let (f, trimmedPdf) = trimList (M.toList pdf) > in POrd (e'+e+f) (M.fromList trimmedPdf) > trimAdd e' (PAny e pdf) = > let (f, trimmedPdf) = trimList pdf > in PAny (e'+e+f) trimmedPdf

> runP :: (Num p, Ord a) => P p a -> (p, M.Map a p) > runP (POrd e pdf) = (e, pdf) > runP (PAny e pdf) = (e, fromList pdf)

`chooseOrd`function apart from the fact that it assigns weights to each element:

> chooseP :: (Fractional p, Ord p, Ord a) => > [a] -> P p a > chooseP xs = let p = 1/fromIntegral (length xs) > in POrd 0 $ fromList $ map (flip (,) p) xsAnd the Bernoulli distribution, i.e. tossing a

`Bool`coin that comes up

`True`with probability :

> bernoulliP :: (Fractional p, Ord p) => > p -> P p Bool > bernoulliP p = POrd 0 $ fromList $ [(False, 1-p), (True, p)]Now we can try a random walk in one dimension. At each step we have a 50/50 chance of standing still or taking a step to the right:

> random_walk1 :: Int -> P Double Int > random_walk1 0 = returnP 0 > random_walk1 n = do > a <- random_walk1 (n-1) > b <- chooseP [0, 1] > returnP $ a+bBelow in

`main`we take 2048 steps but only track 512 probabilities. The tail probability in this case is about . So only tracking 1/4 of the outcomes has had almost no impact on the numbers. This also illustrates why it is good to track the tail probabilities rather than inferring them from the missing probabilities in the bulk of the PDF - they can be so small they vanish compared to floating poimnt errors. We can afford to track a lot fewer than 512 (out of 2049 possible) outcomes and still have a good representative PDF.

Now here's a two-dimensional random walk for 32 steps. The tail probability is about so we are getting a reasonably representative PDF. We have to run fewer steps than before, however, because the space of possible outcomes spans two dimensions, meaning that reduction doesn't help as much as it does in one dimension.

> random_walk2 :: Int -> (Int, Int) -> P Double (Int, Int) > random_walk2 0 (x, y) = returnP (x, y) > random_walk2 n (x, y) = do > (x',y') <- random_walk2 (n-1) (x, y) > dx <- chooseP [-1, 1] > dy <- chooseP [-1, 1] > returnP (x'+dx, y'+dy)One last simulation. This is a queing scenario. Tasks come in once every tick of the clock. There are four queues a task can be assigned to. A task is assigned to the shortest queue. Meanwhile each queue as a 1/4 probability of clearing one item at each tick of the clock. We build the PDF for the maximum length any queue has at any time.

The first argument to

`queue`is the number of ticks of the clock. The second argument is the list of lengths of the queues. It returns a PDF, not just on the current queue size, but also on the longest queue it has seen.

> queue :: Int -> [Int] -> P Double (Int, [Int]) > queue 0 ls = returnP (maximum ls, ls) > queue n ls = do > (longest, ls1) <- queue (n-1) ls > ls2 <- forM ls1 $ \l -> do > served <- bernoulliP (1/4) > returnP $ if served && l > 0 then l-1 else l > let ls3 = L.sort $ head ls2+1 : tail ls2 > returnP (longest `max` maximum ls3, ls3)For the queing simulation the tail probability is around despite the fact that we have discarded a vast possible set of possible outcomes.

It's a little ugly that

`trimSize`is a global constant:

> trimSize = 512The correct solution is probably to separate the probability "syntax" from its "semantics". In other words, we should implement a free monad supporting the language of probability with suitable constructors for

`bernoulliP`and

`choiceP`. We can then write a separate interpreter which takes a

`trimSize`as argument. This has another advantage too: the

`Monad`above isn't a true monad. It uses a greedy approach to discarding probabilities and different rearrangements of the code, that ought to give identical results, may end up diferent. By using a free monad we ensure that our interface is a true monad and we can put the part of the code that breaks the monad laws into the interpreter. The catch is that my first attempt at writing a free monad resulted in code with poor performance. So I'll leave an efficient version as an exercise :-)

> main = do > print $ runP $ random_walk1 2048 > print $ runP $ random_walk2 32 (0, 0) > print $ runP $ do > (r, _) <- queue 128 [0, 0, 0, 0] > returnP r