A Neighborhood of Infinity

Sunday, September 15, 2024

What does it take to be a hero? and other questions from statistical mechanics.

1 We only hear about the survivors

In the classic Star Trek episode Errand of Mercy, Spock computes the chance of success:

CAPTAIN JAMES T. KIRK : What would you say the odds are on our getting out of here?

MR. SPOCK : Difficult to be precise, Captain. I should say, approximately 7,824.7 to 1.

And yet they get out of there. Are Spock’s probability computations unreliable? Think of it another way. The Galaxy is a large place. There must be tens of thousands of Spocks, and Grocks, and Plocks out there on various missions. But we won’t hear (or don’t want to hear) about the failures. So they may all be perfectly good at probability theory, but we’re only hearing about the lucky ones. This is an example of survivor bias.

2 Simulation

We can model this. I’ve written a small battle simulator for a super-simple made up role-playing game...

And the rest of this article can be found at github

(Be sure to download the actual PDF if you want to be able to follow links.)

Saturday, September 07, 2024

How to hide information from yourself in a solo RPG

A more stable version of this article can be found on github.

The Problem

Since the early days of role-playing games there has been debate over which rolls the GM should make and which are the responsibility of the players. But I think that for “perception” checks it doesn’t really make sense for a player to roll. If, as a player, you roll to hear behind a door and succeed, but you’re told there is no sound, then you know there is nothing to be heard. But you ought to just be left in suspense.

If you play a solo RPG the situation is more challenging. If there is a probability p of a room being occupied, and probability q of you hearing the occupant if you listen at the door, how can you simulate listening without making a decision about whether the room is occupied before opening the door? I propose a little mathematical trick.

Helena Listening, by Arthur Rackham

Simulating conditional probabilities

Suppose P(M) = p and P(H|M) = q (and P(H|not M) = 0). Then P(H) = pq. So to simulate the probability of hearing something at a new door: roll to see if a monster is present, and then roll to hear it. If both come up positive then you hear a noise.

But...but...you object, if the first roll came up positive you know there is a monster, removing the suspense if the second roll fails. Well this process does produce the correct (marginal) probability of hearing a noise at a fresh door. So you reinterpret the first roll not as determining whether a monster is present, but as just the first step in a two-step process to determine if a sound is heard.

But what if no sound is heard and we decide to open the door? We need to reduce the probability that we find a monster behind the door. In fact we need to sample P(M|not H). We could use Bayes’ theorem to compute this but chances are you won’t have any selection of dice that will give the correct probability. And anyway, you don’t want to be doing mathematics in the middle of a game, do you?

There’s a straightforward trick. In the event that you heard no noise at the door and want to now open the door: roll (again) to see if there is a monster behind the door, and then roll to listen again. If the outcome of the two rolls matches the information that you know, ie. it predicts you hear nothing, then you can now accept the first roll as determining whether the monster is present. In that case the situation is more or less vacuously described by P(M|not H). If the two rolls disagree with what you know, ie. they predict you hear something, then repeat the roll of two dice. Keep repeating until it agrees with what you know.

In general

There is a general method here though it’s only practical for simple situations. If you need to generate some hidden variables as part of a larger procedure, just generate them as usual, keep the variables you observe, and discard the hidden part. If you ever need to generate those hidden variables again, and remain consistent with previous rolls, resimulate from the beginning, restarting the rolls if they ever disagree with your previous observations.

In principle you could even do something like simulate an entire fight against a creature whose hit points remain unknown to you. But you’ll spend a lot of time rerolling the entire fight from the beginning. So It’s better for situations that only have a small number of steps, like listening at a door.

Monday, August 21, 2023

What does it mean for a monad to be strong?

This is something I put on github years ago but I probably should have put it here.

Here's an elementary example of the use of the list monad:

> test1 = do
>   x <- [1, 2]
>   y <- [x, 10*x]
>   [x*y]

We can desugar this to:

> test2 = [1, 2] >>= \x -> [x, 10*x] >>= \y -> [x*y]

It looks like we start with a list and then apply a sequence (of length 2) of functions to it using bind (>>=). This is probably why some people call monads workflows and why the comparison has been made with Unix pipes.

But looks can be deceptive. The operator (>>=) is right associative and test2 is the same as test3:

> test3 = [1, 2] >>= (\x -> [x, 10*x] >>= \y -> [x*y])

You can try to parenthesise the other way:

> -- test4 = ([1, 2] >>= \x -> [x, 10*x]) >>= \y -> [x*y]

We get a "Variable not in scope: x" error. So test1 doesn't directly fit the workflow model. When people give examples of how workflow style things can be seen as monads they sometimes use examples where later functions don't refer to variables defined earlier. For example at the link I gave above the line m >>= x-> (n >>= y-> o) is transformed to (m >>= x-> n) >>= y-> o which only works if o makes no mention of x. I found similar things to be true in a number of tutorials, especially the ones that emphasise the Kleisli category view of things.

But we can always "reassociate" to the left with a little bit of extra work. The catch is that the function above defined by y-> ... "captures" x from its environment. So it's not just one function, it's a family of functions parameterised by x. We can fix this by making the dependence on x explicit. We can then pull the inner function out as it's no longer implicitly dependent on its immediate context. When compilers do this it's called lambda lifting.

Define (the weirdly named function) strength by

> strength :: Monad m => (x, m y) -> m (x, y)
> strength (x, my) = do
>   y <- my
>   return (x, y)

It allows us to smuggle x "into the monad".

And now we can rewrite test1, parenthesising to the left:

> test5 = ([1, 2] >>= \x -> strength (x, [x, 10*x])) >>= \(x, y) -> [x*y]

This is much more like a workflow. Using strength we can rewrite any (monadic) do expression as a left-to-right workflow, with the cost of having to throw in some applications of strength to carry along all of the captured variables. It's also using a composition of arrows in the Kleisli category.

A monad with a strength function is called a strong monad. Clearly all Haskell monads are strong as I wrote strength to work with any Haskell monad. But not all monads in category theory are strong. It's a sort of hidden feature of Haskell (and the category Set) that we tend not to refer to explicitly. It could be said that we're implicitly using strength whenever we refer to earlier variables in our do expressions.

Sunday, March 05, 2023

Constructing Clifford Algebras using the Super Tensor Product

Some literate Haskell but little about this code is specific to Haskell...

> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE TypeOperators #-}
> {-# LANGUAGE UndecidableInstances #-}
> 
> import GHC.TypeLits

Introduction

This is a followup to Geometric Algebra for Free and More Low Cost Geometric Algebra.

In those articles I showed how you could build up the Clifford algebras like so:

type Cliff1  = Complex R
type Cliff1' = Split R
type Cliff2  = Quaternion R
type Cliff2' = Matrix R
type Cliff3  = Quaternion Cliff1'
type Cliff3' = Matrix Cliff1
type Cliff4  = Quaternion Cliff2'
type Cliff4' = Matrix Cliff2
type Cliff5  = Quaternion Cliff3'
...

I used CliffN as the Clifford algebra for a negative definite inner product and CliffN' for the positive definite case. It's not a completely uniform sequence in the sense that CliffN is built from CliffN' for dimension two lower and you use a mix of Matrix and Quaternion.

The core principle making this work is that for type constructors $T$ implemented like Matrix, Quaternion etc. we have the property that

$TU\mathbb{R} = T\mathbb{R}\otimes U\mathbb{R}$

eg. Matrix (Quaternion Float) is effectively the same thing as Matrix Float $\otimes$ Quaternion Float.

But John Baez pointed out to me that you can build up the CliffN algebras much more simply enabling us to use these definitions:

> type Cliff1 = Complex Float
> type Cliff2 = Complex Cliff1
> type Cliff3 = Complex Cliff2
> type Cliff4 = Complex Cliff3
> type Cliff5 = Complex Cliff4

...

Or even better:

> type family Cliff (n :: Nat) :: * where
>   Cliff 0 = Float
>   Cliff n = Complex (Cliff (n - 1))

But there's one little catch. We have to work, not with the tensor product, but the super tensor product.

We define Complex the same way as before:

> data Complex a = C a a deriving (Eq, Show)

Previously we used a definition of multiplication like this:

instance Num a => Num (Complex a) where
  C a b * C c d = C (a * c - b * d) (a * d + b * c)

We can think of C a b in Complex R as representing the element $1\otimes a+i\otimes b$ $\mathbb{C}\otimes R$ . The definition of multiplication in a tensor product of algebras is defined by $(a\otimes b)(c\otimes d) = (ac)\otimes(bd)$ . So we have $(1\otimes a+i\otimes b)(1\otimes c+i\otimes d)$ $= (1\otimes a)(1\otimes c)+(1\otimes a)(i\otimes d)+(i\otimes b)(1\otimes c)+(i\otimes b)(i\otimes d)$ $= 1\otimes(ac)+i\otimes(ad)+i\otimes(bc)+i^2\otimes(bd)$ $= 1\otimes(ac-bd)+i\otimes(ad+bc)$ .

This means that line of code we wrote above defining * for Complex isn't simply a definition of multiplication of complex numbers, it says how to multiply in an algebra tensored with the complex numbers.

Let's go Super!

A superalgebra is an algebra graded by $\mathbb{Z}_2$ where $\mathbb{Z}_2$ is the ring of integers modulo 2. What that means is that we have some algebra $A$ that can be broken down as a direct sum $A_0\oplus A_1$ (the subscripts live in $\mathbb{Z}_2$ ) with the property that multiplication respects the grading, ie. if $x$ is in $A_i$ and $y$ is in $A_j$ then $xy$ is in $A_{i+j}$ .

The elements of $A_0$ are called "even" (or bosonic) and those in $A_1$ "odd" (or fermionic). Often even elements commute with everything and odd elements anticommute with each other but this isn't always the case. (The superalgebra is said to be supercommutative when this happens. This is a common pattern: a thing X becomes a superX if it has odd and even parts and swapping two odd things introduces a sign flip.)

The super tensor product is much like the tensor product but it respects the grading. This means that if $x$ is in $A_i$ and $y$ is in $B_j$ then $x\otimes y$ is in $(A\otimes B)_{i+j}$ . From now on I'm using $\otimes$ to mean super tensor product.

Multiplication in the super tensor product of two superalgebras $A$ and $B$ is now defined by the following modified rule: if $b$ is in $B_i$ and $c$ is in $A_j$ then $(a\otimes b)(c\otimes d) = (-1)^{ij}(ac)\otimes(bd)$ . Note that the sign flip arises when we shuffle an odd $c$ left past an odd $b$ .

The neat fact that John pointed out to me is that $\mathrm{Cliff_n}=\mathbb{C}\otimes\mathbb{C}\otimes\ldots\mbox{ \mathrm{n times} }\ldots\otimes\mathbb{C}$ .

We have to modify our definition of * to take into account that sign flip.

I initially wrote a whole lot of code to define a superalgebra as a pair of algebras with four multiplication operations and it got a bit messy. But I noticed that the only specifically superalgebraic operation I ever performed on an element of a superalgebra was negating the odd part of an element.

So I could define SuperAlgebra like so:

class SuperAlgebra a where
  conjugation :: a -> a

where conjugation is the negation of the odd part.

(I'm not sure if this operation corresponds to what is usually called conjugation in this branch of mathematics.)

But there's a little efficiency optimization I want to write. If I used the above definition, then later I'd often find myself computing a whole lot of negates in a row. This means applying negate to many elements of large algebraic objects even though any pair of them cancel each other's effect. So I add a little flag to my conjugation function that is used to say we want an extra negate and we can accumulate flips of a flag rather than flips of lots of elements.

> class SuperAlgebra a where
>   conjugation :: Bool -> a -> a

Here's our first instance:

> instance SuperAlgebra Float where
>   conjugation False x = x
>   conjugation True x = negate x

This is saying that the conjugation is the identity on Float but if we want to perform an extra flip we can set the flag to True. Maybe I should call it conjugationWithOptionalExtraNegation.

And now comes the first bit of non-trivial superalgebra:

> instance (Num a, SuperAlgebra a) => SuperAlgebra (Complex a) where
>   conjugation e (C a b) = C (conjugation e a) (conjugation (not e) b)

We consider $1$ to be even and $i$ to be odd. When we apply the conjugation to $1\otimes a+i\otimes b$ then we can just apply it directly to $a$ . But that $i\otimes$ flips the "parity" of $b$ (because tensor product respects the grading) so we need to swap when we use the conjugation. And that should explain why conjugation is defined the way it is.

Now we can use the modified rule for $\mathbb{C}\otimes$ defined above:

> instance (Num a, SuperAlgebra a) => Num (Complex a) where
>   fromInteger n = C (fromInteger n) 0
>   C a b + C a' b' = C (a + a') (b + b')
>   C a b * C c d = C (a * c - conjugation False b * d)
>                     (conjugation False a * d + b * c) 
>   negate (C a b) = C (negate a) (negate b)
>   abs = undefined
>   signum = undefined

For example, conjugation False is applied to the first $b$ on the RHS because $d$ implicitly represents an $id$ term and when expanding out the product we shuffle the (odd) $i$ in $id$ left of $b$ . It doesn't get applied to the second $ib$ because $ib$ and $c$ remain in the same order.

That's it!

Tests

I'll test it with some examples from Cliff3:

> class HasBasis a where
>   e :: Integer -> a


> instance HasBasis Float where
>   e = undefined


> instance (Num a, HasBasis a) => HasBasis (Complex a) where
>   e 0 = C 0 1
>   e n = C (e (n - 1)) 0


> make a b c d e f g h =
>   C (C (C a b) (C c d))
>     (C (C e f) (C g h))


> e1, e2, e3, e21, e31, e32, e321 :: Cliff 3
> e1 = e 0
> e2 = e 1
> e21 = e2 * e1
> e3 = e 2
> e31 = e3 * e1
> e32 = e3 * e2
> e321 = e3 * e2 * e1


> main = do
>     print (e1 * e1 + 1 == 0)
>     print (e31 * e31 + 1 == 0)
>     print (e3 * e3 + 1 == 0)
>     print (e21 * e21 + 1 == 0)
>     print (e2 * e2 + 1 == 0)
>     print (e32 * e32 + 1 == 0)
>     print (e321 * e321 - 1 == 0)
>     print (e3 * e2 * e1 - e321 == 0)
>     print (e2 * e1 - e21 == 0)
>     print (e3 * e1 - e31 == 0)
>     print (e3 * e2 - e32 == 0)
>     print (e21 * e32 - e31 == 0)

Observation

The implementation of multiplication looks remarkably like it's the Cayley-Dickson construction. It can't be (because iterating it three times gives you a non-associative algebra but the Clifford algebras are associative). Nonetheless, I think comparison with Cayley-Dickson may be useful.

Efficiency

As mentioned above, before I realised I just needed the conjugation operation I wrote the above code with an explicit split of a superalgebra into two pieces intertwined by four multiplications. I think the previous approach may have a big advantage - it may be possible to use variations on the well known "speed-up" of complex multiplication that uses three real multiplications instead of four. This should lead to a fast implementation of Clifford algebras.

Also be warned: you can kill GHC if you turn on optimization and try to multiply elements of high-dimensional Clifford algebras. I think it tries to inline absolutely everything and you end up with a block of code that grows exponentially with $n$ .

Note also that this code translates directly into many languages.