Saturday, October 31, 2009

Buffon's Needle, the Easy Way

Buffon's needle is a popular probability problem. Rule lines on the floor a distance d apart and toss a needle of length l<d onto it. What is the probability that the needle crosses a line? A solution is described at wikipedia but it involves a double integral and some trigonometry. Nowhere does it mention that there is a less familiar but much simpler proof, though if you follow the links you'll find it. In addition, the usual solution involves π but gives little intuition as to why π appears. The simpler proof reveals that it appears naturally as a ratio of the circumference of a circle to its diameter. I've known this problem since I was a kid and yet I hadn't seen the simpler proof until a friend sold me his copy of Introduction to Geometric Probability for $5 a few days ago.

So instead of solving Buffon's needle problem we'll solve what appears to be a harder problem: when thrown, what is the expectation of the number of times a rigid curved (in a plane) wire length l (no restriction on l) crosses one of our ruled lines d apart? Here's an example of one of these 'noodles'. It crosses the ruled lines three times:

Expectation is linear in the sense that E(A+B) = E(A)+E(B). So if we imagine the wire divided up into N very short segments of length l/N the expectation for the whole wire must be the sum of the expectations for all of the little pieces. If the wire is well behaved, for N large enough the segments are close to identical straight line segments. Here's a zoomed up view of a piece of our noodle:

For a small straight line segment the expectation must simply be a function of the length of the segment. The expectation for the whole wire is the expectation for one segment multiplied by the number of segments. In other words, the expectation is proportional to the length of the wire and we can write E(l)=kl for some constant k.

Now we know it's proportional to the length, we need to find the constant of proportionality, k. We need to 'calibrate' by thinking of a noodle shape where we know in advance exactly how many times it will cross the ruled lines. The following picture gives the solution:

A circle of diameter d will almost always cross the lines in two places. The length of this wire is πd so E(πd)=2 and k=2/πd.

The expected number of crossings for a wire of length l is 2l/πd. A needle of length l<d can intersect only zero or one times. So the expected value is in fact the probability of intersecting a line. The solution is 2l/πd.

No integrals needed.

The expected number of crossings is an example of an invariant measure, something I've talked about before. There are only a certain number of functions of a noodle that are additive and invariant under rotations and just knowing these facts is almost enough to pin down the solution.


Now I can leave you with a puzzle to solve. In the UK, a 50p coin is a 7 sided curvilinear polygon of constant width. Being constant width means a vending machine can consistently measure its width no matter how the coin is oriented in its plane. Can you use a variation of the argument above to compute the circumference of a 50p coin as a function of its width?

Tuesday, October 13, 2009

"What Category do Haskell Types and Functions Live In?"

The question in my title is one that is often raised by Haskell programmers and it's a difficult one to answer rigorously and satisfyingly. But you may notice that I've put the question in quotes. This is because I'm not asking the question myself. Instead I want to argue that often there's a better question to ask.

Superficially Haskell looks a lot like category theory. We have types that look like objects and functions that look like arrows. Given two functions we can compose them just how arrows compose in a category. We also have things that look like products, coproducts, other kinds of limit including infinite ones, natural transformations, Kan extensions, monads and quite a bit of 2-categorical structure.

So what goes wrong? (Besides the obvious problem that on a real computer, composing two working functions might result in a non-working function because you run out of memory.)

Among other things, Haskell functions can fail to terminate because of things like infinite loops. Computer scientists often use the notation ⊥ to represent a non-terminating computation. So when we talk of the Haskell integers, say, we don't just mean the values 0, 1, 2, ... but we also have to include ⊥. Unfortunately, when we do this we break a few things. For one thing we no longer have coproducts. But people find it useful to talk about algebraic datatypes as constructing types using products and coproducts and that would be a hard thing to give up.

So we could restrict ourselves to considering only the category theory of computable functions. But that's not a trivial thing to do either, and it doesn't reflect what real Haskell programs do.

But even if we did manage to tweak this and that to get a bona fide category out of Haskell, all we'd get is a custom tailored category that serves just one purpose. One theme running through much of my blog is that Haskell can be used to gain an understanding of a nice chunk of elementary category theory in general. Showing that Haskell simply gives us one example of a category really isn't that interesting. When I talked about the Yoneda Lemma I felt like I was talking about more than just one property of some obscure category that I can't actually define and that most category theorists have never even heard of.

So what's going on? Why does it feel like Haskell is so naturally category theoretical while the details are so messy?

Going back to my Yoneda lemma code, consider my definition of check

> check a f = fmap f a

It's straightforward to translate this into standard category theoretical notation that applies to any category. Even though the code is implemented in a specific programming language there's nothing about it that prevents it being translated for use in any category. So it doesn't matter what category Haskell corresponds to. What matters is that this bit of code is written in language suitable for any category. And the proof I give can be similarly translated.

Consider this standard problem given to category theory students: prove that (A×B)×C is isomorphic to A×(B×C). In Haskell we could construct the isomorphism as:

> iso :: ((a,b),c) -> (a,(b,c))
> iso ((x,y),z) = (x,(y,z))

But now we hit a problem. We can straightforwardly translate this into mathematical notation and it will give a valid isomorphism in the category of sets, Set. But iso is written to accept arguments which are elements of some type. Not all objects in categories have elements, and arrows might not correspond to functions. And even if they did, if we were working with (certain types of) topological spaces we'd be giving a construction for the isomorphism, and our proof would show the underlying function had an inverse, but we'd be failing to show it's continuous. It looks like writing Haskell code like this only tells us about a particularly limited type of category.

But not so. Type cabal install pointfree to install pointfree and then run pointfree 'iso ((x,y),z) = (x,(y,z))' and it responds with

> iso = uncurry (uncurry ((. (,)) . (.) . (,)))

pointfree rewrites a function in point-free style. There are no x's, y's or z's in the written version, only uncurry, composition (.), and the product function (,). These exist in all Cartesian closed categories (CCC). So our original function definition, despite apparently referring to elements, can be mechanically turned into a definition valid for any CCC. We can now reinterpret the meaning of x, y and z in the original definition as not referring to elements at all, but as labels indicating how a bunch of fairly general categorically defined primitives are to be assembled together.

(Incidentally, my first foray into pure functional programming was to write a SASL compiler. It was little more than a bunch of rewrite rules to convert SASL code into point-free compositions of S, K and I, among other combinators.)

What we have here is an example of an internal language at work. I'm not sure what a precise definition of "internal language" is, but it's something like this: take a formal system and find a way to translate it to talk about categories in such a way that true propositions in one are turned into true propositions in the other. The formal system now becomes an internal language for those categories.

The best known example is topos theory. A topos is a category that has a bunch of properties that make it a bit like Set. We take a subset of the language of set theory that makes use of just these properties. Our propositions that look like set theory can now be mechanically translated into statements valid of all toposes. This means we can happily write lots of arguments referring to elements of objects in a topos and get correct results.

In their book Introduction to Higher-Order Categorical Logic, Lambek and Scott showed that "pure typed λ-calculus" is the internal language of CCCs. Even though expressions in the λ-calculus contain named variables these can always be eliminated and replaced by point-free forms. Theorems about typed λ-calculus are actually theorems about CCCs. When we write Haskell code with 'points' in it, we don't need to interpret these literally.

So despite not knowing which category Haskell lives in, much of the code I've written in these web pages talks about a wide variety of categories because Haskell is essentially a programming language based on an internal language (or a bunch of them). Despite the fact that even a function like iso might have quite complex semantics when run on a real computer, the uninterpreted programs themselves often represent completely rigorous, and quite general pieces of category theory.

So the question to ask isn't "what category does Haskell live in?" but "what class of category corresponds to the internal language in which I wrote this bit of code?". I partly answer this question for do-notation (a little internal language of its own) in an earlier post. Haskell (and various subsets and extensions) is essentially a way to give semantics to internal languages for various classes of category. However complicated and messy those semantics might get on a real world computer, the language itself is a thing of beauty and more general than might appear at first.

BTW This trick of reinterpreting what look like variables as something else was used by Roger Penrose in his abstract index notation. Just as we can sanitise variables by reinterpreting them as specification for plumbing in some category, Penrose reinterpreted what were originally indices into arrays of numbers as plumbing in another category. Actually, this isn't just an analogy. With a little reformatting abstract index notation is very close to the way I've been using monads to work with vector spaces so that abstract index notation can be viewed as a special case of an internal language for categories with monads.

Saturday, October 10, 2009

Vectors, Invariance, and Math APIs

Many software libraries, especially those for physics or 3D graphics, are equipped with tools for working with vectors. I'd like to point out how in these libraries the functions for manipulating vectors sometimes have special and useful properties that make it worthwhile to separate them out into their own interface.

Two types of Function

Suppose an object of mass m is moving with velocity v and we apply force f to it for time t. What is the final velocity? This is given by g:

g(m,t,f,v) = v+(t/m)*f

Now suppose that R is a rotation operation, typically represented by a matrix. What happens if we apply it to both of the vector arguments of g?

g(m,t,Rf,Rv) = Rv+(t/m)*Rf = Rg(m,t,f,v)

In other words, rotating the vector arguments is the same as rotating the vector result.

Another example: Consider the function that gives the force on an electric charge as a function of its velocity and the magnetic field:

f(e,v,B) = ev×B

It's essentially just the cross product. If you rotate both of the arguments to the cross product then the result is rotated too. The result is that

f(e,Rv,RB) = Rf(e,v,B)

On the other hand, many 3D APIs come with a function to perform componentwise multiplication of vectors. Write vectors x as triples (x0,x1,x2), and so on, we can write such a function as:

f(x,y) = (x0y0,x1y1,x2y2)

You can show that this doesn't have a similar property.

Rotational Invariance

To make things easy, let's restrict ourselves to functions of scalars and vectors. And when I say vector, I'm talking strictly about vectors representing magnitude and direction, but not positions. Examples of such vectors are velocities, accelerations, angular velocities, magnetic fields, and the difference between two positions. A function is said to be rotationally invariant if applying a rotation R to all of its vector arguments results in the same thing as applying R to all of the vectors in its value. This allows you to have a function that returns multiple vectors, like a tuple or array.

The first two functions I described above were rotationally invariant but the third wasn't. Notice how the first two examples also described physical processes. This is the important point: as far as we know, all of the laws of physics are rotationally invariant. If you write down an equation describing a physical process then replacing all of the vectors in it by their rotated counterparts must also result in a valid equation. So if you're describing a physical process with a computer program, and you end up with a function that isn't rotationally invariant, you've made a mistake somewhere.

Vector APIs

Vector APIs frequently come with all manner of functions. Some have the invariance property and some don't. If you write code that you'd like to be rotationally invariant, but it turns out that it isn't, you usually have to examine the code to find the bug. But if you separate the invariant functions into their own interface, and then write code using just that interface, the code is guaranteed to be invariant. If your programming language has reasonably strict types then you may even be able to arrange things so that the type signature of the function alone is enough to tell you that the function is invariant. In effect you are able to make the compiler prove that your function is invariant.

(As an aside, this is an example of why a good type system does much more than you might at first have guessed. They don't just stop you making typos, they can do things like prove that your programs satisfy certain geometrical properties.)

So what functions would you have in such an API? Among the essential rotationally invariant functions are:

1. Multiplcation of a vector by a scalar
2. Addition of vectors
3. Dot product
4. Cross product

In terms of these you can build functions such as

1. Vector length
2. Vector normalization
3. Rotation of one vector around an axis specified by another vector
4. Linear interpolation between vectors

What kinds of functions would be excluded?

1. Constructing a vector from three scalars, ie. f(x,y,z) = (x,y,z).
2. Constructing a vector form a single scalar, ie. f(x) = (x,x,x).
3. Extracting the ith component of a vector, ie. f(i,(x0,x1,x2)) = xi.
4. Pointwise multiplication of vectors.
5. Computing the elementwise cosine of a vector.

On seeing the first excluded example above you might ask "how am I supposed to construct vectors?" The point is that you don't program exclusively with an invariant API, you simply use it whenever you need to prove invariance.

Coordinate System Invariance

One purpose of writing to a particular interface is that it allows the API to hide implementation details from the user. Using a rotationally invariant API has a role to serve here. For example, many 3D renderers allow you to write shaders. These are essentially functions that compute the colour of a piece of geometry that needs rendering. You write a shader and the renderer then calls your shader as needed when a fragment of geometry passes through its pipeline. Frequently these are used for lighting calculations but there are all kinds of other things that may be computed in shaders.

In a 3D renderer different parts of the computation are often performed in different coordinate systems. For example it may be convenient to perform lighting calculations in a coordinate system oriented with the direction of the light. But the author of a renderer doesn't want to be committed to a particular choice. In order to do this, it is essential to be able to write shaders that are agnostic about which coordinate system is being used. If we work with rotationally invariant functions, our shaders are guaranteed to be agnostic in this way (assuming that the only kind of coordinate change that takes place is a rotation).

Lots More Types

I've concentrated on just one type of invariance, rotational invariance. If we consider more types of invariance then more types of interface naturally emerge. It would take too long to cover all of the details here so I'm just going to briefly sketch the beginnings of the more general case. So just read this section as a list of pointers to further reading.

For example, some functions are invariant under translations. These can be thoght of as functions of points in space. If we allow more general linear transformations then we find that some common functions transform 'oppositely' to vectors. In particular, normals to surfaces transform in this way. In fact, Pixar's Renderman has three distinct types, vectors, points and normals that captures these different invariances.

If we go back to rotations again but now extend these by allowing reflections then we find an interesting new phenomenon. For example, consider the result of reflecting in the x-y-plane, followed by reflecting in the y-z-plane followed by reflecting in the x-z-plane. This simply multiplies vectors by -1. Dot product is invariant under this: (-x)·(-y)=x·y. But cross product isn't because (-x)×(-y)=x×y. Even though the cross product is apparently vector, it doesn't get multiplied by -1. When we start considering invariance under reflection we find that some vectors behave differently. These are the pseudovectors and in effect they have their own separate type and interface. Interestingly, nature likes to keep pseudovectors and vectors separate except in parity violating phenomena. There are even pseudoscalars.

Incidentally, if you consider invariance under scaling you're led to the idea of encoding dimensions in your types.


If you're writing a vector API think about the invariance properties that your functions may have. If any are useful then it may be worth placing those in a separate interface. The more distinct types you have, the more properties you can make your compiler prove. Obviously this needs to be balanced against practicality, complexity for users and what you actually need. To some extent, many existing APIs make some of these distinctions with varying degrees of strictness. The main point I want to make clear is that these distinctions are based on invariance properties, something that not all developers of such APIs are aware of.

At some point I hope to return to this topic and enumerate all of the common vector-like types in a single framework. Unfortunately it's a big topic and I've only been able to scratch the surface here. In particular there are some subtle interplays between dimensions and types.

On a deeper level, I think there must be some type theoretical framework in which these invariance properties are free theorems.

Update: I believe some of this material is covered in Jimm Blinn's Vectors and Geometry and Objects, Oh My!, but I don't have access to that. I suspect that there is one big difference in my presentation: I'm not so interested here in vectors (or normals or whatever) in themselves but as defining interfaces to functions with invariance properties. Like the way category theorists shift the focus from objects to arrows. It makes a difference because it immediately gives theorems that our code is guaranteed to satisfy. It's the invariance property of the cross product (say) that is useful here, not the fact that the components of a vector transform a certain way when we change coordinates (because I might not even want to refer to coordinates in my code).

Example Code

To show that I'm talking about something very simple, but still powerful, here's some Haskell code:

> data Triple = T Float Float Float deriving Show

> class Vector v where
> (.+) :: v -> v -> v
> (.*) :: Float -> v -> v
> dot :: v -> v -> Float
> cross :: v -> v -> v

> instance Vector Triple where
> T x y z .+ T u v w = T (x+u) (y+v) (z+w)
> a .* T x y z = T (a*x) (a*y) (a*z)
> dot (T x y z) (T u v w) = x*u+y*v+z*w
> cross (T x y z) (T u v w) = T
> (y*w-v*z)
> (z*u-x*w)
> (x*v-y*u)

You can freely apply the four primitive functions to elements of type Triple but if you have a function of, say, signature

> f :: Vector v => (v,v,Float) -> [(v,v)]

you are guaranteed it is invariant.