Wednesday, June 28, 2006

Laws of Form: An Opinion

I just reread George Spencer-Brown's Laws of Form. I mentioned this book a year ago but at that time I was talking from memory and from second hand descriptions. I finally got hold of a copy for around $20 and worked my way through it. You can find out about a little of the content here.

But after this rereading I think I'm now ready for the final verdict: is this a crackpot book or not?

Well - in the first few chapters it develops a kind of boolean algebra using 'forms' built out of Spencer-Brown's 'distinction' symbol. It's cute because he's constructing an algebra that has just the one symbol. Everything he has to say here seems more or less fine.

In the following chapters variables are introduced. It gets a little confusing. At first the variables are metavariables in the sense that they are used by Spencer-Brown to represent his forms. But later on he introduces variables within his system at which point we have something equivalent to propositional calculus. The earlier boolean style algebra can now be interpreted as a model for this calculus. But it's all very confusing because he doesn't make it clear when an expression is to interpreted in the algebra or the calculus. Nonetheless, he then proceeds to prove the kinds of theorems you'd expect a logician to prove, viz. soundness and completeness.

He then introduces imaginary values. This is the point at which I became lost in my first reading years ago. Interpreting the 'distinction' as boolean 'not', he's introducing a logical value that is a solution to x = not x. Clearly neither x=true nor x=false do the job so Spencer-Brown claims to introduce imaginary values that are solutions. It seems like a lot of nonsense, but in my second reading of the book it made perfect sense. I don't think that he says enough to completely disambiguate the meaning but I was able to provide an interpretation, in Haskell, that fits his description. Quite simply, the logical values in Spencer-Brown's system need to be interpreted as streams of boolean values, ie. as type [Bool]. The distinction operator (call it cross) can now be defined as

cross :: [Bool] -> [Bool]
cross a = False : map not a

In other words, it's just a not with a delay. But there is a catch, the text talks vaguely about a delay, but not what the initial value of the stream should be. So I actually define cross by

cross :: Bool -> [Bool] -> [Bool]
cross init a = init : map not a

and provide my own values of init as required by the context.

So how do I know this is the correct interpretation? Well the book contains a counter circuit that is supposed to take logical signals as inputs and output a signal that flips value half as often as the input. I was able to take his circuit and transcribe it directly into Haskell (making some choices of init values) and amazingly it performed exactly as described. (This tool goes further than my code, but it still has to deal with the ambiguity issue.)

Here's the code:

(#) = zipWith (||)
cross a = False : map not a
i = cross i
m = True : m
n = False : n
delay n x = replicate n False ++ x
pulse m = replicate m True ++ n

b0 = i
b1 = [False,False,True,True]++b1
b2 = [False,False,False,False,True,True,True,True]++b2

b n = replicate n False ++ map not (b n)

trace a = let b = take 120 a in
let t x = if x then "-" else " " in
putStr $ unlines [b >>= t,b >>= (t.not)]

gate a b = let f = cross (cross (f # a) # b) in f

nor init x y = init : map not (x # y)

count a = let b = nor False a f
c = nor True b d
d = nor False b j
e = nor False a h
g = nor False e j
h = nor False f c
j = nor True d e
f = nor True g h
in f

test = count $ map not $ (b 4)

main = do
trace (map not (b 4))
trace test

a # b means a written next to b. cross a means a inside a distinction or crossing. Instead of building the circuit using cross I use nor, corresponding to the shorthand GSB uses. The definition of count is neat. It's a mutualy recursive definition that essentially allows me to label various points in the circuit and wire them up. It's really amazing that Haskell allows you do do this. Anyway, run main and it'll draw the two 'oscilloscope' traces. (You'll need a w-i-d-e terminal window.) There's a brief discussion of the circuit here.

So, Laws of Form succeeds in defining a boolean style algebra and propositional style calculus. It then shows how to build circuits using logic gates. And that, as far as I can see, is the complete content of the book. It's fun, it works, but it's not very profound and I don't think that even in its day it could have been terribly original. (Who first proved NAND and NOT gates are universal? Sheffer? Peirce?) In my view this makes GSB's mathematics not of the crackpot variety, despite his talk of imaginary logical values.

But...GSB's style is classic crackpot stuff. His writing borrows more from the Tao Te Ching than from conventional mathematics texts. He uses delberately obfuscated language and makes pronouncements that read like the writings of mystics. It's no surprise that when there was a conference on GSB's work it took place at Esalen.

So my final opinion, for all of the two cents that it's worth, is that GSB is a little on the crackpot side, but that his mathematics in Laws of Form is sound, fun, cute, but, despite the trappings, not terribly profound.


Monday, June 26, 2006

k-calculus and Special Relativity

I've been familiar with Special Relativity since childhood - I think I was able to give Einstein's old trains and light rays argument for time dilation when I was barely out of elementary school. (Unfortunately my brain only went downhill from there...) So I was pretty surprised when I started reading Bondi's introduction to Relativity for the layman, Relativity and Common Sense, published in the fifties, and found that I was learning something new.

At one point in the book, Bondi shows that different observers, who appear to have synchronised watches, end up having watches showing different times. Amazingly he seems to do all of this in what looks like a Newtonian framework. It took me several rereadings of the paragraphs to eventually realise where it was that he'd sneaked in the relativity assumption. I felt like I'd just seen one of those puzzles where you rearrange a square as a rectangle which appears to have a lower area. But as a result of this, he makes Special Relativity seem like the most natural thing in the world. And the really nice thing is that in his presentation he doesn't need to talk about watches that use light pulses (because the sceptic might say that time dilation only applies to watches that use laser pulses) but instead his argument applies to any kind of watch.

Anyway, the key idea is this: suppose A and B have zero relative velocity. If A signals B at one minute intervals then B will receive the signals at one minute intervals. But if A and B move apart with constant velocity, B will receive A's signals less frequently simply because the signals have to cover an increasing distance as B moves from A. This much is trivial and applies to any non-instantaneous signal in a Newtonian or Einsteinian framework.

Now suppose B receives signals every k minutes. The key step is this: according to relativity, k can only be a function of the relative velocity of A and B, not a function of the absolute velocities of A and B. Using this assumption, and some other far less radical ones that few people would have problems with, the rest of Special Relativity follows. But you'll have to read the book for the details (or google for k-calculus).

It's all very elementary stuff. But this k-factor argument seems like it would be much more convincing to the layman than Einstein's original thought experiment. In fact, I did a bit of googling and it seems that the k-factor argument has been presented in a few different places under the name "k-calculus".

Bondi also stresses that watches are like odometers. They measure something that is personal to the path that you have taken rather than some absolute observer independent quantity. This is something that is hard to get from other elementary accounts I have read. Other introductions present time dilation as some kind of influence that somehow makes your clocks go 'wrong'. Just the name 'time dilation' gives the impression that what you measure is somehow dilated from the 'correct' time, a completely erroneous view. Just as there is no 'distance' dilation for driving north-east and then north-west to get to a place due north.

Anyway, if you have some doubting Thomas relative (no pun intended) who thinks the whole Relativity thing is bogus, but they do have a bit of numerical aptitude, this might be the book to get them. You don't even need to compute a square root to compute the discrepancy between the watches in Bondi's first example. All in all, an excellent book!

Anyway, my reason for reading this book is that I really don't like the way some physicists talk about Relativity. So I thought I'd read a few elementary accounts to get an idea of what trends there are in explaining the subject.

(Sadly, Bondi passed away last year. And maybe I've finally got over my resentment of Bondi for giving me low marks for the cosmology questions in exams at Cambridge one year...)

Wednesday, June 21, 2006

How to Divide by Three

I just came across this paper by Peter Doyle and John Conway (though it seems that Conway disowns it!).

Ostensibly it proves something simple: that if there a bijection from A×3 to B×3 (where 3 = {0,1,2}), then there is a bijection from A to B. The catch is that the authors choose not to use the axiom of choice and so must explicitly describe the bijection. It turns out this is a hard problem. Lindenbaum claimed a proof in 1926 but it was 'lost'. Tarski published a proof in 1949 but Conway and Doyle think their proof is probably a rediscovery of Lindenbaum's original.

It's very curious. A priori I'd never have guessed this was a tricky problem.

It seems that the ubiquitous John Baez mentioned this a hundred 'weeks' ago here. Does that man have to have already written on everything that interests me in mathematics!? :-)


Friday, June 16, 2006

A Taylor Series for Types

There's a result that I'm surprised isn't mentioned in Derivatives of Containers. It's pretty obvious considering what's in that paper but it's still interesting to mention, so maybe it's well known and too obvious for anyone else to state explicitly. Basically this

F[X+Y]=F[X]+Y F'[X]+Y²/2! F''[X]+Y³/3! F'''[X]+...

is true when interpreted as a statement about types.

But first I have to say what I mean by dividing a type by n!. That's explained at the end of the above paper, but first some notation: Let n be the set {0,...,n-1}. Then use the suggestive notation X! to mean the group of permutations of X. Clearly #(n!)=n!, where the on the right hand side I'm using ! to mean factorial. We now allow groups to act on containers by permuting the elements within them. For example a triple of X's is written X³. This datatype has a notion of an ordering of the elements ie. (1,2,3) is not the same as (3,2,1). The group 3! acts in a natural way on triples by permuting the three elements. If we take the quotient by this group then we get the 3-element bag where order doesn't matter. So Xn/n! is the n-element bag. (As pointed out in the paper, we can't define sets in this way because sets require a notion of equality.)

The paper also shows that F'[X] is the type of F-containers with one of the X's punched out to leave a hole. So F(n)[X] is an F-container with n holes punched out. But note that it still retains information about the order in which the holes were punched out. For example, consider pairs with F[X]=X². F''[X]=2. This corresponds to the fact that there are two orders in which you can punch out the two elements of a pair.

Now consider a Yn/n! F(n)[X]. This is an F-container of X's where n of the X's have been punched out, combined with n Y's. We can insert the n Y's into the holes in the order specified by the F(n)[X]. So Yn/n! F(n)[X] is an F-container of X's with n of the X's replaced by Y's.

Now look at the right hand side of the Taylor series. This is basically an F-container of X's where some (or zero) of the X's have been replaced by Y's.

And finally note what the left hand side is: it's an F-container of X+Y's. Ie. each object in the F-container is either an X or a Y. And it's pretty obvious that's the same thing as a container of X's where some of the X's are replaced by Y's.

I like to write it as F[X+Y]=exp(Y d/dX)F[X].

Hey! I just discovered there's a Container Type blog!

Wednesday, June 14, 2006

Fun with Derivatives of Containers

This is a neat paper.

So more thinking out loud...

Consider functions that map streams of X's to streams of Y's. As streams are a linear ordering of elements we can label positions in the output stream using natural numbers starting at zero. So a function that maps streams of X's to streams of Y's can be viewed as a function that maps a stream of X's, and a natural number, to a Y. We assemble the output stream by evaluating this function on the input stream for each natural number.

We could imagine generalising this construction to other container datatypes besides streamns. But streams have this particularly convenient way to address their elements. Each element of a stream is the nth element, for some n. For other structures such as trees it becomes more complicated to address particular elements in the tree. For a binary tree, say, we can specify an element by listing all of the binary decisions, left or right, that you need to make when descending to that element from the root of the tree.

Look at a slightly different construction. Instead of trying to address a particular element of a container consider the type which is containers of X's with one of the X's replaced by a 'hole'. For example a pair of X's can be written as X×X=X². A pair with a 'hole' is either something like (x,ˆ) or (ˆ,x) where ˆ is the 'hole'. So a pair with a hole is simply of type X+X=2.X. More generally, suppose for a given type of container F is the functor that takes a type X to the type F[X] of containers of X's. Then F'[X] is the type of containers of X's with one of the X's replaced by a hole, where F' is the derivative of F. By 'derivative' we mean derivative in the ordinary sense of calculus in that it's computed using the usual Leibniz rules. This is a completely amazing fact - I think it's one of the most amazing facts in all of computer science. I wrote about it in more detail here. That's loosely based on this paper, but my document is completely non-rigorous and probably full of errors.

There are all kinds of interesting algebraic properties that relate F and F'. For example if F'[X] is a container with a hole then X.F'[X] is a container with a hole and an X that could potentially be put into the hole. If we put the X into the hole then we don't just get an F[X] again because we still have the information about where the hole was. In other words, X.F'[X] is an F[X] together with the address of one of the X's within it. (Pity we can't write X.F'[X]-F[X] to mean just the address, but '-' makes no sense in this context.) In fact we have a projection map throwing away the address: X.F'[X]→F[X]. I think this map is a natural transformation from X.d/dX to 1.

Anyway, in the paper I started off with, the authors show that the functor X→X.F'[X] is a comonad. One of the key results in the paper can be summarised as defining a function called 'run' with this type signature:

run :: X.dF[X]/dX → Y → F[X] → F[Y]

In other words, given a function that maps a container, and the address of an X within it, to a Y, we can construct a function that maps containers of X's to containers of Y's. Intuitively this is obvious form what I said in the third paragraph above: we're specifying how for each location in an F[Y] its value should be computed from an F[X]. You can't construct any old function F[X]→F[Y] in this way. These are only the functions obtained by replacing each X-value in the original container with a Y-value. In other words they are maps F[X]→F[Y] that preserve the structure of the original container. For example, if F[X] is lists of X's, then we are talking about functions that map lists of length n to lists of length n. In other words, X.dF[X]/dX→Y is isomorphic to the type of 'structure-preserving' maps F[X]→F[Y]. I have to say, X.dF[X]/dX→Y is not what I would have expected this to look like a priori. In fact, it's pretty amazing that you can express the concept of a structure-preserving map as a regular type.

Anyway, the paper does something more useful, it shows how when F is a tree-structure then the function X.dF[X]/dX→Y can be defined 'locally' giving a really beautiful way of implementing attribute grammars. In this case the objects in F'[X] are called 'zippers'. In fact, the paper makes no reference to functors F and their derivatives, that's my generalisation. I'm not interested in the attribute grammar thing but instead I'm fascinated by the fact that there are these natural maps:
X.dF[X]/dX → F[X] and X.dF[X]/dX → Y → F[X] → F[Y]

and wonder if there is a more general theory saying when such maps might exist.

Oh well, that's enough for now. I'm hoping to post some very high resolution pictures of quaternionic Julia sets some time soon - just gotta get permission to display them from my employer whose machines were used to render them.

Saturday, June 10, 2006

Monads, Kleisli Arrows, Comonads and other Rambling Thoughts

(Below I use 'arrow' to mean an arrow in a category (or an ordinary Haskell function) but an Arrow (with capital A) is one of the objects defined by Hughes. I'm not really going to say much about Arrows.)

A while back I looked at a paper on comonads and streams but couldn't really see what comonads were offering. Just recently, I was thinking about a design for a dataflow language for microcontrollers and found that Arrows captured some of the functionality I wanted. But after writing some Haskell code I realised that I was writing the same 'glue' over and over again. So it was becoming clear that arrows were too general and I needed a class that filled in the glue for me automatically. I wrote the code and it looked familiar. I realised that I had rediscovered comonads and they now seemed entirely natural.

First, let me review monads. There are countless web sites that purport to be introductions to monads but my way of looking at them seems a little different to most of those accounts. (It's not in any way unusual, just not in the beginner's guides.) I like to think in terms of Kleisli arrows. I find this perspective unifies the different applications of monads such as encapsulating side effects or doing simple logic programming using the list monad.

A Kleisli arrow is simply an arrow (ie. in Haskell, a function) a→m b where m is a monad. What monads allow you to do is compose these things. So given f::a→m b and g::b→m c there is an arrow a→m c. To me this is the raison d'être of monads but as far as I can see, the standard interface defined in Control.Monad doesn't actually provide a name for this composition function. (Though the Control.Arrow interface does provide a Kleisli arrow composition function.)

So here's why I like to think in terms of Kleisli arrows: Consider what is almost the prototypical example of a Haskell monad - the Writer monad. Suppose you have something that is conceptually a function f::a→b but you want it to output a list of items to a log as a side effect. In a functional programming language there is no way out - you're pretty well forced to return the thing you want to log along with the object of type b. If your log is a list of objects of type d, then you need to modify f to return an object of type (b,[d]) instead of b. But here we have a catch, if we have f::a→(b,[d]) and g::b→(c,[d]) (ie. conceptually a function of type b→c producing a log of type [d]) then we want to compose these things. But the argument to g is no longer the return type of f. We need some plumbing to concatenate these functions. In this case the plumbing needs to take the output of f, split off the log keeping it for later, pass the remainder to g, and then concatenate the log from f before the log of g. And this is what monads do, they provide the plumbing. (If you knew nothing about monads and wrote the obvious code to plumb these things together, concatenating the logs, probably the first thing you wrote would look just like part of the definition of the Writer monad, except the Writer monad is generalised to monoids instead of lists.)

Let's work through the details of composing Kleisli arrows: we want to compose f::a→m b and g::b→m c. The obvious thing to do is add a gluing map between them m b→b. But that's uninteresting as it just throws away the fanciness. Instead we use the fact that m is a functor (part of the mathematician's definition of monad) to lift g to a function m b→m (m c). This now composes nicely with f but the catch is that we end up with something twice as fancy. However, part of the definition of monads is that there is a map m (m c)→m c. (Twice as fancy is still just fancy.) And that can now be used to finish off the composition.

Consider another example of a monad, the list monad. The idea is that we want a function to return multiple values, not just one. So instead of f::a→b we have f::a→[b]. But suppose we have another one of these things, g::b→[c]. How do we concatenate these? Conceptually what we want to do is run g on each of the return values of f in turn and then collect up all of the results in one big list. This is exactly what the list monad does, it provides the plumbing to concatenate functions in this way.

In both cases we have f::a→m b and g::b→m c and we get a function a→m c. Or more informally, monads give a way to compose functions that map ordinary types to fancy types providing the glue that allows the fancy output of one function to connect to the on-fancy input of the next. And I like to view things this way because a functional program is a concatenation of lots of functions - so it's natural to think about monads as simply a new way of building programs by concatenation of functions.

Anyway, I was thinking about stream functions. A stream function from a to b is simply a function [a]→[b]. (Strictly speaking we're only considering infinite lists - streams.) This doesn't quite fit the pattern of non-fancy→fancy, it's more like fancy→fancy. And that's what the Arrow interface allows us to do. But I'm not going to talk about Arrows here except to say that I started using them to write some stream code. But then I noticed that I was only interested in causal stream functions. This is a function where the nth element of the output depends only on the first n values of the input. This pattern fits many types of processing in dataflow applications such as audio processing. In order to compute the nth element of the output of a causal f::[a]→[b] we need only compute a function f::[a]→b. To compute the entire stream we repeatedly use this function to generate each element in turn. So, for example, if the input is the stream [x1,x1,x2,...] then the output is [f [x1],f[x1,x2],f[x1,x2,x3],...]. In other words a stream function is really a function f::[a]→b but we need special glue to concatenate them because the nth element of the output concatenation should look like g [f [x1],f[x1,x2],f[x1,x2,x3],...,f [x1,...,xn]].

If you followed that then you may have noticed the pattern. We want to compose two functions that map fancy types to non-fancy types to produce a new function that maps fancy types to non-fancy types. It's the exact opposite of what monads do. And this is exactly what comonads are about: they are the correct abstraction to use when writing glue for fancy-to-non-fancy functions. It all seems so natural I'm astonished to find that Control.Comonad isn't a part of the standard Haskell distributions.

Let's look at the details more closely. Let's still use m to represent a comonad. We need to compose f::m a→b and g::m b→c. m is a functor (by definition of comonad) so we can lift f to a function of type m (m a)→m b. This composes directly with g. And to finish it off we use the function m a→m (m a) provided by the definition of a comonad.

And in even more detail for the case of (lists considered as) streams. The lift operation is simply given by the usual map function. You lift a function f by applying it to each element in the stream in turn and returning the stream of results. The function m a→m (m a) is more interesting. It maps [x1,x2,x3,...] to [[x1],[x1,x2],[x1,x2,x3],...]. In other words it maps a stream to its list of 'histories'. My use of the loaded word 'history' should be a hint about where causality comes in. If we lift a function f::[a]→b to act on this list of histories we get [f [x1],f[x1,x2],f[x1,x2,x3],...]. In other words, a comonad gives exactly what we need to work with streams.

Anyway, one of the cool things about monads is the special syntactic sugar provided by Haskell that allows us to write what looks like imperative code even though it's actually functional. I've been trying to figure out what similar sugar might look like for comonads. But I can't quite figure it out. I can see roughly what it'd look like. You'd be able to write lines of code like

b <- 2*(head a) -- double the volume
c <- 0.5*head b+0.5*head (tail b) -- simple FIR filter

so that even though it 'looks' like b is merely twice the head of a, the compiler would produce the appropriate glue to make b actuually be the stream whose head is 2*(head a). In fact, you can do something a bit like this using Arrow syntax. But I can't quite fill in the details in such a way that it nicely parallels the syntax for monads.

(Silly me...I think I've just figured it out now. The 'codo' block is different from a 'do' block because it needs to define a coKleisli arrow, not an element of the comonad. Hmmm...)

And just some final words: I believe Arrows are the wrong approach to functional reactive programming. Comonads are much more appropriate because they model causal functions much more closely - and causal stream functions are what FRP is all about.

Saturday, June 03, 2006

A Monadic Lightswitch

So I finally got onto the next stage of my embedded functional programming project. After scouring the web, the best price/performance ratio I could find for a microcontroller was given by the ET-ARM Stamp made by a company in Thailand. $24.90. Pretty impressive considering the actual microcontroller is a Philips LPC2119 which is a 32-bit ARM deice with 128K Flash ROM, 16K RAM and dozens of analogue and digital IO ports. It's very self-contained - the microcontroller itself requires nothing more than power, a crystal and a couple of capacitors to be up and running, and that's what the ET-ARM board provides. I also bought a small development board with solderless breadboard, switches, LEDs and so on.

So, my first project was to implement a light switch. Here it is:

You can see my finger pressing the button and the LED switched on.

And here's the code running on the microcontroller:

loop = do IO {
a <- iopin0;
if band 1024 a>0 then
ioset0 2048
ioclr0 2048;

start = do IO {
pinsel0 0;
iodir0 2048;
} 666.

The programming language is my as yet unnamed lazy pure functional language described here.

The code is compiled into C via an intermediate bytecode representation. It's inefficient. Although each 'word' in the code typically compiles to one or two bytecodes, each bytecode expands to around half a dozen 32 bit instructions. (And that can grow if there's lots of lambda lifting going on.) By switching the compiler to thumb mode I got that down to 16 bits per intruction so it's still about a dozen or more bytes per bytecode. Still, with 128K to burn, who's counting? The complete executable is about 48K in size (that's mostly the library, the code itself ). The stack and heap appear to be well behaved. I've managed to run simple logic programs using the list monad without filling the 16K of RAM.

So a quick bit of explanation of the code: the 'main' function is called 'start'. pinsel0 is the C primitive function that sets a bunch of microcontroller pins to function as general I/O ports. iodir0 uses the supplied bitmask to set a bunch of ports to be output ports. (Port 11 in this case.) iopin0 is the function to return the inputs from the I/O ports and ioset0 and ioclr0 set or clear the appropriate I/O ports. Port 10 is connected to the switch and port 11 to the LED. 'band' is my binary 'and' function.

Unlike Haskell, my language is untyped so the 'do' function (it is a function) takes an argument, in this case IO, to specify which monad we're in. I'm using the IO monad to sequence the events (ie. I want the input to be read before the LED state is updates).

And the last thing that needs explaining is the '666'. My IO monad is basically a state monad that makes use of a sequence function 'seq' to ensure that things are evaluated in the right order. 666 is the state that's passed through the monad. It's a dummy value (any value would do) but it's presence is vital as a kind of token that represents the value of the world. For example, if you were to unpack the machinery behind the monad (when I release the code) you'd see that ioclr0 implicitly tries to evaluate the token implicitly returned by iopin0 before executing, ensuring that the switch state is read before the LED state is updated.

Note there's no 'real time' support in the sense that the garbage collector is nothing fancy so it can cause stalls. I make no claim that this is a sensible way to program a microcontroller. :-)

I should mention that I used the free evaluation version of the Keil development system under Windows. It was the slickest microcontroller development environment I've used. (Though my experience of these things isn't great...)

And here's a close up of the microcontroller itself:

Hard to believe that tiny little black square is busy reducing lambda expressions and garbage collecting!

I think my next project will be to design a language better suited to this kind of hardware. I'm thinking of a stream based language like Supercollider. Strongly typed but with guarantees on memory usage deducible at compile time so no garbage collection. I may also make it functional with a degree of support for closures (but in a way that still limits memory usage) and maybe I'll throw S4 into the type system to support a degree of staged computation, very useful when you really need to lighten the load on the target CPU.

Thursday, June 01, 2006

Solar Astronomy

I mentioned recently that I'd been volunteering at the local observatory/science museum. You might think there wasn't much astronomy to be done during the day. It turns out you can view Venus. But during the day, if you're armed with the right filters you can also observe one of the most amazing and dynamic astronomical phenomena that are visible: the Sun.

Normally the sun looks like a bright disc of uniform colour. But view it through a hydrogen-α filter and it suddenly transforms into a bizarre alien object. And amazingly you don't need a billion dollar satellite to view the sun this way. A $500 telescope is good enough. Follow that link and you'll see pictures actually taken with that telescope (and subsequently image processed...). You can clearly see solar granulation, filaments, prominences and some of the structure around sunspots.

The problem with solar viewing is that the Sun pumps out vast quantities of energy fairly indiscriminately following a Planck distribution. This tends to overwhelm phenomena that emit at a specific wavelength. So you need an incredibly narrow band pass filter that only allows light at a given wavelength to be transmitted. In this case we were using a filter whose bandwidth was less than 0.1 nanometre to view emissions at 656.3nm caused by electrons dropping from the third to the second energy level in hydrogen atoms.

Making filters this good is itself is an amazing feat. What material could possibly have the correct properties to allow that degree of selectivity? As far as I can make out from the sales blurb, the filter on this telescope is in fact a Fabry-Pérot etalon. Essentially it's two highly reflective (but still slightly transmissive) parallel layers. Consider light travelling perpendicularly through such layers. It might travel straight through. Or it might bounce 2n times before continuing on its way. So the light we see is the sum of light that has bounced 0, 2, 4, 6, ... times. If all of these reflected rays are in phase they'll constructively interfere. Otherwise the infinite sum will be over widely varying phases resulting in destructive interference. So by choosing the correct spacing between the layers we can ensure that only one wavelength (and higher harmonics of it) will be transmitted. You can imagine the difficulty in getting the precise layer separation and that's why these filters can cost many thousands of dollars. And fabrication isn't the only problem - it also needs to be stable even though it has solar energy falling directly onto it.

Anyway, I'm so taken with this solar astronomy stuff I'm going to read this book. And I'll leave you with this link to some even more amazing pictures of the Sun.

(BTW In lieu of having a way to attach my camera to the solar telescope I borrowed the picture from here. Note that the comment is now out of date, the solar neutrino problem has since been solved.)