Comments on A Neighborhood of Infinity: Lossless decompression and the generation of random samples

How about primenumber distribution. What's the...

2013-07-26T15:32:37.674-07:00

How about primenumber distribution. What's the entropy, and how could they be stored with minimal bits.

(Question might seam simple but its quite fundamental math)

Very interesting insight. This is essentially an a...

2012-02-13T14:08:33.861-08:00

Very interesting insight. This is essentially an application of the source coding theorem which says that one can find a code that is arbitrarily close the entropy (which also is the lower bound).

Computing the entropy of p (=1.54 in this case) then tells us that, using this method, we have to do 1.5 evaluations on average on the tree to generate one sample.

2012-02-13T13:55:23.762-08:00

This comment has been removed by the author.

I don't think Huffman coding yields suitable t...

2012-02-06T00:50:19.077-08:00

I don't think Huffman coding yields suitable trees for this problem, since the trees for random variate generation have the additional requirement of being ordered. Optimal alphabetic binary trees or optimal BSTs require a different algorithm that is not a variation of Huffman coding:

"This is also known as the Hu-Tucker problem, after the authors of the paper presenting the first linearithmic solution to this optimal binary alphabetic problem, which has some similarities to Huffman algorithm, but is not a variation of [Huffman coding]. These optimal alphabetic binary trees are often used as binary search trees."

As I read the problem statement I immediately want...

2012-01-23T13:43:06.560-08:00

As I read the problem statement I immediately wanted to mention Optimal Binary Search Trees (http://en.wikipedia.org/wiki/Binary_search_tree#Optimal_binary_search_trees).

As it turns out, the publication "Darts, Dice, and Coins" already does it. Argh!

Hi I hate to jump to a different subject. But a wh...

2012-01-20T11:37:37.624-08:00

Hi I hate to jump to a different subject. But a while back you had some wonderful notes on Target Enumeration and a Haskell implementation.
My question is how could you extend what you did such that our boundary points lie on real number coordinates...so to refer to the pixel examples our area of coverage would contain the counted 'holes' plus the partially contained 'holes' or say pixel areas? How could we get the exact coverage based on the way you implemented? I appreciate any information as I know this was a while back---but I JUST found the blog if you have any resources please feel free to contact me at:
jcarrola@swri.org ---thanks for any time you can spare. Regards

I came to this same realization when I started stu...

2012-01-16T12:25:17.618-08:00

I came to this same realization when I started studying data compression (almost 30 years ago -- yikes!) seriously. Build a decoder for the appropriate arithmetic code, and feed it random bits. You get your output distributed according to the probability model used to build the arithmetic code.

Bear in mind that if this code is a bottleneck, br...

2012-01-14T18:39:00.310-08:00

Bear in mind that if this code is a bottleneck, branch instructions whose conditions depend on random numbers are effectively unpredictable, and hence expensive.

Here's a neat little trick you should know. If the number of symbols is a power of 2, there's also this solution:

double x = random_number();
unsigned s = 0;
for (i = 0; i < log2(numberOfSymbols); ++i)
{
s = (s << 1) | (a[s] < x);
}
return s;

What if we are sampling/compression observations f...

2012-01-09T01:30:27.549-08:00

What if we are sampling/compression observations from a latent variable model? The bits-back method makes a connection. See p353 of MacKay's textbook for a sketch and references.

Neat! Had this problem before and wasn't sure ...

2012-01-08T20:32:51.852-08:00

Neat! Had this problem before and wasn't sure how to solve it. Thanks for the explanation.

Neat! Had this problem before and wasn't sure ...

2012-01-08T20:32:05.956-08:00

Neat! Had this problem before and wasn't sure how to solve it efficiently.

Another elegant solution (Walker's method) is ...

2012-01-08T11:04:28.614-08:00

Another elegant solution (Walker's method) is described by Mihai here-- http://infoweekly.blogspot.com/2011/09/follow-up-sampling-discrete.html

jkff, You use the tree given by the Huffman algor...

2012-01-08T08:14:50.162-08:00

jkff,

You use the tree given by the Huffman algorithm but you don't have a 50/50 decision at each branch. You derive the probability of going each way from the sum of the probabilities in each branch below.

I see how this tells us about the optimal number o...

2012-01-07T22:41:18.669-08:00

I see how this tells us about the optimal number of decisions you have to make while generating a random number, but we still cannot generate the numbers by just taking a random bit string and decompressing it with the Huffman tree over the desired distribution. E.g. consider a distribution {a:0.4, b:0.6} - it will have a Huffman tree with just 2 leaves, and decompression will give {a:0.5, b:0.5}. Am I missing something?

@sigfpe, I don't believe so, it's a rather...

2012-01-07T13:51:25.109-08:00

@sigfpe, I don't believe so, it's a rather clever way of doing random sampling. That said, I haven't looked at the literature beyond the article linked.

@Justin, I've had links to that article on bo...

2012-01-07T13:49:24.879-08:00

@Justin,

I've had links to that article on both G+ and Twitter now! Looks like I missed out on a nice article a few weeks ago. But does the alias method imply anything interesting about compression algorithms?

There'es a faster algorithm for selecting elem...

2012-01-07T13:45:43.216-08:00

There'es a faster algorithm for selecting elements at random from a finite distribution known as the Alias Method that doesn't suffer from the O(log n) worse case. Keith Schwarz has a good description written up called Darts, Dice, and Coins: Sampling from a Discrete Distribution.

@lvps1000vm I really need to read that book! (PS...

2012-01-07T12:34:20.913-08:00

@lvps1000vm

I really need to read that book!

(PS I was a student at the same time and place as MacKay. It was interesting suddenly seeing his name everywhere years later.)

This concept of a decoder as a random sample gener...

2012-01-07T12:31:40.868-08:00

This concept of a decoder as a random sample generator is explained in MacKay's book, my personal reference book in the subject.

You can download it for free at
http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

It appears at chapter 6.3, page 118 of the book, 130 of the PDF file.

Nice! This is similar to a Galois tech talk I gave...

2012-01-07T11:58:08.556-08:00

Nice! This is similar to a Galois tech talk I gave in May:

http://corp.galois.com/blog/2011/5/16/tech-talk-video-empirical-sampling-with-haskell.html

That's a really great insight! It goes to show...

2012-01-07T11:28:56.266-08:00

That's a really great insight! It goes to show the value in asking the right question :) At last I know when a raven is like a writing desk.