Saturday, May 28, 2011

Fast forwarding lrand48()

A break from abstract nonsense to answer a question I've seen asked online a number of times. It requires nothing more than elementary modular arithmetic and it ends in some exercises.

Given a pseudo-random number generator, say BSD Unix lrand48(), is there a quick way to jump forward a billion numbers in the sequence, say, without having to work through all of the intermediate numbers? The method is no secret, but I couldn't find explicit code online so I thought I'd put some here. Literate Haskell of course.

> {-# LANGUAGE ForeignFunctionInterface #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}

On MacOSX, if you type 'man lrand48', you'll see the function lrand48() returns a sequence of 31 bit non-negative integers defined using the sequence rn+1 = arn+c mod m where

> a = 25214903917
> c = 11
> m = 2^48

The actual returned value is the floor of rn/217 and r0 = 20017429951246.

We can compute the nth element in the sequence the hard way by importing lrand48 and looping n times:

> foreign import ccall "lrand48" lrand48 :: IO Int

> nthrand 1 = lrand48
> nthrand n = lrand48 >> nthrand (n-1)

But there is a better way. If we iterate twice we get that rn+2 = a(arn+c)+c mod m = a2rn+ac+c mod m. Note how two applications of the iteration give you back another iteration in the same form: a multiplication followed by an addition modulo m. We can abstract this a bit. Given two function f(x) = ax+c mod m and g(x) = a'x+c' mod m we get g(f(x)) = (a'*a)*x + a'*c+c' mod m. We can represent functions of this type using a simple Haskell type:

> data Affine = Affine { multiply :: Integer, add :: Integer } deriving (Show, Eq, Ord)

We can now write a function to compose these functions. I'm going to use the operator * to represent composition:

> instance Num Affine where
>    Affine a' c' * Affine a c = Affine (a'*a `mod` m) ((a'*c+c') `mod` m)

To skip forward n steps we just need to multiply n of these together, ie. raise Affine a c to the power of n using ^. We then need to apply this function to r0:

> initial = Affine 0 20017429951246

> nthrand' n = (add $ Affine a c ^ n * initial) `div` (2^17)

Now try firing up ghci and comparing the outputs of nthrand 1000000 and nthrand' 1000000. Don't run nthrand more than once without resetting the seed, eg. by restarting ghci. (I know someone will post a reply below that it doesn't work...)

There are lots of papers on how to do this with other kinds of random number generator. My example is probably the easiest. The main application I can see is for jumping straight to that annoying regression test failure without going through all of the intermediates.

Exercises.
1. Read the corresponding man page for Linux. Port the above code to work there. Or any other OS you feel like. Or any other random number generator.
2. Can you split lrand48() into two? Ie. can you make two random generators that produce sequences si and ti so that s0, t0, s1, t1, ... form the sequence given by lrand48().
3. I've neglected to mention some special sauce in the code above. Why does it actually run so fast? (Clue: why did I use Num?)

10 comments:

  1. I've used this trick, in speeding up lossy audio codec. It's a little strange to find a random number generator in an audio codec, but they're there for a perfectly good reason - one piece of white noise sounds much like another, so the codec will simply transmit noise-like pieces of the audio as 'there is some noise here'. In order to pass the conformance tests, however, you need to generate the same sequence of random numbers.

    ReplyDelete
  2. For number 3:

    By having Affine be an instance of Num, we can use the efficient implementation of (^), which does better than repeated multiplication.

    http://hackage.haskell.org/packages/archive/base/latest/doc/html/src/GHC-Real.html#%5E

    It takes advantage of the fact that x^2y = (x*x)^y and x^(2y+1) = x(x*x)^y.

    ReplyDelete
  3. For number 3:

    When Affine is an instance of Num we are able to use the efficient implementation of (^), which does better than repeatedly multiplying through.

    http://hackage.haskell.org/packages/archive/base/latest/doc/html/src/GHC-Real.html#%5E

    It take advantage of the fact that x^2y = (x*x)^y and x^(2y+1) = x(x*x)^y.

    ReplyDelete
  4. @Alexey,

    Yes. The correct way to implement this would be through Monoid but I think Data.Monoid lacks the binary power algorithm. Maybe it appears in another library.

    ReplyDelete
  5. Fun fun!

    pow :: Integral a => Affine -> a -> Affine
    x0 `pow` y0
    | y0 < 0 = error "Negative exponent"
    | y0 == 0 = 1
    | otherwise = f x0 y0
    where
    -- f : x0 ^ y0 = x ^ y
    f x y
    | even y = f (x `mappend` x) (y `quot` 2)
    | y == 1 = x
    | otherwise = g (x `mappend` x) ((y - 1) `quot` 2) x
    -- g : x0 ^ y0 = (x ^ y) `mappend` z
    g x y z
    | even y = g (x `mappend` x) (y `quot` 2) z
    | y == 1 = x `mappend` z
    | otherwise = g (x `mappend` x) ((y - 1) `quot` 2) (x `mappend` z)


    (Haven't compiled it.)

    ReplyDelete
  6. @Alexey,

    At the very least you can generalise your type signature. If you do that then you can reuse the code to do other stuff fast. For example, you can use the ideas here http://goo.gl/AjHPM to perform very fast regexp matches against strings that are very long repeating patterns.

    ReplyDelete
  7. Alexey, I assume you are my Alexey :-)

    I can't believe I re-implemented binary exponentiation for my Collatz monoid when I could have just used this trick!

    Of course if the preludes class hierarchy made more sense it'd probably have been more obvious....

    ReplyDelete
  8. First exercise, c++

    #include
    using namespace std;

    typedef unsigned long ul;
    typedef unsigned long long ull;
    typedef unsigned short us;

    ull _m = 281474976710656; // 2^48, 0x1000000000000
    ull _a = 25214903917; // 0x5DEECE66DL
    us _c = 11; // 0xBL

    struct affine {
    ull a, c;
    };

    // Affine a' c' * Affine a c = Affine (a'*a `mod` m) ((a'*c+c') `mod` m)
    affine compose(affine A1, affine A2)
    {
    affine a = { A1.a*A2.a % _m, A1.a*A2.c + A1.c % _m};
    return a;
    }

    // Affine a c ^ n
    affine composen(affine x, ull n)
    {
    affine res = { 1, 0 };
    while (n > 0) {
    if (n & 1) res = compose(res, x);
    x = compose(x,x);
    n = n>>1;
    }
    return res;
    }

    affine initial = {0, 20017429951246}; // 0x1234abcd330e

    // nthrand' n = (add $ Affine a c ^ n * initial) `div` (2^17)
    ul nthrand(ull n) {
    affine x = compose(composen({_a, _c},n), initial);
    return (x.a + x.c) >> 17 & 0x7FFFFFFF;
    }

    int main()
    {
    ull n,rand;
    cin >> n;
    rand = nthrand(n);
    cout << std::hex << rand << endl;
    return 0;
    }

    ReplyDelete
  9. Second exercise

    //Can you split lrand48() into two? Ie.
    //can you make two random generators that produce sequences
    //si and ti so that s0, t0, s1, t1, ...
    //from the sequence given by lrand48().


    #include
    #include
    #include
    #include

    using namespace std;

    typedef unsigned long ul;
    typedef unsigned long long ull;
    typedef unsigned short us;

    ull _m = 281474976710656; // 2^48, 0x1000000000000
    ull _a = 25214903917; // 0x5DEECE66DL
    us _c = 11; // 0xBL

    struct affine {
    ull a, c;
    };

    // Affine a' c' * Affine a c = Affine (a'*a `mod` m) ((a'*c+c') `mod` m)
    affine compose(affine A1, affine A2)
    {
    affine a = { A1.a*A2.a % _m, A1.a*A2.c + A1.c % _m};
    return a;
    }

    // Affine a c ^ n
    affine composen(affine x, ull n)
    {
    affine res = { 1, 0 };
    while (n > 0) {
    if (n & 1) res = compose(res, x);
    x = compose(x,x);
    n = n>>1;
    }
    return res;
    }

    typedef std::vector afvector;
    // v[0] = s stream, v[1] = t stream
    // initial affine on position 0
    afvector v[2];

    // set initials for n elements
    void initseq(unsigned n) {
    v[0].resize(n); v[0].assign(n, {0,0});
    affine initial0 = {0, 20017429951246}; // 0x1234abcd330e
    v[0][0] = initial0;

    v[1].resize(n); v[1].assign(n, {0,0});
    affine initial1 = compose(composen({_a, _c},n), initial0);
    v[1][0] = initial1;
    }

    // get rand from affine
    ul getrand(affine x) {
    return (x.a + x.c) >> 17 & 0x7FFFFFFF;
    }

    // nth affine
    affine nthaffine(ull n, us seq) {
    affine initial = v[seq][0];
    affine x = compose(composen({_a, _c},n), initial);
    return x;
    }

    // nth rand
    ul nthrand(ull n, us seq) {
    affine x = nthaffine(n,seq);
    return getrand(x);
    }

    // next affine
    affine nextaffine(ull n, us seq) {
    affine x,y;
    if(n>0) {
    y = v[seq][n-1];
    x = compose({_a, _c}, y);
    }
    else x = v[seq][n];
    return x;
    }

    // next rand
    ul nextrand(ull n, us seq) {
    affine x = nextaffine(n, seq);
    return getrand(x);
    }

    pthread_t threads[2];
    pthread_attr_t attr;
    void *thread_status;

    // param is the sequence
    void *threaded_nthrand(void *param) {
    int seq = (int)param;
    for(ull i=0; i<v[seq].size(); i++)
    v[seq][i] = nthaffine(i, seq);
    pthread_exit(NULL);
    }

    void *threaded_nextrand(void *param) {
    int seq = (int)param;
    for(ull i=0; i<v[seq].size(); i++)
    v[seq][i] = nextaffine(i, seq);
    pthread_exit(NULL);
    }

    int main()
    {
    ull i,s,t,n;
    int seq;
    cout<<"Sequence size=";
    cin>>n;

    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

    cout << "Threaded nthrand\n";
    initseq(n);
    for(seq=0; seq<2; seq++)
    pthread_create(&threads[seq], &attr, threaded_nthrand, (void *)seq );
    for(seq=0; seq<2; seq++)
    pthread_join(threads[seq], &thread_status);
    for(i=0; i<n; i++) {
    s = getrand(v[0][i]);
    t = getrand(v[1][i]);
    cout << "s" << i << " = ";
    cout << std::hex << s ;
    cout << " t" << i << " = ";
    cout << std::hex << t << endl;
    }

    cout << endl << "Threaded nextrand\n";
    initseq(n);
    for(seq=0; seq<2; seq++)
    pthread_create(&threads[seq], &attr, threaded_nextrand, (void *)seq );
    for(seq=0; seq<2; seq++)
    pthread_join(threads[seq], &thread_status);
    for(i=0; i<n; i++) {
    s = getrand(v[0][i]);
    t = getrand(v[1][i]);
    cout << "s" << i << " = ";
    cout << std::hex << s ;
    cout << " t" << i << " = ";
    cout << std::hex << t << endl;
    }
    pthread_exit(NULL);
    return 0;
    }

    ReplyDelete