Discovered in 2013 by then-UPMC-student Jérémie Lumbroso, and independently re-discovered by an Apple programmer last Fall (with much ado on Twitter and HN), the basic idea is as follows:

1. Take your (infinite) random bitstream as the binary digits ("fractional part") of random real $r\in\left[0,1\right)$
Specifically: $r=\sum_{i=1}^{\infty}{\text{bit}_i 2^{-i}}$
2. Read out only as many digits of $r$ as are required to determine the value of $\left\lfloor rn\right\rfloor$
3. You now (clearly) have fairly chosen $\leftarrow_{$}\left\{0,1,\cdots,n-1\right\}\$
4. For the next number, resume reading bits from the stream where this number left off

While it's easily-shown that no single-draw algorithm can be perfectly optimal, this algorithm is noticeably more efficient than rejection sampling in the single-draw case (and Lumbroso reminds us that this it is, obviously, also perfectly compatible with arithmetic-coded batching of the draws (left as an exercise for the reader)).

from fractions import Fraction
from random import getrandbits
def educational_randbelow(n):
# performs VERY poorly CPU-wise --
# for illustration only!
if n <= 1:
# A random integer "below" 1 must be 0
return 0
k = (n-1).bit_length()
if n == 2 ** k:
# Ironically, this interpretation
# of the algorithm
# doesn't handle the
# trivial case well...
# handle it specially:
return getrandbits(k)
# i represents the current level of "detail" we have on r,
# telling us how MANY bits of it we have uncovered so far:
i = 1
# next is the value r (from the equation):
# (it starts out VERY coarse, just i=1 bit of detail!)
# (*technically, it is a lower bound on r --
#  it tells us AT LEAST how big r is known
#  to be based on what we've seen of it.)
r = Fraction(getrandbits(1), 2**i)
# this next number is how big we KNOW
# that r can NEVER QUITE reach; if
# 100% of our "random" bits from here on out
# were high, it would APPROACH that value:
r_upperbound = r + Fraction(1, 2**i)
# if int(n * r) != int(n * r_upperbound)
# then that means that we don't yet have
# enough detail about our random number
# to see exactly what the true integer
# value of n*r should be
while int(n * r) != int(n * r_upperbound):
# Just keep resolving more
# bits of 0 <= r < 1
# until a number is
# homed-in-on unambiguously
i += 1
r += Fraction(getrandbits(1), 2**i)
r_upperbound = r + Fraction(1, 2**i)
# Once int(n * r) == int(n * r_upperbound),
# the while-loop will break, since resolving
# more bits of r would be pointless
# -- time to return the random number!
return int(n * r)

With one helping of mathematical intuition, two helpings of high-school algebra, and an invested afternoon, you can eliminate all the Fraction objects in the above example to MASSIVELY improve performance:

def randbelow(n):
if n <= 1:
return 0
k = (n - 1).bit_length()
a = getrandbits(k)
b = 2 ** k
if n == b:
return a
while (n * a // b) != (n * (a + 1) // b):
a = (a * 2) | getrandbits(1)
b = b * 2
return n * a // b

Below here is an adaptation from the paper itself; it runs far faster even than the above optimization (and has integrated handling of powers-of-2), but I couldn't possibly explain to you at this time just how or why it is equivalent to the above code. Perhaps I'll understand it after more pondering, someday:

Python

def randbelow(n):
"""Fast Dice Roller, Lumbroso, 2013"""
a = 0
b = 1
while True:
a = (a * 2) | getrandbits(1)
b = b * 2
if b >= n:
if a < n:
return a
a = a - n
b = b - n

C

uintmax_t randbelow(uintmax_t n, bool (*flip)())
// Fast Dice Roller
// Lumbroso, 2013
{
uintmax_t a = 0;
uintmax_t b = 1;
while (1) {
a = (a << 1) | flip();
b = b << 1;
if ( b >= n ) {
if ( a < n ) return a;
else
{
a = a - n;
b = b - n;
}
}
}
}

bool flip()
// example fair coin-flip function
{
return (rand() > RAND_MAX/2);
}

JS

function randbelow(n, getrandbit=()=>(Math.random() > 1/2)) {
// Fast Dice Roller
// Lumbroso, 2013
const _int = m => n.__proto__.constructor(m);
const [_0, _1] = [_int(0), _int(1)];
let a = _0;
let b = _1;
while( true ) {
a = (a << _1) + _int(getrandbit());
b = b << _1;
if ( b >= n ) {
if ( a < n ) return a;
else
{
a = a - n;
b = b - n;
}
}
}
}