It’s probably the case that all math is trivial. So the statement that “[some particular mathematical result] is trivial” would, then, be trivially true — and useless! It might be better to say that “all math is trivial — if you have the intuition for it”. And this problem was not at all trivial (for me) to figure out, however inevitable and even obvious it looks (to me) in hindsight.

The problem I had today was: I needed to to create a bijection between the natural numbers and all finite strings from a given finite alphabet. Here’s basically how I worked through that, with a code sample at the end.

(The “alphabet” will be $A=\left\{c_1, c_2, \dots, c_k\right\}$. For example, you might set English strings in a 27-character alphabet with $c_1=\text{‘A’}$, $c_{26}=\text{‘Z’}$, and $c_{27}=\text{Space}$; or Morse Code strings in a 2-character alphabet with $c_1=\text{‘·’}$ and $c_2=\text{‘‒’}$; or octet strings with $k=2^8, c_x=x-1$.)

Remember, the goal is to create a bijection, which means that every string (even weird ones with trailing — or leading — spaces!) needs to be represented, and we can’t “cheat” by transmitting the length separately; the bijection must be between $\mathbb{N}\leftrightarrow A_k^\ast$.

I couldn’t figure out how to make this, so I just started pairing elements up by hand like an idiot, hopeful I'd spot the pattern:

  • Epoch 0 (elements of $A^0$):
    • $\left(\right) \mapsto 1$
  • Epoch 1 (elements of $A^1$):
    • ($1$ integer has been assigned before this point.)
    • $\left(c_1\right) \mapsto 2$
    • $\left(c_2\right) \mapsto 3$
    • $\left(c_k\right) \mapsto k+1$
  • Epoch 2 (elements of $A^2$):
    • ($k+1$ integers have been assigned before this point.)
    • Strings starting with $\left(c_1, \dots\right)$:
      • $\left(c_1, c_1\right) \mapsto k+2$
      • $\left(c_1, c_2\right) \mapsto k+3$
      • $\left(c_1, c_{k-1}\right) \mapsto k+k$
      • $\left(c_1, c_k\right) \mapsto k+\left(k+1\right)$
    • Strings starting with $\left(c_2, \dots\right)$:
      • $\left(c_2, c_1\right) \mapsto 2k+2$
      • $\left(c_2, c_1\right) \mapsto 2k+3$
      • $\left(c_2, c_{k-2}\right) \mapsto 2k+k$
      • $\left(c_2, c_{k-1}\right) \mapsto k+\left(k+1\right)$
      • $\left(c_2, c_k\right) \mapsto k+\left(k+2\right)$
  • Epoch 3 …:

Then it hit me: instead of continuing to melt my brain counting these darn offsets on my fingers and hoping that, if I did enough of them, I’d be able to scry a fully general formula that fits them into infinity, I could just quarantine off each “epoch” from each other, and see if that makes it easier (it did!)

So now I end up with this much easier starting point:

  • Epoch 0 has $k^0=1$ strings. (I have $k$ symbols, and I build my string up out of exactly $0$ of them; I have only one string I can create: the empty string.)
    • $1$ integer has been assigned before this point.
  • Epoch 1 has $k^1$ strings. (I have $k$ symbols, and I build my string up out of exactly $1$ of them; I have $k$ strings I could create.)
    • $1+k$ integers have been assigned before this point: the integer that went to the string in Epoch 0, plus the integers that went to the strings in Epoch 1.
  • Epoch 2 has $k^2$ strings. (I have $k$ symbols, and I build my string up out of exactly $1$ of them: I have $k$ choices for the first symbol times $k$ choices for the second symbol, so I have $k^2$ possibilities.)
    • $1+k+k^2$ integers have been assigned before this point: the integer that went to the string in Epoch 0, plus the integers that went to the strings in Epoch 1, plus the integers that went to the strings in Epoch 2.
  • Epoch $n-1$ has ${({n-1})}^2$ elements.
    • $\sum_{i=0}^{n-1}{k^i}$ integers have been assigned before this point.

So, say that my string $s$ has $n$ elements. All I need to do is calculate a ranking of my string within that epoch, then add $\sum_{j=0}^{n-1}{k^j}$ to that, and I’ll be guaranteed not to step on the toes of any previous epochs! (And I won’t collide with larger epochs, either, by the same logic!)

Now that I don’t have to keep the whole universe of $A_k^\ast$ in my head anymore, intra-epoch ranking is pretty easy: just treat the string as a base-$k$ number — $f\left(s\right): A_k^n \to \mathbb{N} = \sum_{i=1}^{n}{k^{i-1} s_i}$.

Then to wrap it up:

$$f\left(s\right): A_k^\ast \to \mathbb{N} = \left(\sum_{i=1}^{\left|s\right|}{s_i k^{i-1}}\right)+\left(\sum_{j=0}^{\left|s\right|-1}{k^j}\right)$$

…now, inverting that formula isn’t very clean. (I scrapped the Mathjax render of it.)

But it comes out OK in Python, anyway:

def rank(s, k, symbol_rank=ord):
	i = 0
	offset = 0
	for j, c in enumerate(map(symbol_rank, s)):
		assert 0 <= c < k
		offset = offset + k**j
		i = i*k + c
	return i + offset

def unrank(i, k, return_cast=lambda it: str().join(map(chr, it))):
	offset, next_offset = 0, 1
	j = 0
	while next_offset < i:
		offset = next_offset
		j += 1
		next_offset += k**j
	i -= offset
	l = []
	for j in range(length):
		i, c = divmod(i, k)
		l.append(c)
	assert i == 0
	l.reverse()
	return return_cast(l)

Leave a Reply

Your email address will not be published. Required fields are marked *

Warning: This site uses Akismet to filter spam. Until or unless I can find a suitable replacement anti-spam solution, this means that (per their indemnification document) all commenters' IP addresses will be sent to Automattic, Inc., who may choose to share such with 3rd parties.
If this is unacceptable to you, I highly recommend using an anonymous proxy or public Wi-Fi connection when commenting.