{"id":2773,"date":"2023-06-11T00:16:04","date_gmt":"2023-06-11T05:16:04","guid":{"rendered":"http:\/\/www.ishygddt.xyz\/~blog\/?p=2773"},"modified":"2023-06-11T00:19:12","modified_gmt":"2023-06-11T05:19:12","slug":"ranking-strings","status":"publish","type":"post","link":"http:\/\/www.ishygddt.xyz\/~blog\/2023\/06\/ranking-strings","title":{"rendered":"[DRAFT] Bijection between strings and positive integers"},"content":{"rendered":"<p>It\u2019s probably the case that <em>all<\/em> math is trivial. So the statement that \u201c[some particular mathematical result] is trivial\u201d would, then, be trivially true \u2014 and useless! It might be better to say that \u201call math is trivial \u2014 if you have the intuition for it\u201d. And this problem was not <em>at all<\/em> trivial (for me) to figure out, however inevitable and even obvious it looks (to me) <em>in hindsight<\/em>.<\/p>\n<p>The problem I had today was: I needed to to create a bijection between the natural numbers and all finite strings from a given finite alphabet. Here\u2019s basically how I worked through that, with a code sample at the end.<\/p>\n<p>(The \u201calphabet\u201d will be <span lang=\"x-mathjax\">$A=\\left\\{c_1, c_2, \\dots, c_k\\right\\}$<\/span>. For example, you might set English strings in a 27-character alphabet with <span lang=\"x-mathjax\">$c_1=\\text{\u2018A\u2019}$<\/span>, <span lang=\"x-mathjax\">$c_{26}=\\text{\u2018Z\u2019}$<\/span>, and <span lang=\"x-mathjax\">$c_{27}=\\text{Space}$<\/span>; or Morse Code strings in a 2-character alphabet with <span lang=\"x-mathjax\">$c_1=\\text{\u2018\u00b7\u2019}$<\/span> and <span lang=\"x-mathjax\">$c_2=\\text{\u2018\u2012\u2019}$<\/span>; or octet strings with <span lang=\"x-mathjax\">$k=2^8, c_x=x-1$<\/span>.)<\/p>\n<p>Remember, the goal is to create a bijection, which means that <strong>every<\/strong> string (even weird ones with trailing \u2014 or leading \u2014 spaces!) needs to be represented, and we can\u2019t \u201ccheat\u201d by transmitting the length separately; the bijection must be between <span lang=\"x-mathjax\">$\\mathbb{N}\\leftrightarrow A_k^\\ast$<\/span>.<\/p>\n<p>I couldn\u2019t figure out how to make this, so I just started pairing elements up by hand like an idiot, hopeful I'd spot the pattern:<\/p>\n<ul>\n<li>Epoch 0 (elements of <span lang=\"x-mathjax\">$A^0$<\/span>):\n<ul>\n<li><span lang=\"x-mathjax\">$\\left(\\right) \\mapsto 1$<\/span><\/li>\n<\/ul>\n<\/li>\n<li>Epoch 1 (elements of <span lang=\"x-mathjax\">$A^1$<\/span>):\n<ul>\n<li>(<span lang=\"x-mathjax\">$1$<\/span> integer has been assigned before this point.)<\/li>\n<li><span lang=\"x-mathjax\">$\\left(c_1\\right) \\mapsto 2$<\/span><\/li>\n<li><span lang=\"x-mathjax\">$\\left(c_2\\right) \\mapsto 3$<\/span><\/li>\n<li>\u2026<\/li>\n<li><span lang=\"x-mathjax\">$\\left(c_k\\right) \\mapsto k+1$<\/span><\/li>\n<\/ul>\n<\/li>\n<li>Epoch 2 (elements of <span lang=\"x-mathjax\">$A^2$<\/span>):\n<ul>\n<li>(<span lang=\"x-mathjax\">$k+1$<\/span> integers have been assigned before this point.)<\/li>\n<li>Strings starting with <span lang=\"x-mathjax\">$\\left(c_1, \\dots\\right)$<\/span>:\n<ul>\n<li><span lang=\"x-mathjax\">$\\left(c_1, c_1\\right) \\mapsto k+2$<\/span><\/li>\n<li><span lang=\"x-mathjax\">$\\left(c_1, c_2\\right) \\mapsto k+3$<\/span><\/li>\n<li>\u2026<\/li>\n<li><span lang=\"x-mathjax\">$\\left(c_1, c_{k-1}\\right) \\mapsto k+k$<\/span><\/li>\n<li><span lang=\"x-mathjax\">$\\left(c_1, c_k\\right) \\mapsto k+\\left(k+1\\right)$<\/span><\/li>\n<\/ul>\n<\/li>\n<li>Strings starting with <span lang=\"x-mathjax\">$\\left(c_2, \\dots\\right)$<\/span>:\n<ul>\n<li><span lang=\"x-mathjax\">$\\left(c_2, c_1\\right) \\mapsto 2k+2$<\/span><\/li>\n<li><span lang=\"x-mathjax\">$\\left(c_2, c_1\\right) \\mapsto 2k+3$<\/span><\/li>\n<li>\u2026<\/li>\n<li><span lang=\"x-mathjax\">$\\left(c_2, c_{k-2}\\right) \\mapsto 2k+k$<\/span><\/li>\n<li><span lang=\"x-mathjax\">$\\left(c_2, c_{k-1}\\right) \\mapsto k+\\left(k+1\\right)$<\/span><\/li>\n<li><span lang=\"x-mathjax\">$\\left(c_2, c_k\\right) \\mapsto k+\\left(k+2\\right)$<\/span><\/li>\n<li>\u2026<\/li>\n<\/ul>\n<\/li>\n<li>\u2026<\/li>\n<\/ul>\n<\/li>\n<li>Epoch 3 \u2026:<\/li>\n<\/ul>\n<p>Then it hit me: instead of continuing to melt my brain counting these darn offsets on my fingers and hoping that, if I did enough of them, I\u2019d be able to scry a <em>fully general<\/em> formula that fits them into infinity, I could just quarantine off each \u201cepoch\u201d from each other, and see if that makes it easier (it did!)<\/p>\n<p>So now I end up with this much easier starting point:<\/p>\n<ul>\n<li>Epoch 0 has <span lang=\"x-mathjax\">$k^0=1$<\/span> strings. (I have <span lang=\"x-mathjax\">$k$<\/span> symbols, and I build my string up out of exactly <span lang=\"x-mathjax\">$0$<\/span> of them; I have only one string I can create: the empty string.)\n<ul>\n<li><span lang=\"x-mathjax\">$1$<\/span> integer has been assigned before this point.<\/li>\n<\/ul>\n<\/li>\n<li>Epoch 1 has <span lang=\"x-mathjax\">$k^1$<\/span> strings. (I have <span lang=\"x-mathjax\">$k$<\/span> symbols, and I build my string up out of exactly <span lang=\"x-mathjax\">$1$<\/span> of them; I have <span lang=\"x-mathjax\">$k$<\/span> strings I could create.)\n<ul>\n<li><span lang=\"x-mathjax\">$1+k$<\/span> integers have been assigned before this point: the integer that went to the string in Epoch 0, <em>plus<\/em> the integers that went to the strings in Epoch 1.<\/li>\n<\/ul>\n<\/li>\n<li>Epoch 2 has <span lang=\"x-mathjax\">$k^2$<\/span> strings. (I have <span lang=\"x-mathjax\">$k$<\/span> symbols, and I build my string up out of exactly <span lang=\"x-mathjax\">$1$<\/span> of them: I have <span lang=\"x-mathjax\">$k$<\/span> choices for the first symbol <em>times<\/em> <span lang=\"x-mathjax\">$k$<\/span> choices for the second symbol, so I have <span lang=\"x-mathjax\">$k^2$<\/span> possibilities.)\n<ul>\n<li><span lang=\"x-mathjax\">$1+k+k^2$<\/span> integers have been assigned before this point: the integer that went to the string in Epoch 0, <em>plus<\/em> the integers that went to the strings in Epoch 1, <em>plus<\/em> the integers that went to the strings in Epoch 2.<\/li>\n<\/ul>\n<\/li>\n<li>\u2026<\/li>\n<li>Epoch <span lang=\"x-mathjax\">$n-1$<\/span> has <span lang=\"x-mathjax\">${({n-1})}^2$<\/span> elements.\n<ul>\n<li><span lang=\"x-mathjax\">$\\sum_{i=0}^{n-1}{k^i}$<\/span> integers have been assigned before this point.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>So, say that my string <span lang=\"x-mathjax\">$s$<\/span> has <span lang=\"x-mathjax\">$n$<\/span> elements. All I need to do is calculate a ranking of my string <strong>within that epoch<\/strong>, then add <span lang=\"x-mathjax\">$\\sum_{j=0}^{n-1}{k^j}$<\/span> to that, and I\u2019ll be guaranteed not to step on the toes of any previous epochs! (And I won\u2019t collide with larger epochs, either, by the same logic!)<\/p>\n<p>Now that I don\u2019t have to keep the whole universe of <span lang=\"x-mathjax\">$A_k^\\ast$<\/span> in my head anymore, intra-epoch ranking is pretty easy: just treat the string as a base-<span lang=\"x-mathjax\">$k$<\/span> number \u2014 <span lang=\"x-mathjax\">$f\\left(s\\right): A_k^n \\to \\mathbb{N} = \\sum_{i=1}^{n}{k^{i-1} s_i}$<\/span>.<\/p>\n<p>Then to wrap it up:<\/p>\n<div lang=\"x-mathjax\">$$f\\left(s\\right): A_k^\\ast \\to \\mathbb{N} = \\left(\\sum_{i=1}^{\\left|s\\right|}{s_i k^{i-1}}\\right)+\\left(\\sum_{j=0}^{\\left|s\\right|-1}{k^j}\\right)$$<\/div>\n<p>\u2026now, inverting that formula isn\u2019t very clean. (I scrapped the Mathjax render of it.<!--ChatGPT claims \u201cthere is no simple closed-form inverse function for this particular series\u201d; I haven\u2019t confirmed if that\u2019s true yet, or what the nearest \u201cnon closed form\u201d alternative would be.-->)<\/p>\n<p>But it comes out OK in Python, anyway:<\/p>\n<pre><code class=\"language-python\" data-line=\"\">def rank(s, k, symbol_rank=ord):\n\ti = 0\n\toffset = 0\n\tfor j, c in enumerate(map(symbol_rank, s)):\n\t\tassert 0 &lt;= c &lt; k\n\t\toffset = offset + k**j\n\t\ti = i*k + c\n\treturn i + offset\n\ndef unrank(i, k, return_cast=lambda it: str().join(map(chr, it))):\n\toffset, next_offset = 0, 1\n\tj = 0\n\twhile next_offset &lt; i:\n\t\toffset = next_offset\n\t\tj += 1\n\t\tnext_offset += k**j\n\ti -= offset\n\tl = []\n\tfor j in range(length):\n\t\ti, c = divmod(i, k)\n\t\tl.append(c)\n\tassert i == 0\n\tl.reverse()\n\treturn return_cast(l)<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>It\u2019s probably the case that all math is trivial. So the statement that \u201c[some particular mathematical result] is trivial\u201d would, then, be trivially true \u2014 and useless! It might be better to say that \u201call math is trivial \u2014 if you have the intuition for it\u201d. And this problem was not at all trivial (for &hellip;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-2773","post","type-post","status-publish","format-standard","hentry","category-drafts"],"_links":{"self":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts\/2773","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/comments?post=2773"}],"version-history":[{"count":75,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts\/2773\/revisions"}],"predecessor-version":[{"id":2850,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts\/2773\/revisions\/2850"}],"wp:attachment":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/media?parent=2773"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/categories?post=2773"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/tags?post=2773"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}