{"id":2579,"date":"2023-05-17T10:34:07","date_gmt":"2023-05-17T15:34:07","guid":{"rendered":"http:\/\/www.ishygddt.xyz\/~blog\/?p=2579"},"modified":"2023-05-18T15:02:03","modified_gmt":"2023-05-18T20:02:03","slug":"ec-point-serialization-formats","status":"publish","type":"post","link":"http:\/\/www.ishygddt.xyz\/~blog\/2023\/05\/ec-point-serialization-formats","title":{"rendered":"EC point serialization formats"},"content":{"rendered":"<p>I looked into this when trying to make a piece of toy software to serialize and de-serialize *all* OpenSSH-supported public key formats.<\/p>\n<p>While OpenSSH uses standardized formats for private keys, the <code class=\"\" data-line=\"\">ssh-* AAAA\/3NzaC0\u2026<\/code> format you're used to pasting into remote servers is actually\u00a0a \"proprietary\" (though with freely-licensed spec and implementation) encoding\u2014it's not JSON, and not any standardized BER\/DER codec; instead, it's\u00a0<em>mostly<\/em> a Length-Value encoding (think <a href=\"https:\/\/en.wikipedia.org\/wiki\/Type-length-value\">TLV<\/a> without the T) with fixed-length length fields. (Technically, <a href=\"https:\/\/datatracker.ietf.org\/doc\/html\/rfc4251#section-5\">the underlying encoding<\/a> is basically ad-hoc, but in practice the only 2 of its datatypes anyone practically ever uses are <code class=\"\" data-line=\"\">string<\/code>, which holds octet strings of length variable at runtime up to <span class=\"language-mathjax\">$2^{32}-1$<\/span> bytes long; and <code class=\"\" data-line=\"\">mpint<\/code>, which holds the integers in range <span class=\"language-mathjax\">${[{{-2^{{({2^{32}-1})}\\times 8}},{2^{{({2^{32}-1})}\\times 8}}})}$<\/span>; both of which are, obviously, encoded as a <code class=\"\" data-line=\"\">uint32<\/code> length followed by the raw value.)<\/p>\n<p>You can see from the following code fragment (which I wrote, and which is correct as regards OpenSSH-generated <code class=\"language-bash\" data-line=\"\">~\/.ssh\/id_*.pub<\/code> files) that, while OpenSSH actually encodes RSA's (and DSS's) key values itself, it just stashes elliptic-curve keys of all species as foreign \"data blobs\" that <em>have<\/em> to be parsed out by respective functions <code class=\"language-javascript\" data-line=\"\">unpack_bernstein_compressed_point<\/code> and <code class=\"language-javascript\" data-line=\"\">unpack_sec1ec_point<\/code>:<\/p>\n<pre><code class=\"language-javascript\" data-line=\"\">function _ossh2obj(buf) {\n\tlet reader = new _OsshReader(buf);\n\tlet type = reader.readString();\n\tswitch (type) {\n\n\t\tcase &quot;ssh-rsa&quot;: {\n\t\t\tlet e = reader.readMpint();\n\t\t\tlet n = reader.readMpint();\n\n\t\t\treturn {type: &#039;rsa&#039;, value: {n, e}};\n\t\t}\n\n\t\tcase &quot;ssh-dss&quot;: {\n\t\t\tlet p = reader.readMpint();\n\t\t\tlet q = reader.readMpint();\n\t\t\tlet g = reader.readMpint();\n\t\t\tlet y = reader.readMpint();\n\n\t\t\treturn {type: &#039;dss&#039;, value: {p, q, g, y}};\n\t\t}\n\n\t\tcase &quot;ssh-ed25519&quot;:\n\t\tcase &quot;ssh-ed448&quot;: {\n\n\t\t\t\/\/ 1. Curve\n\t\t\tlet identifier = type.match(\/^ssh-(ed\\S+)$\/)[1];\n\t\t\tlet params = getEdDSAParams(identifier);\n\n\t\t\t\/\/ 2. Point\n\t\t\tlet A = reader.readBytes();\n\t\t\tif (A.length !== get_bernstein_compressed_length(params))\n\t\t\t\tthrow new Error(`Invalid key (wrong ${type} length).`);\n\t\t\tlet P = unpack_bernstein_compressed_point(A, params);\n\t\t\t\/\/ let {x, y} = P;\n\n\t\t\treturn {type: &#039;eddsa&#039;, value: {identifier, point: P}};\n\t\t}\n\n\t\tcase &quot;ecdsa-sha2-nistp256&quot;:\n\t\tcase &quot;ecdsa-sha2-nistp384&quot;:\n\t\tcase &quot;ecdsa-sha2-nistp521&quot;: {\n\t\t\t\/\/ https:\/\/www.rfc-editor.org\/rfc\/rfc5656#section-3.1\n\n\t\t\t\/\/ 1. Curve\n\t\t\tlet [hash_name, expected_identifier] = type.match(\/^ecdsa-(\\w+)-(\\w+)$\/).slice(1);\n\t\t\tlet identifier = reader.readString();\n\t\t\tif (identifier !== expected_identifier)\n\t\t\t\tthrow new Error(&quot;Invalid key (mismatched type field and SEC 1 identifier).&quot;);\n\t\t\tlet params = getECDSAParams(identifier);\n\n\t\t\t\/\/ 2. Point\n\t\t\tlet Q = reader.readBytes();\n\t\t\tlet P = unpack_sec1ec_point(Q, params);\n\t\t\t\/\/ let {x, y} = P;\n\n\t\t\treturn {type: &#039;ecdsa&#039;, value: {identifier, h: hash_name, point: P}};\n\t\t}\n\n\t\tdefault:\n\t\t\tthrow new Error(`Unsupported OpenSSH key type: ${type}`);\n\t};\n}<\/code><\/pre>\n<p>These formats (<a href=\"https:\/\/www.secg.org\/sec1-v2.pdf#subsubsection.2.3.3\">SEC 1<\/a>, <a href=\"https:\/\/www.rfc-editor.org\/rfc\/rfc8032.html#section-3.1\">Bernstein<\/a>) were both \"borrowed\" wholesale from the non-OpenSSH elliptic-curve software ecosystem.<\/p>\n<p>What's interesting to me (and why I wrote this post) is how <em>nearly<\/em> similar these 2 formats are. Actually, instead of saying how they're similar, I'll just enumerate <em>comprehensively<\/em> their differences:<\/p>\n<ul>\n<li>Bernstein encoding lacks the header byte, and supports <em>only<\/em> compressed format. Bernstein-encoded elliptic-curve points are always exactly <span class=\"language-mathjax\">$\\lceil{{({{\\lceil{\\text{log2}{({p})}}\\rceil}+{1}})}\\div 8}\\rceil$<\/span> octets long; in contrast to SEC 1, Bernstein encoding\u00a0<em>defines<\/em> octet-strings of length <span class=\"language-mathjax\">$0$<\/span> and <span class=\"language-mathjax\">${{\\lceil{({{\\text{log2}{({p})}}-1})} \\div 8}\\rceil} \\times {2}$<\/span>* to be invalid.<\/li>\n<li>For point compression, Bernstein encoding truncates the <em>first<\/em> co-ordinate, <span class=\"language-mathjax\">$x$<\/span>, rather than SEC 1's truncation of <span class=\"language-mathjax\">$y$<\/span>.\n<ul>\n<li>Technically, the Bernstein protocols are defined over Montgomery curves (curves of form <span class=\"language-mathjax\">$By^2 = x^3 + Ax^2 + x$<\/span>), while the general EC protocols are defined over short Weierstrass curves (with form <span class=\"language-mathjax\">$y^2 = x^3 + ax + b$<\/span>), so this is a bit of an apples-to-oranges comparison. But it still bears noting, if you're trying to make actual serializers\/deserializers of these codecs.<\/li>\n<\/ul>\n<\/li>\n<li>Bernstein encoding encodes both points trivially, while SEC 1 maps the compressed point through a transformation, <span class=\"language-mathjax\">$0 \\mapsto 2; 1 \\mapsto 3$<\/span>.<\/li>\n<li>Bernstein encoding concatenates <span class=\"language-mathjax\">${Y} \\mathbin\\Vert {X}$<\/span>\u00a0<strong>bitwise<\/strong>; this allows saving an octet for some curves. This contrasts with SEC 1, which serializes the points independently into whole \"padded\" octet strings before concatenation.<\/li>\n<li>Bernstein encoding serializes the non-truncated co-ordinate into one <strong>fewer<\/strong> bits than would be required to encode <span class=\"language-mathjax\">$p-1$<\/span>, making a leap of faith for the extremely conservative assumption that the truncated co-ordinate will be <span class=\"language-mathjax\">$\\ge 2$<\/span>.\n<ul>\n<li>*Technically, the legal lengths for SEC 1 encoded points are <span class=\"language-mathjax\">$\\{{1}, {1 + \\lceil{{({{\\lceil{\\text{log2}{({p})}}\\rceil}})}\\div 8}\\rceil + 1}, {1 + \\lceil{{\\text{log2}{({p})}} \\div 8}\\rceil \\times {2}}\\}$<\/span> (including the header byte, not pinching a bit off non-compressed co-ordinates, and not pinching an octet in bitwise concatenations). The formulas named earlier are what <em>would<\/em> be the lengths if you generalized Bernstein encoding to support uncompressed points and the point at infinity.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>[NOTE FROM EDITOR: The above formulas are all based on prime <span class=\"language-mathjax\">$p\\neq 2$<\/span>. While no Bernstein protocol is <em>yet<\/em> based on a characteristic-<span class=\"language-mathjax\">$2$<\/span> field, for SEC 1, some of these formulas are actually different for characteristic-<span class=\"language-mathjax\">$2$<\/span>.]<\/p>\n<p>The first point is the most significant difference; it was a design choice to create the following \"knock-on\" effects:<\/p>\n<ul>\n<li><span class=\"language-mathjax\">$\\mathcal{O}$<\/span> (the point-at-infinity) has no representation; it simply cannot be encoded\u2014and, therefore, cannot be produced during decoding!<\/li>\n<li>Since compressed format is forced, you have to solve for the missing co-ordinate during deserialization; this means that no element of <span class=\"language-mathjax\">$\\mathbb{Z}_p^2 \\setminus E$<\/span> (\"points-<em>off<\/em>-the-curve\") can be produced during decoding!<\/li>\n<li>The only supported format has a fixed length, which allows simplifying codepaths and ossifying netcode.<\/li>\n<\/ul>\n<p>These are security features because trying to treat with the point at infinity or points-off-the-curve can leak your private key or otherwise damage the security of the handshake if the software you're using wasn't specifically designed to handle those inputs (see <a href=\"https:\/\/neilmadden.blog\/2022\/04\/19\/psychic-signatures-in-java\/\">CVE-2022-21449<\/a> and <a href=\"https:\/\/github.com\/advisories\/GHSA-rvj9-8cvx-3vq9\">CVE-2017-16007<\/a> for representative examples); it's very easy for library developers to just <em>assume<\/em> that the public key you're trying to treat with is a fully legitimate one. By categorically preventing the deserializer from emitting them, the library developer is saved totally from being required to <strong>even think about<\/strong> these cases, and the application (and, therefore, the user) which uses a Bernstein protocol is saved totally from library developers who neglected to think about these cases.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I looked into this when trying to make a piece of toy software to serialize and de-serialize *all* OpenSSH-supported public key formats. While OpenSSH uses standardized formats for private keys, the ssh-* AAAA\/3NzaC0\u2026 format you're used to pasting into remote servers is actually\u00a0a \"proprietary\" (though with freely-licensed spec and implementation) encoding\u2014it's not JSON, and not &hellip;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[97],"tags":[],"class_list":["post-2579","post","type-post","status-publish","format-standard","hentry","category-original-content"],"_links":{"self":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts\/2579","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/comments?post=2579"}],"version-history":[{"count":40,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts\/2579\/revisions"}],"predecessor-version":[{"id":2636,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts\/2579\/revisions\/2636"}],"wp:attachment":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/media?parent=2579"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/categories?post=2579"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/tags?post=2579"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}