{"id":918,"date":"2021-03-22T02:20:56","date_gmt":"2021-03-22T02:20:56","guid":{"rendered":"http:\/\/www.ishygddt.xyz\/~blog\/?p=918"},"modified":"2022-03-08T14:03:56","modified_gmt":"2022-03-08T20:03:56","slug":"python-parse-http-date","status":"publish","type":"post","link":"http:\/\/www.ishygddt.xyz\/~blog\/2021\/03\/python-parse-http-date","title":{"rendered":"Parsing the HTTP \"Date\" header in Python"},"content":{"rendered":"<p>Per <a href=\"https:\/\/tools.ietf.org\/html\/rfc7231#section-7.1.1.1\">RFC 7231 \u00a77.1.1.1<\/a>:<\/p>\n<blockquote><p>A recipient that parses a timestamp value in an HTTP header field MUST accept all three HTTP-date formats.<\/p><\/blockquote>\n<p>These formats are then described as (with the first being the <span style=\"text-decoration: underline\">only<\/span> preferred format; the latter 2 are designated as \"obsolete\"), converted for this post into <a href=\"https:\/\/man7.org\/linux\/man-pages\/man3\/strftime.3.html\"><code class=\"\" data-line=\"\">strftime(3)<\/code><\/a> syntax:<\/p>\n<ol>\n<li><code class=\"\" data-line=\"\">%a, %d %b %Y %H:%M:%S %Z<\/code><br \/>\nwith the timezone always <em>given as<\/em> \"GMT\", but to be <em>interpreted as<\/em> UTC.<\/li>\n<li><code class=\"\" data-line=\"\">%A, %d-%b-%y %H:%M:%S %Z<\/code><br \/>\nwhere the timezone may be equal to any of an array of \"standard\" abbreviations from <a href=\"https:\/\/tools.ietf.org\/html\/rfc850#section-2.1.4\">RFC 850 \u00a72.1.4<\/a>.<\/li>\n<li><code class=\"\" data-line=\"\">%a %b %-d %H:%M:%S %Y<\/code><br \/>\nwhere you simply pray\/assume that the remote server is operating in UTC.<\/li>\n<\/ol>\n<p>However, Python's <a href=\"https:\/\/docs.python.org\/3\/library\/datetime.html#datetime.datetime.strptime\"><code class=\"language-python\" data-line=\"\">strptime<\/code><\/a> function does not support timezones: it <em>eats<\/em> them with <code class=\"\" data-line=\"\">%Z<\/code>, but does not actually <em>use<\/em> them. Therefore, we will have to hack this support in ourselves. The <a href=\"http:\/\/pythonhosted.org\/pytz\"><code class=\"language-python\" data-line=\"\">pytz<\/code><\/a> module is <span style=\"text-decoration: underline\">indispensible<\/span> for parsing them, so we will appreciate\/use it. (We additionally have to crack open <a href=\"https:\/\/docs.python.org\/3\/library\/re.html\"><code class=\"language-python\" data-line=\"\">re<\/code><\/a> because, depressingly, <code class=\"language-python\" data-line=\"\">strptime<\/code> does not even <em>make available to us<\/em> that which it matched as <code class=\"\" data-line=\"\">%Z<\/code>.)<\/p>\n<p>So, a Python\u00a0 function to parse an HTTP Date header into a datetime object would be something like:<\/p>\n<pre><code class=\"language-python\" data-line=\"\">from datetime import datetime\nfrom pytz import timezone, utc\nimport re\ndef parse_http_date(date):\n\ttry:\n\t\timf1 = &#039;%a, %d %b %Y %H:%M:%S GMT&#039;\n\t\treturn datetime.strptime(date, imf1).replace(tzinfo=utc)\n\texcept ValueError:\n\t\ttry:\n\t\t\trfc850 = &#039;%A, %d-%b-%y %H:%M:%S %Z&#039;\n\t\t\ttzname = re.fullmatch(r&#039;((\\w+), (\\d+)-(\\w+)-(\\d+) (\\d+):(\\d+):(\\d+)) (.+)&#039;, s).group(9)\n\t\t\tif tzname == &#039;GMT&#039;: tzname = &#039;UTC&#039;\n\t\t\treturn datetime.strptime(date, rfc850).replace(tzinfo=timezone(tzname))\n\t\texcept (ValueError, TypeError):\n\t\t\tpass\n\t\ttry:\n\t\t\tasctime = &#039;%a %b %-d %H:%M:%S %Y&#039;\n\t\t\treturn datetime.strptime(date, asctime).replace(tzinfo=utc)\n\t\texcept ValueError:\n\t\t\tpass\n\t\t# Neither of the &quot;obsolete&quot; formats worked, so re-raise original strptime error from preferred format\n\t\traise\n<\/code><\/pre>\n<p>If you don't care about parsing obsolete formats, this can be reduced to:<\/p>\n<pre><code class=\"language-python\" data-line=\"\">from datetime import datetime\nfrom datetime import timezone as tz\ndef parse_http_date(date):\n\timf1 = &#039;%a, %d %b %Y %H:%M:%S GMT&#039;\n\treturn datetime.strptime(date, imf1).replace(tzinfo=tz.utc)<\/code><\/pre>\n<p>\u2026which only uses the standard library!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Per RFC 7231 \u00a77.1.1.1<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[101],"tags":[72,44],"class_list":["post-918","post","type-post","status-publish","format-standard","hentry","category-writeups","tag-parsing","tag-python"],"_links":{"self":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts\/918","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/comments?post=918"}],"version-history":[{"count":18,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts\/918\/revisions"}],"predecessor-version":[{"id":2088,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/posts\/918\/revisions\/2088"}],"wp:attachment":[{"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/media?parent=918"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/categories?post=918"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.ishygddt.xyz\/~blog\/wp-json\/wp\/v2\/tags?post=918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}