Per RFC 7231 §7.1.1.1:

A recipient that parses a timestamp value in an HTTP header field MUST accept all three HTTP-date formats.

These formats are then described as (with the first being the only preferred format; the latter 2 are designated as "obsolete"), converted for this post into strftime(3) syntax:

  1. %a, %d %b %Y %H:%M:%S %Z
    with the timezone always given as "GMT", but to be interpreted as UTC.
  2. %A, %d-%b-%y %H:%M:%S %Z
    where the timezone may be equal to any of an array of "standard" abbreviations from RFC 850 §2.1.4.
  3. %a %b %-d %H:%M:%S %Y
    where you simply pray/assume that the remote server is operating in UTC.

However, Python's strptime function does not support timezones: it eats them with %Z, but does not actually use them. Therefore, we will have to hack this support in ourselves. The pytz module is indispensible for parsing them, so we will appreciate/use it. (We additionally have to crack open re because, depressingly, strptime does not even make available to us that which it matched as %Z.)

So, a Python  function to parse an HTTP Date header into a datetime object would be something like:

from datetime import datetime
from pytz import timezone, utc
import re
def parse_http_date(date):
	try:
		imf1 = '%a, %d %b %Y %H:%M:%S GMT'
		return datetime.strptime(date, imf1).replace(tzinfo=utc)
	except ValueError:
		try:
			rfc850 = '%A, %d-%b-%y %H:%M:%S %Z'
			tzname = re.fullmatch(r'((\w+), (\d+)-(\w+)-(\d+) (\d+):(\d+):(\d+)) (.+)', s).group(9)
			if tzname == 'GMT': tzname = 'UTC'
			return datetime.strptime(date, rfc850).replace(tzinfo=timezone(tzname))
		except (ValueError, TypeError):
			pass
		try:
			asctime = '%a %b %-d %H:%M:%S %Y'
			return datetime.strptime(date, asctime).replace(tzinfo=utc)
		except ValueError:
			pass
		# Neither of the "obsolete" formats worked, so re-raise original strptime error from preferred format
		raise

If you don't care about parsing obsolete formats, this can be reduced to:

from datetime import datetime
from datetime import timezone as tz
def parse_http_date(date):
	imf1 = '%a, %d %b %Y %H:%M:%S GMT'
	return datetime.strptime(date, imf1).replace(tzinfo=tz.utc)

…which only uses the standard library!

Leave a Reply

Your email address will not be published. Required fields are marked *

Warning: This site uses Akismet to filter spam. Until or unless I can find a suitable replacement anti-spam solution, this means that (per their indemnification document) all commenters' IP addresses will be sent to Automattic, Inc., who may choose to share such with 3rd parties.
If this is unacceptable to you, I highly recommend using an anonymous proxy or public Wi-Fi connection when commenting.