As per RFC 7231 §7.1.1.1:

A recipient that parses a timestamp value in an HTTP header field MUST accept all three HTTP-date formats.

These formats are then described as (with the first being the only preferred format; the latter 2 are designated as "obsolete"), converted for this post into strftime(3) syntax:

  1. %a, %d %b %Y %H:%M:%S %Z
    with the timezone always given as "GMT", but to be interpreted as UTC.
  2. %A, %d-%b-%y %H:%M:%S %Z
    where the timezone may be equal to any of an array of "standard" abbreviations from RFC 850 §2.1.4.
  3. %a %b %-d %H:%M:%S %Y
    where you simply pray/assume that the remote server is operating in UTC.

However, Python's strptime function does not support timezones: it eats them with %Z, but does not actually use them. Therefore, we will have to hack this support in ourselves. The pytz module is indispensible for parsing them, so we will appreciate/use it. (We additionally have to crack open re because, depressingly, strptime does not even make available to us that which it matched as %Z.)

So, a Python  function to parse an HTTP Date header into a datetime object would be something like:

from datetime import datetime
from pytz import timezone, utc
import re
def parse_http_date(date):
	try:
		imf1 = '%a, %d %b %Y %H:%M:%S GMT'
		return datetime.strptime(date, imf1).replace(tzinfo=utc)
	except ValueError:
		try:
			rfc850 = '%A, %d-%b-%y %H:%M:%S %Z'
			tzname = re.fullmatch(r'((\w+), (\d+)-(\w+)-(\d+) (\d+):(\d+):(\d+)) (.+)', s).group(9)
			if tzname == 'GMT': tzname = 'UTC'
			return datetime.strptime(date, rfc850).replace(tzinfo=timezone(tzname))
		except (ValueError, TypeError):
			pass
		try:
			asctime = '%a %b %-d %H:%M:%S %Y'
			return datetime.strptime(date, asctime).replace(tzinfo=utc)
		except ValueError:
			pass
		# Neither of the "obsolete" formats worked, so re-raise original strptime error from preferred format
		raise

If you don't care about parsing obsolete formats, this can be reduced to:

from datetime import datetime
from datetime import timezone as tz
def parse_http_date(date):
	imf1 = '%a, %d %b %Y %H:%M:%S GMT'
	return datetime.strptime(date, imf1).replace(tzinfo=tz.utc)

…which only uses the standard library!

Leave a Reply

Your email address will not be published. Required fields are marked *