Log in

When "when" is too hard a question: SQLAlchemy, Python datetime, and ISO8601

A new programmer asked on a work chat room how timezones are handled in databases. He asked if it was a good idea to store things in UTC. The senior programmers all laughed as we told some of our horror stories with timezones. Yes, UTC is great; if only it were that simple.
About a week later I was designing the schema for a blue sky project I'm implementing. I had to confront time in all its Pythonic horror.
Let's start with the datetime.datetime class. Datetime objects optionally include a timezone. If no timezone is present, several methods such as timestamp treat the object as a local time in the system's timezone. The timezone method returns a POSIX timestamp, which is always expressed in UTC, so knowing the input timezone is important. The now method constructs such an object from the current time.
However other methods act differently. The utcnow method constructs a datetime object that has the UTC time, but is not marked with a timezone. So, for example datetime.fromtimestamp(datetime.utcnow().timestamp()) produces the wrong result unless your system timezone happens to have the same offset as UTC.
It's also possible to construct a datetime object that includes a UTC time and is marked as having a UTC time. The utcnow method never does this, but you can pass the UTC timezone into the now method and get that effect. As you'd expect, the timestamp method returns the correct result on such a datetime.
Now enter SQLAlchemy, one of the more popular Python ORMs. Its DATETIME type has an argument that tries to request a column capable of storing a a timezone from the underlying database. You aren't guaranteed to get this though; some databases don't provide that functionality. With PostgreSQL, I do get such a column, although something in SQLAlchemy is not preserving the timezones (although it is correctly adjusting the time). That is, I'll store a UTC time in an object, flush it to my session, and then read back the same time represented in my local timezone (marked as my local timezone). You'd think this would be safe.
Enter SQLite. SQLite makes life hard for people wanting to store time; it seems to want to store things as strings. That's fairly incompatible with storing a timezone and doing any sort of comparisons on dates. SQLAlchemy does not try to store a timezone in SQLite. It just trims any timezone information from the datetime. So, if I do something like
d = datetime.now(timezone.utc)
obj.date_col = d
assert obj.date_col == d # fails
assert obj.date_col.timestamp() == d.timestamp() # fails
assert d == obj.date_col.replace(tzinfo = timezone.utc) # finally succeeds

There are some unfortunate consequences of this. If you mark your datetimes with timezone information (even if it is always the same timezone), whether two datetimes representing the same datetime compare equal depends on whether objects have been flushed to the session yet. If you don't mark your objects with timezones, then you may not store timezone information on other databases.
At least if you use only the methods we've discussed so far, you're reasonably safe if you use local time everywhere in your application and don't mark your datetimes with timezones. That's undesirable because as our new programmer correctly surmised, you really should be using UTC. This is particularly true if users of your database might span multiple timezones.
You can use UTC time and not mark your objects as UTC. This will give the wrong data with a database that actually does support timezones, but will sort of work with SQLite. You need to be careful never to convert your datetime objects into POSIX time as you'll get the wrong result.
It turns out that my life was even more complicated because parts of my project serialize data into JSON. For that serialization, I've chosen ISO 8601. You've probably seen that format: '2017-04-09T18:17:27.340410+00:00. Datetime provides the convenient isoformat method to print timestamps in the ISO 8601 format. If the datetime has a timezone indication, it is included in the ISO formatted string. If not, then no timezone indication is included. You know how I mentioned that datetime takes a string without a timezone marker as local time? Yeah, well, that's not what 8601 does: UTC all the way, baby! And at least the parser in the iso8601 module will always include timezone markers. So, if you use datetime to print a timestamp without a timezone marker and then read that back in to construct a new datetime on the deserialization side, then you'll get the wrong time. OK, so mark things with timezones then. Well, if you use local time, then the time you get depends on whether you print the ISO string before or after session flush (before or after SQLAlchemy trims the timezone information as it goes to SQLite).
It turns out that I had the additional complication of one side of my application using SQLite and one side using PostgreSQL. Remember how I mentioned that something between SQLAlchemy and PostgreSQL was recasting my times in local timezone (although keeping the time the same)? Well, consider how that's going to work. I serialize with the timezone marker on the PostgreSQL side. I get a ISO8601 localtime marked with the correct timezone marker. I deserialize on the SQLite side. Before session flush, I get a local time marked as localtime. After session flush, I get a local time with no marking. That's bad. If I further serialize on the SQLite side, I'll get that local time incorrectly marked as UTC. Moreover, all the times being locally generated on the SQLite side are UTC, and as we explored, SQLite really only wants one timezone in play.
I eventually came up with the following approach:

  1. If I find myself manipulating a time without a timezone marking, assert that its timezone is UTC not localtime.

  2. Always use UTC for times coming into the system.

  3. If I'm generating an ISO 8601 time from a datetime that has a timezone marker in a timezone other than UTC, represent that time as a UTC-marked datetime adjusting the time for the change in timezone.

This is way too complicated. I think that both datetime and SQLAlchemy's SQLite time handling have a lot to answer for. I think SQLAlchemy's core time handling may also have some to answer for, but I'm less sure of that.



timezone flag

storing datetimes in the DB with timezones is generally considered to be an antipattern, and you should always store things in UTC. The conversion between UTC and a local timezone is only when you first bring data in, and later when you display it.

that said, the timezone=True flag can be implemented for SQLite but per https://bitbucket.org/zzzeek/sqlalchemy/issues/1985/enable-timezone-in-sqlite-datetime-type there's been no interest in this, no pull requests or anything. I will gladly accept a well-written PR to provide support for the timezone flag on SQLite.

Re: timezone flag

I'd be all for only storing times in UTC. My issue is that I want the timestamp and isoformat methods to produce correct results. I actually think a better solution than implementing the timezone flag on sqlite would be to take an argument indicating what timezone things are stored in, and to have the bind param processor store in that timezone if the datetime has a timezone object and the row result processor to associate that timezone with the datetime object on the way in.


PyYAML too

This week I discovered that PyYAML converts incoming ISO8601 timestamps to datetime in UTC (but with no timezone attached). (I was able to override this, thankfully.)

April 2017

Powered by LiveJournal.com