About a week later I was designing the schema for a blue sky project I'm implementing. I had to confront time in all its Pythonic horror.
Let's start with the
datetime.datetimeclass. Datetime objects optionally include a timezone. If no timezone is present, several methods such as
timestamptreat the object as a local time in the system's timezone. The timezone method returns a POSIX timestamp, which is always expressed in UTC, so knowing the input timezone is important. The
nowmethod constructs such an object from the current time.
However other methods act differently. The
utcnowmethod constructs a datetime object that has the UTC time, but is not marked with a timezone. So, for example
datetime.fromtimestamp(datetime.utcnow()produces the wrong result unless your system timezone happens to have the same offset as UTC.
It's also possible to construct a datetime object that includes a UTC time and is marked as having a UTC time. The
utcnowmethod never does this, but you can pass the UTC timezone into the
nowmethod and get that effect. As you'd expect, the
timestampmethod returns the correct result on such a datetime.
Now enter SQLAlchemy, one of the more popular Python ORMs. Its
DATETIMEtype has an argument that tries to request a column capable of storing a a timezone from the underlying database. You aren't guaranteed to get this though; some databases don't provide that functionality. With PostgreSQL, I do get such a column, although something in SQLAlchemy is not preserving the timezones (although it is correctly adjusting the time). That is, I'll store a UTC time in an object, flush it to my session, and then read back the same time represented in my local timezone (marked as my local timezone). You'd think this would be safe.
Enter SQLite. SQLite makes life hard for people wanting to store time; it seems to want to store things as strings. That's fairly incompatible with storing a timezone and doing any sort of comparisons on dates. SQLAlchemy does not try to store a timezone in SQLite. It just trims any timezone information from the datetime. So, if I do something like
d = datetime.now(timezone.utc) obj.date_col = d session.add(obj) session.flush() assert obj.date_col == d # fails assert obj.date_col.timestamp() == d.timestamp() # fails assert d == obj.date_col.replace(tzinfo = timezone.utc) # finally succeeds
There are some unfortunate consequences of this. If you mark your datetimes with timezone information (even if it is always the same timezone), whether two datetimes representing the same datetime compare equal depends on whether objects have been flushed to the session yet. If you don't mark your objects with timezones, then you may not store timezone information on other databases.
At least if you use only the methods we've discussed so far, you're reasonably safe if you use local time everywhere in your application and don't mark your datetimes with timezones. That's undesirable because as our new programmer correctly surmised, you really should be using UTC. This is particularly true if users of your database might span multiple timezones.
You can use UTC time and not mark your objects as UTC. This will give the wrong data with a database that actually does support timezones, but will sort of work with SQLite. You need to be careful never to convert your datetime objects into POSIX time as you'll get the wrong result.
It turns out that my life was even more complicated because parts of my project serialize data into JSON. For that serialization, I've chosen ISO 8601. You've probably seen that format: '2017-04-09T18:17:27.340410+00:00. Datetime provides the convenient
isoformatmethod to print timestamps in the ISO 8601 format. If the datetime has a timezone indication, it is included in the ISO formatted string. If not, then no timezone indication is included. You know how I mentioned that datetime takes a string without a timezone marker as local time? Yeah, well, that's not what 8601 does: UTC all the way, baby! And at least the parser in the
iso8601module will always include timezone markers. So, if you use datetime to print a timestamp without a timezone marker and then read that back in to construct a new datetime on the deserialization side, then you'll get the wrong time. OK, so mark things with timezones then. Well, if you use local time, then the time you get depends on whether you print the ISO string before or after session flush (before or after SQLAlchemy trims the timezone information as it goes to SQLite).
It turns out that I had the additional complication of one side of my application using SQLite and one side using PostgreSQL. Remember how I mentioned that something between SQLAlchemy and PostgreSQL was recasting my times in local timezone (although keeping the time the same)? Well, consider how that's going to work. I serialize with the timezone marker on the PostgreSQL side. I get a ISO8601 localtime marked with the correct timezone marker. I deserialize on the SQLite side. Before session flush, I get a local time marked as localtime. After session flush, I get a local time with no marking. That's bad. If I further serialize on the SQLite side, I'll get that local time incorrectly marked as UTC. Moreover, all the times being locally generated on the SQLite side are UTC, and as we explored, SQLite really only wants one timezone in play.
I eventually came up with the following approach:
- If I find myself manipulating a time without a timezone marking, assert that its timezone is UTC not localtime.
- Always use UTC for times coming into the system.
- If I'm generating an ISO 8601 time from a datetime that has a timezone marker in a timezone other than UTC, represent that time as a UTC-marked datetime adjusting the time for the change in timezone.
This is way too complicated. I think that both datetime and SQLAlchemy's SQLite time handling have a lot to answer for. I think SQLAlchemy's core time handling may also have some to answer for, but I'm less sure of that.