Every programmer will eventually find themselves writing a feature that requires handling of dates and times. And if you need to deal in any way with things like multiple time zones, daylight savings offsets, etc., you quickly realize that it’s substantially trickier than it might initially seem. For many applications, you can get by with being less than totally rigorous (how many social media users will care if the timestamp on their vacation photos is off by a couple hours?). But at Redfin, we need to deal with dates and times of things like open houses and tour appointments, deadlines for documents related to real estate transactions, etc. So complete rigor and precision is pretty important (the last thing you want is an agent waiting around at an open house where no one shows up because we’re displaying a time that’s off by a couple hours). And we’ve found that one notable potential pitfall is the problematic (but somewhat common) practice of representing dates as timestamps:
Most programmers will be familiar with the notion of a “date-time” (also often referred to as a “timestamp”): a representation, in some syntax or another, of a specific instant in time. Some common representations are:
- A string with date and time, plus time zone, such as “2016-06-20 17:30:45 PDT”
- A number of seconds or milliseconds since the “Unix epoch” (Jan. 1, 1970 UTC)
And most programming languages have a data type, either built-in or from a well-known third-party library, for representing this concept: Boost for C++ has the local_date_time class, Joda-Time for Java has the DateTime class, Python has the datetime class, SQL has the timestamp type, and so on.
If I parse “2015-04-15 00:00:00” into, say, the corresponding Python datetime object, how is code far upstream going to know that this object is supposed to represent the date “April 5, 2015” (*just* the date, irrespective of time), and not the specific instant of midnight on that day? I’m relying on every developer who writes code consuming that object to “just know” (somehow) that it represents the date generically and not the specific instant. This might seem like a trivial nit to pick, but in fact this exact issue can cause quite significant bugs, as the following real-world example demonstrates (yes, this really happened):
- I have a value that’s “just a date” of Oct. 10, 2015, and store it in the database as a SQL timestamp of “2015-10-10 00:00:00”
- A select in the application server returns this as a Java object representing “2015-10-10 00:00:00 PDT” (PDT, i.e. Pacific Daylight Time, because the servers and JVMs are all set to the Pacific time zone).
- I have a serialization layer that serializes all such date-time objects by deriving the Unix timestamp (milliseconds since the Unix epoch), so my object is returned to a browser Ajax call as the integer 1444460400000 (i.e. the milliseconds-from-epoch value corresponding to “2015-10-10 00:00:00 PDT”)
- If the browser’s time zone is Pacific (same as my server time zone) then the resulting object will correctly display as “2015-10-10 00:00:00”. My UI is only interested in the date part of this, and will show “2015-10-10” as desired.
- If the browser time zone is *east* of Pacific time, say Eastern time, the resulting object will display as “2015-10-10 03:00:00”, but that’s still OK since the UI logic is only showing the date part.
- If the browser time zone is *west* of Pacific time, say Hawaii, the resulting object will display as “2015-10-09 21:00:00” — note how the date part got shifted *back* to the *previous* day. Oops! Application logic showing the date portion will show the wrong day!!
If this date were some critical deadline (e.g. “submissions must be postmarked no later than Oct. 10, 2015”), this would be a severe bug indeed!
Why did this happen? Because I’m using the wrong tool for the job. Upstream code receives my date-time object (“2015-10-10 00:00:00 PDT”) and takes it at face value, treating it as a specific instant in time (exactly midnight on Oct. 10 PDT) whose component fields (day, hour, etc.) are subject to shifts depending on the time zone of the client consuming it.
A better choice, when my data is intrinsically “just a date” with no time portion meaningfully attributable to it, is to use a data type that correctly models that concept: For example the new java.time package available in Java 8 has a LocalDate class that internally is just three integers (year, month, day), Joda-Time has a similar LocalDate class, Python datetime module has the date class, etc. For relational databases, the SQL standard defines the DATE datatype for exactly this purpose. For languages that have no such “date only” class available, it is relatively trivial to implement your own (three integers: year, month, day).