N3272
Timezones and the strftime function

Issue,

Author:
Project:
ISO/IEC 9899 Programming Languages — C, ISO/IEC JTC1/SC22/WG14
Proposal Category:
Liaison Question
Target:
C2y/C3a

Abstract

Can strftime access unspecified fields in the broken-down time?

1. Changelog

1.1. Revision 1 - May 30th, 2024

1.2. Revision 2 - Jun 6th, 2024

2. Introduction

The strftime function lists the tm_isdst member of a broken-down struct tm that is used (along with the LC_TIME category of the current locale) when calculating the output for %z and %Z conversions. However, additional members may appear in struct tm, and it is not clear whether these members can also be used. This matters because many timezones cannot be represented by the tm_isdst field alone.

In particular, in POSIX-based systems %z and %Z behavior also depends on timezone information outside of tm_isdst and LC_CTIME. This timezone information is derived from the contents of the TZ environment variable specified by POSIX, and TZ affects %z and %Z indirectly via one of two mechanisms.

In some POSIX-based systems (e.g., Solaris) %z and %Z use only global state; in others (e.g., GNU/Linux) %z and %Z use only tm_gmtoff and tm_zone; and in still others (e.g., macOS) %z and %Z use a mixture of the two.

One interpretation of the C standard (call it A) is that the characters generated for %z and %Z are completely determined by tm_isdst and LC_TIME. Because interpretation (A) prohibits any dependency on the TZ environment variable, it prohibits all POSIX-based systems, which is so implausible that it will not be discussed further here.

Another interpretation (call it B) is that %z and %Z’s behavior can also depend on information outside the scope of the C standard. Interpretation (B) would allow all POSIX-based systems.

A third interpretation (call it C) is like (B), except that it prohibits %z and %Z’s behavior from depending on struct tm members like tm_gmtoff and tm_zone that are outside the scope of the C standard. Interpretation (C) would allow systems like Solaris, but would prohibit systems like GNU/Linux and macOS.

The following program illustrates differences between (B) and (C).

// Program 1
#include <stdio.h>
#include <time.h>

int
main ()
{
    time_t epoch = {0};
    char gbuf[100], lbuf[100];
    struct tm gtm = *   gmtime (&epoch);
    struct tm ltm = *localtime (&epoch);
    strftime (gbuf, sizeof gbuf, "%%z=%z %%Z=%Z", &gtm);
    strftime (lbuf, sizeof lbuf, "%%z=%z %%Z=%Z", &ltm);
    printf ("   gmtime %s tm_isdst=%d\n", gbuf, gtm.tm_isdst);
    printf ("localtime %s tm_isdst=%d\n", lbuf, ltm.tm_isdst);
}

Suppose Program 1 is executed on a POSIX.1-2024 system with TZ="Europe/London" in the environment. At the epoch (1970-01-01 00:00:00 UTC), London observed British Standard Time, one hour ahead of UTC. Interpretation (B) would allow the following output, indicating that the epoch’s time zone is +0000 (UTC) when interpreted via gmtime, and is +0100 (BST) when interpreted via localtime:

   gmtime %z=+0000 %Z=UTC tm_isdst=0
localtime %z=+0100 %Z=BST tm_isdst=0

However, interpretation (C) would prohibit this output because (C) requires %z and %Z to generate the same characters in both strftime calls, as they both have the same tm_isdst values. Instead, (C) would require output like this:

   gmtime %z=+0000 %Z=GMT tm_isdst=0
localtime %z=+0000 %Z=GMT tm_isdst=0

or this:

   gmtime %z=+0100 %Z=BST tm_isdst=0
localtime %z=+0100 %Z=BST tm_isdst=0

or this:

   gmtime %z=-0001 %Z=LMT tm_isdst=0
localtime %z=-0001 %Z=LMT tm_isdst=0

depending on which standard time the POSIX implementation happens to choose, as London has observed each of these three forms of standard time at some point.

When localtime or gmtime is used, interpretation (B) is obviously better. However, when neither localtime nor gmtime is involved, interpretation (C) can be better. For example:

// Program 2
#include <stdio.h>
#include <time.h>

int
main ()
{
    char buf[100];
    struct tm tm;
    tm.tm_isdst = 0;
    strftime (buf, sizeof buf, "%%z=%z %%Z=%Z", &tm);
    puts (buf);
}

Suppose Program 2 is executed on a POSIX.1-2024 system with TZ="Europe/London" in the environment. Interpretation (B) says strftime can access the uninitialized tm_gmtoff and tm_zone members, leading to undefined behavior; systems like GNU/Linux and macOS behave this way and have undefined behavior up to and including core dumps. Interpretation (C) says the output is “%z=+0000 %Z=GMT”, “%z=+0100 %Z=BST”, or “%z=-0001 %Z=LMT” depending on which of London’s three standard times the implementation chooses, so there is no undefined behavior; systems like Solaris behave this way. An application developer might prefer interpretation (C) for Program 2 even though the output is indeterminate, as it at least avoids undefined behavior.

In practice, code like Program 1 is far more common than uses like Program 2. This has been confirmed by an informal survey of public code on GitHub, and this suggests that if no interpretation can be compatible with both Program 1 and Program 2, compatibility with Program 1 should be preferred.

There is one additional related issue. The C standard’s struct tm members do not suffice to specify times unambiguously, even when tm_isdst is specified. For example:

// Program 3
#include <stdio.h>
#include <time.h>

int
main ()
{
    char buf[100];
    struct tm tm;
    tm.tm_year = 2007 - 1900;
    tm.tm_mon = 11;
    tm.tm_mday = 9;
    tm.tm_hour = 2;
    tm.tm_min = 45;
    tm.tm_sec = 0;
    tm.tm_isdst = 0;
#ifdef POSIX
    tm.tm_gmtoff = 4 * 60 * 60;
#endif
    strftime (buf, sizeof buf, "%Y-%m-%d %H:%M:%S %z", &tm);
    puts (buf);
}

On a platform taking interpretation (C) when TZ="America/Caracas" Program 3 can output either “2007-12-09 02:45:00 -0430” or “2007-12-09 02:45:00 -0400” because both timestamps are equally plausible: Venezuela standard time was adjusted at 03:00 that day by moving the clocks backwards 30 minutes, and there is no way for the application to specify which of the two timestamps is desired. A POSIX.1-2014 implementation taking interpretation (B) has no problem with ambiguity, as it can inspect tm_gmtoff.

So, the question is: is POSIX extending strftime in an allowed way? This paper presents three options:

  1. Explicitly permit additional members to be used in %z and %Z conversions.

  2. Extend the broken-down time structure to handle %z and %Z conversions, and extend localtime and gmtime accordingly, where localtime returns a null pointer if no time zone is determinable.

  3. Like Option 2, except localtime instead sets tm_zone to an empty string if no time zone is determinable.

Option 1 is the most conservative: it merely clarifies and/or changes the C standard to allow common behavior on POSIX-based systems.

Options 2 and 3 would move POSIX’s tm_gmtoff and tm_zone into the C standard. The distinction between Options 2 and 3 does not matter for POSIX-based systems where the time zone is always determinable; it matters only for systems that cannot determine the time zone, where Option 2 would likely cause misbehavior in applications that expect localtime to succeed on everyday timestamps, while Option 3 would let these applications continue to behave as before.

3. Proposed wording

For both options, the wording is relative to the published ISO/IEC 9899:2023 standard.

3.1. Option 1

Make the following changes to §7.29.3.5 The strftime function

%z is replaced by the offset from UTC in the ISO 8601 format “-0430” (meaning 4 hours 30 minutes behind UTC, west of Greenwich), or by no characters if no time zone is determinable. [tm_isdst] Behavior is undefined if the broken-down time structure does not have a value that could be returned by localtime or gmtime. [all members, including any non-standard additional members]

%Z is replaced by the locale’s time zone name or abbreviation, or by no characters if no time zone is determinable. [tm_isdst] Behavior is undefined if the broken-down time structure does not have a value that could be returned by localtime or gmtime. [all members, including any non-standard additional members]

3.2. Option 2

§7.29.1 Components of time, para 6

The tm structure shall contain at least the following members, in any order.389) The semantics of the members and their normal ranges are expressed in the comments.
int tm_sec; // seconds after the minute -- [0, 60]
int tm_min; // minutes after the hour -- [0, 59]
int tm_hour; // hours since midnight -- [0, 23]
int tm_mday; // day of the month -- [1, 31]
int tm_mon; // months since January -- [0, 11]
int tm_year; // years since 1900
int tm_wday; // days since Sunday -- [0, 6]
int tm_yday; // days since January 1 -- [0, 365]
int tm_isdst; // Daylight Saving Time flag
long tm_gmtoff; // Seconds east of UTC.
const char *tm_zone; // Timezone abbreviation.

§7.29.3.3 The gmtime functions

The gmtime functions convert the calendar time pointed to by timer into a broken-down time, expressed as UTC. They set the broken-down time’s tm_zone member to a pointer to a string "UTC" with static storage duration.

§7.29.3.4 The localtime functions

Description

The localtime functions convert the calendar time pointed to by timer into a broken-down time, expressed as local time. They set the broken-down time’s tm_zone member to a pointer to a string with lifetime that extends to the end of the program. (Footnote: Implementations may shorten the lifetime of a tm_zone string when a program uses extensions to the C standard, for example, by setting the TZ environment variable. )

Returns

The localtime functions return a pointer to the broken-down time, or a null pointer if the specified time cannot be converted to local time or if no time zone is determinable .

§7.29.3.5 The strftime function

%z is replaced by the offset from UTC in the ISO 8601 format "-0430" (meaning 4 hours 30 minutes behind UTC, west of Greenwich), or by no characters if no time zone is determinable. [ tm_isdst tm_gmtoff ]

%Z is replaced by the locale’s time zone name or abbreviation, or by no characters if no time zone is determinable. [ tm_isdst tm_zone ]

3.3. Option 3

This is the same as Option 2, except for “§7.29.3.4 The localtime functions” where the changes are as follows instead, with no change needed to the Returns paragraph:

Description

The localtime functions convert the calendar time pointed to by timer into a broken-down time, expressed as local time. They set the broken-down time’s tm_zone member to a pointer to a string with lifetime that extends to the end of the program. (Footnote: Implementations may shorten the lifetime of a tm_zone string when a program uses extensions to the C standard, for example, by setting the TZ environment variable. ) If no time zone is determinable, these functions set the tm_zone member to an empty string and the tm_gmtoff member to 0.

4. Acknowledgements

Thanks to Paul Eggert and Geoff Clare for helping to formulate the wording.