1. Changelog
1.1. Revision 1 - May 30th, 2024
1.2. Revision 2 - Jun 6th, 2024
-
Initial version
2. Introduction
The
function lists the
member of a broken-down
that is used (along with the
category of the current locale) when calculating the output for
and
conversions.
However, additional members may appear in
, and it is not clear whether these members can also be used.
This matters because many timezones cannot be represented by the
field alone.
In particular, in POSIX-based systems
and
behavior also depends on timezone information outside of
and
.
This timezone information is derived from the contents of the
environment variable specified by POSIX, and
affects
and
indirectly via one of two mechanisms.
-
In POSIX.1-2017 and earlier, timezone information is in global state updated by POSIX’s
function, whichtzset
calls.strftime -
In POSIX.1-2024, timezone information is also in the
membersstruct tm
andtm_gmtoff
, which are extensions to the C standard.tm_zone
In some POSIX-based systems (e.g., Solaris)
and
use only global state; in others (e.g., GNU/Linux)
and
use only
and
; and in still others (e.g., macOS)
and
use a mixture of the two.
One interpretation of the C standard (call it A) is that the characters generated for
and
are completely determined by
and
.
Because interpretation (A) prohibits any dependency on the
environment variable, it prohibits all POSIX-based systems, which is so implausible that it will not be discussed further here.
Another interpretation (call it B) is that
and
’s behavior can also depend on information outside the scope of the C standard.
Interpretation (B) would allow all POSIX-based systems.
A third interpretation (call it C) is like (B), except that it prohibits
and
’s behavior from depending on
members like
and
that are outside the scope of the C standard.
Interpretation (C) would allow systems like Solaris, but would prohibit systems like GNU/Linux and macOS.
The following program illustrates differences between (B) and (C).
// Program 1 #include <stdio.h>#include <time.h>int main () { time_t epoch = { 0 }; char gbuf [ 100 ], lbuf [ 100 ]; struct tm gtm = * gmtime ( & epoch ); struct tm ltm = * localtime ( & epoch ); strftime ( gbuf , sizeof gbuf , "%%z=%z %%Z=%Z" , & gtm ); strftime ( lbuf , sizeof lbuf , "%%z=%z %%Z=%Z" , & ltm ); printf ( " gmtime %s tm_isdst=%d \n " , gbuf , gtm . tm_isdst ); printf ( "localtime %s tm_isdst=%d \n " , lbuf , ltm . tm_isdst ); }
Suppose Program 1 is executed on a POSIX.1-2024 system with
in the environment.
At the epoch (1970-01-01 00:00:00 UTC), London observed British Standard Time, one hour ahead of UTC.
Interpretation (B) would allow the following output, indicating that the epoch’s time zone is +0000 (UTC) when interpreted via
, and is +0100 (BST) when interpreted via
:
gmtime % z =+ 0000 % Z = UTC tm_isdst = 0 localtime % z =+ 0100 % Z = BST tm_isdst = 0
However, interpretation (C) would prohibit this output because (C) requires
and
to generate the same characters in both
calls, as they both have the same
values.
Instead, (C) would require output like this:
gmtime % z =+ 0000 % Z = GMT tm_isdst = 0 localtime % z =+ 0000 % Z = GMT tm_isdst = 0
or this:
gmtime % z =+ 0100 % Z = BST tm_isdst = 0 localtime % z =+ 0100 % Z = BST tm_isdst = 0
or this:
gmtime % z = -0001 % Z = LMT tm_isdst = 0 localtime % z = -0001 % Z = LMT tm_isdst = 0
depending on which standard time the POSIX implementation happens to choose, as London has observed each of these three forms of standard time at some point.
When
or
is used, interpretation (B) is obviously better.
However, when neither
nor
is involved, interpretation (C) can be better.
For example:
// Program 2 #include <stdio.h>#include <time.h>int main () { char buf [ 100 ]; struct tm tm ; tm . tm_isdst = 0 ; strftime ( buf , sizeof buf , "%%z=%z %%Z=%Z" , & tm ); puts ( buf ); }
Suppose Program 2 is executed on a POSIX.1-2024 system with
in the environment.
Interpretation (B) says strftime can access the uninitialized
and
members, leading to undefined behavior; systems like GNU/Linux and macOS behave this way and have undefined behavior up to and including core dumps.
Interpretation (C) says the output is “
”, “
”, or “
” depending on which of London’s three standard times the implementation chooses, so there is no undefined behavior; systems like Solaris behave this way.
An application developer might prefer interpretation (C) for Program 2 even though the output is indeterminate, as it at least avoids undefined behavior.
In practice, code like Program 1 is far more common than uses like Program 2. This has been confirmed by an informal survey of public code on GitHub, and this suggests that if no interpretation can be compatible with both Program 1 and Program 2, compatibility with Program 1 should be preferred.
There is one additional related issue.
The C standard’s
members do not suffice to specify times unambiguously, even when
is specified.
For example:
// Program 3 #include <stdio.h>#include <time.h>int main () { char buf [ 100 ]; struct tm tm ; tm . tm_year = 2007 - 1900 ; tm . tm_mon = 11 ; tm . tm_mday = 9 ; tm . tm_hour = 2 ; tm . tm_min = 45 ; tm . tm_sec = 0 ; tm . tm_isdst = 0 ; #ifdef POSIX tm . tm_gmtoff = 4 * 60 * 60 ; #endif strftime ( buf , sizeof buf , "%Y-%m-%d %H:%M:%S %z" , & tm ); puts ( buf ); }
On a platform taking interpretation (C) when
Program 3 can output either “
” or “
” because both timestamps are equally plausible: Venezuela standard time was adjusted at 03:00 that day by moving the clocks backwards 30 minutes, and there is no way for the application to specify which of the two timestamps is desired. A POSIX.1-2014 implementation taking interpretation (B) has no problem with ambiguity, as it can inspect
.
So, the question is: is POSIX extending
in an allowed way?
This paper presents three options:
-
Explicitly permit additional members to be used in
and% z
conversions.% Z -
Extend the broken-down time structure to handle
and% z
conversions, and extend% Z
andlocaltime
accordingly, wheregmtime
returns a null pointer if no time zone is determinable.localtime -
Like Option 2, except
instead setslocaltime
to an empty string if no time zone is determinable.tm_zone
Option 1 is the most conservative: it merely clarifies and/or changes the C standard to allow common behavior on POSIX-based systems.
Options 2 and 3 would move POSIX’s
and
into the C standard.
The distinction between Options 2 and 3 does not matter for POSIX-based systems where the time zone is always determinable; it matters only for systems that cannot determine the time zone, where Option 2 would likely cause misbehavior in applications that expect
to succeed on everyday timestamps, while Option 3 would let these applications continue to behave as before.
3. Proposed wording
For both options, the wording is relative to the published ISO/IEC 9899:2023 standard.
3.1. Option 1
Make the following changes to
§7.29.3.5 The
function
%z is replaced by the offset from UTC in the ISO 8601 format “-0430” (meaning 4 hours 30 minutes behind UTC, west of Greenwich), or by no characters if no time zone is determinable.[Behavior is undefined if the broken-down time structure does not have a value that could be returned by]
tm_isdst or
localtime . [all members, including any non-standard additional members]
gmtime
is replaced by the locale’s time zone name or abbreviation, or by no characters if no time zone is determinable.
% Z [Behavior is undefined if the broken-down time structure does not have a value that could be returned by]
tm_isdst or
localtime . [all members, including any non-standard additional members]
gmtime
3.2. Option 2
§7.29.1 Components of time, para 6
The tm structure shall contain at least the following members, in any order.389) The semantics of the members and their normal ranges are expressed in the comments.
int tm_sec; // seconds after the minute -- [0, 60] int tm_min; // minutes after the hour -- [0, 59] int tm_hour; // hours since midnight -- [0, 23] int tm_mday; // day of the month -- [1, 31] int tm_mon; // months since January -- [0, 11] int tm_year; // years since 1900 int tm_wday; // days since Sunday -- [0, 6] int tm_yday; // days since January 1 -- [0, 365] int tm_isdst; // Daylight Saving Time flag long tm_gmtoff; // Seconds east of UTC. const char *tm_zone; // Timezone abbreviation.
§7.29.3.3 The
functions
Thefunctions convert the calendar time pointed to by timer into a broken-down time, expressed as UTC. They set the broken-down time’s
gmtime member to a pointer to a string
tm_zone with static storage duration.
"UTC"
§7.29.3.4 The
functions
DescriptionThe
functions convert the calendar time pointed to by timer into a broken-down time, expressed as local time. They set the broken-down time’s
localtime member to a pointer to a string with lifetime that extends to the end of the program. (Footnote: Implementations may shorten the lifetime of a
tm_zone string when a program uses extensions to the C standard, for example, by setting the
tm_zone environment variable. )
TZ Returns
The
functions return a pointer to the broken-down time, or a null pointer if the specified time cannot be converted to local time or if no time zone is determinable .
localtime
§7.29.3.5 The
function
is replaced by the offset from UTC in the ISO 8601 format "-0430" (meaning 4 hours 30 minutes behind UTC, west of Greenwich), or by no characters if no time zone is determinable. [
% z tm_isdsttm_gmtoff ]
is replaced by the locale’s time zone name or abbreviation, or by no characters if no time zone is determinable. [
% Z tm_isdsttm_zone ]
3.3. Option 3
This is the same as Option 2, except for Ҥ7.29.3.4 The
functions” where the changes are as follows instead, with no change needed to the Returns paragraph:
DescriptionThe
functions convert the calendar time pointed to by timer into a broken-down time, expressed as local time. They set the broken-down time’s
localtime member to a pointer to a string with lifetime that extends to the end of the program. (Footnote: Implementations may shorten the lifetime of a
tm_zone string when a program uses extensions to the C standard, for example, by setting the
tm_zone environment variable. ) If no time zone is determinable, these functions set the
TZ member to an empty string and the
tm_zone member to 0.
tm_gmtoff
4. Acknowledgements
Thanks to Paul Eggert and Geoff Clare for helping to formulate the wording.