NB comments by AFNOR for ballot on ISO/IEC 9899:2023

document:	PL1/SC22/WG14 N3072
date:	2022-12-23
editor:	Jens Gustedt

So the text in the introduction should not refer to one

There are mis-classifications of identifiers in the index and in Annex J

This NB comment is not meant as a criticism towards the editors, who have done an incredibly good job, but merely to give them the permission and opportunity to correct such small glitches.

Missing entry in Annex B for `<uchar.h>`

The type uchar8_t is missing in the listing of symbols added to <uchar.h>.

The magic constants for `__has_embed` should have symbolic names

Please add magic constants even during preprocessing and introduce symbolic names to the effect of

##define __STDC_EMBED_NOT_FOUND__ 0
##define __STDC_EMBED_FOUND__     1
##define __STDC_EMBED_EMPTY__     2

For our users the easiest would be to change 6.10.1 p7

The resource (6.10.3.1) identified by the header-name preprocessing token sequence in each contained has_embed expression is searched for as if those preprocessing token were the pp-tokens in a #embed directive, except that no further macro expansion is performed. Such a directive shall satisfy the syntactic requirements of a #embed directive. The has_embed expression evaluates to the same value as the following mandatory macros (6.10.9.1) :

– 0__STDC_EMBED_NOT_FOUND__ if the search fails or if any of the embed parameters in the embed parameter sequence specified are not supported by the implementation for the #embed directive; or,

– 1__STDC_EMBED_FOUND__ if the search for the resource succeeds and all embed parameters in the embed parameter sequence specified are supported by the implementation for the #embed directive and the resource is not empty; or,

– 2__STDC_EMBED_EMPTY__ if the search for the resource succeeds and all embed parameters in the embed parameter sequence specified are supported by the implementation for the #embed directive and the resource is empty.

Add an item to 6.10.9.1

__STDC_EMBED_NOT_FOUND__, __STDC_EMBED_FOUND__, and __STDC_EMBED_EMPTY__

expand to the values 0, 1 and 2, respectively.

If that is not possible please consider to add such symbolic names to a C library header, perhaps <stddef.h>.

It seems that `#embed` offers multiple ways to express the same feature

Please re-synchronize with WG21 and the version of #embed that has been adopted there, to avoid too much implementation complexity and ensure forward compatibility. In particular, WG21 expressed a slight preference for a version of embed without is_empty/suffix/prefix (which was presented as an optional feature), but there is also a desire to use the same feature as WG14. Please reconsider the necessity of that optional part of the adopted proposal.

The status of optional identifiers could be clarified

WG14 does not stick to their announced policies concerning addition of identifiers. Proposed-C23 is quite ambiguous with that respect: on the one hand it clarifies the rules by introducing the term “potentially reserved identifier” on the other hand, in direct violation of the policy announced in ISO/IEC 9899:2018, it newly reserves hundreds of unprefixed identifiers with marginal use in the library clause. Some of these are even short abbreviations or common English words, that have an increased risk of collision with identifier is the application realm.

This could be made less worrisome if identifiers that are added but are optional would be “potentially reserved” and not “reserved”. For example, this would in particular ease the pain for free-standing environments that do not intend to implement the decimal floating point option.

Alternatively, these new decimal floating point functions could use a prefix such as stdc_ to avoid clashing with existing code.

Optional identifiers that the implementation does not implement

The reading of the new paragraph about “potentially reserved identifiers” 6.4.2.1 p10 is not clear in how optional identifiers fit in. Do they become reserved because an implementation implements the feature or are they reserved upfront? An answer to that is not completely trivial, because for example optional macros are often used as feature tests (so they should always be reserved) whereas optional library interfaces (such as atomics, decimal float, or Annex K) only interfere on implementations that choose to implement them.

Proposed change in 6.4.2.1 p10:

Some identifiers may be potentially reserved. A potentially reserved identifier is an identifier which is not reserved unless made so by an implementation providing the identifier (7.1.3) but is anticipated to become reserved by an implementation or a future version of this document. An identifier that this document describes as optional:

If it is defined as a macro it is reserved.

Otherwise, if the definition is given in clauses 1 to 6 it is reserved.

Otherwise, it is potentially reserved.

Most implementations probably already behave according to this; not much of them warn for optional identifiers they don’t implement. Otherwise, more sophisticated tools may me mildly impacted in that they’d have to change their diagnostic for misuse of these identifiers from “reserved” to “potentially reserved”.

There is a new ambiguity for the initialization of `union`s if they are redeclared.

If there is no designated initializer, unions are initialized as if for their first member. In the proposed wording, redeclaration of a union makes it ambiguous and scope dependent which member would be considered first. Thus initialization of unions becomes fragile and may, for example, change when the include order of headers is changed.

Please fix this to either

require that the first member under redeclaration has always the same type, or
require that the declaration order of members has to be the same for all declarations in the same TU.

The latter is the preferred solution by AFNOR because it is also much easier to implement. Which ever requirement is chosen, it should be made a constraint, and not only made UB. This could e.g be done in 6.7.2.3 p1

Where two declarations that use the same tag declare the same type, they shall both use the same choice of struct, union, or enum. If two declarations of the same type have a member-declaration or enumerator-list, one shall not be nested within the other and both declarations shall fulfill all requirements of compatible types (6.2.7) with the additional requirement that corresponding members of structure or union types shall appear in the same order and shall have the same (and not merely compatible) types.

Then, the provided examples also need to be adapted

In Example 1 of 6.7.2.3 move the two lines with unions named bar to Example 2 and add a comment to the second line

    union bar { int x; float y; };
    union bar { float y; int x; };  // members are ordered differently

There is a missing case for the comparison of pointers with type `nullptr_t`

When introducing nullptr and nullptr_t one case was overlooked for equality comparison in 6.5.9 p2 (constraints) last item:

– one operand is a pointer and the other is a null pointer constant or has type nullptr_t.

Similarly, the corresponding prose in p6 (sematics) should be adapted

Otherwise, at least one operand is a pointer. If one operand is a pointer and the other is a null pointer constant or has type nullptr_t, they compare equal if the former is a null pointer~~, the null pointer constant is converted to the type of the pointer~~. If one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void, the former is converted to the type of the latter.

Misspecification of the time conversion functions 7.29.3

The whole paragraph p2 and in particular its first sentence makes no sense in this version because there are no functions asctime_r or ctime_r that are defined in the document.

Modify as follows:

Functions ~~asctime, ctime,~~ gmtime, and localtime are the same as their counterparts suffixed with _r. In place of the parameter buf, they use a pointer to~~an object and return it:~~ one or two broken-down time structures ~~(for gmtime and localtime) or~~. Similarly, an array of char (is commonly used by asctime and ctime). Execution of any of the functions that return a pointer to one of these objects may overwrite the information returned from any previous call to one of these functions that uses the same object. These functions are not reentrant and are not required to avoid data races with each other. Accessing the returned pointer after the thread that called the function that returned it has exited results in undefined behavior. The implementation shall behave as if no other library functions call these functions.

Allow the bitprecise types `_BitInt(`N`)` to be implemented as macros

The mandatory addition of _BitInt(N) is quite a stretch for small C parsers. Introducing this as a simple identifier with “functional” argument would make this addition much more friendly for implementations that already have some form of operator overloading (including suffixes); using macros, _Generic and the new typeof feature would permit to implement the whole feature as a library.

We should allow implementations to use all the machinery they already have, without imposing changes to their parsers. This could easily be achieved by adding _BitInt to the list of keywords that may be expand to different forms when handled by # and ## during preprocessing. Modify the last sentence of 6.4.1 (Keywords) p3 that talks about such exceptions:

The spelling of these keywords, their alternate forms, and of _BitInt, false and true inside expressions that are subject to the # and ## preprocessing operators is unspecified.⁷⁵⁾

Diverging policies for version numbering

The proposed document follows two different strategies concerning version numbers

The new __STDC_VERSION_…H__ macros for headers all expand to 202311L, regardless on the effective intermediate document by which they were introduced. This value is the same as __STDC_VERSION__.
The expansion of the new feature test __has_c_attribute is different for different attributes.

Because attributes changed at some points, this distinction might have been interesting for implementations and early adopters during the development phase of this revision. Now as the dust settles, it is of minor importance and should disappear for the benefit of simplicity and our general users.

AFNOR prefers that any calls __has_c_attribute(ID) for a standard attribute ID return the same value as __STDC_VERSION__, namely 202311L. We’d also like to suggest that for the work on future revisions this policy is maintained. It should be easier to have to maintain and to follow a single version number that reflects a possible working draft than to know these numbers for all features that might change during the elaboration of a revision.

Colors for code snippets

ISO/IEC policy has changed such that now they allow highlighting colors for code snippets. Please consider to use that feature not only in working drafts but also for final documents.

In the working draft, colors of identifiers distinguish the status of the underlying definition; keywords are black, macros are red, types are blue etc. Knowing this color code improves readability of the standard and this should not be kept from our end users.

Missing mention of `timegm`

This function is added to the <time.h> but a reference to that change is missing in Annex M.

A reference to the corresponding paper N2833 is also missing in the abstract but this would be removed anyhow for the published document.

`PRI` and `SCN` macros are missing for new format specifiers

The new format specifiers %b (and optionally %B) for printf and scanf should be as useful as the existing ones. For that they should also have equivalent macro definitions in <inttypes.h> as do the other format specifiers. Therefore lines

PRIbN PRIbLEASTN PRIbFASTN PRIbMAX PRIbPTR

and

SCNbN SCNbLEASTN SCNbFASTN SCNbMAX SCNbPTR

should be added to 7.8.1 p3 and p5, respectively.

Make `printf` format specifier `%B` optional.

The specifier %B is only recommended and not required because it might be in conflict with existing extensions on some implementations. It would be good to change this recommendation into a proper option such that semantics of this specifier can be tested and then relied upon.

Make `%B` optional.

Modify in 7.23.6.1 and 7.31.2.1.

Describe the feature:

#

The result is converted to an “alternative form”. For o conversion, it increases the precision, if and only if necessary, to force the first digit of the result to be a zero (if the value and precision are both 0, a single 0 is printed). For b conversion, a nonzero result has 0b prefixed to it. For the optional B conversion as described below, a nonzero result has 0B prefixed to it. For x (or X) conversion, a nonzero result has 0x (or 0X) prefixed to it. …

Add it to the possible specifiers:

b, B, o, u, x, X

The unsigned int argument is converted to unsigned binary (b or B), unsigned octal (o), unsigned decimal (u), or unsigned hexadecimal notation (x or X) in the style dddd; the letters abcdef are used for x conversion and the letters ABCDEF for X conversion. The precision specifies the minimum number of digits to appear; if the value being converted can be represented in fewer digits, it is expanded with leading zeros. The default precision is 1. The result of converting a zero value with a precision of zero is no wide characters. The specifier B is optional and provides the same functionality as b, except for the # flag as specified above. The macro PRIBPTR from <inttype.h> shall only be defined if the implementation follows the specification as given here.

Change the text in the “recommended practice”

14 AnThe uppercase B format specifier is ~~not covered~~made optional by the description above, because it used to be available for extensions in previous versions of this standard.

~~{linebreak}~~
Implementations that did not use an uppercase B as their own extension before are encouraged to implement it ~~similar to conversion specifier b as standardized above, with the alternative form (#B) generating 0B as prefix for nonzero values~~ as an option as described above.

Add optional macros `PRIB`N and similar

Add to 7.8.1 after p3

3’ The following printf macros for unsigned integer types are optional:

PRIBN PRIBLEASTN PRIBFASTN PRIBMAX PRIBPTR

They shall be defined if the implementation supports the B specifier as indicated in 7.23.6.1 and 7.31.2.1; otherwise they shall not be defined.

Add this feature to future library directions

Add to 7.33.6 (<inttypes.h>) p1:

Macros that begin with either PRI or SCN, and either a lowercase letter, B, or X are potentially reserved identifiers and may be added to the macros defined in the <inttypes.h> header.

Add to 7.33.14 p1 (<stdio.h>) and analogously to 7.33.20 (<wchar.h>) p2:

Lowercase letters may be added to the conversion specifiers and length modifiers in fprintf and fscanf. Other characters may be used in extensions. The specifier B for printf may become mandatory in future versions of this document.

There are semantic changes and inconsistencies for `strtol`, `scanf` and similar functions

Binary constants are now accepted

Integer conversion functions strtol and similar now accept the new binary integer constants if they encounter the prefix 0b or 0B. This can happen if the number base is explicitly provided as 2 or implicitly by providing a 0. This is a semantic change: for example the following code

unsigned long res = strtoul("0b1", 0, 2);

has res ≡ 0 for C17 and res ≡ 1 for C23, because for the first the interpretation stops before the b. This semantic change concerns strtol and similar functions for base 0 and 2. Because scanf always uses an explicit format specifier and C17 had no specifier for base 2, it is not affected by this change.

It is not clear how C libraries are supposed to handle this:

Should they provide one such function that when linked (possibly dynamically) to existing code changes semantics of an execution after an update of the C library?
Should they provide two functions, an old C17 one and a new C23 one? How should a call resolve:
- At compile time of the TU, e.g by providing a macro (depending on __STDC_VERSION__) ?
- At link time by providing weak aliases?

Stringification results in invalid constants for `strtol` or `scanf`

For C17 integer constants that are accepted as literals and those that are accepted by tools such as strtol or scanf are the same. Therefore code as the following works for any implementation where a valid C17 integer literal is supplied for ULLONG_MAX.

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

#define STRINGIFY_(X) #X
#define STRINGIFY(X) STRINGIFY_(X)

unsigned long long what(char const* s) {
    return strtoull(s, 0, 0);
}

char const elements[] = STRINGIFY(ULLONG_MAX);

int main(int argc, char* argv[argc+1]) {
    char const* p = (argc > 1) ? argv[1] : elements;
    if (what(p) < 127) {
        printf("unusual platform with %s max\n", p);
    }
}

Here the initializer of elements would typically be a string such as "0xffffffffffffffffLL" or "18446744073709551615", which then would be correctly recognized as number and skip the call to printf.

In C23, if the implementation chooses to change 18446744073709551615 to 18'446'744'073'709'551'615 (such as to improve readability) the call to strtoull would only see the leading 18 and run into the branch with the call to printf.

The corresponding semantic change effects integer and floating point literals and extends to all variants of strto* and scanf. Also, this problem equally concerns strings that are formed from literals that are part of the implementation or any other third party code base that is maintained independently. The potential semantic change of user code makes the new extended syntax for number literals with digit separators non-portable and even security critical.

Assessment of the induced difficulties for applications

It is difficult to tell the impact that these changes will have in the field. Problematic input sequences composed with the pairs 0b or 0B that would lead to changes in behavior are difficult to detect or predict, in particular for programs that treat large inputs from non-sanitized sources dynamically. Because of this, such programs may malfunction for a long time silently before problems would be detected.

The second problem would probably have the consequence that the new digit separator will be banned by coding styles and secure coding guidelines, which undermines the usefulness of this new feature.

Possible adjustments

Revert
- Remove the 0b and 0B prefixes for integer literals from C23.
- Remove the ' digit separator from C23.
Embrace
- Leave the specifications of all the strto* and scanf functions as they are in C17 and mark them as [[deprecated]] (but not necessary as obsolescent).
- Introduce new functions stdc_str*, stdc_scanf and similar that additionally to their C17 counterparts allow the 0b and 0B prefixes and that deal with the ' digit separator.
For example something along the lines of

  #if __STDC_VERSION__ >= 202311L
  [[deprecated("strings with a '0b' or '0B' prefix may have changed meaning in C23, consider using stdc_strol")]]
  long int strtol(const char*restrict, char **restrict, int);
  long int stdc_strtol(const char *restrict nptr, char **restrict endptr, int base);
  #else
  long int strtol(const char*restrict, char **restrict, int);
  #endif

`va_start` becomes too permissive

This macro previously had the name of the last named argument as second parameter. This is changed to ... with the intent that this parameter can now be omitted. Unfortunately this allows to have any number of arguments of any kind and with the proposed text implementations would not even be allowed to diagnose such code.

We think that the proposed specification goes too far in that it not only guarantees that these arguments are not evaluated (which should be maintained) but also that it should not be expanded. We propose to change as follows.

Only the first argument passed to va_start is evaluated. Any additional arguments are not used ~~by the macro~~other than for possible diagnostics and will not be ~~expanded or~~ evaluated for any reason.

Alternatively the last sentence could just be suppressed.

Only the first argument passed to va_start is evaluated. ~~Any additional arguments are not used by the macro for other purpose that possible diagnostics and will not be expanded or evaluated for any reason.~~

There is no (and will not be) a related “Rationale” document

There are mis-classifications of identifiers in the index and in Annex J

Missing entry in Annex B for <uchar.h>

The magic constants for __has_embed should have symbolic names

It seems that #embed offers multiple ways to express the same feature

The status of optional identifiers could be clarified

Optional identifiers that the implementation does not implement

There is a new ambiguity for the initialization of unions if they are redeclared.

There is a missing case for the comparison of pointers with type nullptr_t