1. Revision History
1.1. Revision 2 - July 6th, 2025
-
Add wording for making sure default argument promotion of arguments to not only conversion specifiers, but precision modifiers, are appropriately shuttled through.
-
What happens after an explicit conversion to the internal value is not specified for the conversion specifier, and so it’s not precision’s problem either (unspoken UB).
-
-
Fix bad plural "An asterisks" in wording.
-
Additional typo fixes ("wether" -> "whether" in prose).
1.2. Revision 1 - June 15th, 2025
-
Change from using
tou * to avoid ambiguities in parsing with^ as the modifier for unsigned integer precision modifiers for length. Previous discussion around. u * is moved to § 3.3 Why .u*s and .zu*s Failsu * -
Alternative syntaxes § 3.4 Syntax Alternatives are discussed in the proposal. If any of them seem appealing, one of them can be voted on.
1.3. Revision 0 - May 27th, 2025
-
Initial release. ✨
2. Introduction & Motivation
It is impossible to use anything other than an for the precision (size) of a string specifier, whether it’s used with or . Normally, this should not be a problem because and many other and other I/O functions in C only ever return . The problem is, most:
-
containers
-
strings
-
size calculations
-
stream offsets
-
large buffer indices
-
andcountof (...) operationssizeof (...)
and so much more are not -typed. This results in a lot of excessive (and, in some ways, dangerous) casting for working with the I/O output functions. The simple, easy-integration fix is to simply allow precision with to include a size modifier, such that while is a string sized by an , represents a string sized by (the signed version of) a .
It is also important for strings that are not null terminated, such as substring functionality and parsing/searching. Needing to make sure things are null terminated is a huge burden, and while the precision modifier helps, the constant casting hides potential overflow errors from high quality of implementation libraries and makes its use dubious.
This proposal is to allow the typical integer length modifiers (, , , , , , , and ) to be applied to the precision modifier when the precision modifier uses an asterisk (i.e., ). This proposal also adds a new precision argument modifier to replace , for indicating an unsigned type for a precision modifier. We choose this character due to not being burdened by existing implementation extensions and being able to scale to other locations (e.g., as the field width).
3. Design
Given the following grammar (using the notation from POSIX, where things enclosed in are optional):
-
% [ argument$ ] [ flags ] [ width ] [ . precision ] [ length modifier ] conversion - specifier
( is a POSIX extension), then the logical place in the grammar to place the that applies specifically to the argument is:
-
% [ argument$ ] [ flags ] [ width ] [ . [ length modifier ] precision ] [ length modifier ] conversion - specifier
This is the easiest place for this to be where it won’t be ambiguous. In particular, placing it in other locations could have it confused for a , and putting it up ahead of the / but having it apply to the itself means that we would preclude having such a modifier on itself. (This paper does not propose this for , just for asterisk-based and the newly-proposed precisions).
Therefore, this design slots it into the one place it can have no negative impact and would be unambiguous: after the , but before the of precision:
extern size_t big_honkin_number ; int main () { char * str = malloc ( big_honkin_number ); // ... int result = printf ( "%.z^s" , big_honkin_number , str ); // no cast needed // ... free ( str ); return 0 ; }
3.1. "But fprintf and friends only return int , isn’t this a problem?"
Thankfully, this is actually less of a problem than was previously surmised. In fact, this proposal actively makes it less of a problem than the cast-based solution. Consider the existence of a file that can be written to and this program:
#include <stdio.h>#include <stdlib.h>#include <limits.h>#include <assert.h>int main () { enum { COUNT = 10 , BYTESIZE = INT_MAX / COUNT }; char * str = ( char * ) malloc ( BYTESIZE + 1 ); for ( size_t i = 0 ; i < BYTESIZE ; ++ i ) { str [ i ] = 'a' ; } str [ BYTESIZE ] = '\0' ; FILE * f = fopen ( "/dev/null" , "w+" ); [[ maybe_unused ]] int write_value = fprintf ( f , "%.*s" , BYTESIZE , str ); [[ maybe_unused ]] int large_write_value = fprintf ( f , "%.s %.s %.s %.s %.s %.s %.s %.s %.s %.s %*.s" , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str ); free ( str ); assert ( write_value == BYTESIZE ); // Well. assert ( large_write_value < 0 ); // ... Okay. return 0 ; }
For both and , the individual sizes of the strings are not what is ultimately the problem here. In fact, each of these is an -typed value (as per the rules for constants and their values in both old and new C) are fully within the bounds. But, effectively creates a situation where, over the course of the 11 strings written, the last write is large enough that it triggers overflow.
While there is no hard requirement in any standard that mandates rigorous checking, most implementations do check if the write will eventually overflow the and either return with an appropriate value or some other negative value. There is no constraint or recommended practice to check for overflow, but glibc, musl-libc, and many more can and do check for this case and report it. We see here that even with purely -typed writes, we get the same error to happen on these platforms: all of them return a negative integer value.
What this means, ultimately, is that it is not the type of the length that matters more, but the actual value!
This proposal cannot change the return value’s type for or or any of its family of functions (as that is an ABI break), but allowing a type for the length modifier is actually an improvement to security. Since most implementations are doing value/overflow checking here, being able to pass in a (too-large) directly and letting the overflow checks inherit in most implementations catch it and return a negative number. For example, observe the following (too large) string being written, but written in the "typical" way that string sizes get passed to formatted I/O functions like :
#include <stdio.h>#include <stdlib.h>#include <limits.h>#include <assert.h>int main () { const size_t BYTESIZE = (( size_t ) INTMAX ) + 1 ; char * str = ( char * ) malloc ( BYTESIZE + 1 ); for ( size_t i = 0 ; i < BYTESIZE ; ++ i ) { str [ i ] = 'a' ; } str [ BYTESIZE ] = '\0' ; FILE * f = fopen ( "/dev/null" , "w+" ); [[ maybe_unused ]] int write_value = fprintf ( f , "%.*s" , ( int ) BYTESIZE , str ); free ( str ); assert ( write_value < 0 ); // might not trigger, actually! return 0 ; }
This is an error. But, we will never see it as an error anymore: the explicit cast inserted into the code for the express purpose of matching the type means that the error is now hidden from us. Compilers cannot warn on it (except using tracing analysis which flags as being truncated by the cast) without generating excessive false positives, as casting is seen as the way to get around this problem and intentional on the part of the user. Thus, by allowing the value directly. We can avoid hard-to-detect truncation errors that happen from potential code. Rather than (erroneously) casting and truncating the value of a into an type or similar, it will instead be actually checked by , , and similar.
This is a notably improvement because is seen as an explicit choice on the part of the developer, made to silence warnings or other diagnostics. Casting is too big of a hammer and too large of a club for this feature set; supplying the size without truncation directly to the function allows for existing quality of implementation to catch this error:
#include <stdio.h>#include <stdlib.h>#include <limits.h>#include <assert.h>int main () { const size_t BYTESIZE = (( size_t ) INTMAX ) + 1 ; char * str = ( char * ) malloc ( BYTESIZE + 1 ); for ( size_t i = 0 ; i < BYTESIZE ; ++ i ) { str [ i ] = 'a' ; } str [ BYTESIZE ] = '\0' ; FILE * f = fopen ( "/dev/null" , "w+" ); [[ maybe_unused ]] int write_value = fprintf ( f , "%.z^s" , BYTESIZE , str ); free ( str ); assert ( write_value < 0 ); // triggers on high quality-of-implementation again!! return 0 ; }
The paper aims to allow high quality library implementations to catch this class of errors and let them error on it, while making things less cumbersome for the user. This forms the basis of this proposal.
3.2. Other Positions?
There were a couple of other choices for this insofar as where to put the "length modifier" type. Unfortunately, for all of these:
-
"%z.^s" -
"%.^zs"
There can be minor conflicts in the grammar or ambiguity of application. For (1), it’s unclear whether that is meant to apply to a potential argument or the desired argument (which determines whether it should be a formatting error or not). This could block future improvements or modifications to the syntax that would allow for different types for the argument. It is not being proposed in this paper, however; this paper is concerned mostly with enabling the use case of typical string and substring data.
For (2), the problem is that it’s unclear when parsing certain things, such as , whether it’s a modifier on the size for the or it’s the traditional, current meaning as a precision modifier of type for a type (e.g., -specified padding on a argument.) Given the grammar, having it appear before the is both the most grammatically safe and implementable choice (without disambiguation and backwards-compatibility break rules). It also appears before what it modifies -- the -- which allows a future where some other position can be chosen to modify a potential modifier or other extensions.
3.3. Why . u * s and . zu * s Fails
This proposal previously used to specify a string with a length, or for a string with an length. The problem is that, while mnemonically was really good for "unsigned numeric precision argument", has an immediately problem: the . can be parsed as BOTH:
-
formatted integer with"%.u" text;* s -
or,
formatted"%.u*s" string withconst char * length.unsigned int
This is a problem due to not being a modifier but a conversion specifier in C. Design-wise, if simply modified or to make it go from being a "signed" value to an "unsigned" value, this could never happen. But C did not go down that path: instead, anything is treated as a wholly separate conversion specifier and thus as a terminating sequence for a simple grammar. So, this proposal chooses because it is less characters than and more aesthetically pleasing than . No known extensions seem to be using as a character either, so this clears any ambiguity while leaving the character in a deliberate location that, for implementations which are thorough in their checking, can produce formatting violations for it.
3.4. Syntax Alternatives
The alternatives here are (potentially) viable choices that can be chosen instead. Briefly, the pros and cons of such characters are discussed. If they are popular, we plan to poll WG14 about them.
-
;"%.z**s" -
;"%.z^s" -
or,
."%.z+s"
3.4.1. ^
The caret is what this proposal uses. It’s a single character so it doesn’t require any changes to a potential lookahead buffer for handling this sort of thing like might do by itself. It’s also aesthetically pleasing, at least in the author’s opinion (or, rather, it’s the last ugly). It can be used in the field width position as well without creating ambiguities, which leaves this as something that can be used in the future if more creative uses of the field width modifiers is considered.
The negative is that its the new character. While there’s currently no reported collision with existing practice and extensions, new characters means there’s less room to maneuver in the future.
3.4.2. **
The double asterisk is a potential choice. It does not use a new character but instead doubles up the current one in the current position. It can be used as a field width as well. It’s novel and does not provide for ambiguities. As an already-used character, it does not provide any chance to clash with existing practice or extensions.
The negative is that its a double-character and (in the author’s opinion) a bit ugly to look at.
3.4.3. +
The plus is a potential choice. It’s a single character so it doesn’t require any changes to a potential lookahead buffer for handling this sort of thing like might do by itself. It’s aesthetically pleasing and somewhat resembles the idea of an "unsigned" (positive) value. As an already-used character, it does not provide any chance to clash with existing practice or extensions.
The negative is that it is very questionable whether or not this can be a field width, as is already a valid introductory sequence and is ambiguous on whether that is the old for adding a sign to an integer or a new for allowing an integer of a different (unsigned) type to be the field width. This means that the concept of a "length modifier" for the precision cannot be extended to the field width in the future without contorting the concept or once again creating new rules for that specific situation.
4. Implementation Concerns
A few developers at a large C standard library vendor stated that they could not implement this for fear of backwards compatibility breaks for their existing users. This goes against what is the understanding of the engineering from the authors of this paper. A standard way to solve the problem with be under Transparent Aliases, which has not yet been standard. But, as the Transparent Aliases paper states, there are several dozen ways to solve this problem in a way where old, already-compiled code is backwards-compatible, including a way that works across almost all target platforms (and with a macro absolutely does work across all target platforms).
Assembly labels -- labels -- provide a way to add new runtime and compile-time changes to existing functions both inside and outside the standard while maintaining traditional behavior. Here is a short, simple, compilable example for having 2 different functions with differing behaviors: one which caters to an older already-compiled programs under the exact binary name that old binaries expect; and, one which uses the binary name . A macro is used to determine which one the C source code identifier connects to.
typedef struct FILE FILE ; #if _USE_OLD_FPRINTF extern __attribute__ (( visibility ( "visible" ))) int fprintf ( FILE * restrict __stream , const char * restrict __format , ...) __asm ( "_fprintf" ); #else extern __attribute__ (( visibility ( "visible" ))) int fprintf ( FILE * restrict __stream , const char * restrict __format , ...) __asm ( "_fprintf_v2" ); #endif
#include <fprintf.h>extern __attribute__ (( visibility ( "visible" ))) int fprintf ( FILE * restrict __stream , const char * restrict __format , ...) { /* COMPATIBILITY implementation here... */ return 0 ; }
#include <fprintf.h>extern __attribute__ (( visibility ( "visible" ))) int fprintf_v2 ( FILE * restrict __stream , const char * restrict __format , ...) { /* BREAKING implementation here... */ return 0 ; }
All of these are compiled into the same shared object / dynamic link library, and it solves the problem. Therefore, we do not consider this an implementation impediment, especially since the only platforms which will have this problem are larger platforms that attempt to maintain backwards compatibility and can afford the extra function definition for impactful changes.
5. Wording
The following wording is against the latest draft of the C standard.
5.1. Modify §7.23.6.2 "The fprintf function"
7.24.6.2Thefunctionfprintf Synopsis#include <stdio.h>int fprintf ( FILE * restrict stream , const char * restrict format , ...); Description......
Each conversion specification is introduced by the character %. After the %, the following appear in sequence:
...
An optional precision that gives the minimum number of digits to appear for the
,b ,B ,d ,i ,o ,u , andx conversions, the number of digits to appear after the decimal-point character forX ,a ,A ,e ,E , andf conversions, the maximum number of significant digits for theF andg conversions, or the maximum number of bytes to be written forG conversions. The precision takes the form of a period (s ) optionally followed. either by an asterisk * (described later) or by an optional nonnegative decimal integer;by one of:
- an optional length modifier followed by an asterisk
(described later);* - an optional length modifier followed by a caret
(described later);^ - or, a nonnegative decimal integer.
If only the period is specified, the precision is taken as zero. If a precision appears with any other conversion specifier, the behavior is undefined.
...
As noted previously, a field width
, or precision, or both,may be indicated with an asterisk. A precision may be indicated with an asterisk or a caret.In this caseAn asterisk means anargument supplies the field widthint or precision. If the precision is an asterisk, anargument or an argument of signed integer type (indicated by an optional length modifier) supplies the precision. If the precision is a caret, anint argument or an argument of unsigned integer type (indicated by an optional length modifier) supplies the precision. The arguments specifying field width, or precision, or both, shall appear (in that order) before the argument (if any) to be converted. A negative field width argument is taken as aunsigned int flag followed by a positive field width. A negative precision argument is taken as if the precision were omitted.- ...
The length modifiers and their meanings are:
hh Specifies that a following ,b ,B ,d ,i ,o ,u , orx conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted toX orsigned char before printing); or that a followingunsigned char conversion specifier applies to a pointer to an argument. If it is followed by an asterisk, then it specifies that the corresponding argument is of typesigned char . If it is followed by a caret, it specifies that the corresponding argument is of typesigned char .unsigned char h Specifies that a following ,b ,B ,d ,i ,o ,u , orx conversion specifier applies to aX orshort int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted tounsigned short int orshort int before printing); or that a followingunsigned short int conversion specifier applies to a pointer to an argument. If it is followed by an asterisk then it specifies that the corresponding argument is of typeshort int . If it is followed by a caret, it specifies that the corresponding argument is of typeshort int .unsigned short int (ell)l Specifies that a following ,b ,B ,d ,i ,o ,u , orx conversion specifier applies to aX orlong int argument; that a following n conversion specifier applies to a pointer to aunsigned long int argument; that a followinglong int conversion specifier applies to ac argument; that a following s conversion specifier applies to a pointer to awint_t argument; or has no effect on a followingwchar_t ,a ,A ,e E , ,f ,F , org conversion specifier. If it is followed by an asterisk then it specifies that the corresponding argument is of typeG . If it is followed by a caret, it specifies that the corresponding argument is of typelong int .unsigned long int (ell-ell)ll Specifies that a following ,b ,B ,d ,i ,o ,u , orx conversion specifier applies to aX orlong long int argument; or that a followingunsigned long long int conversion specifier applies to a pointer to an argument. If it is followed by an asterisk then it specifies that the corresponding argument is of typelong long int . If it is followed by a caret, it specifies that the corresponding argument is of typelong long int .unsigned long long int j Specifies that a following ,b ,B ,d ,i ,o ,u , orx conversion specifier applies to anX orintmax_t argument; or that a following n conversion specifier applies to a pointer to anuintmax_t argument. If it is followed by an asterisk then it specifies that the corresponding argument is of typeintmax_t . If it is followed by a caret, it specifies that the corresponding argument is of typeintmax_t .uintmax_t z Specifies that a following ,b ,B ,d ,i ,o ,u , orx conversion specifier applies to a size_t or the corresponding signed integer type argument; or that a followingX conversion specifier applies to a pointer to a signed integer type corresponding ton argument. If it is followed by an asterisk then it specifies that the corresponding argument is of the corresponding signed type ofsize_t . If it is followed by a caret, then it specifies that the corresponding argument is of typesize_t .size_t t Specifies that a following ,b ,B ,d ,i ,o ,u , orx conversion specifier applies to aX or the corresponding unsigned integer type argument; or that a followingptrdiff_t conversion specifier applies to a pointer to an argument. If it is followed by an asterisk then it specifies that the corresponding argument is of typeptrdiff_t . If it is followed by a caret, then it specifies that the corresponding argument is of the corresponding unsigned type ofptrdiff_t .ptrdiff_t wN Specifies that a following ,b ,B ,d ,i ,o ,u , orx conversion specifier applies to an integer argument with a specific width whereX is a positive decimal integer with no leading zeros (the argument will have been promoted according to the integer promotions, but its value shall be converted to the unpromoted type); or that a followingN conversion specifier applies to a pointer to an integer type argument with a width ofn bits. If it is followed by an asterisk then it specifies that the corresponding argument is ofN -bit integer type. If it is followed by a caret, it specifies that the corresponding argument is ofN -bit unsigned integer type. All minimum-width integer types (7.22.2.3) and exact-width integer types (7.22.2.2) defined in the headerN shall be supported. Other supported values of N are implementation-defined.< stdint . h > wfN Specifies that a following ,b ,B ,d ,i ,o ,u , orx conversion specifier applies to a fastest minimum-width integer argument with a specific width whereX is a positive decimal integer with no leading zeros (the argument will have been promoted according to the integer promotions, but its value shall be converted to the unpromoted type); or that a followingN conversion specifier applies to a pointer to a fastest minimum-width integer type argument with a width ofn bits. If it is followed by an asterisk then it specifies that the corresponding argument is ofN -bit fastest minimum-width integer type. If it is followed by a caret, it specifies that the corresponding argument is ofN -bit fastest minimum-width unsigned integer type. All fastest minimum-width integer types (7.22.2.4) defined in the headerN shall be supported. Other supported values of N are implementation-defined.< stdint . h > If a length modifier appears with any conversion specifier other than as specified previously, the behavior is undefined.
...
If a conversion specification is invalid, the behavior is undefined.336)
shall behave as if it usesfprintf with a type argument naming the type resulting from applying the default argument promotions to the type corresponding to the conversion specification or precision (from an asterisk or a caret) and then converting the result of theva_arg expansion to the type corresponding to the conversion specification.va_arg