Date: 2022-06-17
Thomas Köppe <tkoeppe@google.com>ISO/IEC JTC1 SC22 WG14 N2994
To: WG14, WG21 liaison (SG22)
__VA_OPT__
from P00306R0/N2034 to be clear about
balanced parentheses and resulting tokens; added alternative “#define F(X...)
”.__VA_OPT__
to make
replacement better defined. Not addressed to WG14; a future revision will addressed to WGs 14 and 21.Note: The essential content of this paper has already been applied to C++ (as part of C++20). After an initial presentation of this paper to WG14, the earlier version N2034 has been added to SD3 as a potential future extension for C. The wording of this paper is taken from the final wording in C++ (comprising P0306R4 and P1042R1), adapted for the C standard.
This is a proposal to make variadic macros easier to use with no arguments by adding a new
special functional macro __VA_OPT__
.
Function-style macros that can have variable arguments suffer from a number of ill-specified corner cases. Consider the following macro definitions:
Invocations of these macros are surprising:
Invocation | Effect | Notes |
---|---|---|
F(a, b, c) | f(10, a, b, c) | variable arguments are “b, c ” |
F(a, ) | f(10, a, ) | variable arguments contain zero tokens syntax error |
F(a) | ill-formed | violates 16.3p12 (no variable arguments) |
G(a) | f(10, a) | |
H(a, b, c) | f(10, a, b, c) | variable arguments are “a, b, c ” |
H(a) | f(10, a) | variable arguments are “a ” |
H() | f(10, ) | variable arguments are “” syntax error |
There are two problems:
...)
, the invocation must contain at least
as many commas as the macro has mandatory parameters. This makes the invocation F(a)
invalid.However, it is quite natural for a macro invocation with variable arguments
to degenerate to the case where there are no arguments. In the example, we would
like F(a)
to be replaced with f(10, a)
. A more realistic
example is a custom diagnostic facility such as the following:
The complication arises when we consider H()
. We may perhaps wish it to
be replaced with f(10)
. However, we may also wish to have a macro such as
which always produces a comma, even when invoked with no arguments. The difference is
that we consider H()
to have zero arguments, whereas we consider
ADD_COMMA()
to have one, empty argument.
We would like to make the preprocessor more expressive to allow users to write macros for all of the situations described above. This requires two distinct changes, one simple and the other complex.
Goal 1. Allow the omission of the comma before the variable arguments
in the invocation (i.e. allow F(a)
rather than requiring F(a, )
).
Goal 2. Provide a mechanism to express a replacement text that contains
the variable arguments but which contains a separating comma only if the variable
arguments are non-empty (i.e. allow both f(10, a)
and f(10, a,
b)
as possible replacements of F
). At the same time, continue to
provide a mechanism that unconditionally contains comma before the (possibly empty)
variable arguments, like ADD_COMMA
above.
This behaviour of Goal 1 is already supported by many popular compilers as a
non-conforming extension. It is a non-breaking change, since the current syntax
F(a)
is ill-formed. Goal 2 is much harder to solve, since there is
no single simple enhancement of the existing semantics that satisfies all possible
use cases.
We will step through a series of possible solutions (inspired by existing vendor extensions) and analyse their shortcomings, before presenting the proposed solution.
This approach does not add any new syntax. It merely solves Goal 1 above by allowing
a variadic macro invocation to not contain any variable arguments. However, under this
approach, the absence of variable arguments is taken as a request to delete an
existing comma immediately preceding the __VA_ARGS__
token:
This is a minimal, unsurprising extension. However, it suffers from the major draw-back that it offers no mechanism to delete a trailing comma from a variadic macro with zero mandatory parameters.
A variant of this extension is currently provided by MSVC++ and Embarcadero compilers,
which always delete the comma, even in the case of zero mandatory arguments. Another
possible extension is to provide those semantics under a new name (e.g.
__VA_ARGS_FOO__
).
This approach also allows the omission of the variable arguments,
and in addition it reuses the concatenation operator ##
to control comma deletion explicitly:
This extension is somewhat difficult to explain, but it generally Does What You Want. The
complete omission of variable arguments is required for comma deletion (compare F2(a,
)
and F2(a)
), though omission of the variable arguments alone is not
enough to delete the comma (compare F1(a, )
and F1(a)
), but the
case of zero mandatory parameters is special, and in that case it is mere absence of tokens
from the variable arguments that enables the comma deletion when the ##
operator
is used.
The downside of this extension is three-fold: 1) Parsing this syntax requires look-ahead, adding cost to the translation. 2) The extension reuses an unrelated piece of syntax, muddling the language. 3) The extension hides its dependency on the presence or absence of the variable arguments and whether the variable arguments contain tokens in subtle and non-explicit ways.
A rather more different approach abandons the use of C99’s __VA_ARGS__
token in favour of something like #define F(X, Args...)
or #define F(X,
...Args)
. GCC has long provided the former (where the replacement text would use
Args
for the variable arguments, and , ##Args
(with mandatory
whitespace after the comma!) requests comma deletion). The template-pack-like syntax
...Args
does not appear to be used by any preprocessor and may provide a
less obstructed extension route (e.g. one could say that x, y, ...Args
always has comma deletion semantics).
However, all these approaches seem undesirable. First off, they are a departure, and
perhaps even a regression, from the direction taken by C99 and its __VA_ARGS__
token. Second, this design would only satisfy those needs that require comma deletion,
leaving use cases like the above ADD_COMMA
to use the existing syntax. Thus
there would be two parallel but dissimilar constructions living side by side, which seems
inelegant and wasteful.
Note: This idea came up during the discussion of N2034 in WG14.
We could use syntax like #define F(X ...) f(10, __VA_ARGS__)
to request
comma deletion (note the absence of a comma before the ellipsis in the definition).
This approach does not degenerate to the case of macros with no named parameters,
though. Moreover, WG14 felt that this was too clever and too subtle, whereas the
proposed solution below is highly visible and explicit. Also, an unrelated difference
between C++ and C is that C++ allows omitting the final comma from the parameter list
of a variable function declaration. While this has nothing to do with the preprocessor,
the semantics of the optional comma of that feature are the opposite of this present
consideration, which is unnecessarily confusing.
All of the considered extensions so far have in common that they end up creating a parallel set of constructions which are identical to the existing macro facilities except when the macro is invoked with no variable arguments, and they all provide some automatic mechanism to determine when to delete a comma. However, none of them are quite explicit about what they are doing.
For the next idea, we consider adding a new token. Let us call it __VA_ARGS_OPT__
,
with the semantics that wherever it appears in the replacement text, it is replaced
with the variable arguments (just like __VA_ARGS__
), but additionally,
whenever the variable arguments do contain tokens, a comma is prepended:
In this approach, we have separated Goals 1 and 2 entirely; whether a leading (!) comma is inserted now only depends on whether the variable arguments contain tokens, not on whether they are present at all.
We already said that solutions 1, 2 and 3 are ultimately inelegant, since they create a redundant structure that replicates existing facilities and only differs in subtle details. Solution 4 (a new token) feels cleaner and more orthogonal. In the words of Richard Smith:
“I remain unconvinced that implicitly adding or removing a comma is a good idea. We need the user to tell us which behavior they want.”
We can do a little better than solution 4. Our proposal is to add a new, special kind
of functional macro __VA_OPT__
. This macro may only be used in
the replacement text of a variadic macro:
The semantics are as follows: If the variable arguments contain no tokens, then
__VA_OPT__(content)
is replaced by no tokens (more precisely,
by a placemarker). Otherwise, it is replaced by content, which can contain
any admissible replacement text, including __VA_ARGS__
.
The canonical use case of __VA_OPT__
is for an optional separator:
However, this mechanism allows other constructions, too:
The proposal is a pure extension of the preprocessor. Syntax that was previously not allowed becomes admissible under the proposed changes.
The proposed extension to allow omission of the variable arguments has been implemented
by many compilers. Faisal Vali implemented a version of the proposed __VA_OPT__
extension in Clang.
Change paragraph 6.10.3p4 as follows.
If the identifier-list in the macro definition does not end with an ellipsis,
the number of arguments (including those arguments consisting of no preprocessing
tokens) in an invocation of a function-like macro shall equal the number of parameters
in the macro definition. Otherwise, there shall be moreat least as
many arguments in the invocation thanas there are parameters
in the macro definition (excluding the ...
). There shall exist a
)
preprocessing token that terminates the invocation.
Change paragraph 6.10.3p5 as follows.
The identifieridentifiers __VA_ARGS__
and __VA_OPT__
shall occur only in the replacement-list
of a function-like macro that uses the ellipsis notation in the parameters.
Change paragraph 6.10.3p12 as follows.
If there is a ...
in the identifier list in the
macro definition, then the trailing arguments (if any),
including any separating comma preprocessing tokens, are merged to form a single item:
the variable arguments. The number of arguments so combined is such that, following
merger, the number of arguments is one more than the number of parameters in the macro definition
(excluding the ...
), except that if there are as many arguments as named parameters,
the macro invocation behaves as if a comma token had been appended to the argument list
such that variable arguments are formed that contain no pp-tokens.
Insert a new piece of syntax at the beginning of 6.10.3.1.
Syntax
va-opt-replacement:
__VA_OPT__ (
pp-tokensopt )
Subsection 6.10.3.1 currently lacks “Constraints” and “Semantics” partitions and is therefore somewhat unclear about the distinction. We propose to add this structure in passing, as well as a description. First, insert a new “Description” part, and a new paragraph, immediately after the above new syntax.
Description
Argument substitution is a process during macro expansion in which identifiers
corresponding to the parameters of the macro definition and the special
constructs __VA_ARGS__
and __VA_OPT__
are replaced with
token sequences from the arguments of the macro invocation and possibly of the
argument of the feature __VA_OPT__
. The latter process allows to control
a substitute token sequence that is only expanded if the argument list that
corresponds to a trailing ...
of the parameter list is present and has a
non-empty substitution.
Next, insert a new “Constraints” part, and two new paragraphs. Please see below for an optional edit suggestion.
Constraints
The identifier __VA_OPT__
shall always occur as part of the
preprocessing token sequence va-opt-replacement; its closing )
is determined by skipping intervening pairs of matching left and right parentheses
in its pp-tokens. The pp-tokens of a va-opt-replacement
shall not contain __VA_OPT__
.
The pp-tokens shall form a valid replacement list for the current function-like macro.
A va-opt-replacement is treated as if it were a parameter.
Next, add a new part “Semantics” above the existing wording, and modify as follows. Change paragraph 6.10.3.1p1 (now p4) as follows.
Semantics
After the arguments for the invocation of a function-like macro have been identified,
argument substitution takes place.
AFor each parameter in the replacement
list, unless that is neither preceded by a #
or ##
preprocessing token
ornor followed by a ##
preprocessing token (see below),
is replaced by
the preprocessing tokens
naming the parameter are replaced by a token sequence determined as follows:
Drafting note: The above clarifies that __VA_OPT__
expansions are not rescanned
prior to rescanning of the replacement list containing the instance of __VA_OPT__
.
Append an example after 6.10.3.1p1 (now p4).
Example:
Change paragraph 6.10.3.1p2 (now p5) as follows.
An identifier __VA_ARGS__
that occurs in the replacement list
shall beis treated as if it were a parameter,
and the variable arguments shall form the preprocessing
tokens used to replace it.
Optional suggestion: Now that we have distinct sections for syntax and semantics, we
could split this paragraph, and turn the first part into a new paragraph at the end of
the “Constraints” section, “An identifier __VA_ARGS__
that occurs in the replacement list is treated as if it were a parameter.”, and
change the remaining p4 into ‘The identifier __VA_ARGS__
is replaced
by the tokens of the variable arguments.
Then append new paragraphs to subsection 6.10.3.1 as follows.
The preprocessing token sequence for the corresponding
argument of a va-opt-replacement is defined as follows.
If a (hypothetical) substitution of __VA_ARGS__
as neither an operand of #
nor ##
consists of no preprocessing tokens,
the argument consists of a single placemarker preprocessing token (6.10.3.3, 6.10.3.4).
Otherwise, the argument consists of the results of the expansion of the contained
pp-tokens as the replacement list of the current function-like macro
before removal of placemarker tokens, rescanning, and further replacement.
Note: The placemarker tokens are removed before stringization (6.10.3.2), and can be removed by rescanning and further replacement (6.10.3.4).
Example:
Change paragraph 6.10.3.2p2 as follows.
If, in the replacement list, a parameter is immediately preceded by a #
preprocessing token, both are replaced by a single character string literal
preprocessing token that contains the spelling of the preprocessing token sequence
for the corresponding argument (excluding placemarker tokens).
Let the stringizing argument be the preprocessing token sequence
for the corresponding argument with placemarker tokens removed.
Each occurrence of white space between the stringizing argument’s
preprocessing tokens becomes a single space character in the character string literal.
White space before the first preprocessing token and after the last preprocessing token
comprising the stringizing argument is deleted.
Otherwise, the original spelling of each preprocessing token in the
stringizing argument is retained in the character string literal,
except for special handling for producing the spelling of string literals and
character literals: a \
character is inserted before each "
and \
character of a character literal or string literal
(including the delimiting "
characters), except that it is implementation-defined
whether a \
character is inserted before the \
character
beginning a universal character name. If the replacement that results is not a valid character string literal,
the behavior is undefined. The character string literal corresponding to an empty
stringizing argument is ""
. […]
The entire proposal (rationale, implementation experience and wording) applies almost verbatim to the C++ language as well. Indeed, this paper originated from a C++ proposal, which was eventually adopted (as WG21 P0306R4 and amended by WG21 P1042R1) (following feedback from WG14) and subsequent editorial improvements.
An earlier version of this proposal (N2034) was presented to WG14 at the 2016 London meeting and received favourably, resulting in an entry in SD3 to solve the same problem in a future revision of the C language. We would like to ask the WG14 liaison to present this updated revision to WG14 for inclusion in the next revision of C.
Many thanks to Dawn Perchik, David Krauss, Hubert S. Tong and Richard Smith for valuable discussion, guidance, suggestions, examples and review, to Aaron Ballman and Jens Gustedt for significant help in adapting the wording for WG14, and to Faisal Vali for implementing the feature and clarifying several important details! Thanks also go to the members of WG14 for their hospitality and a very productive discussion.