2024-08-05
discussion for possible integration into IS ISO/IEC 9899:202y
document number | date | comment |
---|---|---|
n3307 | 202408 | original proposal |
CC BY, see https://creativecommons.org/licenses/by/4.0
In view of the addition of __VA_OPT__
first to
C++ and now to C23, there had been interest in extending the C
preprocessor to include recursion. The basic idea would be that __VA_OPT__
can be
used as a test within a macro such as in the following
Indeed, the __VA_OPT__
clause
only inserts its contents if the variable arguments (corresponding to
the ...
part
of the parameter list) are not empty. So at a first glance we may think
that this should be fine, and the above will replace an invocation AddUp(a, b, c)
by (a + (b + (c)))
.
There are two reasons why this doesn’t work well.
The first reason is simple: the above syntax is already valid but
does something else than we might expect. In fact, in current a
recursive use of a macro never expands but keeps the name of the macro
intact. So the result of the macro as written would be (a + AddUp(b, c))
,
which is not very helpful in this case.
This property for recursive invocation can not be changed easily, because a lot of existing code would become invalid. So to enable real recursion we have to invent a new syntax, for example
the eĿlipsis preprocessor implementation now has a feature that
emulates this behavior by defining a stack of macros that call each
other. It is slightly more complicated than to write the #define
directive
above, but it is easy enough to play around and test it:
https://gustedt.gitlabpages.inria.fr/ellipsis/ellipsis-recursive_8dirs.html
This approach works well up to a limited size of the argument list, but is doomed to fail when you would try to have it scale to long lists, say a couple of hundred. Indeed, the second reason why recursion over the argument list does not work well in C is the following observation
C preprocessor recursion over variadic argument lists has quadratic complexity.
This is due to the fact that the C preprocessor always expands
arguments. If we’d pass a list of n
items into AddUp
the preprocessor sees
these n
items plus n-1
comma tokens, so in total 2n-1
tokens. The overall work that is done on one recursion level is thus
proportional to 2n-1
.
If we add that up over all recursion levels, we have a total work of
1 + 3 + 5 + ... + 2n-1
which is n²
.
As this grows quite rapidly with the size of the argument list, the potential of such a recursion feature is a bit limited.
eĿlipsis has a second feature that completely avoids this, tail recursion
https://gustedt.gitlabpages.inria.fr/ellipsis/extensions.html#tail
It works similar to the above but avoids to expand the __VA_ARGS__
parameter
Here the construct __VA_TAIL__()
stands in for a recursive call of the surrounding macro, only that __VA_ARGS__
is not
expanded but passed through directly to the call. Since the elements of
the list are not touched individually, each recursion level has constant
complexity and the overall complexity of a recursive call is linear.
Also, again because none of the elements of the list is touched
individually, we cannot copy the argument list and therefore a second
__VA_TAIL__()
construct would not work and thus results in an empty list.
In fact, __VA_TAIL__()
also admits another syntax where we place an identifier inside the
parenthesis to name the macro that is to be tail called. In the example
above we could in fact have used __VA_TAIL__(AddUp)
with the same result. Here is a more sophisticated example
#define LEFT(X, ...) X __VA_OPT__( + (__VA_TAIL__(RIGHT)))
#define RIGHT(X, ...) __VA_OPT__((__VA_TAIL__(LEFT)) + ) X
This does indeed shuffle the arguments to the left or to the right according to the parity of their position in the list.
eĿlipsis implements this feature quite efficiently: the tail recursion resolves into an iteration over the argument list:
AddUp
this would be (a + (b + ...
AddOn
this would just consist of closing
parenthesis ...))
.At the end of the iteration these two result lists are then just spliced together to form the overall result.
results in
Here the first part of the definition, X;
, appends a
semicolon to the first argument. The construct __VA_TAIL__(MAKE_SEMI)
then calls the same macro again with the remaining argument list, only
that for doing so, that argument list is not expanded again, but just
passed through.
To simplify writing such a macro and to possibly accelerate its expansion a bit, the name of the macro can be omitted. The construct then refers to the macro where it is found. So an equivalent definition of the above could be:
where the redundant information of the name is omitted from the tail call.
results in the three lines
This is a bit more complicated. First, if this macro is called with
just one argument, the whole part within the __VA_OPT__
construct
is not used; the definition is then as if it were just "(X)"
and
so the first invocation results in "(A)"
.
Now, if there is an additional argument, this is as if it had been
defined (X+__VA_TAIL__(AddUp))
.
So there is prefix "(X+"
before the tail call and a suffix ")"
after
it. The result is then nicely appending the prefixes to the left (three
times with arguments A
, B
and C
)
and stacking the suffixes to the right, here resulting in a sequence of
closing parenthesis.
results in the three lines
The identifier that appears in the __VA_TAIL__
construct needs not to be the same as the name of the macro that is
defined. For example here are two macros LEFT
and RIGHT
that call each other.
#define LEFT(X, ...) X __VA_OPT__( + (__VA_TAIL__(RIGHT)))
#define RIGHT(X, ...) __VA_OPT__((__VA_TAIL__(LEFT)) + ) X
RIGHT(0, 1, 2, 3, 4, 5);
The result is a ping pong pattern where the odd numbers are pushed to the left and the even numbers to the right.
Removals are in stroke-out red, additions in
underlined green.
Add __VA_TAIL__
to the
constraint.
change to
Otherwise, the replacement preprocessing tokens are the preprocessing tokens of the corresponding argument after all macros contained therein have been expanded.
TheUnless the invocation is a recursive tail invocation (see~6.10.5.5′), the argument’s preprocessing tokens are completely macro replaced before being substituted as if they formed the rest of the preprocessing file with no other preprocessing tokens being available.
6.10.5.5′ Recursive tail invocation
Syntax
va-tail-invocation:
__VA_TAIL__
(
identifieropt)
Description
Recursive tail invocation is a process during macro expansion in which
the construct __VA_TAIL__
is
replaced by the resulting token sequence of a special recursive
invocation of a macro (named by the identifier) with arguments that
correspond to the variable arguments of the current invocation. If the
identifier is omitted the recursive tail invocation refers to the
current macro.
Constraints
After argument substitution (including#
,##
and__VA_OPT__
processing) has taken place, the identifier__VA_TAIL__
shall always occur as part of the preprocessing token sequence va-tail- invocation. The macro to which the recursive tail invocation refers shall have at least one named parameter.
Semantics
After the arguments for the invocation of a function-like macro have been identified, argument expansion, argument substitution (including
#
,##
and__VA_OPT__
processing) has taken place, va-tail-invocation recursively processes the variable argument list without expanding named macros in the arguments any further. The resulting token sequence of the va-tail-invocation itself does not undergo rescanning and further replacement (6.10.5.6) and is then inserted in place of the va-tail-invocation.
If the variable arguments are empty, each va-tail-invocation is replaced by a single placemarker preprocessing token, as are additional va-tail-invocations if there is more than one va-tail-invocation.
NOTE Recursive tail invocation provides syntax for
recursive macro invocation that is different from the usual macro
invocation syntax identifier( tokens )
.
By imposing that arguments are not expanded recursively and that at
least the first argument is matched to a named parameter on each
recursion level, tail invocation ensures that the variable arguments are
reduced at each recursion step. Thus, the recursion depth is limited by
the number of tokens in the top-most tail invocation.
EXAMPLE Consider the following macros
An invocation tailer(A, B, C)
is processed as in the following table.
prefix | replacement | suffix | X |
remaining | ||
---|---|---|---|---|---|---|
macro invocation | (__VA_OPT__(+)X __VA_TAIL__()) |
A, B, C |
||||
argument identification | (__VA_OPT__(+)X __VA_TAIL__()) |
A |
B, C |
|||
argument expansion | (__VA_OPT__(+)X __VA_TAIL__()) |
35 |
B, 7 |
|||
argument substitution (incl. # , ## , __VA_OPT__ ) |
(+35 __VA_TAIL__()) |
B, 7 |
||||
tail identification | (+35 |
__VA_TAIL__() |
) |
B, 7 |
||
tail invocation | (+35 |
(__VA_OPT__(+)X __VA_TAIL__()) |
) |
B, 7 |
||
argument identification | (+35 |
(__VA_OPT__(+)X __VA_TAIL__()) |
) |
B |
7 |
|
argument substitution (incl. # , ## , __VA_OPT__ ) |
(+35 |
(+B __VA_TAIL__()) |
) |
7 |
||
tail identification | (+35(+B |
__VA_TAIL__() |
)) |
7 |
||
tail invocation | (+35(+B |
(__VA_OPT__(+)X __VA_TAIL__()) |
)) |
7 |
||
argument identification | (+35(+B |
(__VA_OPT__(+)X __VA_TAIL__()) |
)) |
7 |
||
argument substitution (incl. # , ## , __VA_OPT__ ) |
(+35(+B |
(7 __VA_TAIL__()) |
)) |
|||
tail identification | (+35(+B(7 |
__VA_TAIL__() |
))) |
|||
end of tail invocation | (+35(+B(7))) |
|||||
rescanning | (+35(+B(7))) |
|||||
result | (+35(+3*(7))) |
Note that the steps of “argument expansion” and “rescanning” appear only for the start and end of the usual macro invocation. Each recursion level of the tail recursion consist of four steps each: “tail identification”, “tail invocation”, “argument identification”, and “argument substitution”.
Modify as follows
After all parameters in the replacement list have been substituted,
#
and##
processing has taken place, and recursive tail invocation has been performed, all placemarker preprocessing tokens are removed. The resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
There is a branch “tail” on the WG14 git repository which interested parties may consult to see how the above proposed changes would play out in place.