0xEE+23 0x7E+macro 0x100E+value-macro
are preprocessing numbers and as such a conforming C compiler would be required to generate an error when it failed to successfully convert them to actual C language number tokens. The solution is simply to restrict the inclusion of [eE][+-] within a pp-number to situations where the e or E is the first non-digit in the character sequence composin g the preprocessing number. This can be easily implemented in a variety of methods; the informal description above gives perhaps a better guide to efficient implementation than the following revised grammar:It is unbelievable that a standards committee could so lose sight of its objective that it would, in full awareness, make simple expressions illegal.
To illustrate the absurdity of the rationale document's claim that the faulty grammar was felt to be easier to implement, why not adopt the following grammar for a pp-number and really make lif e simple; after all, who wants to have their preprocessor slowed down by checking whether the + or - was preceded by a n eor an E?
pp-number:
digit
.
pp-number digit
pp-number .
pp-number
non-digit
pp-number sign
Response
The Committee reasserts that the grammar and/or semantics of
preprocessing as they appear in the standard are as intended. We are
attaching a copy of the previous responses to this item from David F.
Prosser. The Committee endorses the substance of these responees, which
follow: In response to your first suggested grammar: This grammar
doesn't include all valid numeric constants and exclude other important
tokens. For example, . is derivable. But let's assume
that you intended something like
pp-number:
pp-float
digit
pp-number digit
pp-number non-digit
pp-float:
pp-real
pp-float E sign
pp-float e sign
pp-float digit
pp-float
.
pp-float non-digit
pp-real:
digit
. digit
pp-real digit
pp-real .
This grammar is certainly more complicated than the
one-level construction in the C Standard, and consequently harder to
understand. That's a strike against it. Another strike is that, while
it does mimic the two major numeric categories, it still doesn't include
all sequences covered by the existing grammar, save those that would
otherwise be valid by the stricter tokenization rules. For example,
0b0101e+17 might be someone's future notion of a binary
floating constant. Finally, it suffers from a great deal of
reduce/reduce conflicts, making the implementation and specification
less likely to be understood and implemented as intended. In response
to your second suggested grammar: This could have been done. But the
Committee chose a compromise at a different point - one that
restricts the inappropriate gobbling of characters to +
and - immediately after E or
e. This was all that was necessary to cover all valid numeric
constants in as simple a grammar as was possible. For more background,
you'd need to know the state of the proposed standard a few years before
this grammar was voted in. The Committee had stated its intent that
``garbage'' character sequences that began like a numeric constant were
to be tokenized as a single sequence. This was to prevent situations in
which this ``garbage'' would be turned into valid C code through obscure
macro replacements, among more minor reasons. This was, unfortunately,
very poorly stated in the draft. As I recall, it was placed in the
constraints for subclause 6.1. It was something like ``Each pair of
adjacent tokens that are both keywords, identifiers, and/or constants
must be separated by white space.'' [As ``improved'' for the May 1, 1986
draft proposed standard, subclause 6.1 Constraints consisted of
the single sentence: ``Each keyword, identifier, or constant shall be
separated by some white space from any otherwise adjacent keyword,
identifier, or constant.''] As you can see, this constraint neither
presented the intent of the Committee nor caused implementations to
behave in any sort of consistent manner with respect to tokenization.
Finally a letter writer understood the issue well enough to suggest a
grammar along the lines of the current subclause 6.1.8. It, contrary to
your opening remarks on this topic, is not a ``loose
description,'' and it finally stated in a precise way the intent of the
tokenization rules. The benefits of this construction were that all
tokenization for all implementations would now be the same, no
``garbage'' character sequences would be able to be converted to valid C
code, skipped blocks of code could silently be scanned withou
generating needless and unnecessary tokenization errors, the
preprocessing tokenization of numeric tokens would be greatly
simplified, and room for future expansion of C's numeric tokens was
reserved. That's a lot of good. The down side was that certain
sequences now would require some white space to cause them to be
tokenized as the programmer intended. As noted in the rationale
document, there are other parts in C that require white space for
tokenization to be controlled, and this was found to be one more.
Since the ``mistokenization'' of such sequences must result in some
diagnostic noise from the compiler, and since the fix is so mild, the
Committee agreed that the proposed standard is still much better with
this grammar than with any of the other suggestions. Personally, I
think that the biggest surprise ``win'' was the reservation of future
numeric token ``name space.'' I would not be at all surprised to find
binary constants (that begin with
0b) in newer C implementations.
Question 2
Subclause 6.8.3: Macro substitutions, tokenization,
and white space In general I think it is a good guiding principle that
a C implementation should be able to be based around completely disjoint
preprocessing and lexical scanning parses of the compiler. As such the
rules on tokenizing need to be emphasized with the following paragraphs
(possibly placed after paragraph 1 of subclause 6.8.3.1):
All macro substitutions and expanded macro argument substitutions will result in an additional space token being inserted before and after the replacement token sequence where such a space token is not already present and there is a corresponding preceding or subsequent token in the target token sequence.Naturally such a step can be treated as purely conceptual by a tokenized implementation with combined preprocessing and lexical analysis, except for the purposes of argument stringizing where the added spacing may be essential for unambiguous identification of the preprocessing tokens involved. Such a statement is not a substantive change, as it is merely clarifying the tokenization rules, and given that Standard C has changed the definition of the preprocessor substantially from K&R already (re macro argument expansion before substitution) such an additional explicit change from K&R C should cause comparatively little difficulty except to those who had not appreciated just how different the preprocessing rules are already. Examples which are clarified by this change are:
The last token of every macro argument has no subsequent token at the time of its initial macro argument expansion, and similarly a macro parameter that is the last token of a replacement token list has no subsequent token at the time of that parameter's substitution. Similarly for first tokens and preceding tokens.
The number of arguments in an invocation of a function-like macro shall agree with the number of parameters in the macro definition, ...or is this an undefined, implementation-dependent program - subclause 6.8.3, Semantics paragraph 5:
If (before argument substitution) any argument consists of no preprocessing tokens, the behavior is undefined.In connection with the above I would request that the Committee make a much stronger statement as to whether empty arguments are to be treated as valid arguments or are to be treated as errors. They can have their uses, but if that is recognized then it should be standardized; if not, it should not be allowed.
These empty arguments all have ``shadows'' that cause the sentence you reference in subclause 6.8.3 (page 90, lines 4-5) to be clearly in effect.
The only uncertain case is one in which an empty argument in an
invocation of a one-parameter function-like macro mimics a ``no
arguments'' invocation. (It should also be noted that the definition of
a macro argument from clause 3 does not preclude an empty sequence.)
Thus the standard is clear that the behavior is undefined in the example
from your request. If an implementation does not choose to allow empty
arguments, a diagnostic will probably be emitted; otherwise, the
invocation will most likely be replaced by a preprocessing token
sequence in which the parameter is replaced with no tokens. But because
the standard does not define this, other than as a common extension,
there are no guarantees.
Question 4
Subclause 6.8.3: Preprocessor directives within
actual macro arguments It is a guiding principle that a macro function
and an actual function should be invokable in as similar fashion as
possible. In the latter case, it is not uncommon to find code with
variations of arguments subject to conditional compilation. This should
also compile correctly if an appropriate macro definition is made for
the function.
While conditional compilations within function arguments
is not necessarily a programming style that I would condone, I feel that
it is in the interests of the C programming community at large for such
constructs to be well formed, even if forbidden, and as such make the
following requests:
I would like the Committee to change subclause 6.8.3 to state that #if, #ifdef, #ifndef, #el se, #elif, and #endif preprocessing directives are allowed within actual macro arguments (not necessarily cleanly nested). Conversely, I would like #define and #undef to b e formally forbidden within macro invocations, as these can result in effects that are dependent on the particular implementation of the macro expansions.Response