______________________________________________________________________

  2   Lexical conventions                                          [lex]

  ______________________________________________________________________

1 The text of the program is kept in units called source files  in  this
  International  Standard.   A source file together with all the headers
  (_lib.headers_) and source files included (_cpp.include_) via the pre-
  processing directive #include, less any source lines skipped by any of
  the conditional inclusion (_cpp.cond_)  preprocessing  directives,  is
  called  a  translation  unit.   [Note:  a C++  program need not all be
  translated at the same time.  ]

2 [Note: previously translated translation units and instantiation units
  can  be preserved individually or in libraries.  The separate transla-
  tion units of a program communicate (_basic.link_)  by  (for  example)
  calls  to functions whose identifiers have external linkage, manipula-
  tion of objects whose identifiers have external linkage, or  manipula-
  tion  of  data  files.  Translation units can be separately translated
  and  then   later   linked   to   produce   an   executable   program.
  (_basic.link_).  ]

  2.1  Phases of translation                                [lex.phases]

1 The  precedence  among the syntax rules of translation is specified by
  the following phases.1)

    1 Physical  source file characters are mapped, in an implementation-
      defined manner, to the basic  source  character  set  (introducing
      new-line  characters  for  end-of-line  indicators)  if necessary.
      Trigraph sequences (_lex.trigraph_) are replaced by  corresponding
      single-character  internal representations.  Any source file char-
      acter not in the basic source  character  set  (_lex.charset_)  is
      replaced  by  the  universal-character-name  that  designates that
      character.  (An implementation may use any internal  encoding,  so
      long  as  an  actual  extended character encountered in the source
      file, and the same extended character expressed in the source file
      as  a  universal-character-name  (i.e. using the \uXXXX notation),
      are handled equivalently.)

    2 Each instance of a new-line character and an immediately preceding
      backslash  character is deleted, splicing physical source lines to
      form logical source lines.  If, as a result, a character  sequence
      that matches the syntax of a universal-character-name is produced,
  _________________________
  1) Implementations must behave as if these separate phases occur,  al-
  though in practice different phases might be folded together.

      the behavior is undefined.  If a source file  that  is  not  empty
      does  not end in a new-line character, or ends in a new-line char-
      acter immediately preceded by a backslash character, the  behavior
      is undefined.

    3 The   source   file   is   decomposed  into  preprocessing  tokens
      (_lex.pptoken_) and sequences of white-space characters (including
      comments).  A source file shall not end in a partial preprocessing
      token or partial comment2).  Each comment is replaced by one space
      character.    New-line  characters  are  retained.   Whether  each
      nonempty sequence of white-space characters other than new-line is
      retained  or  replaced  by  one space character is implementation-
      defined.  The process of dividing a source file's characters  into
      preprocessing tokens is context-dependent.  [Example: see the han-
      dling of < within a #include preprocessing directive.  ]

    4 Preprocessing directives are executed and  macro  invocations  are
      expanded.   If  a  character sequence that matches the syntax of a
      universal-character-name  is  produced  by   token   concatenation
      (_cpp.concat_), the behavior is undefined.  A #include preprocess-
      ing directive causes the named header or source file  to  be  pro-
      cessed from phase 1 through phase 4, recursively.

    5 Each  source  character set member, escape sequence, or universal-
      character-name in character literals and string literals  is  con-
      verted  to  a  member  of the execution character set (_lex.ccon_,
      _lex.string_).

    6 Adjacent ordinary string literal tokens are  concatenated.   Adja-
      cent wide string literal tokens are concatenated.

    7 White-space  characters  separating  tokens are no longer signifi-
      cant.   Each  preprocessing  token  is  converted  into  a  token.
      (_lex.token_).   The resulting tokens are syntactically and seman-
      tically analyzed and translated.  [Note: Source files, translation
      units  and  translated  translation  units need not necessarily be
      stored as files, nor need there be any  one-to-one  correspondence
      between  these  entities  and  any  external  representation.  The
      description is conceptual only, and does not specify any  particu-
      lar implementation.  ]

    8 Translated  translation units and instantiation units are combined
      as follows: [Note: some or all of these may  be  supplied  from  a
      library.   ]  Each translated translation unit is examined to pro-
      duce a list of required instantiations.  [Note: this  may  include
      instantiations    which    have    been    explicitly    requested
      (_temp.explicit_).  ] The definitions of  the  required  templates
  _________________________
  2) A partial preprocessing token would arise from a source file ending
  in the first portion of a multi-character token that requires a termi-
  nating sequence of characters, such as a header-name that  is  missing
  the  closing " or >.  A partial comment would arise from a source file
  ending with an unclosed /* comment.

      are  located.   It is implementation-defined whether the source of
      the translation units containing these definitions is required  to
      be  available.   [Note:  an implementation could encode sufficient
      information into the translated translation unit so as  to  ensure
      the  source  is  not required here.  ] All the required instantia-
      tions are performed to produce instantiation units.  [Note:  these
      are similar to translated translation units, but contain no refer-
      ences to uninstantiated templates and no template definitions.   ]
      The program is ill-formed if any instantiation fails.

    9 All external object and function references are resolved.  Library
      components are linked to satisfy external references to  functions
      and  objects  not  defined  in  the current translation.  All such
      translator output is collected into a program image which contains
      information needed for execution in its execution environment.

  2.2  Character sets                                      [lex.charset]

1 The  basic  source  character set consists of 96 characters: the space
  character, the control characters representing horizontal tab,  verti-
  cal  tab,  form  feed,  and  new-line, plus the following 91 graphical
  characters:3)
  a b c d e f g h i j k l m n o p q r s t u v w x y z
  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  0 1 2 3 4 5 6 7 8 9
  _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '

2 The universal-character-name construct provides a way  to  name  other
  characters.
  hex-quad:
          hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

  universal-character-name:
          \u hex-quad
          \U hex-quad hex-quad
  The character designated by the universal-character-name \UNNNNNNNN is
  that  character  whose  character  short  name  in  ISO/IEC  10646  is
  NNNNNNNN;  the  character  designated  by the universal-character-name
  \uNNNN is that character whose character short name in  ISO/IEC  10646
  is  0000NNNN.  If the hexadecimal value for a universal character name
  is less than 0x20 or in the range 0x7F-0x9F  (inclusive),  or  if  the
  universal  character  name  designates a character in the basic source
  character set, then the program is ill-formed.

  _________________________
  3) The glyphs for the members of the basic source  character  set  are
  intended to identify characters from the subset of ISO/IEC 10646 which
  corresponds to the ASCII character set.  However, because the  mapping
  from  source file characters to the source character set (described in
  translation phase 1) is specified as implementation-defined, an imple-
  mentation  is required to document how the basic source characters are
  represented in source files.

3 The basic execution character set and the basic execution wide-charac-
  ter set shall each contain all the members of the basic source charac-
  ter set, plus control characters representing  alert,  backspace,  and
  carriage  return, plus a null character (respectively, null wide char-
  acter), whose representation has all zero bits.  For each basic execu-
  tion  character  set,  the values of the members shall be non-negative
  and distinct from one another.  The execution character  set  and  the
  execution  wide-character  set  are  supersets  of the basic execution
  character set and the  basic  execution  wide-character  set,  respec-
  tively.  The values of the members of the execution character sets are
  implementation-defined, and any  additional  members  are  locale-spe-
  cific.

  2.3  Trigraph sequences                                 [lex.trigraph]

1 Before any other processing takes place, each occurrence of one of the
  following sequences of  three  characters  ("trigraph  sequences")  is
  replaced by the single character indicated in Table 1.

                       Table 1--trigraph sequences

  +-----------------------+------------------------+------------------------+
  |trigraph   replacement | trigraph   replacement | trigraph   replacement |
  +-----------------------+------------------------+------------------------+
  |  ??=           #      |   ??(           [      |   ??<           {      |
  +-----------------------+------------------------+------------------------+
  |  ??/           \      |   ??)           ]      |   ??>           }      |
  +-----------------------+------------------------+------------------------+
  |  ??'           ^      |   ??!           |      |   ??-           ~      |
  +-----------------------+------------------------+------------------------+

2 [Example:
  ??=define arraycheck(a,b) a??(b??) ??!??! b??(a??)
  becomes
  #define arraycheck(a,b) a[b] || b[a]
   --end example]

3 No other trigraph sequence exists.  Each ?  that does not begin one of
  the trigraphs listed above is not changed.

  2.4  Preprocessing tokens                                [lex.pptoken]
  preprocessing-token:
          header-name
          identifier
          pp-number
          character-literal
          string-literal
          preprocessing-op-or-punc
          each non-white-space character that cannot be one of the above

1 Each preprocessing token that is converted to  a  token  (_lex.token_)
  shall have the lexical form of a keyword, an identifier, a literal, an
  operator, or a punctuator.

2 A preprocessing token is the minimal lexical element of  the  language
  in  translation  phases  3 through 6.  The categories of preprocessing
  token are: header names, identifiers, preprocessing numbers, character
  literals,  string  literals, preprocessing-op-or-punc, and single non-
  white-space characters that do not lexically match the  other  prepro-
  cessing  token  categories.   If a ' or a " character matches the last
  category, the behavior is undefined.  Preprocessing tokens can be sep-
  arated  by  white space; this consists of comments (_lex.comment_), or
  white-space characters (space, horizontal tab, new-line, vertical tab,
  and  form-feed),  or  both.   As described in clause _cpp_, in certain
  circumstances during translation phase 4, white space (or the  absence
  thereof)  serves  as  more than preprocessing token separation.  White
  space can appear within a preprocessing token only as part of a header
  name  or  between  the  quotation characters in a character literal or
  string literal.

3 If the input stream has been parsed into preprocessing tokens up to  a
  given  character, the next preprocessing token is the longest sequence
  of characters that could constitute a  preprocessing  token,  even  if
  that would cause further lexical analysis to fail.

4 [Example: The program fragment 1Ex is parsed as a preprocessing number
  token (one that is not a valid floating  or  integer  literal  token),
  even though a parse as the pair of preprocessing tokens 1 and Ex might
  produce a valid expression (for example, if Ex were a macro defined as
  +1).  Similarly, the program fragment 1E1 is parsed as a preprocessing
  number (one that is a valid floating literal token), whether or not  E
  is a macro name.  ]

5 [Example:  The  program  fragment  x+++++y  is  parsed as x ++ ++ + y,
  which, if x and y are of built-in  types,  violates  a  constraint  on
  increment  operators,  even though the parse x ++ + ++ y might yield a
  correct expression.  ]

  2.5  Alternative tokens                                  [lex.digraph]

1 Alternative token representations are provided for some operators  and
  punctuators4).

2 In all respects of the language, each alternative  token  behaves  the
  same,  respectively,  as its primary token, except for its spelling5).
  _________________________
  4)  These  include "digraphs" and additional reserved words.  The term
  "digraph" (token consisting of two characters) is  not  perfectly  de-
  scriptive,  since  one of the alternative preprocessing-tokens is %:%:
  and of course several primary tokens contain two characters.  Nonethe-
  less, those alternative tokens that aren't lexical keywords are collo-
  quially known as "digraphs".
  5)  Thus the "stringized" values (_cpp.stringize_) of [ and <: will be
  different, maintaining the source spelling, but the tokens can  other-

  The set of alternative tokens is defined in Table 2.

                       Table 2--alternative tokens

  +----------------------+-----------------------+-----------------------+
  |alternative   primary | alternative   primary | alternative   primary |
  +----------------------+-----------------------+-----------------------+
  |    <%           {    |     and         &&    |   and_eq        &=    |
  +----------------------+-----------------------+-----------------------+
  |    %>           }    |    bitor         |    |    or_eq        |=    |
  +----------------------+-----------------------+-----------------------+
  |    <:           [    |     or          ||    |   xor_eq        ^=    |
  +----------------------+-----------------------+-----------------------+
  |    :>           ]    |     xor          ^    |     not          !    |
  +----------------------+-----------------------+-----------------------+
  |    %:           #    |    compl         ~    |   not_eq        !=    |
  +----------------------+-----------------------+-----------------------+
  |   %:%:         ##    |   bitand         &    |                       |
  +----------------------+-----------------------+-----------------------+

  2.6  Tokens                                                [lex.token]
  token:
          identifier
          keyword
          literal
          operator
          punctuator

1 There are five kinds of  tokens:  identifiers,  keywords,  literals,6)
  operators,  and  other  separators.   Blanks,  horizontal and vertical
  tabs, newlines, formfeeds, and comments (collectively, "white space"),
  as  described  below,  are  ignored  except  as they serve to separate
  tokens.  [Note: Some white space is  required  to  separate  otherwise
  adjacent  identifiers,  keywords,  numeric  literals,  and alternative
  tokens containing alphabetic characters.  ]

  2.7  Comments                                            [lex.comment]

1 The characters /* start a comment, which terminates with  the  charac-
  ters  */.  These comments do not nest.  The characters // start a com-
  ment, which terminates with the next new-line character.  If there  is
  a form-feed or a vertical-tab character in such a comment, only white-
  space characters shall appear between it and the new-line that  termi-
  nates  the  comment;  no  diagnostic  is required.  [Note: The comment
  characters //, /*, and */ have no special meaning within a //  comment
  and  are  treated  just like other characters.  Similarly, the comment
  characters // and /* have no special meaning within a /* comment.  ]
  _________________________
  wise be freely interchanged.
  6) Literals include strings and character and numeric literals.

  2.8  Header names                                         [lex.header]
  header-name:
          <h-char-sequence>
          "q-char-sequence"
  h-char-sequence:
          h-char
          h-char-sequence h-char
  h-char:
          any member of the source character set except
                  new-line and >
  q-char-sequence:
          q-char
          q-char-sequence q-char
  q-char:
          any member of the source character set except
                  new-line and "

1 Header name preprocessing tokens shall only appear within  a  #include
  preprocessing  directive (_cpp.include_).  The sequences in both forms
  of header-names are mapped  in  an  implementation-defined  manner  to
  headers   or   to   external   source   file  names  as  specified  in
  _cpp.include_.

2 If either of the characters  '  or  \,  or  either  of  the  character
  sequences  /* or // appears in a q-char-sequence or a h-char-sequence,
  or the character " appears  in  a  h-char-sequence,  the  behavior  is
  undefined.7)

  2.9  Preprocessing numbers                              [lex.ppnumber]
  pp-number:
          digit
          . digit
          pp-number digit
          pp-number nondigit
          pp-number e sign
          pp-number E sign
          pp-number .

1 Preprocessing number tokens lexically  include  all  integral  literal
  tokens (_lex.icon_) and all floating literal tokens (_lex.fcon_).

2 A  preprocessing  number  does not have a type or a value; it acquires
  both after a successful conversion (as part of  translation  phase  7,
  _lex.phases_)  to  an  integral  literal  token  or a floating literal
  token.

  _________________________
  7)  Thus, sequences of characters that resemble escape sequences cause
  undefined behavior.

  2.10  Identifiers                                           [lex.name]
  identifier:
          nondigit
          identifier nondigit
          identifier digit
  nondigit: one of
          universal-character-name
          _ a b c d e f g h i j k l m
            n o p q r s t u v w x y z
            A B C D E F G H I J K L M
            N O P Q R S T U V W X Y Z
  digit: one of
          0 1 2 3 4 5 6 7 8 9

1 An identifier is an arbitrarily long sequence of letters  and  digits.
  Each universal-character-name in an identifier shall designate a char-
  acter whose encoding in ISO 10646 falls into one of the ranges  speci-
  fied  in  Annex _extendid_.  Upper- and lower-case letters are differ-
  ent.  All characters are significant.8)

2 In addition, some identifiers are reserved for use by C++  implementa-
  tions  and  standard  libraries  (_lib.global.names_) and shall not be
  used otherwise; no diagnostic is required.

  2.11  Keywords                                               [lex.key]

1 The identifiers shown in Table 3 are  reserved  for  use  as  keywords
  (that is, they are unconditionally treated as keywords in phase 7):

  _________________________
  8) On systems in which linkers cannot accept extended  characters,  an
  encoding  of the universal-character-name may be used in forming valid
  external identifiers.  For example, some otherwise unused character or
  sequence  of  characters  may be used to encode the \u in a universal-
  character-name.  Extended characters may produce a long external iden-
  tifier,  but  C++  does  not  place a translation limit on significant
  characters for external identifiers.  In C++,  upper-  and  lower-case
  letters are considered different for all identifiers, including exter-
  nal identifiers.

                            Table 3--keywords

  +----------------------------------------------------------------------+
  |asm          do             if                 return        typedef  |
  |auto         double         inline             short         typeid   |
  |bool         dynamic_cast   int                signed        typename |
  |break        else           long               sizeof        union    |
  |case         enum           mutable            static        unsigned |
  |catch        explicit       namespace          static_cast   using    |
  |char         export         new                struct        virtual  |
  |class        extern         operator           switch        void     |
  |const        false          private            template      volatile |
  |const_cast   float          protected          this          wchar_t  |
  |continue     for            public             throw         while    |
  |default      friend         register           true                   |
  |delete       goto           reinterpret_cast   try                    |
  +----------------------------------------------------------------------+

2 Furthermore, the alternative representations shown in Table 4 for cer-
  tain operators and punctuators (_lex.digraph_) are reserved and  shall
  not be used otherwise:

                   Table 4--alternative representations

            +------------------------------------------------+
            |and      and_eq   bitand   bitor   compl    not |
            |not_eq   or       or_eq    xor     xor_eq       |
            +------------------------------------------------+

  2.12  Operators and punctuators                        [lex.operators]

1 The  lexical  representation of C++ programs includes a number of pre-
  processing tokens which are used in the syntax of the preprocessor  or
  are converted into tokens for operators and punctuators:
  preprocessing-op-or-punc: one of
  {       }       [       ]       #       ##      (       )
  <:      :>      <%      %>      %:      %:%:    ;       :       ...
  new     delete  ?       ::      .       .*
  +       -       *       /       %       ^       &       |       ~
  !       =       <       >       +=      -=      *=      /=      %=
  ^=      &=      |=      <<      >>      >>=     <<=     ==      !=
  <=      >=      &&      ||      ++      --      ,       ->*     ->
  and     and_eq  bitand  bitor   compl   not     not_eq  or      or_eq
  xor     xor_eq
  Each preprocessing-op-or-punc is converted to a single token in trans-
  lation phase 7 (_lex.phases_).

  2.13  Literals                                           [lex.literal]

1 There are several kinds of literals.9)
  literal:
          integer-literal
          character-literal
          floating-literal
          string-literal
          boolean-literal

  2.13.1  Integer literals                                    [lex.icon]
  integer-literal:
          decimal-literal integer-suffixopt
          octal-literal integer-suffixopt
          hexadecimal-literal integer-suffixopt
  decimal-literal:
          nonzero-digit
          decimal-literal digit
  octal-literal:
          0
          octal-literal octal-digit
  hexadecimal-literal:
          0x hexadecimal-digit
          0X hexadecimal-digit
          hexadecimal-literal hexadecimal-digit
  nonzero-digit: one of
          1  2  3  4  5  6  7  8  9
  octal-digit: one of
          0  1  2  3  4  5  6  7
  hexadecimal-digit: one of
          0  1  2  3  4  5  6  7  8  9
          a  b  c  d  e  f
          A  B  C  D  E  F
  integer-suffix:
          unsigned-suffix long-suffixopt
          long-suffix unsigned-suffixopt
  unsigned-suffix: one of
          u  U
  long-suffix: one of
          l  L

1 An integer literal is a sequence of digits that has no period or expo-
  nent part.  An integer literal may have a prefix  that  specifies  its
  base  and a suffix that specifies its type.  The lexically first digit
  of the sequence of digits is the most significant.  A decimal  integer
  literal  (base ten) begins with a digit other than 0 and consists of a
  sequence of decimal digits.  An octal  integer  literal  (base  eight)
  begins with the digit 0 and consists of a sequence of octal digits.10)
  A hexadecimal integer literal (base sixteen) begins with 0x or 0X  and
  _________________________
  9) The term "literal"  generally  designates,  in  this  International
  Standard, those tokens that are called "constants" in ISO C.
  10) The digits 8 and 9 are not octal digits.

  consists  of a sequence of hexadecimal digits, which include the deci-
  mal digits and the letters a through f and A through  F  with  decimal
  values  ten through fifteen.  [Example: the number twelve can be writ-
  ten 12, 014, or 0XC.  ]

2 The type of an integer literal depends on its form, value, and suffix.
  If it is decimal and has no suffix, it has the first of these types in
  which its value can be represented: int, long int; if the value cannot
  be  represented  as  a  long int, the behavior is undefined.  If it is
  octal or hexadecimal and has no suffix, it  has  the  first  of  these
  types  in  which its value can be represented: int, unsigned int, long
  int, unsigned long int.  If it is suffixed by u or U, its type is  the
  first  of  these types in which its value can be represented: unsigned
  int, unsigned long int.  If it is suffixed by l or L, its type is  the
  first  of these types in which its value can be represented: long int,
  unsigned long int.  If it is suffixed by ul, lu, uL, Lu, Ul,  lU,  UL,
  or LU, its type is unsigned long int.

3 A  program  is  ill-formed if one of its translation units contains an
  integer literal that cannot be  represented  by  any  of  the  allowed
  types.

  2.13.2  Character literals                                  [lex.ccon]
  character-literal:
          'c-char-sequence'
          L'c-char-sequence'
  c-char-sequence:
          c-char
          c-char-sequence c-char
  c-char:
          any member of the source character set except
                  the single-quote ', backslash \, or new-line character
          escape-sequence
          universal-character-name
  escape-sequence:
          simple-escape-sequence
          octal-escape-sequence
          hexadecimal-escape-sequence
  simple-escape-sequence: one of
          \'  \"  \?  \\
          \a  \b  \f  \n  \r  \t  \v
  octal-escape-sequence:
          \ octal-digit
          \ octal-digit octal-digit
          \ octal-digit octal-digit octal-digit
  hexadecimal-escape-sequence:
          \x hexadecimal-digit
          hexadecimal-escape-sequence hexadecimal-digit

1 A  character  literal  is  one  or  more characters enclosed in single
  quotes, as in 'x', optionally preceded by the letter L, as in L'x'.  A
  character  literal that does not begin with L is an ordinary character
  literal, also referred to as a narrow-character literal.  An  ordinary
  character  literal  that  contains a single c-char has type char, with

  value equal to the numerical value of the encoding of  the  c-char  in
  the  execution character set.  An ordinary character literal that con-
  tains more than one c-char is a multicharacter literal.  A  multichar-
  acter literal has type int and implementation-defined value.

2 A  character literal that begins with the letter L, such as L'x', is a
  wide-character literal.  A wide-character literal has type wchar_t.11)
  The value of a wide-character literal containing a single  c-char  has
  value  equal  to  the numerical value of the encoding of the c-char in
  the execution wide-character set.  The value of a wide-character  lit-
  eral containing multiple c-chars is implementation-defined.

3 Certain nongraphic characters, the single quote ', the double quote ",
  the question mark ?, and the backslash \, can be represented according
  to Table 5.

                        Table 5--escape sequences

                   +----------------------------------+
                   |new-line          NL (LF)   \n    |
                   |horizontal tab    HT        \t    |
                   |vertical tab      VT        \v    |
                   |backspace         BS        \b    |
                   |carriage return   CR        \r    |
                   |form feed         FF        \f    |
                   |alert             BEL       \a    |
                   |backslash         \         \\    |
                   |question mark     ?         \?    |
                   |single quote      '         \'    |
                   |double quote      "         \"    |
                   |octal number      ooo       \ooo  |
                   |hex number        hhh       \xhhh |
                   +----------------------------------+
  The  double  quote  "  and  the question mark ?, can be represented as
  themselves or by the escape sequences \" and \?  respectively, but the
  single  quote ' and the backslash \ shall be represented by the escape
  sequences \' and \\ respectively.  If the character following a  back-
  slash  is  not  one of those specified, the behavior is undefined.  An
  escape sequence specifies a single character.

4 The escape \ooo consists of the backslash followed  by  one,  two,  or
  three  octal digits that are taken to specify the value of the desired
  character.  The escape \xhhh consists of the backslash followed  by  x
  followed  by  one or more hexadecimal digits that are taken to specify
  the value of the desired character.  There is no limit to  the  number
  of digits in a hexadecimal sequence.  A sequence of octal or hexadeci-
  mal digits is terminated by the first character that is not  an  octal
  _________________________
  11) They are intended for character sets where a  character  does  not
  fit into a single byte.

  digit  or a hexadecimal digit, respectively.  The value of a character
  literal is implementation-defined if it falls outside of the implemen-
  tation-defined  range  defined  for  char  (for  ordinary literals) or
  wchar_t (for wide literals).

5 A universal-character-name is translated to the encoding, in the  exe-
  cution  character  set,  of  the character named.  If there is no such
  encoding, the universal-character-name is translated to an implementa-
  tion-defined  encoding.   [Note:  in translation phase 1, a universal-
  character-name is introduced whenever an actual extended character  is
  encountered  in  the  source text.  Therefore, all extended characters
  are described in terms  of  universal-character-names.   However,  the
  actual  compiler  implementation may use its own native character set,
  so long as the same results are obtained.  ]

  2.13.3  Floating literals                                   [lex.fcon]
  floating-literal:
          fractional-constant exponent-partopt floating-suffixopt
          digit-sequence exponent-part floating-suffixopt
  fractional-constant:
          digit-sequenceopt . digit-sequence
          digit-sequence .
  exponent-part:
          e signopt digit-sequence
          E signopt digit-sequence
  sign: one of
          +  -
  digit-sequence:
          digit
          digit-sequence digit
  floating-suffix: one of
          f  l  F  L

1 A floating literal consists of an integer part,  a  decimal  point,  a
  fraction  part,  an e or E, an optionally signed integer exponent, and
  an optional type suffix.  The integer and fraction parts both  consist
  of  a  sequence of decimal (base ten) digits.  Either the integer part
  or the fraction part (not both) can be  omitted;  either  the  decimal
  point  or the letter e (or E) and the exponent (not both) can be omit-
  ted.  The integer part, the optional decimal point  and  the  optional
  fraction  part form the significant part of the floating literal.  The
  exponent, if present, indicates the power of 10 by which the  signifi-
  cant  part  is  to  be scaled.  If the scaled value is in the range of
  representable values for its type, the result is the scaled  value  if
  representable,  else the larger or smaller representable value nearest
  the scaled value, chosen in  an  implementation-defined  manner.   The
  type  of a floating literal is double unless explicitly specified by a
  suffix.  The suffixes f and F specify float,  the  suffixes  l  and  L
  specify  long double.  If the scaled value is not in the range of rep-
  resentable values for its type, the program is ill-formed.

  2.13.4  String literals                                   [lex.string]
  string-literal:
          "s-char-sequenceopt"
          L"s-char-sequenceopt"
  s-char-sequence:
          s-char
          s-char-sequence s-char
  s-char:
          any member of the source character set except
                  the double-quote ", backslash \, or new-line character
          escape-sequence
          universal-character-name

1 A  string  literal  is  a  sequence  of  characters  (as  defined   in
  _lex.ccon_) surrounded by double quotes, optionally beginning with the
  letter L, as in "..." or L"...".  A string literal that does not begin
  with  L  is  an  ordinary string literal, also referred to as a narrow
  string literal.  An ordinary string literal has type "array of n const
  char"  and  static storage duration (_basic.stc_), where n is the size
  of the string as defined below, and  is  initialized  with  the  given
  characters.   A string literal that begins with L, such as L"asdf", is
  a wide string literal.  A wide string literal has  type  "array  of  n
  const wchar_t" and has static storage duration, where n is the size of
  the string as defined below, and is initialized with the given charac-
  ters.

2 Whether  all  string  literals  are  distinct  (that is, are stored in
  nonoverlapping objects)  is  implementation-defined.   The  effect  of
  attempting to modify a string literal is undefined.

3 In translation phase 6 (_lex.phases_), adjacent narrow string literals
  are concatenated and adjacent wide string literals  are  concatenated.
  If  a narrow string literal token is adjacent to a wide string literal
  token, the behavior is undefined.  Characters in concatenated  strings
  are kept distinct.  [Example:
  "\xA" "B"
  contains the two characters '\xA' and 'B' after concatenation (and not
  the single hexadecimal character '\xAB').  ]

4 After  any   necessary   concatenation,   in   translation   phase   7
  (_lex.phases_),  '\0' is appended to every string literal so that pro-
  grams that scan a string can find its end.

5 Escape sequences and universal-character-names in string literals have
  the  same  meaning  as in character literals (_lex.ccon_), except that
  the single quote ' is representable either by itself or by the  escape
  sequence  \',  and  the double quote " shall be preceded by a \.  In a
  narrow string literal, a universal-character-name may map to more than
  one char element due to multibyte encoding.  The size of a wide string
  literal is the total number of escape sequences,  universal-character-
  names,  and other characters, plus one for the terminating L'\0'.  The
  size of a  narrow  string  literal  is  the  total  number  of  escape
  sequences  and  other  characters, plus at least one for the multibyte
  encoding  of  each  universal-character-name,   plus   one   for   the

  terminating '\0'.

  2.13.5  Boolean literals                                    [lex.bool]
  boolean-literal:
          false
          true

1 The  Boolean  literals are the keywords false and true.  Such literals
  have type bool.  They are not lvalues.