char8_t backward compatibility remediation

Document Number:	P1423R3
Date:	2019-07-18
Audience:	Library Evolution Working Group Library Working Group
Reply-to:	Tom Honermann <tom@honermann.net>

Code	C++17	C++20 with P0482R6	C++20 with this proposal
`const char *p = u8"text";`	Initializes `p` with the address of the UTF-8 encoded string.	Ill-formed.	Ill-formed.
`char a[] = u8"text";`	Initializes `a` with the UTF-8 encoded string.	Ill-formed.	Ill-formed.
`int operator ""_udl(const char*, unsigned long); int v = u8"text"_udl;`	Initializes `v` with the result of calling `operator ""_udl` with the UTF-8 encoded string literal.	Ill-formed.	Ill-formed.
`std::string s(u8"text");`	Initializes `s` with the UTF-8 encoded string.	Ill-formed.	Ill-formed.
`std::filesystem::path p = ...; std::string s = p.u8string();`	Initializes `s` with the UTF-8 encoded representation of the file path stored in `p`.	Ill-formed.	Ill-formed.
`std::cout << u8'x'; std::cout << u8"text";`	Writes a sequence of UTF-8 code units as characters to stdout. (mojibake if the execution character encoding is not UTF-8)	Writes an integer or pointer value to stdout. (consistent with handling of char16_t and char32_t)	Ill-formed. (for all of char8_t, char16_t, and char32_t)
`std::filesystem::u8path(u8"filename");`	Constructs a `std::filesystem::path` object from the UTF-8 encoded string.	Ill-formed.	Constructs a `std::filesystem::path` object from the UTF-8 encoded string.

Searched for	Debian packages (out of ~18000)
`char8_t`	spring (emulates its own `char8_t` support)
`u8string`	libopenmpt (defines a `mpt::u8string` typedef of `std::basic_string` with a custom `char_traits`) spring (defines a `std::u8string` class that derives from `std::string`)
`u8string_view`	<none>
`u8streampos`	<none>
`mbrtoc8`	<none>
`c8rtomb`	<none>
`u8path(u8"text")`	<none>
`u8"text"`	chromium (~10310, ~102 files) firefox (97 hits, 1 files, 3 test files) icu (83 hits, 1 files, 5 test files) qbs (56 hits, 1 file, 1 test file) mongodb (30 hits, 2 test files) aseba (28 hits, 1 file) monero (26 hits, 1 file, 1 test file) nlohmann-json (21 hits, 1 file, 3 test files) bambootracker (20 hits, 5 files) capnproto (18 hits, 1 file, 1 test file) lgogdownloader (11 hits, 1 file) libosmium (10 hits, 3 test files) cbmc (8 hits, 3 test files) maim (8 hits, 1 file) octave-ltfat (8 hits, 1 file) praat (8 hits, 2 files) slop (8 hits, 1 file) mame (7 hits, 3 files) nlohmann-json3 (7 hits, 3 test files) sdcc (7 hits, 1 test file) antlr4 (3 hits, 1 file) keyman-keyboardprocessor (3 hits, 2 test files) scram (3 hits, 2 test files) tesseract (3 hits, 1 test file) boost1.67 (2 hits, 1 test file) freeorion (2 hits, 1 file) supertux (2 hits, 1 file) cjs (1 hit, 1 test file) cpp-hocon (1 hit, 1 test file) efl (1 hit, 1 file) gjs (1 hit, 1 test file) kate4 (1 hit, 1 example file) kodi (1 hit, 1 test file) libtcod (1 hit, 1 test file) retroarch (1 hit, 1 file) rtags (1 hit, 1 test file)
`u8R"(text)"`	chromium (940, 2 files, 2 test files) kate4 (2 hits, 1 example file, 1 test file) nlohmann-json (2 hits, 1 test file) nlohmann-json3 (2 hits, 1 test file) ksyntax-highlighting (1 hit, 1 test file) ktexteditor (1 hit, 1 test file)

Before	After
`int ft(const char*); ft(u8"text");`	`int ft(const char); #if defined(__cpp_char8_t) int ft(const char8_t); #endif ft(u8"text"); // C++17 or C++20`
`int operator ""_udl(const char*, unsigned long); int v = u8"text"_udl;`	`int operator ""_udl(const char, unsigned long); #if defined(__cpp_char8_t) int operator ""_udl(const char8_t, unsigned long); #endif int v = u8"text"_udl; // C++17 or C++20`

Before	After
`u8"\u00E1"`	`"\xC3\xA1" // U+00E1`
`u8"á" (assuming source encoding is UTF-8)`	`"\xC3\xA1" // U+00E1 (works with any source encoding)`

Before	After
`const char &r = u8’x';`	`const char &r = reinterpret_cast<const char &>(u8’x'); // C++17 or C++20`
`const char *p = u8"text";`	`const char p = reinterpret_cast<const char >(u8"text"); // C++17 or C++20`

Before	After
`constexpr const char &r = u8’x';`	`constexpr const char &r = u8’x'_as_char; // C++20 only`
`constexpr const char *p = u8"text";`	`constexpr const char *p = u8"text"_as_char; // C++20 only`
`// Standard C++ doesn't permit conversion to arrays of unknown, // bound, but versions of gcc prior to 9.0.1 did. constexpr const char (&r)[] = u8"text";`	`// If P0388R2 ^[P0388R2] is adopted, then this will be ok. constexpr const char (&r)[] = u8"text"_as_char; // C++20 with P0388R2`

Before	After
`constexpr const char &r = u8’x';`	`constexpr const char &r = U8(’x'); // C++17 or C++20`
`constexpr const char *p = u8"text";`	`constexpr const char *p = U8("text"); // C++17 or C++20`
`// Standard C++ doesn't permit conversion to arrays of unknown, // bound, but versions of gcc prior to 9.0.1 did. constexpr const char (&r)[] = u8"text";`	`// If P0388R2 ^[P0388R2] is adopted, then this will be ok. constexpr const char (&r)[] = U8("text"); // C++17 or C++20 with P0388R2`

Before	After
`char a[] = u8"text";`	`char_array a = u8"text"; // Ok, initialized with "text\0"`
`constexpr char a[] = u8"text";`	`constexpr char_array a = u8"text"; // Ok, initialized with "text\0"`
`constexpr char a[3] = u8"text"; // ill-formed`	`constexpr char_array<3> a = u8"text"; // ill-formed (too many initializers)`
`constexpr char a[6] = u8"text";`	`constexpr char_array<6> a = u8"text"; // Ok, initialized with "text\0\0"`

namespace std {

  […]

  // [ostream.inserters.character], character inserters
  template<class charT, class traits>
    basic_ostream<charT, traits>& operator<<(basic_ostream<charT, traits>&, charT);
  template<class charT, class traits>
    basic_ostream<charT, traits>& operator<<(basic_ostream<charT, traits>&, char);
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, char);

  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, signed char);
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, unsigned char);

  // The following deleted overloads prevent formatting character values as numeric values.
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, wchar_t) = delete;
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, char8_t) = delete;
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, char16_t) = delete;
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, char32_t) = delete;
  template<class traits>
    basic_ostream<wchar_t, traits>& operator<<(basic_ostream<wchar_t, traits>&, char8_t) = delete;
  template<class traits>
    basic_ostream<wchar_t, traits>& operator<<(basic_ostream<wchar_t, traits>&, char16_t) = delete;
  template<class traits>
    basic_ostream<wchar_t, traits>& operator<<(basic_ostream<wchar_t, traits>&, char32_t) = delete;

  template<class charT, class traits>
    basic_ostream<charT, traits>& operator<<(basic_ostream<charT, traits>&, const charT*);
  template<class charT, class traits>
    basic_ostream<charT, traits>& operator<<(basic_ostream<charT, traits>&, const char*);
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, const char*);

  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, const signed char*);
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, const unsigned char*);

  // The following deleted overloads prevent formatting strings as pointer values.
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, const wchar_t*) = delete;
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, const char8_t*) = delete;
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, const char16_t*) = delete;
  template<class traits>
    basic_ostream<char, traits>& operator<<(basic_ostream<char, traits>&, const char32_t*) = delete;
  template<class traits>
    basic_ostream<wchar_t, traits>& operator<<(basic_ostream<wchar_t, traits>&, const char8_t*) = delete;
  template<class traits>
    basic_ostream<wchar_t, traits>& operator<<(basic_ostream<wchar_t, traits>&, const char16_t*) = delete;
  template<class traits>
    basic_ostream<wchar_t, traits>& operator<<(basic_ostream<wchar_t, traits>&, const char32_t*) = delete;
}

^[N4820]	"Working Draft, Standard for Programming Language C++", N4820, 2019. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/n4820.pdf
^[P0388R2]	Robert Haberlach, "Permit conversions to arrays of unknown bound", P0388R2, 2018. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0388r2.html
^[P0482R6]	Tom Honermann, "char8_t: A type for UTF-8 characters and strings (Revision 6)", P0482R6, 2018. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r6.html
^[P0732R2]	Jeff Snyder and Louis Dionne, "Class Types in Non-Type Template Parameters", P0732R2, 2018. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0732r2.pdf
^[SD-8]	Titus Winters, "SD-8: Standard Library Compatibility", SD-8, 2018. https://isocpp.org/std/standing-documents/sd-8-standard-library-compatibility

char8_t backward compatibility remediation

Changes since P1423R2

Introduction

Examples

Anticipated impact

Remediation approaches

Disable `char8_t` support

Add overloads

Change `u8` literals to ordinary literals with escape sequences

reinterpret_cast `u8` literals to `char`

Emulate C++17 `u8` literals

Substitute class types for C arrays initialized with `u8` string literals

Use explicit conversion functions

Tooling

Options considered to reduce backward compatibility impact

1) Reinstate `u8` literals as type `char` and introduce a new literal prefix for `char8_t`

2) Allow implicit conversions from `char8_t` to `char`

3) Allow initializing an array of `char` with a `u8` string literal

4) Allow initializing an array with a reference to an array

5) Allow `std::string` to be initialized with `char8_t` based types

6) Allow implicit conversions from `std::u8string` to `std::string`

7) Add deleted ostream inserters for `char8_t`, `char16_t`, and `char32_t`

8) Allow `std::filesystem::u8path` to accept ranges and iterators with `char8_t` value types

Proposal

Wording

Library wording

Annex C Compatibility wording

Annex D Compatibility features wording

References

char8_t backward compatibility remediation

Changes since P1423R2

Introduction

Examples

Anticipated impact

Remediation approaches

Disable char8_t support

Add overloads

Change u8 literals to ordinary literals with escape sequences

reinterpret_cast u8 literals to char

Emulate C++17 u8 literals

Substitute class types for C arrays initialized with u8 string literals

Use explicit conversion functions

Tooling

Options considered to reduce backward compatibility impact

1) Reinstate u8 literals as type char and introduce a new literal prefix for char8_t

2) Allow implicit conversions from char8_t to char

3) Allow initializing an array of char with a u8 string literal

4) Allow initializing an array with a reference to an array

5) Allow std::string to be initialized with char8_t based types

6) Allow implicit conversions from std::u8string to std::string

7) Add deleted ostream inserters for char8_t, char16_t, and char32_t

8) Allow std::filesystem::u8path to accept ranges and iterators with char8_t value types

Proposal

Wording

Library wording

Annex C Compatibility wording

Annex D Compatibility features wording

References

Disable `char8_t` support

Change `u8` literals to ordinary literals with escape sequences

reinterpret_cast `u8` literals to `char`

Emulate C++17 `u8` literals

Substitute class types for C arrays initialized with `u8` string literals

1) Reinstate `u8` literals as type `char` and introduce a new literal prefix for `char8_t`

2) Allow implicit conversions from `char8_t` to `char`

3) Allow initializing an array of `char` with a `u8` string literal

5) Allow `std::string` to be initialized with `char8_t` based types

6) Allow implicit conversions from `std::u8string` to `std::string`

7) Add deleted ostream inserters for `char8_t`, `char16_t`, and `char32_t`

8) Allow `std::filesystem::u8path` to accept ranges and iterators with `char8_t` value types