ISO/IEC JTC1 SC22 WG21
N3599
Richard Smith
2013-03-13

Literal operator templates for strings

Overview

N2765 added the ability for users to define their own literal suffixes. Several forms of literal operators are available, with one notable omission: there is no template form of literal operator for character and string literals. N2750 justifies this restriction based on two factors:

Neither of these is still true, and we now have evidence that a literal operator template for string literals would be valuable; indeed, in one codebase where literal operators are not yet permitted, this form of literal operator has been requested more frequently than any of the forms which C++11 permits.

Examples

Type-safe printf

With literal operator templates, it is possible to write a type-safe printf facility:

// A tuple of types.
template<typename ...Ts> struct types {
  template<typename T> using push_front = types<T, Ts...>;
  template<template<typename...> class F> using apply = F<Ts...>;
};

// Select a type from a format character.
template<char K> struct format_type_impl;
template<> struct format_type_impl<'d'> { using type = int; };
template<> struct format_type_impl<'f'> { using type = double; };
template<> struct format_type_impl<'s'> { using type = const char *; };
// ...
template<char K> using format_type = typename format_type_impl<K>::type;

// Build a tuple of types from a format string.
template<char ...String>
struct format_types;
template<>
struct format_types<> : types<> {};
template<char Char, char ...String>
struct format_types<Char, String...> : format_types<String...> {};
template<char ...String>
struct format_types<'%', '%', String...> : format_types<String...> {};
template<char Fmt, char ...String>
struct format_types<'%', Fmt, String...> :
  format_types<String...>::template push_front<format_type<Fmt>> {};

// Typed printf-style formatter.
template<typename ...Args> struct formatter {
  int operator()(Args ...a) {
    return std::printf(str, a...);
  }
  const char *str;
};

template<typename CharT, CharT ...String>
typename format_types<String...>::template apply<formatter>
operator""_printf() {
  static_assert(std::is_same<CharT, char>(), "can only use printf on narrow strings");
  static const CharT data[] = { String..., 0 };
  return { data };
}

void log_bad_guess(const char *name, int guess, int actual) {
  "Hello %s, you guessed %d which is too %s\n"_printf(
    name, guess, guess < actual ? "low" : "high");
}

This is not possible with the existing support for string literal operators, because the type of the literal cannot depend on the contents of the string.

Compiler-validated string literals

By a similar mechanism to the type-safe printf, literal operator templates allow the user to validate that a string literal conforms to a specific syntax or structure during translation.

class SpecialString {
public:
  constexpr static bool IsValidString(const char *str) { /* ... */ }
  explicit SpecialString(const char *str) : str(str) { assert(IsValidString(str); }
  const char *get() { return str; }

private:
  struct Checked {};
  SpecialString(Checked, const char *str) : str(str) {}
  template<typename CharT, CharT ...> friend SpecialString operator""_special();
  const char *str;
};

template<typename CharT, CharT ...String> SpecialString operator""_special() {
  constexpr static CharT data[] = { String..., 0 };
  static_assert(SpecialString::IsValidString(data), "not a valid string");
  return SpecialString(SpecialString::Checked(), data);
}

Again, this is not possible with the existing support for string literal operators, because the literal's value is not available in constant expressions within the literal operator.

String obfuscation

Some commerical applications desire to obfuscate some of their string literals, so that (for instance) running the Unix strings command on their binary does not reveal potentially-sensitive information, such as features the customer has not paid for, or diagnostic messages which are specific to another customer. With a literal operator template, this is possible without disrupting the flow or readability of the client code.

template<typename CharT> struct encoded_string {
  operator std::basic_string<CharT>() { /* ... decode ... */ }
  // ...
}
namespace {
  template<typename CharT, CharT ...String> struct encode {
    static constexpr CharT data[] = { String ^ 0xa3 ..., 0 };
  };
  template<typename CharT, CharT ...String> const CharT encode::data[];
  template<typename CharT, CharT ...String> static encoded_string<CharT> operator""_hidden() {
    return encode<CharT, String...>::data;
  }
}

void report_secret_thing() {
  my_ostream << "secret thing happened"_hidden << std::endl;
}

String interning

Access to the contents of a string literal as a template parameter pack allows string data to be deduplicated during translation, which in turn permits value comparisons to be performed rapidly by comparing the addresses of strings. Without literal operator templates, this requires either runtime overhead to perform the interning, or for the programmer to explicitly construct an object to hold the canonical value of a string. These costs can be avoided with a literal operator template:

std::map<std::string, const char*> intern_map;
template<char ...String> struct register_intern {
  static constexpr char intern[] = { String..., 0 };
  static register_intern register_;
  register_intern() { intern_map[intern] = intern; }
};
template<char ...String> register_intern<String...> register_intern<String...>::register_;

template<typename CharT, CharT ...String> constexpr const char *operator""_intern() {
  static_assert(std::is_same<CharT, char>(), "can only intern narrow strings");
  return (&register_intern<String...>::register_,
          register_intern<String...>::intern);
}

static_assert("foo"_intern == "foo"_intern, "");

String literal canonicalization

Qt defines the macros SIGNAL and SLOT, which encode a method signature in order to allow it to be dynamically invoked:

#define SIGNAL(x) "1" #x
#define SLOT(x) "2" #x
// ...
  QObject::connect(sender, SIGNAL(thingHappened(int)),
                   receiver, SLOT(onThingHappened(int)));

Before the results of the SIGNAL and SLOT macro can be used, they must first be canonicalized (by removing spaces, canonicalizing the location of the const keyword, and so on). With a literal operator template, this canonicalization can be performed during translation.

#define SIGNAL(x) #x ## _qt_signal
#define SLOT(x) #x ## _qt_slot

Proposal

Add a new form of literal operator template for a cooked string literal:

template<typename CharT, CharT ...String>

This form will be used if a non-template literal operator for the string literal is not available. The first template argument will be the element type of the string, and the remaining arguments are the code units in the string literal (excluding its terminating null character).

Raw forms

N2750 expressed a concern that users may wish to use a raw form of string literal. The form proposed herein is a cooked literal operator; no raw form is proposed. If users wish to capture the contents of a string literal as written, a literal operator template can be combined with a raw string literal:

R"(.*\.\(org\|com\|net\))"_regexp

Character literals

No literal operator template is proposed for character literals. The author does not wish to encourage the use of multi-byte character literals, and for single-byte character literals, the feature would have extremely limited utility. Indeed, no use cases are known for this feature, and any possible cases could be addressed by using a string literal instead of a character literal.

Proposed wording

The term of art literal operator template is split into two terms, numeric literal operator template and string literal operator template. The term literal operator template is retained and refers to either form.

Replace literal operator template with numeric literal operator template in [lex.ext] (2.14.8)/3 and [lex.ext] (2.14.8)/4:

[...] Otherwise, S shall contain a raw literal operator or a numeric literal operator template (13.5.8) but not both. [...] Otherwise (S contains a numeric literal operator template), L is treated as a call of the form [...]

Change in [lex.ext] (2.14.8)/5:

If L is a user-defined-string-literal, let C be the element type of the string literal as determined by its encoding-prefix, let str be the literal without its ud-suffix, and let len be the number of code units in str (i.e., its length excluding the terminating null character). If S contains a literal operator with parameter types const C * and std::size_t, the The literal L is treated as a call of the form

  operator "" X(str, len)

Otherwise, S shall contain a string literal operator template (13.5.8), and L is treated as a call of the form

  operator "" X<C, e's1', e's2', ... e'sk'>()

where e is empty when the encoding-prefix is u8 and is otherwise the encoding-prefix of the string literal, and str contains the sequence of code units s1s2...sk (excluding the terminating null character).

Change in [over.literal] (13.5.8)/5:

The declaration of a literal operator template shall have an empty parameter-declaration-clause and its template-parameter-list shall have A numeric literal operator template is a literal operator template whose template-parameter-list has a single template-parameter that is a non-type template parameter pack (14.5.3) with element type char. A string literal operator template is a literal operator template whose template-parameter-list comprises a type template-parameter C followed by a non-type template parameter pack with element type C. The declaration of a literal operator template shall have an empty parameter-declaration-clause and shall declare either a numeric literal operator template or a string literal operator template.