Doc no:        P1072R0

Date:                2018-05-04

Reply to:        Chris Kennelly (ckennelly@google.com), Mark Zeren (mzeren@vmware.com)

Audience:        LEWG/LWG/SG16

Default Initialization for basic_string

Summary

Motivation

Proposal

Option A

Option B

Related Work

Summary

Extend basic_string to allow access to default-initialized elements.

We propose similar changes to vector in P1010R0.

Motivation

Performance sensitive code is impacted by the cost of manipulating strings:  When streaming data into a basic_string, a programmer is forced to choose between extra initialization (resize then copy directly in) or extra copies (copy into a temporary buffer, then append).

Consider writing a pattern several times into a string:

std::string GeneratePattern(const std::string& pattern, size_t count) {

   std::string ret;

   ret.reserve(pattern.size() * count);
  for (size_t i = 0; i < count; i++) {

     ret.append(pattern);  // BAD: Extra bookkeeping

   }

   return ret;

}

Alternatively, we could adjust the output string’s size to its final size, avoiding the bookkeeping in append at the cost of extra initialization:

std::string GeneratePattern(const std::string& pattern, size_t count) {

   std::string ret;

   const auto step = pattern.size();

   ret.size(step * count);  // BAD:  Extra initialization
  for (size_t i = 0; i < count; i++) {

     // GOOD: No bookkeeping

     memcpy(ret.data() + i * step, pattern.data(), step);

   }

   return ret;

}

We propose adding an interface to basic_string to avoid this tradeoff:

std::string GeneratePattern(const std::string& pattern, size_t count) {

   std::string ret;

   const auto step = pattern.size();

   // GOOD:  No initialization

   ret.resize_uninitialized(step * count);
  for (size_t i = 0; i < count; i++) {

     // GOOD: No bookkeeping

     memcpy(ret.data() + i * step, pattern.data(), step);

   }

   return ret;

}

Google has implemented resize_uninitialized (Option A) in its standard library.  This is used in performance critical sections of code such as:

Proposal

Option A

Add to basic.string.capacity [24.3.2.4]:

void resize_uninitialized(size_type n);

The “null terminator” invariant of basic_string [24.3.2] is unchanged.

Option B

While there is implementation experience for Option A, Option B may be more friendly to processor memory prefetchers on modern architectures, particularly when memory is being initialized from uninitialized_data() in order for long strings.

Add to string.accessors [24.3.2.7.1]

charT* uninitialized_data() noexcept;

Add:

basic_string& insert_from_capacity(size_type n);

Related Work