<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 08/03/2018 02:00 PM, Dave Abrahams
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:2D0C499E-0196-415D-AB68-D48578D53057@apple.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <br class="">
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">On Aug 2, 2018, at 10:26 PM, Tom Honermann &lt;<a
              href="mailto:tom@honermann.net" class=""
              moz-do-not-send="true">tom@honermann.net</a>&gt; wrote:</div>
          <br class="Apple-interchange-newline">
          <div class="">
            <meta http-equiv="Content-Type" content="text/html;
              charset=utf-8" class="">
            <div text="#000000" bgcolor="#FFFFFF" class="">
              <div class="moz-cite-prefix">Thank you Michael and Dave! 
                I appreciate the time and detail.  All of your answers
                look to confirm our expectations, so I interpret this as
                a good sign we're thinking about the right things.<br
                  class="">
                <br class="">
                I added a few inline comments/clarifications below.<br
                  class="">
                <br class="">
                We had tentatively planned to meet Wednesday of next
                week, but it turns out that two of our core SG16 members
                are going to be on vacation so, at a minimum, I'd like
                to postpone.  I'm also feeling pretty content with the
                responses that we got from you and I think it would
                suffice for us to just follow up with any remaining
                thoughts via email.  While I'd love for any of you to
                attend one (or more) of our meetings (any time), I want
                to be sensitive to productive use of your time.  So, how
                about we play it by ear for now?<br class="">
              </div>
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        Works for me</div>
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class="">
              <div class="moz-cite-prefix"> <br class="">
                On 08/02/2018 05:18 PM, Dave Abrahams wrote:<br class="">
              </div>
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <meta http-equiv="Content-Type" content="text/html;
                  charset=utf-8" class="">
                <br class="">
                <div class=""><br class="">
                  <blockquote type="cite" class="">
                    <div class="">On Aug 1, 2018, at 12:04 PM, Michael
                      Ilseman &lt;<a href="mailto:milseman@apple.com"
                        class="" moz-do-not-send="true">milseman@apple.com</a>&gt;
                      wrote:</div>
                    <br class="Apple-interchange-newline">
                    <div class="">
                      <meta http-equiv="Content-Type"
                        content="text/html; charset=utf-8" class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">Hello, I am the current maintainer
                          of Swift’s String, and can speak to my
                          thoughts on the status quo and future
                          directions. Dave, who is on this thread, is
                          much more familiar with the history behind
                          this and can likely provide deeper insight
                          into the reasoning.</div>
                      </div>
                    </div>
                  </blockquote>
                  <div class=""><br class="">
                  </div>
                  Michael has done very well here; I only have a few
                  things to add.</div>
                <div class=""><br class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class=""><font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">On Jul 23, 2018, at 7:39
                                    PM, Tom Honermann &lt;<a
                                      href="mailto:tom@honermann.net"
                                      class="" moz-do-not-send="true">tom@honermann.net</a>&gt;
                                    wrote:<br class="">
                                    <font class="" color="#00c8fa"><br
                                        class="">
                                    </font>SG16 is seeking input from
                                    Swift and WebKit representatives to
                                    help inform our work towards
                                    enhancing support for Unicode in the
                                    C++ standard.  In particular, we
                                    recognize the significant amount of
                                    effort that went into the design of
                                    the Swift String type and would like
                                    to better understand the motivations
                                    that contributed to its current
                                    design and any pressures that might
                                    encourage further evolution or
                                    refinement; especially for any
                                    concerns that would be deemed
                                    significant enough to warrant
                                    backward incompatible changes.<br
                                      class="">
                                    Though most of these questions
                                    specifically mention Swift, that is
                                    an artifact of our being more
                                    familiar with Swift than the
                                    internal workings of WebKit.  Many
                                    of these questions would be
                                    applicable to any string type
                                    designed to support Unicode.  We are
                                    therefore also interested in hearing
                                    about the string types used by
                                    WebKit, the motivations that guided
                                    their design, and the trade offs
                                    that have been made.  Of particular
                                    interest would be the results of
                                    design decisions that are contrast
                                    with the design of Swift's String
                                    type.<br class="">
                                    Thank you in advance for any time
                                    and expertise you are willing and
                                    able to share with us.<br class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">The
                                          Swift string manifesto is
                                          about 1 1/2 years old. What
                                          have you learned since writing
                                          it?  What would you change? 
                                          What have you changed?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>We haven’t really diverged from
                              that manifesto. Some things are still in
                              progress, minor details were tweaked, but
                              the core arguments are still relevant.</div>
                            <div class=""><br class="">
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class=""><br
                                            class="">
                                          Swift strings are extended
                                          grapheme cluster (EGC) based. 
                                          What have been the best and
                                          worst consequences of this
                                          choice?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>I’ll use “grapheme” casually to
                              mean EGC. Swift’s Character type
                              represents a grapheme cluster,
                              Unicode.Scalar represents a Unicode scalar
                              value (non-surrogate code point).<br
                                class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>Cocoa APIs are UTF-16 code unit
                              oriented, and thus there’s always caution
                              (via documentation) about making sure such
                              indices align to grapheme boundaries. This
                              is a frequent source of bugs, especially
                              as part of internationalization. By making
                              Swift strings be grapheme-based by
                              default, developers first reach for the
                              correct APIs.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>Another good consequence is that
                              people picking up Swift and playing with
                              string, e.g. in a repl or Playground, see
                              Swift’s notion of characters align with
                              what is displayed. This includes complex
                              multi-component emoji such as family emoji
                              (👨‍👨‍👧‍👧), which is a single Character
                              composed of 7 Unicode.Scalars.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>This does have downsides. What is
                              and is not a grapheme cluster changes with
                              each version of Unicode, and thus grapheme
                              breaking is inherently a run-time concern
                              and can’t be checked at compile time.
                              Another is that while code units can be
                              random-access, graphemes cannot, which is
                              confusing to developers used to UTF-16
                              code unit access mostly working (until
                              their users use non-BMP scalars or emoji
                              that is). </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <div class=""><br class="">
                  </div>
                  <div class="">I'd say the biggest downside is that
                    there are users who simply refuse to accept what we
                    consider to be the fundamental non-random-access
                    character of any efficient string representation.
                     They are upset that they can't index a string
                    directly with an integer, and can't be talked out of
                    it.  I still think we made the right decision in
                    this regard; you'd have the same problem if your
                    strings were unicode-scalar-based.</div>
                </div>
              </blockquote>
              <br class="">
              Are there common scenarios where programmers tend to be
              frustrated by lack of random access?  Perhaps most often
              when they are working with inputs known to be ASCII only? 
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        Those people can just use the UTF-16 or UTF-8 views and be done.</div>
    </blockquote>
    <br>
    I think I may have misunderstood Michael's initial response.  The
    concern is less about (O(1)) random access and more about the
    ability to index with an integer rather than having to use
    String.Index.  Though, that is the case for String.UTF8View and
    String.UTF16View as well, isn't it?<br>
    <br>
    <blockquote type="cite"
      cite="mid:2D0C499E-0196-415D-AB68-D48578D53057@apple.com">
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class="">Or is this
              mostly an education issue and these programmers are having
              a difficult time accepting that they've spent most of
              their career thus far writing bugs? :)<br class="">
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        IMO it's a combination of the latter and the fact that we don't
        yet have good APIs for the higher-level operations they really
        mean when they want to write code that involves (usually
        constant) integer indices, which is usually pattern
        matching/parsing code.</div>
    </blockquote>
    <br>
    Ok, that makes sense and I think aligns with my new understanding
    above.<br>
    <br>
    <blockquote type="cite"
      cite="mid:2D0C499E-0196-415D-AB68-D48578D53057@apple.com">
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class=""> <br
                class="">
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <div class=""><br class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class="">Furthermore, few existing
                              specifications are phrased in terms
                              grapheme-clusters, so something like a
                              validator wouldn’t want to run on
                              grapheme-segmented text, but a lower
                              abstraction level.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>Also, graphemes can be funky. A
                              string containing only, U+0301 (COMBINING
                              ACUTE ACCENT) has one grapheme, but
                              modifies the prior grapheme upon
                              concatenation. Such degenerate graphemes
                              violate algebraic reasoning in these
                              corner cases. </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <div class=""><br class="">
                  </div>
                  <div class="">We are not aware of generic algorithms
                    that rely on concatenation of collections conserving
                    element counts, so we decided to simply document
                    this quirk rather than saying that string is-not-a
                    collection.</div>
                </div>
              </blockquote>
              <br class="">
              SG16 has previously discussed cases like this and I'm
              happy to hear you haven't had to do anything special for
              it.  This is a good example of why we asked about
              inappropriate use of the String count property:
              programmers assuming s1.count + s2.count ==
              s1.append(s2).count.<br class="">
              <br class="">
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <div class=""><br class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class="">Unicode defines properties and
                              most operations on scalars or code points,
                              and very little on top of graphemes.<br
                                class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">When
                                          porting code unit or code
                                          point based code to Swift
                                          strings (e.g., when rewriting
                                          Objective-C code, or rewriting
                                          Swift code to use String
                                          instead of NSString), has
                                          profiling revealed performance
                                          regressions due to the switch
                                          to EGC based processing?  If
                                          so, what action was taken to
                                          correct it?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>We have many fast-paths in
                              grapheme-breaking to identify common
                              situations surrounding single-scalar
                              graphemes. If a developer wants to work
                              with Unicode at a lower level, String
                              provides a UTF8View, a UTF16View, and a
                              UnicodeScalarView. Those views lazily
                              transcode/decode upon access.<br class="">
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
              </blockquote>
              <br class="">
              Cool, it sounds like the answer to any such regressions
              was 1) optimization in terms of fast-paths, and 2) fall
              back to code unit/point processing otherwise.<br class="">
              <br class="">
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <div class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class=""><font class="" color="#8886ff"><br
                                  class="">
                              </font>There are also performance concerns
                              and annoyances when working with ICU, but
                              this is an implementation detail. If
                              you’re interested in using ICU, we can
                              discuss further what has worked best for
                              us.<br class="">
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <div class=""><br class="">
                  </div>
                  I think you're interested in (at least optionally)
                  using ICU unless you have evidence of major investment
                  in another open-source implementation of Unicode
                  algorithms and tables.  Otherwise, C++ implementors
                  could not afford to develop standard libraries.</div>
              </blockquote>
              <br class="">
              Yes, definitely.  For the foreseeable future, I think we
              need to ensure that any interfaces we propose can be
              reasonably implemented using ICU.  However, Zach Laine has
              made impressive progress implementing many of the Unicode
              algorithms without use of ICU in his proposed Boost.Text
              library.  See <a class="moz-txt-link-freetext"
                href="https://github.com/tzlaine/text"
                moz-do-not-send="true">https://github.com/tzlaine/text</a>
              and <a class="moz-txt-link-freetext"
                href="https://tzlaine.github.io/text/doc/html/index.html"
                moz-do-not-send="true">https://tzlaine.github.io/text/doc/html/index.html</a>.<br
                class="">
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        W00t!  Go Zach!<br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class="">
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <div class=""><br class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class=""><font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class=""><br
                                            class="">
                                          Swift strings do not enforce
                                          storage in any particular
                                          Unicode normalization form. 
                                          Was consideration given to
                                          forcing storage in a
                                          particular form such as FCC or
                                          NFC?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>Swift strings now sort with NFC
                              (currently UTF-16 code unit order, but
                              likely changed to Unicode scalar value
                              order). We didn’t find FCC significantly
                              more compelling in practice. Since NFC is
                              far more frequent in the wild (why waste
                              space if you don’t have to), strings are
                              likely to already be in NFC. We have
                              fast-paths to detect on-the-fly normal
                              sections of strings (e.g. all ASCII, all
                              &lt; U+0300, NFC_QC=yes, etc.). We lazily
                              normalize portions of string during
                              comparison when needed.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>As far as enforcing on creation,
                              no. We do want to add an option to perform
                              a linear scan to set a performance flag,
                              perhaps at creation, so that comparison
                              can take the memcmp-like fast-path.<br
                                class="">
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
              </blockquote>
              <br class="">
              Ok, my take away from this is that fast-pathing has been
              sufficient for lazy normalization (when needed) to not be
              (much of) a performance concern.  At least, not enough to
              want to take the normalization cost on every string
              construction up front.<br class="">
              <br class="">
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <div class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class=""><font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">Swift
                                          strings support comparison via
                                          normalization.  Has use of
                                          canonical string equality been
                                          a performance issue?  Or been
                                          a source of surprise to
                                          programmers?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>This was a big performance issue on
                              Linux, where we used to do UCA+DUCET based
                              comparisons. We switch to lexicographical
                              order of NFC-normalized UTF-16 code units
                              (future: scalar values), and saw a very
                              significant speed up there. The remaining
                              performance work revolves around checking
                              and tracking whether a string is known to
                              already be in a normal form, so we can
                              just memcmp.<br class="">
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
              </blockquote>
              <br class="">
              This is very helpful, thank you.  We've suspected that
              full collation (with or without tailoring) would be too
              expensive for use as a default comparison operator, so it
              is good to hear that confirmed.<br class="">
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        More importantly, such collation is not actually useful without
        a locale.  Strings being used for machine processing don't need
        to be ordered according to "human rules" and once human rules do
        come into play you want to account for language/region.  We
        think it <i class="">is</i> important that the machine doesn't
        distinguish between the different ways of writing "é", if
        nothing else to prevent invisible distinctions in literals in
        source code, which is why we normalize.</div>
    </blockquote>
    <br>
    That makes perfect sense.<br>
    <br>
    <blockquote type="cite"
      cite="mid:2D0C499E-0196-415D-AB68-D48578D53057@apple.com">
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class=""> <br
                class="">
              I'm curious why this was a larger performance issue for
              Linux than for (presumably) macOS and/or iOS.<br class="">
              <br class="">
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <div class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class=""><font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">Swift
                                          strings are not locale
                                          sensitive.  Was any
                                          consideration given to
                                          creation of a distinct locale
                                          sensitive string type?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>This is still up for debate and
                              hasn’t been settled yet, but we think it
                              makes a lot of sense. If an array of
                              strings is sorted, we certainly don’t want
                              a locale-change to violate programmer
                              invariants. A distinct type from string
                              could avoid a lot of common errors here,
                              including forgetting to localize before
                              presenting to a user as part of a UI.<br
                                class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">Swift
                                          strings provide a count
                                          property as required to
                                          satisfy the Collection
                                          protocol.  How often do
                                          programmers use count (the
                                          number of EGCs in the string)
                                          inappropriately?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>I’m not sure what would constitute
                              inappropriate usage here. We do not
                              currently provide access to the underlying
                              stored code units, though this is a
                              frequent request and we likely will in the
                              future. I haven’t seen anyone baking in
                              the assumption that count is the same for
                              String and across all of Strings’s views
                              (UTF-8, UTF-16, Unicode scalars).<br
                                class="">
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <div class=""><br class="">
                  </div>
                </div>
                <div class="">One thing to consider is that as long as
                  String is not random-access, count will be a
                  worst-case O(N) operation.  An inappropriate usage
                  might involve computing the length once per loop
                  iteration.</div>
              </blockquote>
              <br class="">
              In addition to the above and prior mention of algebraic
              concerns, other potential abuses we had in mind were using
              it to determine field widths for display or code
              unit/point based storage.<br class="">
              <br class="">
              C++ container requirements specify that .size() be O(1). 
              For us to meet container requirements would require
              computing and caching the count during construction and
              mutation operations.  </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        You could also just not supply .size().  I don't know if .size()
        is required by container these days, but unless things have
        changed since I was watching (and I'm sure they have) the
        container concepts were not actually useful for generic
        programming.</div>
    </blockquote>
    <br>
    .size() is required for containers, but is not required for ranges. 
    The ranges TS provides concepts for both sized and non-sized ranges.<br>
    <br>
    <blockquote type="cite"
      cite="mid:2D0C499E-0196-415D-AB68-D48578D53057@apple.com">
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class="">We could
              potentially get by just meeting range requirements though.<br
                class="">
              <br class="">
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <div class=""><br class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class="">I mentioned degenerate
                              graphemes breaking algebraic properties of
                              the Collection protocol, but this hasn’t
                              been a huge issue in practice so far.<br
                                class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class=""><br
                                            class="">
                                          Swift strings support several
                                          memory unsafe initializers and
                                          methods.  How frequently are
                                          these used incorrectly?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>Many of these initializers come
                              from NSString originally, and developers
                              migrating correct code to Swift maintain
                              that correctness. Rust has a similar
                              situation, though they do validation at
                              creation-time and from_utf8_unchecked()
                              voids memory-safety if the contents are
                              invalid.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">The
                                          Swift manifesto discussed
                                          three approaches to handling
                                          substrings and Swift 4 changed
                                          from "same type, shared
                                          storage" to "different type,
                                          shared storage".  Any regrets?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>Having two types can be a bit of a
                              pain, but we still think it was the right
                              thing to do. This is consistent with Swift
                              treating slices as a distinct type from
                              the base collection.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class=""><br
                                            class="">
                                          How often do you find
                                          programmers doing work at the
                                          EGC level that would be better
                                          performed at the code unit or
                                          code point level?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>Often, if a developer has strict
                              requirements, they know what they’re doing
                              enough to operate at one of those lower
                              levels.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>Not being able to random-access
                              graphemes in a string is a common source
                              of frustration and confusion amongst new
                              users.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">Likewise,
                                          how often do you find
                                          programmers working with
                                          unicodeScalars, utf8, or utf16
                                          views to do work better
                                          performed at the EGC level? 
                                          For what reasons does this
                                          occur?  Perhaps to work around
                                          differences in EGC boundaries
                                          across Unicode versions or the
                                          underlying version of ICU in
                                          use?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>This was very prevalent in Swift’s
                              early days. String wasn’t a collection of
                              graphemes by default prior to Swift 4,</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <div class=""><br class="">
                  </div>
                  Well, it was.  And then in Swift 2 or 3 it wasn't, due
                  to the algebraic reasoning issue.  Now it is again.</div>
                <div class=""><br class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class=""> so without guidance many
                              developers wrote code against the unicode
                              scalars view. We also didn’t have any
                              fast-paths for common-case situations back
                              then, which further encouraged them to use
                              one of the other views.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>This is still done sometimes for
                              performance-sensitive usage, or someone
                              wanting to handle Unicode themselves.
                              However, as mentioned previously, we don’t
                              (yet) provide direct access to the actual
                              storage.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>We haven’t seen much desire for
                              reconciling behavior across Unicode
                              versions. This may be due to Swift being
                              primarily an applications level
                              programming language for devices which
                              only have one version of Unicode that’s
                              relevant (the current one).<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">Has
                                          consideration been given to
                                          exposing Unicode character
                                          database properties?
                                          CharacterSet exposes some of
                                          these properties, but have
                                          more been requested?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>Yes, this was recently added to the
                              language: <a
href="https://github.com/apple/swift-evolution/blob/master/proposals/0211-unicode-scalar-properties.md"
                                class="" moz-do-not-send="true">https://github.com/apple/swift-evolution/blob/master/proposals/0211-unicode-scalar-properties.md</a>.
                              We surface much of the UCD via ICU.<br
                                class="">
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
              </blockquote>
              <br class="">
              Ah, nice.  All kinds of fun to be had with that :)<br
                class="">
              <br class="">
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <div class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class=""><font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">How
                                          firmly is the Swift string
                                          implementation tied to ICU? 
                                          If the C++ standard library
                                          were to add suitable Unicode
                                          support, what would motivate
                                          reimplementing Swift strings
                                          on top of it?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <div class=""><br class="">
                              </div>
                              Swift’s tie to ICU is less firm than it
                              used to be. We use ICU for the following:<br
                                class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>1. Grapheme breaking<br class="">
                              2. Normalization<br class="">
                              3. Accessing UCD properties<br class="">
                              4. Case conversion<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>Each of these are not too tightly
                              entwined with string; they’re cordoned-off
                              as a couple of shims called on fallback
                              slow-paths.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>If the C++ standard library
                              provided these operations, sufficiently
                              up-to-date with Unicode version and
                              comparable or better to ICU in
                              performance, we would be willing to
                              switch. A big pain in interacting with ICU
                              is their limited support for UTF-8. Some
                              users who would like to use a
                              “lighter-weight” Swift and are unhappy at
                              having to link against ICU, as it’s fairly
                              large, and it can complicate security
                              audits.<br class="">
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
              </blockquote>
              <br class="">
              Got it.  Increasing the size of the C++ standard library
              is a definite concern for us as well.  We imagine some C++
              users would be similarly unhappy if their standard library
              suddenly required linking against ICU.<br class="">
              <br class="">
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <div class="">
                  <blockquote type="cite" class="">
                    <div class="">
                      <div style="word-wrap: break-word;
                        -webkit-nbsp-mode: space; line-break:
                        after-white-space;" class="">
                        <div class="">
                          <div class="">
                            <div class=""><font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">Do
                                          Swift programmers tend to
                                          prefer string interpolation or
                                          string formatting functions?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <div class=""><br class="">
                              </div>
                              Users tend to prefer string interpolation.
                              However, Swift currently does not have
                              much in the way of formatting control in
                              interpolations, and this is something
                              we’re currently working on.<br class="">
                              <font class="" color="#8886ff"><br
                                  class="">
                              </font>
                              <blockquote type="cite" class="">
                                <div class="" style="word-wrap:
                                  break-word; -webkit-nbsp-mode: space;
                                  line-break: after-white-space;">
                                  <div class="">
                                    <blockquote type="cite" class="">
                                      <div class="">
                                        <div text="#000000"
                                          bgcolor="#FFFFFF" class="">What
                                          enhancements would you most
                                          like to see in C++ to improve
                                          Unicode support?</div>
                                      </div>
                                    </blockquote>
                                  </div>
                                </div>
                              </blockquote>
                              <div class=""><br class="">
                              </div>
                              Swift’s string is perhaps geared as a
                              higher-level construct than what you may
                              want for C++, and Swift has
                              Cocoa-interoperability concerns where
                              everything is UTF-16. Rust might provide a
                              closer model to what you’re looking for:<br
                                class="">
                            </div>
                          </div>
                          <div class=""><br class="">
                          </div>
                          <div class="">
                            <ul class="MailOutline">
                              <li class="">Strings are a sequence of
                                (valid) UTF-8 code units</li>
                              <ul class="">
                                <li class="">Validation is done on
                                  creation</li>
                                <li class="">Invalid contents (e.g.
                                  Windows file paths) can be handled via
                                  something like WTF-8, which is not
                                  intended for interchange</li>
                              </ul>
                            </ul>
                          </div>
                          <div class="">
                            <ul class="MailOutline">
                              <li class="">String provides bidirectional
                                iterators for:</li>
                              <ul class="">
                                <li class="">Transcoded and/or
                                  normalized code units</li>
                                <li class="">Unicode scalar values
                                  (their “character” type)</li>
                                <li class="">Grapheme clusters</li>
                              </ul>
                            </ul>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <br class="">
                </div>
                <div class="">Michael, I think you're not answering the
                  question asked.  They are asking what Swift would want
                  from C++, e.g., to allow us to decouple from ICU.
                   Wouldn't we like to be able to do that?</div>
              </blockquote>
              <br class="">
              This question was intended to ask you, as expert C++
              programmers independently from Swift, what additions to
              C++ you think would be most helpful to improve our (very
              lacking) Unicode support.  So, Michael's response is on
              point (thank you; we'll take a closer look at Rust), as
              are any comments regarding what would benefit Swift
              specifically.  Michael's earlier comments regarding what
              Swift currently uses ICU for are suggestive of what Swift
              might want from C++.  But I imagine the form in which
              those features are provided would matter greatly; devils
              and details.<br class="">
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        OK, sorry for the misunderstanding!</div>
    </blockquote>
    <br>
    Not a misunderstanding, the question was just (intentionally, but
    clearly overly) vague :)<br>
    <br>
    Tom.<br>
    <br>
    <blockquote type="cite"
      cite="mid:2D0C499E-0196-415D-AB68-D48578D53057@apple.com">
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class=""> <br
                class="">
              Tom.<br class="">
              <br class="">
              <blockquote type="cite"
                cite="mid:A9CC2CEA-2102-4473-93A3-455C4AF66365@apple.com"
                class="">
                <div class=""><br class="">
                </div>
                <div class="">-Dave</div>
                <div class=""><br class="">
                </div>
                <br class="">
              </blockquote>
              <p class=""><br class="">
              </p>
            </div>
          </div>
        </blockquote>
      </div>
      <br class="">
    </blockquote>
    <p><br>
    </p>
  </body>
</html>