Document Number: | P2217R0 |
---|---|
Date: | 2020-08-29 |
Audience: | SG16 |
Reply-to: | Tom Honermann <tom@honermann.net> |
Summaries of SG16 meetings are maintained at https://github.com/sg16-unicode/sg16-meetings. This paper contains a snapshot of select meeting summaries from that repository.
Previously published SG16 meeting summary papers:
SF | F | N | A | SA |
---|---|---|---|---|
3 | 8 | 0 | 0 | 1 |
SF | F | N | A | SA |
---|---|---|---|---|
2 | 5 | 1 | 2 | 0 |
member of a set of elements used for the organization, control, or representation of textual data
Note 1 - A graphic symbol can be represented by a sequence of one or several coded characters.
specified set of characters that are represented in a coded character set
value in the UCS codespace
association between a character and a code point
minimal bit combination that can represent a unit of encoded text for processing or interchange
Note 1 - Examples of code units are octets (8-bit code units) used in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form.
element of interchanged information that is specified to consist of a sequence of code units, in accordance with one or more identified standards for coded character sets
Note 1 - Such sequence can contain code units associated with any type of code points.
Note 2 - Since its second edition: ISO/IEC 10646:2011, this International Standard does not use implementation levels. Its definition of code unit sequence corresponds to the former unrestricted implementation level 3. Other definitions of code unit sequence, previously known as level 1 and 2, are deprecated. To maintain compatibility with these previous editions, in the context of identification of coded representation in International Standards such as ISO/IEC 8824 and ISO/IEC 8825, the concept of implementation level can still be referenced as ‘Implementation level 3’. See Annex N
form that determines how each UCS code point for a UCS character is to be expressed as one or more code units used by the encoding formencoding scheme:
Note 1 - This International Standard specifies UTF-8, UTF-16, and UTF-32.
scheme that specifies the serialization of the code units from the encoding form into octets
Note 1 - Some of the UCS encoding schemes have the same labels as the UCS encoding form. However, they are used in different contexts. UCS encoding forms refer to in-memory and application interface representation of textual data. UCS encoding schemes refer to octet-serialized textual data.
codespace consisting of the integers from 0 to 10FFFF (hexadecimal) available for assigning the repertoire of the UCS characters.UCS scalar value:
any UCS code point except high-surrogate and low-surrogate code points
UCS code unit sequence that purports to be in a UCS encoding form which conforms to the specification of that encoding form and contains no ill-formed code unit sequence subsetminimal well-formed code unit sequence:
well-formed code unit sequence that maps to a single UCS scalar value
UCS code unit sequence that purports to be in a UCS encoding form which does not conform to the specification of that encoding formill-formed code unit sequence subset:
EXAMPLE - An unpaired surrogate code unit is an ill-formed code unit sequence.
non-empty subset of a code unit sequence X which does not contain any code unit which also belong to any minimal well-formed code unit sequence subset of X
Note 1 - An ill-formed code unit sequence subset cannot overlap with a minimal well-formed code unit sequence.
SF | F | N | A | SA |
---|---|---|---|---|
3 | 3 | 1 | 2 | 0 |
SF | F | N | A | SA |
---|---|---|---|---|
0 | 1 | 6 | 2 | 0 |
Yes | No |
---|---|
1 | 8 |
SF | F | N | A | SA |
---|---|---|---|---|
1 | 2 | 2 | 1 | 2 |
Yes | No |
---|---|
1 | 8 |
Yes | No |
---|---|
0 | 9 |
SF | F | N | A | SA |
---|---|---|---|---|
8 | 1 | 0 | 0 | 0 |
"\u1234" == 0x1234
__try__("\u1234") == 0x1234 // :)
if ('\u1234' == 0x73) { return '\u1234'; } else { return 'X'; }