[SG16-Unicode] Abstract and notes for D1859R0: Standard terminology for execution character set encodings

Tom Honermann tom at honermann.net
Mon Sep 9 22:25:44 CEST 2019


On 9/9/19 4:04 PM, Corentin Jabot wrote:
>
>
> On Mon, 9 Sep 2019 at 21:44, Tom Honermann <tom at honermann.net 
> <mailto:tom at honermann.net>> wrote:
>
>     On 9/9/19 2:48 PM, Corentin Jabot wrote:
>>     Character Repertoire. The collection of characters included in a
>>     character set.
>>     Character Set. A collection of elements used to represent textual
>>     information
>>     Coded Character Set. A character set in which each character is
>>     assigned a numeric code point. Frequently abbreviated as
>>     character set, charset, or code set; the acronym CCS is also used.
>>     Abstract Character. A unit of information used for the
>>     organization, control, or representation of textual data.
>     Where did the above terms come from?
>
>
> Sorry, I should quote my sources
> https://unicode.org/glossary/
>
>>
>>     I will admit i am confused. It's either Character Set or
>>     Character Repertoire
>
>     I suppose the above definitions could be read such that a
>     character set may include members that cannot exist in any
>     character repertoire.  For example, escape characters or other
>     not-really-a-character things like variation selectors.
>
> That does make sense

Another interpretation is that a character set might contain only 'A' 
U+0041 { LATIN CAPITAL LETTER A } and ' ́' U+0301 { COMBINING ACUTE 
ACCENT }, but its character repertoire contains 'A' and 'Á' because both 
can be represented using the elements of the character set.

Tom.

>     Tom.
>
>>
>>
>>
>>     On Mon, 9 Sep 2019 at 20:37, Zach Laine
>>     <whatwasthataddress at gmail.com
>>     <mailto:whatwasthataddress at gmail.com>> wrote:
>>
>>         On Sun, Sep 8, 2019 at 8:16 PM Tom Honermann
>>         <tom at honermann.net <mailto:tom at honermann.net>> wrote:
>>
>>             On 9/8/19 12:02 PM, Steve Downey wrote:
>>>             Character repertoire sounds good, and I will eventually
>>>             learn to spell it. Character set is
>>>             definitely terminology from the pre-unicode times, and
>>>             unfortunately tends to merge the repertoire and
>>>             encoding,
>>>             https://www.iana.org/assignments/character-sets/character-sets.xhtml
>>
>>             I think I was a little over zealous earlier in stating
>>             that Unicode uses "character repertoire" as I described. 
>>             I looked again and don't find that term formally defined
>>             in the standard. However, "repertoire" is used throughout
>>             the standard in ways that I believe are consistent with
>>             my description.  I wasn't able to find an alternative
>>             formal term.
>>
>>         I fully endorse overzelousness as applied to Unicode discussions.
>>
>>             The way I've been thinking about it is that a "character
>>             repertoire" describes a set of /abstract characters/ (a
>>             formal Unicode term) and a "character set" describes a
>>             set of /encoded characters/ (a formal Unicode term) that
>>             associate each /abstract character/ member of a
>>             "character repertoire" with a /code point/ (a formal
>>             Unicode term) within a /codespace/ (A formal Unicode
>>             term).  See sections 2.4 and 3.4 of Unicode 12 and uses
>>             of the word "repertoire" within those chapters.  The
>>             Unicode standard does use the term "character set", but I
>>             didn't find a formal definition.
>>
>>         I think I follow, except that I don't see whether there is a
>>         distinction between "character repertoire" and "abstract
>>         characters".  Is there?  I'm asking because if there is not,
>>         I'd prefer to standardize the formally described term, which
>>         sounds like is "abstract characters".
>>         Zach
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190909/11ef0b3c/attachment.html 


More information about the Unicode mailing list