[SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?

Thiago Macieira thiago at macieira.org
Wed Aug 14 21:41:31 CEST 2019


On Wednesday, 14 August 2019 00:54:28 PDT Peter Dimov wrote:
> - what file names use, per filesystem, there can be more than one (*)

There's some work in Linux to create a per-directory setting that configures 
the character set and case sensitiveness (and I'm going to guess locale too, 
as soon as Turkish users are involved). I don't think this is ready.

> - what file contents use

Here we can make an easy distinction: text files and binary files. Text files 
are always encoded in the locale-provided runtime execution encoding, whereas 
everything else is binary. If you want to interpret those bytes, you need to 
use some library to convert from bytes to text.

Some libraries can provide an extension to fopen() that automatically does 
this for you. glibc does: 
	fopen(name", "r,ccs=latin1")

> - what the console/the terminal uses

That should also be the locale runtime encoding, under any sane configuration. 
You can have a misconfigured terminal application -- this used to happen in 
2004 quite often. But that's a mistake, not the expected behaviour.

The terminal may be capable of showing more than the locale expects, but 
that's an implementation-defined extension. For example, Unix terminals have 
been capable of switching to UTF-8 mode with an escape sequence, but no one 
uses that nowadays; the Windows console technically receives the data via the 
wide-char API.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products





More information about the Unicode mailing list