SC22/WG20 N871R
REPORT ON CHARACTER SET POLICY VERSION 1.1
1998-10-12, corr. 2001-09-21
J. W. van Wingen
POLICY OF THE NETHERLANDS GOVERNMENT AND RELATED INSTITUTIONS REGARDING THE USE OF CODED CHARACTER SETS
This document illustrates the policy of the Netherlands Government and related institutions regarding the use of coded character sets. It consists of an extract from NEN 1888, General Personal Data - Definition, character sets and interchange formats, at present at the stage of approved committee draft, translated into English.
Only the material related to the use of character sets is included here. Any reference to normative issues is to be made to the original Dutch version of NEN 1888 (when finally approved). For the correctness of the translation only this author is responsible.
4 Character sets
4.1 The following character sets are being distinguished
4.1.1 Character set EDIFACT-level B (82 characters and SPACE) This set includes 26 capital letters and 26 small letters, 10 digits, SPACE and 20 special characters, and is specified in the "directories" attached to ISO/IEC 9735-1 (EDIFACT) as the UN/ECE level B character set.
4.1.2 Character set ASCII (94 characters and SPACE) This set includes 26 capital letters and 26 small letters, 10 digits, SPACE and 32 special characters, and is specified in ISO/IEC 646, International Reference Version (registered as ISO-IR 6). The set is a superset of EDIFACT Level B. For the 12 special characters not in 4.1.1, see Annex D.3.
4.1.3 Character set Latin1 (190 characters and SPACE) This set includes ASCII and additionally 96 characters, that are extra letters, letters with diacritics, which are required for writing correctly most Western European languages, next to special characters, as specified in ISO/IEC 8859-1 (registered as ISO-IR 100).
4.1.4 Character set Latin5 (190 characters and SPACE) This set includes ASCII and additionally 96 characters, that are extra letters, letters with diacritics, which are required for writing correctly most Western European languages (but not Icelandic) and Turkish, next to special characters, as specified in ISO/IEC 8859-9 (registered as ISO-IR 148).
4.1.5 Character set GBA (292 characters and SPACE) This set includes letters with which all European languages using Latin script can be written, including those of Latin1 and Latin5, 10 digits and a number of special characters. For the specification of the GBA set, see Annex C.
NOTE The GBA set contains the same letters and digits as specified in the Repertoire of ISO/IEC 6937, with the exclusion of the IJ and ij as single characters, and a number of special characters.
4.2 Coding of character sets
If for a character set of 4.1 a ISO coding system is applied, the following coding shall be used.
EDIFACT-level B: ISO/IEC 646, at 7-bit coding
ISO/IEC 8859-1 or 8859-9, at 8-bit coding
ASCII: ISO/IEC 646, at 7-bit coding
ISO/IEC 8859-1 or 8859-9, at 8-bit coding
Latin1: ISO/IEC 8859-1
Latin5: ISO/IEC 8859-9
GBA: ISO/IEC 6937 or
ISO/IEC 10646-1
NOTES at GBA:
1. ISO/IEC 6937 codes characters with diacritic marks with two bytes, all others with one single byte. It is assumed that hardware and software are capable to treat such characters, as if these do not take more than a single position. The standard was intended for text communication where this aspect is not much of a problem. Use of the coding method of ISO/IEC 6937 in text processing on the contrary poses severe demands which may result in higher costs.
2. ISO/IEC 10646-1 offers a choice between uniform coding with two bytes (UCS-2) and mixed coding with one or two bytes (UTF-8); see Annex C.
3. The coding system "Unicode", maintained by the Unicode consortium includes the GBA set, having the same codes for its characters as ISO/IEC 10646-1.
ANNEX C VERSION 1.6
1998-10-12
J. W. van Wingen
(This Annex is normative.)
THE GBA SET
C.1 Introduction
The characters that are permitted in the GBA system are specified in this Annex. There is a distinction between "repertoire" of a character set, and the actual coding of each member of the set. The repertoire specifies only the characters themselves, which we classify into LETTERS, DIGITS and SPECIAL CHARACTERS. In this Annex the repertoire is normative, all other data are just informative. What is part of the GBA set is indicated in the listing with "permitted", what is not with "not permitted".
Each character may be coded according to a particular method, uniformly for a given collection of data. It depends on the application selected. At transfer of data to a different application, the coding has to be (possibly) converted, even if both of these use the same characters. Which characters are permitted is specified in the repertoire selected. This is given simply by a list, in which each character is included with its standardised unique name, which is always written with capitals.
The GBA set is a repertoire which is a subset of that in ISO/IEC 6937. In the past the International Telecommunications Union (ITU) has also defined a subset, which has become known as CCITT T.61, or as the Teletex set. De GBA set has been derived from that. In the mean time, however, differences have crept in with the special characters, because ITU did not follow immediately the ISO developments, and GBA not those with ITU. Consequently, the GBA set cannot be considered any longer as being identical to the Teletex set.
The GBA repertoire may be coded in different ways.
In the list (first letters, then digits, then special characters) in each line (that is for each character) there is given (columns):
Not all the information given will be required in daily practice. The data provided, however, may prevent unneeded searching in special cases.
C.2 Notes on the list (this is C.3 in the Dutch version)
In the list the following notes are referenced by "note", followed by the corresponding number.
4.
5. GBA ASCII
6. $$ dollar SC03 DOLLAR SIGN A4 24
7. ## num SM01 NUMBER SIGN A6 23
10.
11. GBA/6937
12. $g gcedil LG41 LATIN SMALL LETTER G WITH CEDILLA C267
was called in the past:
LG11 LATIN SMALL LETTER G WITH ACUTE
It corresponds with the following capital letter:
$G Gcedil LG42 LATIN CAPITAL LETTER G WITH CEDILLA CB47
At the revision of ISO 6937 (from 1983 to 1994) the name has been changed, but the coding has been left unmodified. Whilst all letters with CEDILLA have a coding that starts with CB, the small "g" has C2, as if it still had ACUTE. (In Latvian the CEDILLA is being printed, either as a comma upturned and above the "g", or as a diacritic with a strong likeness to an ACUTE accent.)
GBA repertoire
---------------------------------------------------------------------------
LETTERS PERMITTED IN GBA code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
a LA01 LATIN SMALL LETTER A 61 61 61 0061
A LA02 LATIN CAPITAL LETTER A 41 41 41 0041
b LB01 LATIN SMALL LETTER B 62 62 62 0062
B LB02 LATIN CAPITAL LETTER B 42 42 42 0042
c LC01 LATIN SMALL LETTER C 63 63 63 0063
C LC02 LATIN CAPITAL LETTER C 43 43 43 0043
d LD01 LATIN SMALL LETTER D 64 64 64 0064
D LD02 LATIN CAPITAL LETTER D 44 44 44 0044
e LE01 LATIN SMALL LETTER E 65 65 65 0065
E LE02 LATIN CAPITAL LETTER E 45 45 45 0045
f LF01 LATIN SMALL LETTER F 66 66 66 0066
F LF02 LATIN CAPITAL LETTER F 46 46 46 0046
g LG01 LATIN SMALL LETTER G 67 67 67 0067
G LG02 LATIN CAPITAL LETTER G 47 47 47 0047
h LH01 LATIN SMALL LETTER H 68 68 68 0068
H LH02 LATIN CAPITAL LETTER H 48 48 48 0048
i LI01 LATIN SMALL LETTER I 69 69 69 0069
I LI02 LATIN CAPITAL LETTER I 49 49 49 0049
j LJ01 LATIN SMALL LETTER J 6A 6A 6A 006A
J LJ02 LATIN CAPITAL LETTER J 4A 4A 4A 004A
k LK01 LATIN SMALL LETTER K 6B 6B 6B 006B
K LK02 LATIN CAPITAL LETTER K 4B 4B 4B 004B
l LL01 LATIN SMALL LETTER L 6C 6C 6C 006C
L LL02 LATIN CAPITAL LETTER L 4C 4C 4C 004C
m LM01 LATIN SMALL LETTER M 6D 6D 6D 006D
M LM02 LATIN CAPITAL LETTER M 4D 4D 4D 004D
n LN01 LATIN SMALL LETTER N 6E 6E 6E 006E
N LN02 LATIN CAPITAL LETTER N 4E 4E 4E 004E
o LO01 LATIN SMALL LETTER O 6F 6F 6F 006F
O LO02 LATIN CAPITAL LETTER O 4F 4F 4F 004F
p LP01 LATIN SMALL LETTER P 70 70 70 0070
P LP02 LATIN CAPITAL LETTER P 50 50 50 0050
q LQ01 LATIN SMALL LETTER Q 71 71 71 0071
Q LQ02 LATIN CAPITAL LETTER Q 51 51 51 0051
r LR01 LATIN SMALL LETTER R 72 72 72 0072
R LR02 LATIN CAPITAL LETTER R 52 52 52 0052
s LS01 LATIN SMALL LETTER S 73 73 73 0073
S LS02 LATIN CAPITAL LETTER S 53 53 53 0053
t LT01 LATIN SMALL LETTER T 74 74 74 0074
T LT02 LATIN CAPITAL LETTER T 54 54 54 0054
u LU01 LATIN SMALL LETTER U 75 75 75 0075
U LU02 LATIN CAPITAL LETTER U 55 55 55 0055
v LV01 LATIN SMALL LETTER V 76 76 76 0076
V LV02 LATIN CAPITAL LETTER V 56 56 56 0056
w LW01 LATIN SMALL LETTER W 77 77 77 0077
W LW02 LATIN CAPITAL LETTER W 57 57 57 0057
x LX01 LATIN SMALL LETTER X 78 78 78 0078
X LX02 LATIN CAPITAL LETTER X 58 58 58 0058
y LY01 LATIN SMALL LETTER Y 79 79 79 0079
Y LY02 LATIN CAPITAL LETTER Y 59 59 59 0059
z LZ01 LATIN SMALL LETTER Z 7A 7A 7A 007A
Z LZ02 LATIN CAPITAL LETTER Z 5A 5A 5A 005A
---------------------------------------------------------------------------
---------------------------------------------------------------------------
LETTERS PERMITTED IN GBA code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
/a LA11 LATIN SMALL LETTER A WITH ACUTE 2F61 C261 C3A1 00E1
/A LA12 LATIN CAPITAL LETTER A WITH ACUTE 2F41 C241 C381 00C1
/c LC11 LATIN SMALL LETTER C WITH ACUTE 2F63 C263 C487 0107
/C LC12 LATIN CAPITAL LETTER C WITH ACUTE 2F43 C243 C486 0106
/e LE11 LATIN SMALL LETTER E WITH ACUTE 2F65 C265 C3A9 00E9
/E LE12 LATIN CAPITAL LETTER E WITH ACUTE 2F45 C245 C389 00C9
/i LI11 LATIN SMALL LETTER I WITH ACUTE 2F69 C269 C3AD 00ED
/I LI12 LATIN CAPITAL LETTER I WITH ACUTE 2F49 C249 C38D 00CD
/l LL11 LATIN SMALL LETTER L WITH ACUTE 2F6C C26C C4BA 013A
/L LL12 LATIN CAPITAL LETTER L WITH ACUTE 2F4C C24C C4B9 0139
/n LN11 LATIN SMALL LETTER N WITH ACUTE 2F6E C26E C584 0144
/N LN12 LATIN CAPITAL LETTER N WITH ACUTE 2F4E C24E C583 0143
/o LO11 LATIN SMALL LETTER O WITH ACUTE 2F6F C26F C3B3 00F3
/O LO12 LATIN CAPITAL LETTER O WITH ACUTE 2F4F C24F C393 00D3
/r LR11 LATIN SMALL LETTER R WITH ACUTE 2F72 C272 C595 0155
/R LR12 LATIN CAPITAL LETTER R WITH ACUTE 2F52 C252 C594 0154
/s LS11 LATIN SMALL LETTER S WITH ACUTE 2F73 C273 C59B 015B
/S LS12 LATIN CAPITAL LETTER S WITH ACUTE 2F53 C253 C59A 015A
/u LU11 LATIN SMALL LETTER U WITH ACUTE 2F75 C275 C3BA 00FA
/U LU12 LATIN CAPITAL LETTER U WITH ACUTE 2F55 C255 C39A 00DA
/y LY11 LATIN SMALL LETTER Y WITH ACUTE 2F79 C279 C3BD 00FD
/Y LY12 LATIN CAPITAL LETTER Y WITH ACUTE 2F59 C259 C39D 00DD
/z LZ11 LATIN SMALL LETTER Z WITH ACUTE 2F7A C27A C5BA 017A
/Z LZ12 LATIN CAPITAL LETTER Z WITH ACUTE 2F5A C25A C5B9 0179
---------------------------------------------------------------------------
\a LA13 LATIN SMALL LETTER A WITH GRAVE 5C61 C161 C3A0 00E0
\A LA14 LATIN CAPITAL LETTER A WITH GRAVE 5C41 C141 C380 00C0
\e LE13 LATIN SMALL LETTER E WITH GRAVE 5C65 C165 C3A8 00E8
\E LE14 LATIN CAPITAL LETTER E WITH GRAVE 5C45 C145 C388 00C8
\i LI13 LATIN SMALL LETTER I WITH GRAVE 5C69 C169 C3AC 00EC
\I LI14 LATIN CAPITAL LETTER I WITH GRAVE 5C49 C149 C38C 00CC
\o LO13 LATIN SMALL LETTER O WITH GRAVE 5C6F C16F C3B2 00F2
\O LO14 LATIN CAPITAL LETTER O WITH GRAVE 5C4F C14F C392 00D2
\u LU13 LATIN SMALL LETTER U WITH GRAVE 5C75 C175 C3B9 00F9
\U LU14 LATIN CAPITAL LETTER U WITH GRAVE 5C55 C155 C399 00D9
---------------------------------------------------------------------------
---------------------------------------------------------------------------
LETTERS PERMITTED IN GBA code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
>a LA15 LATIN SMALL LETTER A WITH CIRCUMFLEX 3E61 C361 C3A2 00E2
>A LA16 LATIN CAPITAL LETTER A WITH CIRCUMFLEX 3E41 C341 C382 00C2
>c LC15 LATIN SMALL LETTER C WITH CIRCUMFLEX 3E63 C363 C489 0109
>C LC16 LATIN CAPITAL LETTER C WITH CIRCUMFLEX 3E43 C343 C488 0108
>e LE15 LATIN SMALL LETTER E WITH CIRCUMFLEX 3E65 C365 C3AA 00EA
>E LE16 LATIN CAPITAL LETTER E WITH CIRCUMFLEX 3E45 C345 C38A 00CA
>g LG15 LATIN SMALL LETTER G WITH CIRCUMFLEX 3E67 C367 C49D 011D
>G LG16 LATIN CAPITAL LETTER G WITH CIRCUMFLEX 3E47 C347 C49C 011C
>h LH15 LATIN SMALL LETTER H WITH CIRCUMFLEX 3E68 C368 C4A5 0125
>H LH16 LATIN CAPITAL LETTER H WITH CIRCUMFLEX 3E48 C348 C4A4 0124
>i LI15 LATIN SMALL LETTER I WITH CIRCUMFLEX 3E69 C369 C3AE 00EE
>I LI16 LATIN CAPITAL LETTER I WITH CIRCUMFLEX 3E49 C349 C38E 00CE
>j LJ15 LATIN SMALL LETTER J WITH CIRCUMFLEX 3E6A C36A C4B5 0135
>J LJ16 LATIN CAPITAL LETTER J WITH CIRCUMFLEX 3E4A C34A C4B4 0134
>o LO15 LATIN SMALL LETTER O WITH CIRCUMFLEX 3E6F C36F C3B4 00F4
>O LO16 LATIN CAPITAL LETTER O WITH CIRCUMFLEX 3E4F C34F C394 00D4
>s LS15 LATIN SMALL LETTER S WITH CIRCUMFLEX 3E73 C373 C59D 015D
>S LS16 LATIN CAPITAL LETTER S WITH CIRCUMFLEX 3E53 C353 C59C 015C
>u LU15 LATIN SMALL LETTER U WITH CIRCUMFLEX 3E75 C375 C3BB 00FB
>U LU16 LATIN CAPITAL LETTER U WITH CIRCUMFLEX 3E55 C355 C39B 00DB
>w LW15 LATIN SMALL LETTER W WITH CIRCUMFLEX 3E77 C377 C5B5 0175
>W LW16 LATIN CAPITAL LETTER W WITH CIRCUMFLEX 3E57 C357 C5B4 0174
>y LY15 LATIN SMALL LETTER Y WITH CIRCUMFLEX 3E79 C379 C5B7 0177
>Y LY16 LATIN CAPITAL LETTER Y WITH CIRCUMFLEX 3E59 C359 C5B6 0176
---------------------------------------------------------------------------
%a LA17 LATIN SMALL LETTER A WITH DIAERESIS 2561 C861 C3A4 00E4
%A LA18 LATIN CAPITAL LETTER A WITH DIAERESIS 2541 C841 C384 00C4
%e LE17 LATIN SMALL LETTER E WITH DIAERESIS 2565 C865 C3AB 00EB
%E LE18 LATIN CAPITAL LETTER E WITH DIAERESIS 2545 C845 C38B 00CB
%i LI17 LATIN SMALL LETTER I WITH DIAERESIS 2569 C869 C3AF 00EF
%I LI18 LATIN CAPITAL LETTER I WITH DIAERESIS 2549 C849 C38F 00CF
%o LO17 LATIN SMALL LETTER O WITH DIAERESIS 256F C86F C3B6 00F6
%O LO18 LATIN CAPITAL LETTER O WITH DIAERESIS 254F C84F C396 00D6
%u LU17 LATIN SMALL LETTER U WITH DIAERESIS 2575 C875 C3BC 00FC
%U LU18 LATIN CAPITAL LETTER U WITH DIAERESIS 2555 C855 C39C 00DC
%y LY17 LATIN SMALL LETTER Y WITH DIAERESIS 2579 C879 C3BF 00FF
%Y LY18 LATIN CAPITAL LETTER Y WITH DIAERESIS 2559 C859 C5B8 0178
---------------------------------------------------------------------------
~a LA19 LATIN SMALL LETTER A WITH TILDE 7E61 C461 C3A3 00E3
~A LA20 LATIN CAPITAL LETTER A WITH TILDE 7E41 C441 C383 00C3
~n LN19 LATIN SMALL LETTER N WITH TILDE 7E6E C46E C3B1 00F1
~N LN20 LATIN CAPITAL LETTER N WITH TILDE 7E4E C44E C391 00D1
~o LO19 LATIN SMALL LETTER O WITH TILDE 7E6F C46F C3B5 00F5
~O LO20 LATIN CAPITAL LETTER O WITH TILDE 7E4F C44F C395 00D5
---------------------------------------------------------------------------
---------------------------------------------------------------------------
LETTERS PERMITTED IN GBA code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
*c LC21 LATIN SMALL LETTER C WITH CARON 2A63 CF63 C48D 010D
*C LC22 LATIN CAPITAL LETTER C WITH CARON 2A43 CF43 C48C 010C
*d LD21 LATIN SMALL LETTER D WITH CARON 2A64 CF64 C48F 010F
*D LD22 LATIN CAPITAL LETTER D WITH CARON 2A44 CF44 C48E 010E
*e LE21 LATIN SMALL LETTER E WITH CARON 2A65 CF65 C49B 011B
*E LE22 LATIN CAPITAL LETTER E WITH CARON 2A45 CF45 C49A 011A
*l LL21 LATIN SMALL LETTER L WITH CARON 2A6C CF6C C4BE 013E
*L LL22 LATIN CAPITAL LETTER L WITH CARON 2A4C CF4C C4BD 013D
*n LN21 LATIN SMALL LETTER N WITH CARON 2A6E CF6E C588 0148
*N LN22 LATIN CAPITAL LETTER N WITH CARON 2A4E CF4E C587 0147
*r LR21 LATIN SMALL LETTER R WITH CARON 2A72 CF72 C599 0159
*R LR22 LATIN CAPITAL LETTER R WITH CARON 2A52 CF52 C598 0158
*s LS21 LATIN SMALL LETTER S WITH CARON 2A73 CF73 C5A1 0161
*S LS22 LATIN CAPITAL LETTER S WITH CARON 2A53 CF53 C5A0 0160
*t LT21 LATIN SMALL LETTER T WITH CARON 2A74 CF74 C5A5 0165
*T LT22 LATIN CAPITAL LETTER T WITH CARON 2A54 CF54 C5A4 0164
*z LZ21 LATIN SMALL LETTER Z WITH CARON 2A7A CF7A C5BE 017E
*Z LZ22 LATIN CAPITAL LETTER Z WITH CARON 2A5A CF5A C5BD 017D
---------------------------------------------------------------------------
#a LA23 LATIN SMALL LETTER A WITH BREVE 2361 C661 C483 0103
#A LA24 LATIN CAPITAL LETTER A WITH BREVE 2341 C641 C482 0102
#g LG23 LATIN SMALL LETTER G WITH BREVE 2367 C667 C49F 011F
#G LG24 LATIN CAPITAL LETTER G WITH BREVE 2347 C647 C49E 011E
#u LU23 LATIN SMALL LETTER U WITH BREVE 2375 C675 C5AD 016D
#U LU24 LATIN CAPITAL LETTER U WITH BREVE 2355 C655 C5AC 016C
---------------------------------------------------------------------------
+o LO25 LATIN SMALL LETTER O WITH DOUBLE ACUTE 2B6F CD6F C591 0151
+O LO26 LATIN CAPITAL LETTER O WITH DOUBLE ACUTE 2B4F CD4F C590 0150
+u LU25 LATIN SMALL LETTER U WITH DOUBLE ACUTE 2B75 CD75 C5B1 0171
+U LU26 LATIN CAPITAL LETTER U WITH DOUBLE ACUTE 2B55 CD55 C5B0 0170
---------------------------------------------------------------------------
@a LA27 LATIN SMALL LETTER A WITH RING ABOVE 4061 CA61 C3A5 00E5
@A LA28 LATIN CAPITAL LETTER A WITH RING ABOVE 4041 CA41 C385 00C5
@u LU27 LATIN SMALL LETTER U WITH RING ABOVE 4075 CA75 C5AF 016F
@U LU28 LATIN CAPITAL LETTER U WITH RING ABOVE 4055 CA55 C5AE 016E
---------------------------------------------------------------------------
@c LC29 LATIN SMALL LETTER C WITH DOT ABOVE 4063 C763 C48B 010B
@C LC30 LATIN CAPITAL LETTER C WITH DOT ABOVE 4043 C743 C48A 010A
@e LE29 LATIN SMALL LETTER E WITH DOT ABOVE 4065 C765 C497 0117
@E LE30 LATIN CAPITAL LETTER E WITH DOT ABOVE 4045 C745 C496 0116
@g LG29 LATIN SMALL LETTER G WITH DOT ABOVE 4067 C767 C4A1 0121
@G LG30 LATIN CAPITAL LETTER G WITH DOT ABOVE 4047 C747 C4A0 0120
@I LI30 LATIN CAPITAL LETTER I WITH DOT ABOVE 4049 C749 C4B0 0130
@i LI61 LATIN SMALL LETTER DOTLESS I 4069 F5 C4B1 0131
@z LZ29 LATIN SMALL LETTER Z WITH DOT ABOVE 407A C77A C5BC 017C
@Z LZ30 LATIN CAPITAL LETTER Z WITH DOT ABOVE 405A C75A C5BB 017B
---------------------------------------------------------------------------
---------------------------------------------------------------------------
LETTERS PERMITTED IN GBA code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
=a LA31 LATIN SMALL LETTER A WITH MACRON 3D61 C561 C481 0101
=A LA32 LATIN CAPITAL LETTER A WITH MACRON 3D41 C541 C480 0100
=e LE31 LATIN SMALL LETTER E WITH MACRON 3D65 C565 C493 0113
=E LE32 LATIN CAPITAL LETTER E WITH MACRON 3D45 C545 C492 0112
=i LI31 LATIN SMALL LETTER I WITH MACRON 3D69 C569 C4AB 012B
=I LI32 LATIN CAPITAL LETTER I WITH MACRON 3D49 C549 C4AA 012A
=o LO31 LATIN SMALL LETTER O WITH MACRON 3D6F C56F C58D 014D
=O LO32 LATIN CAPITAL LETTER O WITH MACRON 3D4F C54F C58C 014C
=u LU31 LATIN SMALL LETTER U WITH MACRON 3D75 C575 C5AB 016B
=U LU32 LATIN CAPITAL LETTER U WITH MACRON 3D55 C555 C5AA 016A
---------------------------------------------------------------------------
=d LD61 LATIN SMALL LETTER D WITH STROKE 3DF2 F2 C491 0111
=D LD62 LATIN CAPITAL LETTER D WITH STROKE 3DE2 E2 C490 0110
=h LH61 LATIN SMALL LETTER H WITH STROKE 3DF4 F4 C4A7 0127
=H LH62 LATIN CAPITAL LETTER H WITH STROKE 3DE4 E4 C4A6 0126
=l LL61 LATIN SMALL LETTER L WITH STROKE 3DF8 F8 C582 0142
=L LL62 LATIN CAPITAL LETTER L WITH STROKE 3DE8 E8 C581 0141
$o LO61 LATIN SMALL LETTER O WITH STROKE 24F9 F9 C3B8 00F8
$O LO62 LATIN CAPITAL LETTER O WITH STROKE 24E9 E9 C398 00D8
=t LT61 LATIN SMALL LETTER T WITH STROKE 3DFD FD C5A7 0167
=T LT62 LATIN CAPITAL LETTER T WITH STROKE 3DED ED C5A6 0166
---------------------------------------------------------------------------
$c LC41 LATIN SMALL LETTER C WITH CEDILLA 2463 CB63 C3A7 00E7
$C LC42 LATIN CAPITAL LETTER C WITH CEDILLA 2443 CB43 C387 00C7
$g LG41 LATIN SMALL LETTER G WITH CEDILLA (note5) 2467 C267 C4A3 0123
$G LG42 LATIN CAPITAL LETTER G WITH CEDILLA 2447 CB47 C4A2 0122
$k LK41 LATIN SMALL LETTER K WITH CEDILLA 246B CB6B C4B7 0137
$K LK42 LATIN CAPITAL LETTER K WITH CEDILLA 244B CB4B C4B6 0136
$l LL41 LATIN SMALL LETTER L WITH CEDILLA 246C CB6C C4BC 013C
$L LL42 LATIN CAPITAL LETTER L WITH CEDILLA 244C CB4C C4BB 013B
$n LN41 LATIN SMALL LETTER N WITH CEDILLA 246E CB6E C586 0146
$N LN42 LATIN CAPITAL LETTER N WITH CEDILLA 244E CB4E C585 0145
$r LR41 LATIN SMALL LETTER R WITH CEDILLA 2472 CB72 C597 0157
$R LR42 LATIN CAPITAL LETTER R WITH CEDILLA 2452 CB52 C596 0156
$s LS41 LATIN SMALL LETTER S WITH CEDILLA 2473 CB73 C59F 015F
$S LS42 LATIN CAPITAL LETTER S WITH CEDILLA 2453 CB53 C59E 015E
$t LT41 LATIN SMALL LETTER T WITH CEDILLA 2474 CB74 C5A3 0163
$T LT42 LATIN CAPITAL LETTER T WITH CEDILLA 2454 CB54 C5A2 0162
---------------------------------------------------------------------------
$a LA43 LATIN SMALL LETTER A WITH OGONEK 2461 CE61 C485 0105
$A LA44 LATIN CAPITAL LETTER A WITH OGONEK 2441 CE41 C484 0104
$e LE43 LATIN SMALL LETTER E WITH OGONEK 2465 CE65 C499 0119
$E LE44 LATIN CAPITAL LETTER E WITH OGONEK 2445 CE45 C498 0118
$i LI43 LATIN SMALL LETTER I WITH OGONEK 2469 CE69 C4AF 012F
$I LI44 LATIN CAPITAL LETTER I WITH OGONEK 2449 CE49 C4AE 012E
$u LU43 LATIN SMALL LETTER U WITH OGONEK 2475 CE75 C5B3 0173
$U LU44 LATIN CAPITAL LETTER U WITH OGONEK 2455 CE55 C5B2 0172
---------------------------------------------------------------------------
---------------------------------------------------------------------------
LETTERS PERMITTED IN GBA code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
&a LA51 LATIN SMALL LETTER AE 26F1 F1 C3A6 00E6
&A LA52 LATIN CAPITAL LETTER AE 26E1 E1 C386 00C6
&o LO51 LATIN SMALL LIGATURE O E 26FA FA C593 0153
&O LO52 LATIN CAPITAL LIGATURE O E 26EA EA C592 0152
&s LS61 LATIN SMALL LETTER SHARP S (German) 26FB FB C39F 00DF
&n LN61 LATIN SMALL LETTER ENG (Sami) 26FE FE C58B 014B
&N LN62 LATIN CAPITAL LETTER ENG (Sami) 26EE EE C58A 014A
&d LD63 LATIN SMALL LETTER ETH (Icelandic) 26F3 F3 C3B0 00F0
&D LD64 LATIN CAPITAL LETTER ETH (Icelandic) (note4) 26E3 .. C390 00D0
&t LT63 LATIN SMALL LETTER THORN (Icelandic) 26FC FC C3BE 00FE
&T LT64 LATIN CAPITAL LETTER THORN (Icelandic) 26EC EC C39E 00DE
---------------------------------------------------------------------------
LETTERS PERMITTED IN GBA, BUT OBSOLETE (note2) code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
~i LI19 #LATIN SMALL LETTER I WITH TILDE 7E69 C469 C4A9 0129
~I LI20 #LATIN CAPITAL LETTER I WITH TILDE 7E49 C449 C4A8 0128
~u LU19 #LATIN SMALL LETTER U WITH TILDE 7E75 C475 C5A9 0169
~U LU20 #LATIN CAPITAL LETTER U WITH TILDE 7E55 C455 C5A8 0168
&k LK61 #LATIN SMALL LETTER KRA (Greenlandic) 266B F0 C4B8 0138
&l LL63 #LATIN SMALL LETTER L WITH MIDDLE DOT 266C F7 C580 0140
&L LL64 #LATIN CAPITAL LETTER L WITH MIDDLE DOT 264C E7 C4BF 013F
=n LN63 #LATIN SMALL LETTER N PRECEDED BY APOSTROPHE3D6E EF C589 0149
---------------------------------------------------------------------------
LETTERS NOT PERMITTED IN GBA, BUT IN 6937 (note1) code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
&i LI51 LATIN SMALL LIGATURE I J 26F6 F6 C433 0133
&I LI52 LATIN CAPITAL LIGATURE I J 26E6 E6 C432 0132
---------------------------------------------------------------------------
PERMITTED DIGITS code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
1 ND01 DIGIT ONE 31 31 31 0031
2 ND02 DIGIT TWO 32 32 32 0032
3 ND03 DIGIT THREE 33 33 33 0033
4 ND04 DIGIT FOUR 34 34 34 0034
5 ND05 DIGIT FIVE 35 35 35 0035
6 ND06 DIGIT SIX 36 36 36 0036
7 ND07 DIGIT SEVEN 37 37 37 0037
8 ND08 DIGIT EIGHT 38 38 38 0038
9 ND09 DIGIT NINE 39 39 39 0039
0 ND10 DIGIT ZERO 30 30 30 0030
---------------------------------------------------------------------------
---------------------------------------------------------------------------
SPECIAL CHARACTERS PERMITTED IN GBA code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
@2 NS02 SUPERSCRIPT TWO 4032 B2 C2B2 00B2
@3 NS03 SUPERSCRIPT THREE 4033 B3 C2B3 00B3
---------------------------------------------------------------------------
_2 NF01 VULGAR FRACTION ONE HALF 5F32 BD C2BD 00BD
_4 NF04 VULGAR FRACTION ONE QUARTER 5F34 BC C2BC 00BC
_3 NF05 VULGAR FRACTION THREE QUARTERS 5F33 BE C2BE 00BE
---------------------------------------------------------------------------
++ SA01 PLUS SIGN 2B2B 2B 2B 002B
< SA03 LESS-THAN SIGN 3C 3C 3C 003C
== SA04 EQUALS SIGN 3D3D 3D 3D 003D
>> SA05 GREATER-THAN SIGN 3E3E 3E 3E 003E
_+ SA02 PLUS-MINUS SIGN 5FB1 B1 C2B1 00B1
_: SA06 DIVISION SIGN 5F3A B8 C3B7 00F7
_* SA07 MULTIPLICATION SIGN 5F2A B4 C397 00D7
---------------------------------------------------------------------------
_f SC01 CURRENCY SIGN 5F66 A8 C2A4 00A4
_L SC02 POUND SIGN 5F4C A3 C2A3 00A3
$$ SC03 DOLLAR SIGN (note3) 2424 24 24 0024
_c SC04 CENT SIGN 5F63 A2 C2A2 00A2
_Y SC05 YEN SIGN 5F59 A5 C2A5 00A5
---------------------------------------------------------------------------
## SM01 NUMBER SIGN (note3) 2323 23 23 0023
%% SM02 PERCENT SIGN 2525 25 25 0025
&& SM03 AMPERSAND 2626 26 26 0026
** SM04 ASTERISK 2A2A 2A 2A 002A
@@ SM05 COMMERCIAL AT 4040 40 40 0040
*( SM06 LEFT SQUARE BRACKET 2A28 5B 5B 005B
*) SM08 RIGHT SQUARE BRACKET 2A29 5D 5D 005D
| SM13 VERTICAL LINE 7C 7C 7C 007C
_m SM17 MICRO SIGN 5F6D B5 C2B5 00B5
_O SM18 OHM SIGN (note6) 5F4F E0 .... 2126
@0 SM19 DEGREE SIGN 4030 B0 C2B0 00B0
_o SM20 MASCULINE ORDINAL INDICATOR 5F6F EB C2BA 00BA
_a SM21 FEMININE ORDINAL INDICATOR 5F61 E3 C2AA 00AA
#S SM24 SECTION SIGN 2353 A7 C2A7 00A7
#P SM25 PILCROW SIGN 2350 B6 C2B6 00B6
#. SM26 MIDDLE DOT 233A B7 C2B7 00B7
---------------------------------------------------------------------------
SP SP01 SPACE 20 20 20 0020
! SP02 EXCLAMATION MARK 21 21 21 0021
*! SP03 INVERTED EXCLAMATION MARK 2321 A1 C2A1 00A1
" SP04 QUOTATION MARK 22 22 22 0022
' SP05 APOSTROPHE 27 27 27 0027
( SP06 LEFT PARENTHESIS 28 28 28 0028
) SP07 RIGHT PARENTHESIS 29 29 29 0029
, SP08 COMMA 2C 2C 2C 002C
__ SP09 LOW LINE 5F5F 5F 5F 005F
- SP10 HYPHEN-MINUS 2D 2D 2D 002D
. SP11 FULL STOP 2E 2E 2E 002E
// SP12 SOLIDUS 2F2F 2F 2F 002F
: SP13 COLON 3A 3A 3A 003A
; SP14 SEMICOLON 3B 3B 3B 003B
? SP15 QUESTION MARK 3F 3F 3F 003F
*? SP16 INVERTED QUESTION MARK 2A3F BF C2BF 00BF
*< SP17 LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 2A3C AB C2AB 00AB
*> SP18 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 2A3E BB C2BB 00BB
---------------------------------------------------------------------------
---------------------------------------------------------------------------
SPECIAL CHARACTERS NOT PERMITTED IN GBA, BUT ASCII code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
@\ SD13 GRAVE ACCENT 405C 60 60 0060
@> SD15 CIRCUMFLEX ACCENT 403E 5E 5E 005E
~~ SD19 TILDE 7E7E 7E 7E 007E
---------------------------------------------------------------------------
\\ SM07 REVERSE SOLIDUS 5C5C 5C 5C 005C
{ SM11 LEFT CURLY BRACKET 7B 7B 7B 007B
} SM14 RIGHT CURLY BRACKET 7D 7D 7D 007D
---------------------------------------------------------------------------
SPECIAL CHARACTERS NOT PERMITTED IN GBA, BUT 6937 code in 10646-1
TRFO SID NAME TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
@1 NS01 SUPERSCRIPT ONE 4031 D1 C2B9 00B9
---------------------------------------------------------------------------
=1 NF18 VULGAR FRACTION ONE EIGHTH 3D31 DC .... 215B
=3 NF19 VULGAR FRACTION THREE EIGHTHS 3D33 DD .... 215C
=5 NF20 VULGAR FRACTION FIVE EIGHTHS 3D35 DE .... 215D
=7 NF21 VULGAR FRACTION SEVEN EIGHTHS 3D37 DF .... 215E
---------------------------------------------------------------------------
@/ SD11 ACUTE ACCENT 402F C220 C2B4 00B4
@% SD17 DIAERESIS 4025 C820 C2A8 00A8
@* SD21 CARON 402A CF20 C887 02C7
@# SD23 BREVE 4023 C620 C898 02D8
@" SD25 DOUBLE ACUTE ACCENT 402B CD20 C89D 02DD
@0 SD27 RING ABOVE 4030 CA20 C89A 02DA
@. SD29 DOT ABOVE 402E C720 C899 02D9
@= SD31 MACRON 403D C520 C2AF 00AF
_) SD41 CEDILLA 5F29 CB20 C2B8 00B8
_( SD43 OGONEK 5F28 CE20 C89B 02DB
---------------------------------------------------------------------------
_- SM12 HORIZONTAL BAR 5F2D D0 .... 2015
_< SM30 LEFTWARDS ARROW 5F3C AC .... 2190
_> SM31 RIGHTWARDS ARROW 5F3E AE .... 2192
_A SM32 UPWARDS ARROW 5F41 AD .... 2191
_V SM33 DOWNWARDS ARROW 5F56 AF .... 2193
#c SM52 COPYRIGHT SIGN 2363 D3 C2A9 00A9
#r SM53 REGISTERED SIGN 2372 D2 C2AE 00AE
#t SM54 TRADE MARK SIGN 2374 D4 .... 2122
*| SM65 BROKEN BAR 237C D7 C2A6 00A6
^ SM66 NOT SIGN D6 D6 C2AC 00AC
_J SM93 MUSIC NOTE (EIGHTH NOTE IN 10646) 5F4A D5 .... 266A
---------------------------------------------------------------------------
@( SP19 LEFT SINGLE QUOTATION MARK 4028 A9 .... 2018
@) SP20 RIGHT SINGLE QUOTATION MARK 4029 B9 .... 2019
@{ SP21 LEFT DOUBLE QUOTATION MARK 405B AA .... 201C
@} SP22 RIGHT DOUBLE QUOTATION MARK 405D BA .... 201D
---------------------------------------------------------------------------
SP31 NO-BREAK SPACE .. A0 C2A0 00A0
SP32 SOFT HYPHEN .. FF C2AD 00AD
---------------------------------------------------------------------------
ANNEX D (informative) VERSION 1.5
1998-10-12
REPRESENTATION OF THE CHARACTERS OF THE GBA SET WITH THE CHARACTERS OF THE ASCII SET WITH LOSS OF INFORMATION
D.1 Introduction
Should one have in his application at his disposal only the 52 letters, the 10 digits and the 33 special characters of ASCII (in total 95), then submitted texts may be made manageable by replacing every character not available by one taken from ASCII. In this way different GBA characters are mapped on the same ASCII character, thus causing loss of information. This method is internationally known as "fall back".
The rules to be followed at this transformation are described in the following.
D.2 Rules
1. From letters carrying a diacritic mark (those that are coded in ISO/IEC 6937 with two bytes) the diacritic mark is removed (this is the first byte of the code).
2. The following letters of the so-called supplementary set in ISO/IEC 6937 (that are those coded with 1 byte) are modified as follows (the SGML public entities are written without the preceding ampersand and the closing semicolon):
TRFO SGML SID Name becomes from to
@i inodot LI61 LATIN SMALL LETTER DOTLESS I i F5 69
=d dstrok LD61 LATIN SMALL LETTER D WITH STROKE d F2 64
=D Dstrok LD62 LATIN CAPITAL LETTER D WITH STROKE D E2 44
=h hstrok LH61 LATIN SMALL LETTER H WITH STROKE h F4 68
=H Hstrok LH62 LATIN CAPITAL LETTER H WITH STROKE H E4 48
=l lstrok LL61 LATIN SMALL LETTER L WITH STROKE l F8 6C
=L Lstrok LL62 LATIN CAPITAL LETTER L WITH STROKE L E8 4C
$o ostrok LO61 LATIN SMALL LETTER O WITH STROKE o F9 6F
$O Ostrok LO62 LATIN CAPITAL LETTER O WITH STROKE O E9 4F
=t tstrok LT61 LATIN SMALL LETTER T WITH STROKE t FD 64
=T Tstrok LT62 LATIN CAPITAL LETTER T WITH STROKE T ED 44
&d eth LD63 LATIN SMALL LETTER ETH (Icelandic) d F3 64
&k kgreen LK61 LATIN SMALL LETTER KRA (Greenlandic) q F0 71
3. The following letters will be transformed into two characters:
&a aelig LA51 LATIN SMALL LETTER AE ae F1 6165
&A AElig LA52 LATIN CAPITAL LETTER AE AE E1 4145
&l lmidot LL63 LATIN SMALL LETTER L WITH MIDDLE DOT l. F7 6C2E
&L Lmidot LL64 LATIN CAPITAL LETTER L WITH MIDDLE DOT L. E7 4C2E
&o oelig LO51 LATIN SMALL LIGATURE O E oe FA 6F65
&O OElig LO52 LATIN CAPITAL LIGATURE O E OE EA 4F45
&s szlig LS61 LATIN SMALL LETTER SHARP S (German) ss FB 7373
&t thorn LT63 LATIN SMALL LETTER THORN (Icelandic) th FC 7468
&T THORN LT64 LATIN CAPITAL LETTER THORN (Icelandic) TH EC 5448
&n eng LN61 LATIN SMALL LETTER ENG (Sami) ng FE 6E67
&N ENG LN62 LATIN CAPITAL LETTER ENG (Sami) NG EE 4E47
'n napos LN63 LATIN SMALL LETTER N PRECEDED BY 'n EF 276E
APOSTROPHE
4. The following special characters will be transformed to 1 character:
@2 sup2 NS02 SUPERSCRIPT TWO 2 B2 32
@3 sup3 NS03 SUPERSCRIPT THREE 3 B3 33
_: divide SA06 DIVISION SIGN : B8 3A
_* times SA07 MULTIPLICATION SIGN x B4 78
_f curren SC01 CURRENCY SIGN * A8 2A
_L pound SC02 POUND SIGN L A3 4C
$$ dollar SC03 =DOLLAR SIGN ? 24 3F
_c cent SC04 CENT SIGN c A2 63
_Y yen SC05 YEN SIGN Y A5 59
## num SM01 =NUMBER SIGN ? 23 3F
@@ commat SM05 =COMMERCIAL AT ? 40 3F
*( lsqb SM06 =LEFT SQUARE BRACKET ( 5B 28
*) rsqb SM08 =RIGHT SQUARE BRACKET ) 5D 29
| verbar SM13 =VERTICAL LINE ? 7C 3F
_m micro SM17 MICRO SIGN ? B5 3F
@0 deg SM19 DEGREE SIGN o B0 6F
_o ordm SM20 MASCULINE ORDINAL INDICATOR o EB 6F
_a ordf SM21 FEMININE ORDINAL INDICATOR a E3 61
#S sect SM24 SECTION SIGN ? A7 3F
#P para SM25 PILCROW SIGN ? B6 3F
#. middot SM26 MIDDLE DOT . B7 2E
*! iexcl SP03 INVERTED EXCLAMATION MARK ! A1 21
__ lowbar SP09 =LOW LINE - 5F 2D
*? iquest SP16 INVERTED QUESTION MARK ? BF 3F
5. The following special characters will be transformed to 2 characters:
*< laquo SP17 LEFT-POINTING DOUBLE ANGLE QUOTATION MARK << AB 3C3C
*> raquo SP18 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK >> BB 3E3E
6. The following special characters will be transformed to 3 characters:
_2 frac12 NF01 VULGAR FRACTION ONE HALF 1/2 BD 312F32
_4 frac14 NF04 VULGAR FRACTION ONE QUARTER 1/4 BC 312F34
_3 frac34 NF05 VULGAR FRACTION THREE QUARTERS 3/4 BE 332F34
_+ plusmn SA02 PLUS-MINUS SIGN +/- B1 2B2F2D
_O ohm SM18 OHM SIGN Ohm E0 4F686D
7. All other characters remain unchanged.
D.3 Notes to the rules:
1. The lists of characters for transformation as given above only contain the characters from the GBA set, for others no rules are presented here. The characters to which a mapping is specified are those included in the Edifact Level B repertoire. This does not contain the following characters in ASCII: # $ @ \ [ ] ^ ` { | } ~ which would be transformed as well in as far as they occur in the GBA set. Should one remain in an ASCII enviroment, however, then it has little sense to transform these characters to others (indicated with = before the name of the character under D.2 sub 4).
2. The following characters in ISO/IEC 6937:1994 have got a code different from that in CCITT T.61, which has been withdrawn since. The GBA set is still based on the old T.61 with respect to coding. There are no plans to change these codes in the near future.
GBA ASCII
$$ dollar SC03 DOLLAR SIGN A4 24
## num SM01 NUMBER SIGN A6 23
D.4 Example
As an example a French text is given here, together with its version after transformation with loss.
ORIGINAL
Composée en 1936, exécutée pour la première fois au cours du Festival de Venise de 1937, la musique de la Suite Provençale s'affirme, au c÷ur de l' ÷uvre immense de Darius Milhaud, comme l'une de ses réussites les plus accomplies. C'est Aix, sa ville natale et si tendrement aimée, avec ses hôtels historiques, son cours Mirabeau, ses fontaines dont l'eau ruisselle en jasant sous la fraiche voûte des platanes, tandis qu'au-delà des champs de vignes et sombres haies de cyprès, la courbe nette de la Sainte Victoire s'érige dans le bleu d'un ciel nimbé de grise vapeur que Cézanne a si bien fixé avec un amour égal à son souci de la plus stricte vérité.
AFTER TRANSFORMATION
Composee en 1936, executee pour la premiere fois au cours du Festival de Venise de 1937, la musique de la Suite Provencale s'affirme, au coeur de l' oeuvre immense de Darius Milhaud, comme l'une de ses reussites les plus accomplies. C'est Aix, sa ville natale et si tendrement aimee, avec ses hotels historiques, son cours Mirabeau, ses fontaines dont l'eau ruisselle en jasant sous la fraiche voute des platanes, tandis qu'au-dela des champs de vignes et sombres haies de cypres, la courbe nette de la Sainte Victoire s'erige dans le bleu d'un ciel nimbe de grise vapeur que Cezanne a si bien fixe avec un amour egal a son souci de la plus stricte verite.
ANNEX E (informative) VERSION 1.3
1998-10-12
REPRESENTATION OF THE CHARACTERS OF THE GBA SET WITH THE CHARACTERS OF THE ASCII SET WITHOUT LOSS OF INFORMATION
E.1 Introduction
Should one have in his application at his disposal only the 52 letters, the 10 digits and the 33 special characters of ASCII (in total 95), then submitted texts may be input without loss of information only if some characters will be transformed to more than one character taken from ASCII.
It is important here to follow a uniform transformation scheme that also can be performed by a simple program, that in addition allows that the original can be reconstructed again, and that makes the original text as readable as possible after the transformation.
With the following conventions good experience has been met. All GBA characters not in ASCII are turned into two ASCII characters, a special character followed by a letter, thus not causing loss of information. A text thus transformed can be typed in directly on an ASCII keyboard, or converted with a program from a text provided by the GBA system, that contains letters with diacritics. The text remains directly readable, if one knows the conventions for the diacritics, contrary to methods like uuencode, base64 or mime which require decypherment first.
E.2 Notation
For denotating accented letters and others not in the 26 Latin letter basic alphabet the following rules apply. Any such letter is written as a one basic letter preceded by one special character, chosen as:
/ ACUTE
\ GRAVE
> CIRCUMFLEX
% DIAERESIS
~ TILDE
* CARON
# BREVE
+ DOUBLE ACUTE
@ RING ABOVE or DOT ABOVE and DOTLESS I
= MACRON or STROKE (but this not on O)
$ CEDILLA or OGONEK and O WITH STROKE (Danish etc.)
& LIGATURE or special form (AE OE ETH THORN ENG SHARP S)
_ LOW LINE
In Annex C one may find for every non-ASCII character the transformed representation (TRFO).
If any of these specials occur in a text, that should not be included in the transformation, these must be replaced by TWO of that same character. Thus / is replaced by //, and % by %%.
E.3 Example
As an example a text in French is presented, with its transformed version.
ORIGINAL
Composée en 1936, exécutée pour la première fois au cours du Festival de Venise de 1937, la musique de la Suite Provençale s'affirme, au c÷ur de l' ÷uvre immense de Darius Milhaud, comme l'une de ses réussites les plus accomplies. C'est Aix, sa ville natale et si tendrement aimée, avec ses hôtels historiques, son cours Mirabeau, ses fontaines dont l'eau ruisselle en jasant sous la fraiche voûte des platanes, tandis qu'au-delà des champs de vignes et sombres haies de cyprès, la courbe nette de la Sainte Victoire s'érige dans le bleu d'un ciel nimbé de grise vapeur que Cézanne a si bien fixé avec un amour égal à son souci de la plus stricte vérité.
AFTER TRANSFORMATION
Compos/ee en 1936, ex/ecut/ee pour la premi\ere fois au cours du Festival de Venise de 1937, la musique de la Suite Proven$cale s'affirme, au c&our de l' &ouvre immense de Darius Milhaud, comme l'une de ses r/eussites les plus accomplies. C'est Aix, sa ville natale et si tendrement aim/ee, avec ses h>otels historiques, son cours Mirabeau, ses fontaines dont l'eau ruisselle en jasant sous la fraiche vo>ute des platanes, tandis qu'au-del\a des champs de vignes et sombres haies de cypr\es, la courbe nette de la Sainte Victoire s'/erige dans le bleu d'un ciel nimb/e de grise vapeur que C/ezanne a si bien fix/e avec un amour /egal \a son souci de la plus stricte v/erit/e.
ANNEX F (informative) VERSION 1.3
1998-10-12
REPRESENTATION OF CHARACTERS FROM NON-LATIN SCRIPTS WITH CHARACTERS FROM THE GBA SET (TRANSLITERATION AND TRANSCRIPTION)
F.1 Introduction
At handling of either documents or names in non-latin scripts either transliteration or transcription has to be applied, because it cannot be expected nor required from the official in charge that he is able to deal with other scripts than the Latin, as a general rule.
Transliteration is the transformation at which each character from the other script is converted to one or more characters from the Latin script, in a way that may be called mechanical, in principle without human intervention. Back transliteration is defined analogically, but sometimes a certain letter combination cannot be univocally reduced to a single letter, like "ue" does not need to have resulted from a letter with umlaut as in the German word "aktuell".
Transcription is the transformation at which the given text is converted to Latin script with keeping as much as possible to the original pronunciation. In this way it may happen that a given letter might result in a letter that may vary according to context, or may depend on the application of external knowledge (example: Ersjev (transliterated) becomes Jersjov (transcribed). Here the first "e" is changed to "je", but the second "e" to "o".
At applying these methods in practice, one has to distinguish documents from names. Documents are being handled in the civil service as original (thus in the script of origin) or in translation. Transliteration or transcription do not matter here. With names the case is different, because those have to be entered into the administrative systems, that only handle Latin script. For writing names to be included in the Population Register the Agreement of Berne (Trb. 1974 nr. 31) has to be followed, unless deviation is authorised by Circulaire from the Minister of Justice. This implies that names in Latin script remain unmodified from the reading (including diacritics) in the original as presented. Names in non-Latin script shall be kept untranslated, and rendered by a letter to letter conversion as strict as possible (transliteration). ISO standards, if existing, shall be applied at this transformation.
At evaluating these rules after several years, it appeared that at applying transliteration strictly, serious problems arise. Thus it has been decided, and announced by the Staatssecretaris of Justice by circulaire, that with a number of scripts deviation from the rules is permitted.
F.2 Greek script
The Latin reading of the name in a Greek passport shall be adhered to. Because several Greek letters and combinations of letters, like "ei", "oi", are being pronounced as "i", and transcribed that way, is back transcription without thorough knowledge of the Greek language not possible. Thus the original Greek spelling should have been stored too, but for this there is in general no provision.
Furthermore, it is the custom in Greece that non-Greek words will be transcribed to Greek script according to certain rules. Should these foreign words or names be transliterated back according to ISO 843 to Latin script, then strange results emerge.
Don Giovanni Nton Tziobani
Dirk Bogarde Nterk Mpougkarnt
Van den Broek Ban nten Mprouk
Thus the Latin spelling, presented with expertise by Greek authorities, shall be adhered to. This implies transcription. There is no way to restore "ntokimanter" to "documentaire" applying a fixed rule.
F.3 Cyrillic script
In former Jugoslavia ISO 9 for transliteration has never been applied. This has a 1 to 1 mapping. Usually some single Cyrillic letters will be transformed to two Latin ones, like lj and nj. This usage shall be followed.
With Russian there are problems, because transliteration does not reflect pronunciation. According to ISO 9, "Elcin" should have been written, and not "Jeltsin", like one sees normally. On the contrary, one finds "Potemkin" and not "Patjomkin", as the pronunciation would do expect.
With persons of German origin, who had their name written in Russia with Cyrillic characters, it is permitted at their return in the West to use again the original spelling of their name, if it can be proved how that was done in the past. Many Russian personal names are from foreign origin, and are often no longer recognizable as such (Kjoei-Cui, Katuar-Catoire, Metner-Medtner). A name like that of Grigori Shneerson can be reduced to Schneiersohn, but also Schneyersohn is possible.
F.4 Hebrew en Arabic script
Should ISO 259, respectively 233 be used, then letter / diacritic combinations may appear, not included in the GBA set.
F.5 General
The conclusion is, if the nation of origin provides a name, written in Latin script, then this name shall be adopted. Should uncertainty remain, then the judgement of an accredited translator shall be followed.