SC22/WG20 N734 Title: Proposed sorting order [1,2] and transliteration [3] for Thaana script (used in the Dhivehi language in the Maldives Republic) in standards of ISO/IEC JTC1/SC22/WG20 and ISO/TC46/SC2. Date: 15 January 1999 Source: John Clews Status: For information, in regard to future work on ISO/IEC 14651 (when ordering of the full repertoire of ISO/IEC 10646-1:2000 is begun) and in ISO DIS 15919: Transliteration of Devanagari and related Indic scripts (if Thaana is added as an annex). It is hoped that this information can be checked, and hopefully confirmed, or if necessary altered, following input in due course from experts in the Maldives Republic, during 2000, thanks to Mohammed Shareef (Male', and University of Birmingham, UK). This version supersedes any earlier versions of this chart by John Clews. Note: [Inserted notes indicate suggested dotted letter sort order ] [UCS IDs for these characters are shown again, in UCS order, i.e. repeated, but including UCS values, later in the chart]. ---------------------------------------------------------------- (a) Consonants Ar UCS TL UCS character name (abbreviated) ---------------------------------------------------------------- 08 0780 h Dhi_HAA [ .h Dhi_HHAA dotted: modifies 0780 HAA ] [ .kh(,h)Dhi_KHAA dotted: modifies 0780 HAA ] 18a 0781 sh Dhi_SHAVIYANI 34 0782 n Dhi_NOONU 13 0783 r Dhi_RAA [ .z Dhi_ZAA dotted: modifies 0783 RAA ] 02 0784 b Dhi_BAA 32 0785 lh Dhi_LHAVIYANI 28 0786 k Dhi_KAAFU 01 0787 ' Dhi_ALIFU [ ` Dhi_AINU dotted: modifies 0787 ALIFU ] [ .gh Dhi_GHAINU dotted: modifies 0787 ALIFU ] 36 0788 v Dhi_VAAVU [ .w Dhi_WAAVU dotted: modifies 0788 VAAVU ] 33 0789 m Dhi_MEEMU 26 078A f Dhi_FAAFU 11 078B dh Dhi_DHAALU [ .dh Dhi_THAALU dotted: modifies 078B DHAALU ] 05 078C th Dhi_THAA [ .t Dhi_TTAA dotted: modifies 078C THAA ] [ .th(,t)Dhi_TO dotted: modifies 078C THAA ] [ .zh(,z)Dhi_ZO dotted: modifies 078C THAA ] 31 078D l Dhi_LAAMU 29 078E g Dhi_GAAFU [ .q Dhi_QAAFU dotted: modifies 078E GAAFU ] 30 078F gn Dhi_GNAVIYANI 16 0790 s Dhi_SEENU [ .sh Dhi_SHEENU dotted: modifies 0790 SEENU ] [ .s Dhi_SAADHU dotted: modifies 0790 SEENU ] [ .d Dhi_DAADDU dotted: modifies 0790 SEENU ] 12a 0791 d Dhi_DAVIYANI 15 0792 z Dhi_ZAVIYANI 03a 0793 t Dhi_TAVIYANI 38 0794 y Dhi_YAA 03 0795 p Dhi_PAVIYANI 07 0796 j Dhi_JAVIYANI 10 0797 ch Dhi_CHAVIYANI ---------------------------------------------------------------- (b) Additional dotted letters Ar UCS TL UCS character name (abbreviated) ---------------------------------------------------------------- 04 0798 .t Dhi_TTAA dotted: modifies 078C THAA 35 0799 .h Dhi_HHAA dotted: modifies 0780 HAA 09 079A .kh(,h)Dhi_KHAA dotted: modifies 0780 HAA 12 079B .dh Dhi_THAALU dotted: modifies 078B DHAALU 14 079C .z Dhi_ZAA dotted: modifies 0783 RAA 18 079D .sh Dhi_SHEENU dotted: modifies 0790 SEENU 19 079E .s Dhi_SAADHU dotted: modifies 0790 SEENU 21 079F .d Dhi_DAADDU dotted: modifies 0790 SEENU 21a 07A0 .th(,t)Dhi_TO dotted: modifies 078C THAA 21b 07A1 .zh(,z)Dhi_ZO dotted: modifies 078C THAA 24 07A2 ` Dhi_AINU dotted: modifies 0787 ALIFU 25 07A3 .gh Dhi_GHAINU dotted: modifies 0787 ALIFU 27 07A4 .q Dhi_QAAFU dotted: modifies 078E GAAFU 37 07A5 .w Dhi_WAAVU dotted: modifies 0788 VAAVU ---------------------------------------------------------------- (c) Vowel signs (written above or below consonants) Ar UCS TL UCS character name (abbreviated) ---------------------------------------------------------------- 07A6 a Dhi_ABAFILI 07A7 aa Dhi_AABAAFILI 07A8 i Dhi_IBIFILI 07A9 ee Dhi_EEBEEFILI 07AA u Dhi_UBUFILI 07AB oo Dhi_OOBOOFILI 07AC e Dhi_EBEFILI 07AD ei * Dhi_EYBEYFILI - * ey in BGN/official TL [4] 07AE o Dhi_OBOFILI 07AF oa Dhi_OABOAFILI - * not au as in most Indic scripts. 07B0 + Dhi_SUKUN ---------------------------------------------------------------- ---------------------------------------------------------------- Notes on table: [1] The UCS order is the basic Thaana sorting order, although [insertions] in the table show what may well be the ideal sorting order for dotted Thaana characters, which are otherwise shown as a block in 0798-07A5. Note that 0795 Dhi_PAVIYANI (p) although it comprises a dot with the shape of 078A Dhi_FAAFU (f), is a well established letter with a defined sequence in Thaana script, and should not be regarded as part of the group of other lesser used dotted characters, which are generally used mainly when quoting Arabic words in text, and which are sometimes described as being an additional Thaana repertoire, outside of the basic Thaana repertoire. [2] The "Ar" column allows a comparison with sorting order of the Arabic alphabet, if a simple SORT command is run on the numbers in this column. [3] The official Maldives government transliteration scheme uses only ASCII (ISO/IEC 646) characters. The transliteration (TL) column shown here avoids ambiguity as there are no duplicate transliterations, although in the case of digraphs some multiple passing would be necessary to ensure full reversibility in transformation of text between ASCII characters and Thaana, to eliminate the possibity of digraphs being confused with the letters which comprise the digraphs. Information on dotted letters in Dhivehi (UCS 0798-07A5) has been hard to come by, so both the suggested ordering and transliteration information are very much a working hypothesis, which is why some transliteration suggestions for dotted letters also have alternatives in brackets, which represent earlier suggestions. However, ordering and transliteration conventions for other letters are well established. [4] Parsing ei to ey is necessary to transform to the official Maldives transliteration. Use of ei avoids ambiguity, to ensure that all legitimate e-vowel and y-consonants are eliminated. Keyodu (Island in the Vaavu (Felidu) Atoll (J) is an assumed example, where this type of ambiguity needs to be avoided. [In passing, in Indian scripts, ai is the equivalent of ei or ey]. ---------------------------------------------------------------- Examples of Divehi string, for transformation/transliteration: Dhivehi Raajje (name of the official Maldives language in Dhivehi)[5] 078B 07A8 0788 07AC 0780 07A8 .0783 07A7 0787 07B0 0796 07AC dh i v e h i r aa ' + j e [6] [6, 7, 8] [5] dhivehi raajje - literally "Divehi speakers." In Dhivehi, "Maldivian" and the "Maldives Republic" are written as "Divehi Raajje" and "Divehi Raajjeyge Jumhooriyya" in the Maldivian Government's 1987 system, according to the UNGEGN "Country names" document, issued at the Seventh United Nations Conference on the Standardization of Geographical Names in New York, 12-23 January 1998. [6] Alifu with vowel: only the vowel is romanized. Usually after SPACE or at the beginning of strings. [7] Sukun alone: vowel is deleted (+ can be removed if followed by a consonant) [8] Alifu with sukun: used to double following character (as above). --------------------------------------------------------------------- References The following, showing the glyphs, is fairly essential for use with this document: ISO/IEC 10646-1:2000 - Information technology - Universal Multiple-Octet Coded Character Set, part 1: Architecture and Basic Multilingual Plane. - Amendment 24: Thaana. (ISO/IEC JTC1/SC2/WG2 N2031 - Corrected Revised Text of FPDAM 24 - Thaana; 1999-06-11) This table will also soon be available in The Unicode Standard, version 3.0, and also in ISO/IEC 10646-1:2000, both due out around the end of February 2000, or shortly after. Other useful references: Geiger, Wilhelm - Maldivian linguistic studies (with H C Bell). New Delhi: Asian editions series, 1996. Maniku, Hassan Ahmed and Disanayaka, J.B. - Say it in Maldivian (Dhivehi). - Columbo: Lake House Investments, 1990. ---------------------------------------------------------------- Best regards John Clews -- John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG tel: 0171 412 7826 (day/evening); 01423 888 432 (weekend) Email: Ordering@sesame.demon.co.uk Committee Chair of ISO/TC46/SC2: Conversion of Written Languages; Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization; Committee Member of CEN/TC304: Information and Communications Technologies: European Localization Requirements Committee Member of the Foundation for Endangered Languages; Committee Member of ISO/IEC/JTC1/SC2: Coded Character Sets