In recent days, two big brand mobile phone companies, Samsung and LG, started adding Khmer language to their Android phone.
Many people have only used non-standard keyboards will see some differences to what they are used to, and they will wonder how to type some of the vowels, such as េះ, ោះ, ុះ and ាំ?
Khmer language experts refer to these as combined vowels, and in 1996, a language committee suggest to Unicode experts to only add 16 single vowels to Khmer Unicode standard. These were: ា ិ ី ឹ ឺ ុ ូ ួ ើ ឿ ៀ េ ែ ៃ ោ and ៅ.
Under this proposal the combined vowels are made up of two Unicode characters and require 2 keys for typing:
ាំ -> ា + ំ
េះ -_> េ + ះ
ោះ -> ោ + ះ
ុះ -> ុ + ះ
This method of typing is similar to the new Khmer keyboard layout in Mac OSX and also similar to standard keyboard developed by NiDA.
Maurice Bauhahn and Michael Everson, in their responses to the KPP group for the reasons not to add combined vowels, indicated:
The decision to not encode the ligatures ុំ U+17C6 (KHMER VOWEL SIGN U) + U+17C6 (KHMER SIGN NIKAHIT) and ាំU+17B6 (KHMER VOWEL SIGN AA) + U+17C6 (KHMER SIGN NIKAHIT) was a bold move by the Khmer linguists committee. Indeed it appears to fly in the face of Khmer textbooks and the Chhuan Nath dictionary introduction. On closer inspection, however, it is obvious that it was the right decision.
-First of all, note that the name NIKAHIT distinguishes this character from pure vowels whose names are purely phonetic (their names mimic their sound).
-Second, note that this is not the only combination which creates a unique vowel. The signs ំU+17C6 (KHMER SIGN NIKAHIT), ះ U+17C7 (KHMER SIGN REAHMUK) and ៈ U+17C8 (KHMER SIGN YUUKALEAPINTU) combine with various dependent vowels to create about 18 additional vowels. If one would accept the two suggested ligatures, it would also be necessary to add the 16 other vowels, thoroughly complicating the life of already overworked typists!
-Third, the ambiguity of whether to type a combined ligature of a dependent vowel and NIKAHIT (or REAHMUK or YUUKALEAPINTU) or to use separate parts would violate the principle of “ambiguity must be avoided”. Note that sometimes those signs are stand alone (used only with the inherent vowel).
-Fourth, Khmer sorting would not be simplified by encoding combined characters. Khmer requires a syllabic based sort which is much more complicated than default Latin-based algorithms allow. An additional conversion of dependent vowel plus sign to create a separately sorted vowel would place a minor load on such a sort...which would probably be key-based rather than live in any case.
-Fifth, it is good to learn from problems a similar encoding has caused. Compatibility decompositions defined in The Unicode Standard 3.1 make it very difficult to classify the constituent parts of such a ligature (note U+0E33 THAI CHARACTER SARA AM and U+0EB3 LAO VOWEL SIGN AM), because these both decompose into a combining mark followed by a base character, with the former combining with some preceding character.
0 comments:
Post a Comment