Every Character has a Story - YEH-based letters (Part 1)
The concept of character may have different meanings in different contexts; phonology, computer encoding or writing. These series of articles focuses primarily on computer encoding with the Unicode standard for languages written in Arabic script, with main emphasis on Urdu language.
Historical Background
The letter yeh, like other Arabic letters, was originally written without dots, ی, in pre-Islamic and early Islamic periods. In classical Arabic orthography, during Ummayad and Abassid standardization, the letter adopted two dots (ي) beneath it in its isolated and final positions to distinguish it from other letters, such as alef maksura (ى). The use of dots under the isolated and final forms of yeh is compulsory in Modern Standard Arabic (some consider it optional, e.g. in Egypt) but they are not optional in modern writing under the initial and medial forms across all Arabic dialects.
There is also an alternate form ے seen in some styles of writing (not to be confused with large/bari yeh in Urdu, as this is different rendering of the same yeh and, unlike Urdu, not a separate alphabet). The yeh shape ى is also used, optionally with a superscript alef (encoded as U+0670 ٰ◌), to write a final alef in some instances, a writing convention. This usage, alef maksura, omits the dots and is regarded as a different letter than normal yeh.
Unicode Encoding
Arabic
Arabic has just one yeh letter and should encode it consistently using U+064A ي. Although the representative glyph shown for this Unicode character has dots below, it may be rendered as a dotless ى shape in some styles and locales (this would be expected in a Nastaliq font, for example); it could even appear as a ے shape in appropriate calligraphic contexts. Meaning, its representation should be font designers' choice and should have no bearing on text encoding.
• use U+064A ي for YEH
• use U+0649 ى for ALEF MAQSURA (only at the end of a word)
• do not use U+06CC ی FARSI YEH or U+06D2 ے for YEH BARREE
Persian
In Persian, yeh is consistently written without dots, ی, in isolated and final positions. The dotted form that is the typical rendering of U+064A ي would be considered incorrect. To allow the Persian form ی to be reliably rendered, and to co-occur with the typical Arabic ي in the same text, Unicode provides a separate code U+06CC ی FARSI YEH. Although the Persian yeh is clearly related to the Arabic letter, and could have been considered a glyph variant of it, the difference in typical appearance is not an optional feature; it is a requirement for proper rendering of Persian text.
Note that it is possible to produce the appropriate appearance for Persian by the use of Arabic yeh U+064A ي in initial and medial positions, and alef maksura U+0649 ى in final and isolated positions. This represents an inconsistent encoding of the Persian letter, and should not be done; however, users working with systems designed for Arabic may have used such “hacks” to achieve the desired rendering.
There have also been cases where Arabic fonts have been altered to render U+064A with a dotless glyph, and U+0649 with a dotted one (in order to support the Persian preference for yeh, while still providing the possibility of rendering the Arabic form where needed). This swapping of glyphs represents a deviation from the Unicode standard, and leads to data interchange problems; it is not a correct way to encode Persian text.
When Arabic words spelled with alef maksura are used in Persian, it could be considered most logical to encode these with U+0649 ى, for consistency with Arabic. However, as U+0649 ى is indistinguishable from the Persian yeh U+06CC ی in word-final position (the only position alef maksura should occur, it is unlikely that users will make such a distinction.
• use U+06CC ی for yeh
• be aware that some Persian text may be encoded with U+064A and/or U+0649 for yeh
Urdu
In Urdu, a distinction is made between the form ی, representing an /i/ vowel, and ے, representing /e/. The two forms are known as choti yeh (small yeh) and barree yeh (large yeh) respectively, and are considered separate letters. Thus, unlike in Arabic, the form ے must be encoded separately from ی, not treated as an optional calligraphic glyph variant.
For small yeh, Urdu follows the Persian convention of writing without dots; the form ي would be considered incorrect. As with Persian, there is the possibility of encountering data where the Arabic yeh and/or alef maksura has been used to stand in for the correct form, possibly with modified fonts, so processes may need to be prepared for some inconsistencies in encoding.
• use U+06CC ی for small yeh /i/
• use U+06D2 ے for large yeh /e/
• be aware that U+064A ي and/or U+0649 ى may also be found
Sindhi
Although Sindhi is a neighboring language to Urdu, it follows the Arabic convention of writing yeh with dots, ي, rather than in the Persian and Urdu way. A dotless form is only seen when writing Arabic-derived words with alef maksura, not for the normal yeh letter. It therefore uses the same character codes as Arabic for yeh.
• use U+064A ي for yeh
• use U+0649 ى for alef maksura in Arabic-derived words
Comments
Post a Comment