Subject | RE: [firebird-support] Unicode size |
---|---|
Author | Chad Z. Hower |
Post date | 2004-12-15T23:26:30Z |
:: Short form: you cannot depend on the OS to know the
:: capitalization and collation rules for your language.
The OS has different rules for each regional setting. Esp in Unicode, even
if the characters LOOK the same and have the same font pattern, they are
tracked differently. A Russian R looks like a English P. In fonts, one is
aliased to the other. But in Unicode they have different values and the OS
knows that it's a Russian R, and not an Enlgish P. When you install support
for that langauge, it knows how to upper case it and wont uppercase it to an
Enlgsh P, but a Russian R. In this case a Russian R is same as a capital
English P, but not so for other characters in Russian. Some uppers match
upper symbols in English, but not the lowers because Russian doesn't really
differentiate much between upper and lower case like other character based
languages do.
Arabic has even stricter rules and has approximately 4 versions of each
letter (Depending on what follows (or preceded it depending on how you look
at it, rtl or ltr), not just an upper and lower case.
:: capitalization and collation rules for your language.
The OS has different rules for each regional setting. Esp in Unicode, even
if the characters LOOK the same and have the same font pattern, they are
tracked differently. A Russian R looks like a English P. In fonts, one is
aliased to the other. But in Unicode they have different values and the OS
knows that it's a Russian R, and not an Enlgish P. When you install support
for that langauge, it knows how to upper case it and wont uppercase it to an
Enlgsh P, but a Russian R. In this case a Russian R is same as a capital
English P, but not so for other characters in Russian. Some uppers match
upper symbols in English, but not the lowers because Russian doesn't really
differentiate much between upper and lower case like other character based
languages do.
Arabic has even stricter rules and has approximately 4 versions of each
letter (Depending on what follows (or preceded it depending on how you look
at it, rtl or ltr), not just an upper and lower case.