Subject | Re: [Firebird-Architect] UTF-8 over UTF-16 WAS: Applications of Encoded Data Streams |
---|---|
Author | Jim Starkey |
Post date | 2005-05-03T12:00:47Z |
Svend Meyland Nicolaisen wrote:
global optimization suggests that a bias towards larger character sets
is warranted. But the distribution of current users heavily favor the
latin character distribution, and single bytes are a lot easier to
handle than shorts. If the favor of hard questions, go for simplicity.
>You may have a point; there are many more Chinese than Europeans, so a
>
>>Do you have any statistical data to show that UTF-16 consumes
>>few bytes that UTF-8?
>>
>>
>>
>
>No, I have no statistics ready. I suppose that if you mainly uses characters
>from the Latin 1 character set then UTF-8 will be better compressed than
>UTF-16. But texts containing Japanese or Thai characters seems to be better
>compressed with UTF-16.
>
>
>
global optimization suggests that a bias towards larger character sets
is warranted. But the distribution of current users heavily favor the
latin character distribution, and single bytes are a lot easier to
handle than shorts. If the favor of hard questions, go for simplicity.