Subject Re: [Firebird-Architect] UTF-8 over UTF-16 WAS: Applications of Encoded Data Streams
Author Jim Starkey
Svend Meyland Nicolaisen wrote:

>
>
>>Do you have any statistical data to show that UTF-16 consumes
>>few bytes that UTF-8?
>>
>>
>>
>
>No, I have no statistics ready. I suppose that if you mainly uses characters
>from the Latin 1 character set then UTF-8 will be better compressed than
>UTF-16. But texts containing Japanese or Thai characters seems to be better
>compressed with UTF-16.
>
>
>
You may have a point; there are many more Chinese than Europeans, so a
global optimization suggests that a bias towards larger character sets
is warranted. But the distribution of current users heavily favor the
latin character distribution, and single bytes are a lot easier to
handle than shorts. If the favor of hard questions, go for simplicity.