Subject RE: [Firebird-Architect] UTF-8 over UTF-16 WAS: Applications of Encoded Data Streams
Author Svend Meyland Nicolaisen
> -----Original Message-----
> From: Firebird-Architect@yahoogroups.com
> [mailto:Firebird-Architect@yahoogroups.com] On Behalf Of Jim Starkey
> Sent: 3. maj 2005 04:56
> To: Firebird-Architect@yahoogroups.com
> Subject: Re: [Firebird-Architect] UTF-8 over UTF-16 WAS:
> Applications of Encoded Data Streams
>
>
>
> Svend Meyland Nicolaisen wrote:
>
> >I have lately wondered why UTF-8 generally seem to be
> preferred over UTF-16?
> >I can understand the use of UTF-8 in applications that need
> to maintain
> >backward compatibility with the US-ASCII character set and/or mainly
> >uses characters from the US-ASCII character set.
> >Also if you need to "allocate" space for an X character wide
> text field
> >in a database like FireBird, I would think that you need to allocate
> >space for the worst case scenario which is 4 times X for
> both UTF-8 and
> >UTF-16. So the potential compressions of UTF-8 dosn't help much here.
> >
> >
> >
> Do you have any statistical data to show that UTF-16 consumes
> few bytes that UTF-8?
>

No, I have no statistics ready. I suppose that if you mainly uses characters
from the Latin 1 character set then UTF-8 will be better compressed than
UTF-16. But texts containing Japanese or Thai characters seems to be better
compressed with UTF-16.

/Svend