Subject | RE: [Firebird-Architect] UTF-8 over UTF-16 WAS: Applications of Encoded Data Streams |
---|---|
Author | Svend Meyland Nicolaisen |
Post date | 2005-05-03T09:18:27Z |
> -----Original Message-----No, I have no statistics ready. I suppose that if you mainly uses characters
> From: Firebird-Architect@yahoogroups.com
> [mailto:Firebird-Architect@yahoogroups.com] On Behalf Of Jim Starkey
> Sent: 3. maj 2005 04:56
> To: Firebird-Architect@yahoogroups.com
> Subject: Re: [Firebird-Architect] UTF-8 over UTF-16 WAS:
> Applications of Encoded Data Streams
>
>
>
> Svend Meyland Nicolaisen wrote:
>
> >I have lately wondered why UTF-8 generally seem to be
> preferred over UTF-16?
> >I can understand the use of UTF-8 in applications that need
> to maintain
> >backward compatibility with the US-ASCII character set and/or mainly
> >uses characters from the US-ASCII character set.
> >Also if you need to "allocate" space for an X character wide
> text field
> >in a database like FireBird, I would think that you need to allocate
> >space for the worst case scenario which is 4 times X for
> both UTF-8 and
> >UTF-16. So the potential compressions of UTF-8 dosn't help much here.
> >
> >
> >
> Do you have any statistical data to show that UTF-16 consumes
> few bytes that UTF-8?
>
from the Latin 1 character set then UTF-8 will be better compressed than
UTF-16. But texts containing Japanese or Thai characters seems to be better
compressed with UTF-16.
/Svend