firebird-support - Re: [firebird-support] Re: Using unicode versus WIN1252 (Firebird 2)

Subject	Re: [firebird-support] Re: Using unicode versus WIN1252 (Firebird 2)
Author	Fulvio Senore
Post date	2009-01-10T16:43:23Z

Douglas Tosi ha scritto:

> On Sat, Jan 10, 2009 at 1:42 PM, Milan Babuskov <milanb@...> wrote:
>
>> Douglas Tosi wrote:
>>
>>> On Fri, Jan 9, 2009 at 1:36 PM, Milan Babuskov <milanb@...>
>>> wrote:
>>>
>>>> josef_gschwendtner wrote:
>>>>
>>>>> German characters mostly have ASCII < 7F. With UTF8 these characters
>>>>> have the same storage size as in WIN1252, right?
>>>>>
>>>> Only when represented in UTF8 form. However, Firebird internally
>>>> represents them as 4-bytes-per-character internally.
>>>>
>>> That is odd.
>>> What is the point of having UTF8 as an option if internally it uses
>>> the same space as Unicode? Am I missing something?
>>>
>> Answer these and you'll have your answer:
>>
>> What's the point in having ASCII when it uses the same space as WIN1252?
>> What's the point in having WIN1252 when it uses the same space as
>> ISO-8859-1?
>> Character sets don't exist because they use different space.
>>
>
> Not the same thing.
> AFAIR the only difference between UTF8 and Unicode is the size.
> If Firebird uses the same space to store each of them, I don't see the
> point in having both.
>

UTF8 is a unicode encoding that uses only one byte for the ASCII
characters and more bytes for other (less common in western european
aplphabets) characters. As a result text from a european language
usually takes about the same number of bytes as in WIN1252 and similar
encodings.
That's why I was surprised that UTF8 text took much more space in the
database. It looks like characters are stored using a fixed number of
bytes: there are probably good reasons to do things in this way and,
being curious, I would like to know them. Maybe the goal is to save
decoding time when reading the text from disk, who knows.

Fulvio Senore