firebird-architect - Re: [Firebird-Architect] Re: The Wolf on Firebird 3

Subject	Re: [Firebird-Architect] Re: The Wolf on Firebird 3
Author	Adriano dos Santos Fernandes
Post date	2005-11-17T01:07:52Z

Roman Rokytskyy wrote:

>>Honestly, in my own very personal opinion, assuming no kind of
>>compression (so considering a worst-case scenario only), the whole
>>things comes down to this : do we accept the risk of multiplying the
>>storage requirements of strings inside a DB by 2x, 3x, 4x times
>>(extreme cases) ? I do. That may be just me. No matter. I'm just
>>exposing my views. What will advent then is out of my control anyway
>>(and that's certainly good that way :) ).
>>
>>
>
>Sorry, but the issue is not how much space it will take on the disk,
>but how many pages will be fetched from the disk and how many packets
>will be send over the wire. The space does not relly matter, the
>performance does.
>
>

At least for latin charsets, this is not a problem. Could not say for
asian charsets.

>
>
>>If the thing named UNICODE_FSS is correctly implemented (which maybe
>>it is, but let me doubt based on issues encountered trying to use it
>>- okay last year and not on fb2), yes it would be some indicator.
>>
>>
>
>Maybe Adriano can give us more information? Should that be UTF-8 for
>example? Or maybe he can make a new charset similar to WIN1251 that
>takes exactly 2 bytes per char (as in UTF-8 case)?
>
>

UNICODE_FSS is slow and crap.
But UTF-8 has the same problems for FB storage, and it is one byte
greater per char.

AFAIK VARCHAR variable isn't truncated to the used length before
compression. Why?

>
>
>>Not
>>an exact one of course, because such a utf8-ization of the internals
>>and storage would certainly receive a great deal of attention to
>>architecture and implementation details. (I have fear that the
>>current UNICODE_FSS implementation uses 3 bytes for each char,
>>needed or not. Also when defining columns, the length you have to
>>give is a kind of byte count, so you have to declare your size * 3,
>>if I remember well. That is obviously not how it should work. That's
>>why I fear the comparison would be probably unfair based on FB1 or
>>FB2. But again that may be an indicator. )
>>
>>

No, it's the opposite. UNICODE_FSS accepts declared char length * 3.

>
>That's what I'm afraid too. But if the whole engine is utf8-ized, then
>there is no return back - it will cause many changes to the engine
>internals that most likely will not be possible to rollback (even if
>we ignore all the efforts were put into it). So, for now we have only
>Jim words that everything going to be fine...
>
>

I'm afraid that it will be a high cost for many users (Windows, for
example).
Conversions will be needed when retrieving, sending, sorting and
comparing strings.

Adriano