Subject | Re: [Firebird-Architect] Re: The Wolf on Firebird 3 |
---|---|
Author | Adriano dos Santos Fernandes |
Post date | 2005-11-17T01:07:52Z |
Roman Rokytskyy wrote:
asian charsets.
But UTF-8 has the same problems for FB storage, and it is one byte
greater per char.
AFAIK VARCHAR variable isn't truncated to the used length before
compression. Why?
example).
Conversions will be needed when retrieving, sending, sorting and
comparing strings.
Adriano
>>Honestly, in my own very personal opinion, assuming no kind ofAt least for latin charsets, this is not a problem. Could not say for
>>compression (so considering a worst-case scenario only), the whole
>>things comes down to this : do we accept the risk of multiplying the
>>storage requirements of strings inside a DB by 2x, 3x, 4x times
>>(extreme cases) ? I do. That may be just me. No matter. I'm just
>>exposing my views. What will advent then is out of my control anyway
>>(and that's certainly good that way :) ).
>>
>>
>
>Sorry, but the issue is not how much space it will take on the disk,
>but how many pages will be fetched from the disk and how many packets
>will be send over the wire. The space does not relly matter, the
>performance does.
>
>
asian charsets.
>UNICODE_FSS is slow and crap.
>
>>If the thing named UNICODE_FSS is correctly implemented (which maybe
>>it is, but let me doubt based on issues encountered trying to use it
>>- okay last year and not on fb2), yes it would be some indicator.
>>
>>
>
>Maybe Adriano can give us more information? Should that be UTF-8 for
>example? Or maybe he can make a new charset similar to WIN1251 that
>takes exactly 2 bytes per char (as in UTF-8 case)?
>
>
But UTF-8 has the same problems for FB storage, and it is one byte
greater per char.
AFAIK VARCHAR variable isn't truncated to the used length before
compression. Why?
>No, it's the opposite. UNICODE_FSS accepts declared char length * 3.
>
>>Not
>>an exact one of course, because such a utf8-ization of the internals
>>and storage would certainly receive a great deal of attention to
>>architecture and implementation details. (I have fear that the
>>current UNICODE_FSS implementation uses 3 bytes for each char,
>>needed or not. Also when defining columns, the length you have to
>>give is a kind of byte count, so you have to declare your size * 3,
>>if I remember well. That is obviously not how it should work. That's
>>why I fear the comparison would be probably unfair based on FB1 or
>>FB2. But again that may be an indicator. )
>>
>>
>I'm afraid that it will be a high cost for many users (Windows, for
>That's what I'm afraid too. But if the whole engine is utf8-ized, then
>there is no return back - it will cause many changes to the engine
>internals that most likely will not be possible to rollback (even if
>we ignore all the efforts were put into it). So, for now we have only
>Jim words that everything going to be fine...
>
>
example).
Conversions will be needed when retrieving, sending, sorting and
comparing strings.
Adriano