Subject | Re: [Firebird-Architect] UTF-8 vs UTF-16 |
---|---|
Author | Nickolay Samofatov |
Post date | 2003-08-15T17:08:13Z |
Hello, Peter !
also good and mostly trivial task. It is valuable for all MBCS
charsets.
data is mostly ASCII.
exactly are you going to transform ?
I think effort should be first directed to fixing UNICODE_FSS
implementation bugs namely:
1) incorrect padding of CHAR(N) values
2) lack of control for character string overfilling
> Given that:^^^^ True.
> a) the implementation of UTF-8 (UNICODE_FSS) is not without problems,
> to say the least.
> b) one of the goals of UTF-8 is space saving for near-ASCII data, but
> in Firebird,
> only on-disk rows save,
> whereas in-memory,^^^^ True, UNICODE_FSS consumes 3 bytes of memory per character.
> on-the-wire and index data pays 3 bytes for every character^^^^ False in almost all cases. True only for CHAR(N) fields.
> I am tempted to propose adding UTF-16 (representing each [this is notUTF16 is nice idea, but fixing work with UNICODE_FSS is
> quite true, ask if you dare] character with two bytes) support to
> Firebird - and actively promoting its use, obsoleting UNICODE_FSS.
also good and mostly trivial task. It is valuable for all MBCS
charsets.
> PROs:This bugs can be easily fixed in the engine.
> P1) Strange behaviour of UNICODE_FSS due to variable byte length of
> characters can be avoided
> P2) Uses network bandwidth and memory 33% more efficentlyNo, exactly opposite, if you request data as SQL_VARYING and your
data is mostly ASCII.
> CONs:true.
> C1) Uses disk bandwidth less efficently for near-ASCII data.
> C2) Trailing spaces compress worse, also bad for disk bandwithI do not think that such transformantions are good idea. What
> C3) Looks strange in tools connecting with character set none,
> as there are 0 bytes embedded in the data
> (C2 can be avoided and C3 can be weakened, by not storing
> the 16bit UNICODE value directly but a transformed one)
exactly are you going to transform ?
I think effort should be first directed to fixing UNICODE_FSS
implementation bugs namely:
1) incorrect padding of CHAR(N) values
2) lack of control for character string overfilling
> Peter JacobiNickolay Samofatov