firebird-architect - Re: [Firebird-Architect] UTF-8 vs UTF-16

Subject	Re: [Firebird-Architect] UTF-8 vs UTF-16
Author	Dimitry Sibiryakov
Post date	2003-08-26T05:39:08Z

On 25 Aug 2003 at 12:51, peter_jacobi.rm wrote:

>I tried to start a flamewar on this (actually a weaker
>version) in firebird-support but nobody bite ;-)

Firebird-support is rather wrong place for flamewars as well as
firebird-devel. Here and IBDI are much more suitable. ;)

>a) Having a matching narrow (8 bit not multibyte) character set
>for your application is such an efficency benefit, that it
>should not lightly be given up. Unless benchmarking can show
>that switching to UNICODE has less than 30% (to guess a number)
>overhead, this wouldn't get many votes, I assume.

It depends what efficiency benefit you keep in mind.
Indeed, if new ODS has only 16-bits characters size of databases
will grow almost twice. But modern HDD are cheap and big.
The same for network protocol. Traffic may be doubled. But overall
perfomance can be improved by using UTF-8 as intermediate encoding or
(and) decreasing the number of round-tripping packets. A few bigger
packets are more efficient than a lot of small, I believe.
If you mean storage in your application, then unicode characters
are converted into current locale on client size. Something like MS
mess with W and A functions.

>b) Too large a character set is a data integrity burden. It's
>sad enough that FB doesn't have choices here, for example
>character sets explicitely excluding control characters. But
>always allowing any UNICODE character in any column just make
>me shudder.

If one want to restrict set of characters that can be stored in a
column CHECK constraints is a good way to do so. May be a new
functions to check subsets should be introduced: something like
'value is cyrillic or latin-1'. Or to keep old dataset syntax but
only for check purposes.

SY, Dimitry Sibiryakov.