Subject | Re: [Firebird-Architect] UTF-8 vs UTF-16 |
---|---|
Author | Dimitry Sibiryakov |
Post date | 2003-08-26T05:39:08Z |
On 25 Aug 2003 at 12:51, peter_jacobi.rm wrote:
firebird-devel. Here and IBDI are much more suitable. ;)
Indeed, if new ODS has only 16-bits characters size of databases
will grow almost twice. But modern HDD are cheap and big.
The same for network protocol. Traffic may be doubled. But overall
perfomance can be improved by using UTF-8 as intermediate encoding or
(and) decreasing the number of round-tripping packets. A few bigger
packets are more efficient than a lot of small, I believe.
If you mean storage in your application, then unicode characters
are converted into current locale on client size. Something like MS
mess with W and A functions.
column CHECK constraints is a good way to do so. May be a new
functions to check subsets should be introduced: something like
'value is cyrillic or latin-1'. Or to keep old dataset syntax but
only for check purposes.
SY, Dimitry Sibiryakov.
>I tried to start a flamewar on this (actually a weakerFirebird-support is rather wrong place for flamewars as well as
>version) in firebird-support but nobody bite ;-)
firebird-devel. Here and IBDI are much more suitable. ;)
>a) Having a matching narrow (8 bit not multibyte) character setIt depends what efficiency benefit you keep in mind.
>for your application is such an efficency benefit, that it
>should not lightly be given up. Unless benchmarking can show
>that switching to UNICODE has less than 30% (to guess a number)
>overhead, this wouldn't get many votes, I assume.
Indeed, if new ODS has only 16-bits characters size of databases
will grow almost twice. But modern HDD are cheap and big.
The same for network protocol. Traffic may be doubled. But overall
perfomance can be improved by using UTF-8 as intermediate encoding or
(and) decreasing the number of round-tripping packets. A few bigger
packets are more efficient than a lot of small, I believe.
If you mean storage in your application, then unicode characters
are converted into current locale on client size. Something like MS
mess with W and A functions.
>b) Too large a character set is a data integrity burden. It'sIf one want to restrict set of characters that can be stored in a
>sad enough that FB doesn't have choices here, for example
>character sets explicitely excluding control characters. But
>always allowing any UNICODE character in any column just make
>me shudder.
column CHECK constraints is a good way to do so. May be a new
functions to check subsets should be introduced: something like
'value is cyrillic or latin-1'. Or to keep old dataset syntax but
only for check purposes.
SY, Dimitry Sibiryakov.