firebird-architect - Re: UTF-8 vs UTF-16

Subject	Re: UTF-8 vs UTF-16
Author	mailmur
Post date	2003-08-26T12:44:17Z

>Regarding french language, characters with accent should sort as the
>same characters without accent, just as it is in a french

dictionnary. Don't know for spanish. But I assume this rule is mostly
valid, with maybe some exceptions that could justify some collating
rules.

In Finland: "A" and "O" is sorted as usual. But then same letters with
two dots on top, "Ä" and "Ö", is sorted to the end. (a with ring is a
swedish-O)
...,X,Y,Z,Å,Ä,Ö

So, probably even some accent chars are not equal to regular
counterparts
in some language.

>c) Current limitation of CHAR and VARCHAR columns is ~32,000 bytes.
>Saying that each character needs 2 bytes, you decrease this boundary
>to ~16,000.

Or fix the length restriction... Remember that the engine was written
when workstations tended
to max out at 3 or 4 megs. Those days are long gone. Go hog wild and
change all the shorts to
longs and live free.

MSSQLServer has the following restrictions and Ive always managed
live with it. ASCII varchar=8000 chars and UNICODE nvarchar=4000
chars. Its always "ASCII-type maxlength / 2 = UNICODE type maxlenght".

And then simple queries like:
Select * From table Where mystrcol = "Könkkä"

Should play well with the charset defined in column/table/database,
without extra collaction syntax in a query. Even most simple queries
in FB are not compatible to other unicode-aware databases, where all
this is not an issue. This kind of queries is just not nice to write
and special apps generating queries at runtime will definitely can't
generate it.
Select * From table Where mystrcol = "Könkkä" USE-XYZ-CHARSET-
SYNTAX-HERE

MSSQLServer does it well with char/nchar choice. It cannot do a user-
specified charsets per column, but minority db users probably need
such feature.

In the year +2000 all international-aware dbapps should use the
unicode charset. And then dbserver should eat it transparently.