firebird-support - Re: Help On Unicode!

Subject	Re: Help On Unicode!
Author	peter_jacobi.rm
Post date	2004-05-14T09:54:17Z

Hi Pavel,

This will be a long posting...

--- "pavel_menshchikov" <mpn2001@y...> wrote:

> The differences in "constructions" are clear. I wonder why Alan
> doesn't recommend to use unicode to novice programmer?

To give full disclosure, and as you've already guessed, Unicode
support in Firebird is still problematic. But it's not that
bad, as your tests may suggest. So it's primarily not a hard
("won't work") but a soft ("will give more trouble, than it's
worth") decision.

Add these points
- efficency
- locale support (about 100 culture specific collations
for narrow character sets, 0 for Unicode)
- tools trouble (will all your tools, components and intermediate
layers work with Unicode?)

> I've read (in IB or FB documentation as I remember) that IB/FB
> internally uses unicode to store/manipulate string values,

Only to the limited extent, that Unicode is used as
an intermediate form in character set conversions, so that
the number of converters needed doesn't rise quadratically.

> and
> converts strings for clients as specified (character set/collation).
> Is it right?

There is an automatic charset conversion between database
character set and client's connection character set, but it
has the crippling problem, that conversions which changes
the number of bytes almost always fail. So automatic conversion
between CP1251 and KOI8R is O.K. but, automatic conversion
between CP1251 and UTF8 can't be used.

> I tried the following with FB 1.5.0.4306 (release):

I've re-created your test so that it only used ISQL.

As the mailing list cannot be trusted to deliver UTF8
content, I've put an UUencoded version of the script below.

It can be run with:
del c:\ac2.fdb && ..\bin\isql -e -i utf8test.sql >utf8test.out

And the output be looked at with any UTF-8 aware editor.

If you want to try the quirky UTF8 of the W2K commandline,
you can switch the commandline with "chcp 65001" and run
without output redirection.

Anyway, the test shows correct behaviour of LIKE and
LIKE '<string_in_russian>%' and STARTING WITH
'<string_in_russian>' give identical results.

Much to my surprise, I must confess, as my leaky
carbon-based long term storage seemed to remember
a posting of Nickolay, that he has optimized the LIKE
implementation, and as a collateral damage, it won't work
for MBCS anymore. Seems that wasn't as worse as thought
and/or only affects CVS HEAD.

The uuencoded script:

begin 0777 utf8test.sql
M[[N_<VAO=R!V97)S:6]N.PT*<VAO=R!V97)S:6]N.PT*4T54($Y!3453(%5.
M24-/1$5?1E-3.PT*4T54(%-13"!$24%,14-4(#,[#0I#4D5!5$4@1$%404)!
M4T4@)V,Z7&%C,BYF9&(G#0H@55-%4B G4UE31$)!)R!005-35T]21" G;6%S
M=&5R:V5Y)R!004=%7U-)6D4@.#$Y,@T*($1%1D%53%0@0TA!4D%#5$52(%-%
M5"!53DE#3T1%7T934SL-"D-214%412!$3TU!24X@0T@R-2!!4R!605)#2$%2
M*#(U*2!#2$%204-415(@4T54(%5.24-/1$5?1E-3($-/3$Q!5$4-"B!53DE#
M3T1%7T934SL-"D-214%412!404),12!415-4*$Y!344@0T@R-2D[#0H-"@T*
M8V]M;6ET.PT*#0II;G-E<G0@:6YT;R!T97-T('9A;'5E<R H)V%B8R<I.PT*
M:6YS97)T(&EN=&\@=&5S="!V86QU97,@*"<Q,C,G*3L-"FEN<V5R="!I;G1O
M('1E<W0@=F%L=65S("@GPX3#EL.<)RD[#0II;G-E<G0@:6YT;R!T97-T('9A
M;'5E<R H)^*"K"<I.PT*#0II;G-E<G0@:6YT;R!T97-T('9A;'5E<R H)]"_
MT8#0N-&!T++0L-"XT++0L-"UT8(G*3L-"FEN<V5R="!I;G1O('1E<W0@=F%L
M=65S("@GT8/0O="XT+K0L-"[T8S0O=&+T+DG*3L-"FEN<V5R="!I;G1O('1E
M<W0@=F%L=65S("@GT+O1CM"QT+[0O-&#)RD[#0II;G-E<G0@:6YT;R!T97-T
M('9A;'5E<R H)]&!T+C0O-"RT+[0N]&#)RD[#0II;G-E<G0@:6YT;R!T97-T
M('9A;'5E<R H)]"]T+70M]"PT++0N-&!T+C0O-"^)RD[( T*:6YS97)T(&EN
M=&\@=&5S="!V86QU97,@*"?0O]"[T+#1@M&$T+[1@-"\T8LG*3L-"@T*8V]M
M;6ET.PT*#0IS96QE8W0@*B!F<F]M('1E<W0[#0IS96QE8W0@*B!F<F]M('1E
M<W0@=VAE<F4@;F%M92!S=&%R=&EN9R!W:71H("?0NR<[#0IS96QE8W0@*B!F
M<F]M('1E<W0@=VAE<F4@;F%M92!L:6ME("?0NR4G.PT*<V5L96-T("H@9G)O
M;2!T97-T('=H97)E(&YA;64@;&EK92 G)="[)SL-"G-E;&5C=" J(&9R;VT@
K=&5S="!W:&5R92!N86UE(&QI:V4@)R70NR4G.PT*#0IC;VUM:70[#0H-"FT@
end

> 1. Implementing test DB:
> SET NAMES UNICODE_FSS; /* as I understand, that is not for client
> connections but for DB objects naming */

That's the client's connection character set

> [...] but I need "like" in my program.
>
> Am I doing something wrong? Or FB 1.5 still has no full support for
> unicode?

Yes to both questions, whatever that may help you.

Best Regards,
Peter Jacobi