Subject | Re: Help On Unicode! |
---|---|
Author | peter_jacobi.rm |
Post date | 2004-05-14T09:54:17Z |
Hi Pavel,
This will be a long posting...
--- "pavel_menshchikov" <mpn2001@y...> wrote:
support in Firebird is still problematic. But it's not that
bad, as your tests may suggest. So it's primarily not a hard
("won't work") but a soft ("will give more trouble, than it's
worth") decision.
Add these points
- efficency
- locale support (about 100 culture specific collations
for narrow character sets, 0 for Unicode)
- tools trouble (will all your tools, components and intermediate
layers work with Unicode?)
an intermediate form in character set conversions, so that
the number of converters needed doesn't rise quadratically.
character set and client's connection character set, but it
has the crippling problem, that conversions which changes
the number of bytes almost always fail. So automatic conversion
between CP1251 and KOI8R is O.K. but, automatic conversion
between CP1251 and UTF8 can't be used.
As the mailing list cannot be trusted to deliver UTF8
content, I've put an UUencoded version of the script below.
It can be run with:
del c:\ac2.fdb && ..\bin\isql -e -i utf8test.sql >utf8test.out
And the output be looked at with any UTF-8 aware editor.
If you want to try the quirky UTF8 of the W2K commandline,
you can switch the commandline with "chcp 65001" and run
without output redirection.
Anyway, the test shows correct behaviour of LIKE and
LIKE '<string_in_russian>%' and STARTING WITH
'<string_in_russian>' give identical results.
Much to my surprise, I must confess, as my leaky
carbon-based long term storage seemed to remember
a posting of Nickolay, that he has optimized the LIKE
implementation, and as a collateral damage, it won't work
for MBCS anymore. Seems that wasn't as worse as thought
and/or only affects CVS HEAD.
The uuencoded script:
begin 0777 utf8test.sql
M[[N_<VAO=R!V97)S:6]N.PT*<VAO=R!V97)S:6]N.PT*4T54($Y!3453(%5.
M24-/1$5?1E-3.PT*4T54(%-13"!$24%,14-4(#,[#0I#4D5!5$4@1$%404)!
M4T4@)V,Z7&%C,BYF9&(G#0H@55-%4B G4UE31$)!)R!005-35T]21" G;6%S
M=&5R:V5Y)R!004=%7U-)6D4@.#$Y,@T*($1%1D%53%0@0TA!4D%#5$52(%-%
M5"!53DE#3T1%7T934SL-"D-214%412!$3TU!24X@0T@R-2!!4R!605)#2$%2
M*#(U*2!#2$%204-415(@4T54(%5.24-/1$5?1E-3($-/3$Q!5$4-"B!53DE#
M3T1%7T934SL-"D-214%412!404),12!415-4*$Y!344@0T@R-2D[#0H-"@T*
M8V]M;6ET.PT*#0II;G-E<G0@:6YT;R!T97-T('9A;'5E<R H)V%B8R<I.PT*
M:6YS97)T(&EN=&\@=&5S="!V86QU97,@*"<Q,C,G*3L-"FEN<V5R="!I;G1O
M('1E<W0@=F%L=65S("@GPX3#EL.<)RD[#0II;G-E<G0@:6YT;R!T97-T('9A
M;'5E<R H)^*"K"<I.PT*#0II;G-E<G0@:6YT;R!T97-T('9A;'5E<R H)]"_
MT8#0N-&!T++0L-"XT++0L-"UT8(G*3L-"FEN<V5R="!I;G1O('1E<W0@=F%L
M=65S("@GT8/0O="XT+K0L-"[T8S0O=&+T+DG*3L-"FEN<V5R="!I;G1O('1E
M<W0@=F%L=65S("@GT+O1CM"QT+[0O-&#)RD[#0II;G-E<G0@:6YT;R!T97-T
M('9A;'5E<R H)]&!T+C0O-"RT+[0N]&#)RD[#0II;G-E<G0@:6YT;R!T97-T
M('9A;'5E<R H)]"]T+70M]"PT++0N-&!T+C0O-"^)RD[( T*:6YS97)T(&EN
M=&\@=&5S="!V86QU97,@*"?0O]"[T+#1@M&$T+[1@-"\T8LG*3L-"@T*8V]M
M;6ET.PT*#0IS96QE8W0@*B!F<F]M('1E<W0[#0IS96QE8W0@*B!F<F]M('1E
M<W0@=VAE<F4@;F%M92!S=&%R=&EN9R!W:71H("?0NR<[#0IS96QE8W0@*B!F
M<F]M('1E<W0@=VAE<F4@;F%M92!L:6ME("?0NR4G.PT*<V5L96-T("H@9G)O
M;2!T97-T('=H97)E(&YA;64@;&EK92 G)="[)SL-"G-E;&5C=" J(&9R;VT@
K=&5S="!W:&5R92!N86UE(&QI:V4@)R70NR4G.PT*#0IC;VUM:70[#0H-"FT@
end
Best Regards,
Peter Jacobi
This will be a long posting...
--- "pavel_menshchikov" <mpn2001@y...> wrote:
> The differences in "constructions" are clear. I wonder why AlanTo give full disclosure, and as you've already guessed, Unicode
> doesn't recommend to use unicode to novice programmer?
support in Firebird is still problematic. But it's not that
bad, as your tests may suggest. So it's primarily not a hard
("won't work") but a soft ("will give more trouble, than it's
worth") decision.
Add these points
- efficency
- locale support (about 100 culture specific collations
for narrow character sets, 0 for Unicode)
- tools trouble (will all your tools, components and intermediate
layers work with Unicode?)
> I've read (in IB or FB documentation as I remember) that IB/FBOnly to the limited extent, that Unicode is used as
> internally uses unicode to store/manipulate string values,
an intermediate form in character set conversions, so that
the number of converters needed doesn't rise quadratically.
> andThere is an automatic charset conversion between database
> converts strings for clients as specified (character set/collation).
> Is it right?
character set and client's connection character set, but it
has the crippling problem, that conversions which changes
the number of bytes almost always fail. So automatic conversion
between CP1251 and KOI8R is O.K. but, automatic conversion
between CP1251 and UTF8 can't be used.
> I tried the following with FB 1.5.0.4306 (release):I've re-created your test so that it only used ISQL.
As the mailing list cannot be trusted to deliver UTF8
content, I've put an UUencoded version of the script below.
It can be run with:
del c:\ac2.fdb && ..\bin\isql -e -i utf8test.sql >utf8test.out
And the output be looked at with any UTF-8 aware editor.
If you want to try the quirky UTF8 of the W2K commandline,
you can switch the commandline with "chcp 65001" and run
without output redirection.
Anyway, the test shows correct behaviour of LIKE and
LIKE '<string_in_russian>%' and STARTING WITH
'<string_in_russian>' give identical results.
Much to my surprise, I must confess, as my leaky
carbon-based long term storage seemed to remember
a posting of Nickolay, that he has optimized the LIKE
implementation, and as a collateral damage, it won't work
for MBCS anymore. Seems that wasn't as worse as thought
and/or only affects CVS HEAD.
The uuencoded script:
begin 0777 utf8test.sql
M[[N_<VAO=R!V97)S:6]N.PT*<VAO=R!V97)S:6]N.PT*4T54($Y!3453(%5.
M24-/1$5?1E-3.PT*4T54(%-13"!$24%,14-4(#,[#0I#4D5!5$4@1$%404)!
M4T4@)V,Z7&%C,BYF9&(G#0H@55-%4B G4UE31$)!)R!005-35T]21" G;6%S
M=&5R:V5Y)R!004=%7U-)6D4@.#$Y,@T*($1%1D%53%0@0TA!4D%#5$52(%-%
M5"!53DE#3T1%7T934SL-"D-214%412!$3TU!24X@0T@R-2!!4R!605)#2$%2
M*#(U*2!#2$%204-415(@4T54(%5.24-/1$5?1E-3($-/3$Q!5$4-"B!53DE#
M3T1%7T934SL-"D-214%412!404),12!415-4*$Y!344@0T@R-2D[#0H-"@T*
M8V]M;6ET.PT*#0II;G-E<G0@:6YT;R!T97-T('9A;'5E<R H)V%B8R<I.PT*
M:6YS97)T(&EN=&\@=&5S="!V86QU97,@*"<Q,C,G*3L-"FEN<V5R="!I;G1O
M('1E<W0@=F%L=65S("@GPX3#EL.<)RD[#0II;G-E<G0@:6YT;R!T97-T('9A
M;'5E<R H)^*"K"<I.PT*#0II;G-E<G0@:6YT;R!T97-T('9A;'5E<R H)]"_
MT8#0N-&!T++0L-"XT++0L-"UT8(G*3L-"FEN<V5R="!I;G1O('1E<W0@=F%L
M=65S("@GT8/0O="XT+K0L-"[T8S0O=&+T+DG*3L-"FEN<V5R="!I;G1O('1E
M<W0@=F%L=65S("@GT+O1CM"QT+[0O-&#)RD[#0II;G-E<G0@:6YT;R!T97-T
M('9A;'5E<R H)]&!T+C0O-"RT+[0N]&#)RD[#0II;G-E<G0@:6YT;R!T97-T
M('9A;'5E<R H)]"]T+70M]"PT++0N-&!T+C0O-"^)RD[( T*:6YS97)T(&EN
M=&\@=&5S="!V86QU97,@*"?0O]"[T+#1@M&$T+[1@-"\T8LG*3L-"@T*8V]M
M;6ET.PT*#0IS96QE8W0@*B!F<F]M('1E<W0[#0IS96QE8W0@*B!F<F]M('1E
M<W0@=VAE<F4@;F%M92!S=&%R=&EN9R!W:71H("?0NR<[#0IS96QE8W0@*B!F
M<F]M('1E<W0@=VAE<F4@;F%M92!L:6ME("?0NR4G.PT*<V5L96-T("H@9G)O
M;2!T97-T('=H97)E(&YA;64@;&EK92 G)="[)SL-"G-E;&5C=" J(&9R;VT@
K=&5S="!W:&5R92!N86UE(&QI:V4@)R70NR4G.PT*#0IC;VUM:70[#0H-"FT@
end
> 1. Implementing test DB:That's the client's connection character set
> SET NAMES UNICODE_FSS; /* as I understand, that is not for client
> connections but for DB objects naming */
> [...] but I need "like" in my program.Yes to both questions, whatever that may help you.
>
> Am I doing something wrong? Or FB 1.5 still has no full support for
> unicode?
Best Regards,
Peter Jacobi