Subject | Re: [firebird-support] Re: Firebird and Unicode queries |
---|---|
Author | Helen Borrie |
Post date | 2005-02-10T00:40:04Z |
At 08:38 PM 9/02/2005 +0000, you wrote:
strictly binary, which means that there are no dictionary sort orders or
upper/lowercase mappings, other than for the characters in the range of the
first 128 characters (US ASCII equivalents).
Also, since UNICODE_FSS stores every character as exactly 3 bytes, the
maximum size of indexes (up to and including Fb 1.5.x) is one-third or less
than that for single-byte character sets. In practice, this means you
can't index string fields larger than ~84 characters - less if you use
multi-segment indexes.
sense to Java programmers who are used to working with / prefer to take
advantage of an application language that can manipulate unicode according
to some idiomatic Java conventions. It will lock you into storing data
that neither the database engine nor other application languages can make
any sense of. It will make most string expressions unavailable. It goes
right against the principle of storing data that is independent of the
application languages through which it might be accessed.
If you want to store unicode characters, declare the columns as
UNICODE_FSS. If you use character set NONE (or any other single-byte
character set) the database engine has no way at all to know that each
character is represented by a 3-byte word.
Applications should take care of ensuring that the interface sees the right
character images - the database engine doesn't deliver character images -
and they should ensure that search criteria passed to the database are
recognised as UNIICODE_FSS.
If you want *all* of your string fields to be stored as unicode, you should
make UNICODE_FSS the default character set of the database and always use
UNICODE_FSS as the lc_ctype of the client connection.
If the database default charset is not UNICODE_FSS correctly paired with a
matching client lc_ctype, but columns are arbitrarily declared as
unicode_fss then the client work to achieve proper character recognition
by the engine is more complicated, viz.
-- in non-parameterised statements, you'll have to prefix each string with
the introducer _UNICODE_FSS, e.g.
where UField = _UNICODE_FSS 'carrots'
-- in parameterised statements, you will have to use a cast expression in
the input string, e.g.
where UField = cast(? as varchar(n) character set UNICODE_FSS)
(where (n) is the declared size of UField in the table definition).
./hb
>Is this really so??? I decided to use FB, based on the statement itOf course it is supported. However, there are no collations, other than
>supports Unicode.. And it does have a UNICODE_FSS character type..
>Do you mean it is not really supported?
strictly binary, which means that there are no dictionary sort orders or
upper/lowercase mappings, other than for the characters in the range of the
first 128 characters (US ASCII equivalents).
Also, since UNICODE_FSS stores every character as exactly 3 bytes, the
maximum size of indexes (up to and including Fb 1.5.x) is one-third or less
than that for single-byte character sets. In practice, this means you
can't index string fields larger than ~84 characters - less if you use
multi-segment indexes.
>--- In firebird-support@yahoogroups.com, David Johnson wrote:This is not the right advice, albeit it might be a workaround that makes
> >
> > To store utf-8 or utf-16, you should declare the column with no
> > character set.
sense to Java programmers who are used to working with / prefer to take
advantage of an application language that can manipulate unicode according
to some idiomatic Java conventions. It will lock you into storing data
that neither the database engine nor other application languages can make
any sense of. It will make most string expressions unavailable. It goes
right against the principle of storing data that is independent of the
application languages through which it might be accessed.
If you want to store unicode characters, declare the columns as
UNICODE_FSS. If you use character set NONE (or any other single-byte
character set) the database engine has no way at all to know that each
character is represented by a 3-byte word.
Applications should take care of ensuring that the interface sees the right
character images - the database engine doesn't deliver character images -
and they should ensure that search criteria passed to the database are
recognised as UNIICODE_FSS.
If you want *all* of your string fields to be stored as unicode, you should
make UNICODE_FSS the default character set of the database and always use
UNICODE_FSS as the lc_ctype of the client connection.
If the database default charset is not UNICODE_FSS correctly paired with a
matching client lc_ctype, but columns are arbitrarily declared as
unicode_fss then the client work to achieve proper character recognition
by the engine is more complicated, viz.
-- in non-parameterised statements, you'll have to prefix each string with
the introducer _UNICODE_FSS, e.g.
where UField = _UNICODE_FSS 'carrots'
-- in parameterised statements, you will have to use a cast expression in
the input string, e.g.
where UField = cast(? as varchar(n) character set UNICODE_FSS)
(where (n) is the declared size of UField in the table definition).
./hb