Subject | Re: [firebird-support] Re: Firebird and Unicode queries |
---|---|
Author | Lester Caine |
Post date | 2005-02-10T05:46:51Z |
David Johnson wrote:
the wrong answer.
database field. In the good old days yo could look at the binary data
and character 'x' would be a position 'x' on ALL records. UNICODE_FSS
maintains that link at the expense of 24bits per character rather than
8, but then string matching is consistent, and SUBSTRING is a simple
count of characters not needing 'context'.
archive of world data, but I have hit this problem as well. Add in
multiple other languages that do not use the assci characters at all.
What SHOULD the default ordering be and how do you manage it in the
indexes - especially when you add 'full text search ' ;)
reality. While you can store UTF-8/16 easily enough, fully updating all
character functions to handle it is (to my mind) a large amount of
effort. THEN one asks the question, does providing that functionality
impinge on performance when only a simple 7bit binary set is required?
Those of us who are used to working in one dimension - English - still
have great difficulty contemplating the problems of multi dimensional
translations, and are perhaps a little jealous of those who can handle
it so easily. But hopefully these problems are being addressed by the
work being done on INTL ?
--
Lester Caine
-----------------------------
L.S.Caine Electronic Services
>>>--- In firebird-support@yahoogroups.com, David Johnson wrote:<SNIP>
>>>
>>>>To store utf-8 or utf-16, you should declare the column with no
>>>>character set.
>>
>>This is not the right advice, albeit it might be a workaround that makes
> UTF-8 and UTF-16 are standards that are independent of language -David - Helens comment is to the fact that using NONE to store UTF-8 is
> programming, database, or natural. If you want to store data from
> dissimilar languages in the same columns in the same database instance,
> it is necessary to have a character encoding that supports all of these
> at the same time.
the wrong answer.
> In UTF-8 and UTF-16, the byte count is variable from 1 (or 2) to atWhich is the crux of the problem when trying to manage the data within a
> least 6 bytes
database field. In the good old days yo could look at the binary data
and character 'x' would be a position 'x' on ALL records. UNICODE_FSS
maintains that link at the expense of 24bits per character rather than
8, but then string matching is consistent, and SUBSTRING is a simple
count of characters not needing 'context'.
> The A with a circle on top (Angstrom to english speakers) is just an "A"And you are still thinking on the small scale. I've been building up an
> to english speakers, but it is a distinct letter between A and B in
> norwegian and a distinct letter following about two places after Z in
> swedish (or maybe it's the other way around). In those languages, it is
archive of world data, but I have hit this problem as well. Add in
multiple other languages that do not use the assci characters at all.
What SHOULD the default ordering be and how do you manage it in the
indexes - especially when you add 'full text search ' ;)
>>If you want *all* of your string fields to be stored as unicode, you shouldThis is still a half way house, but may be the best we can do in
>>make UNICODE_FSS the default character set of the database and always use
>>UNICODE_FSS as the lc_ctype of the client connection.
reality. While you can store UTF-8/16 easily enough, fully updating all
character functions to handle it is (to my mind) a large amount of
effort. THEN one asks the question, does providing that functionality
impinge on performance when only a simple 7bit binary set is required?
Those of us who are used to working in one dimension - English - still
have great difficulty contemplating the problems of multi dimensional
translations, and are perhaps a little jealous of those who can handle
it so easily. But hopefully these problems are being addressed by the
work being done on INTL ?
--
Lester Caine
-----------------------------
L.S.Caine Electronic Services