Subject | Re: [firebird-support] Re: Firebird and Unicode queries |
---|---|
Author | Helen Borrie |
Post date | 2005-02-10T14:01:55Z |
At 12:47 PM 10/02/2005 +0000, you wrote:
default charset. (I hope you understand that the default charset only
affects character types and text blobs...)
done using a Firebird driver, which interfaces the Firebird API with the
ODBC layer.
lc_ctype (see my reply to Scott Morgan). The lc_ctype has to match the
default charset of the database. Period.
need either that (introducer) syntax or CASTs for all columns whose
charsets aren't in the default charset of the database. This isn't
applicable just to MBCS - it applies in all cases where you are mixing
charsets.
mapping for any UTF words of greater than 3 bytes. Unicode wouldn't be the
ideal choice for the charset of the system locale. It would normally make
better sense to choose an OEM character set. If you have an application
that needs to cater for all possible character images on earth, you will
definitely have a problem or two!
the database something else, you have a mismatch.
Second, all search criteria will be passed in the default charset unless
you use an introducer (for literal searches) or cast (for parameterised
searches). So, if your default charset is NONE, and the characters passed
from the UI are anything but those represented by ascii codes < 128, a
garbage string will be passed. Throwing an illicit UNICODE_FSS lc_ctype
into that mix would simply make things more colourful.
they will be stored as ASCII. Ascii and Ansi characters up to decimal 127
and any Unicode characters equivalent to these characters are the
same. Windows (at least, don't know for sure about Linux) will knock them
down to single-byte regardless. If any of those strings then are to be
stored in a column defined with a single-byte charset, that's how Firebird
stores them. If they are to be stored in a MBCS column, and the bridging
is correct, then Firebird stores them as 3-byte unicode.
From Firebird 1.5.1 forward, a change was made so that a string passed
across a charset NONE connection to a column in the default charset of the
database would get bridged automatically. See "Character set NONE data now
accepted 'as-is'' on p.2 of the v.1.5.1 release notes (which you can pick
up on the Firebird 1.5 main downloads page at www.ibphoenix.com). This can
be of help in some mixed charset databases.
./hb
>Hi all,In that case, it gets down to extra effort for columns that don't match the
>
>First thanks for all suggestions. I hope that this discussion is
>useful for others too...
>
>But I'm still confused. I've been playing around with all options,
>and never had success with Unicode, the way I see it..
>Here are some points I try to solve:
>
>1. Setting the default charset to UNICODE_FSS looks for me big waste
>of resources (space, speed, indexing, etc). Usually a table needs
>only few fields with MBCS characters.. I prefer to have only few
>columns defined with this charset.
default charset. (I hope you understand that the default charset only
affects character types and text blobs...)
>2. Using CHARSET NONE is problematic, I think, as I must provide openI don't quite understand what you are getting at. ODBC access would be
>(in this case ODBC) access to the data - only INSERTS and some
>SELECTS are handled by the client GUI, data-mining should be done
>with other application.
done using a Firebird driver, which interfaces the Firebird API with the
ODBC layer.
>3. Setting the connection type to UNICODE_FSS looks also a problem -Forget "overhead". Please try to get your head around the purpose of
>how this affects the data, passed into non-Unicode fields? Any
>overhead?
lc_ctype (see my reply to Scott Morgan). The lc_ctype has to match the
default charset of the database. Period.
>4. The best solution would be, if I can somehow pass and retrieveIf you are going to store mixed character sets in the same database, you'll
>Unicode data on the fly, using only SQL syntax for this. Helen
>proposed the use of "where UField = _UNICODE_FSS 'carrots'" - I
>haven't yet tested this.
need either that (introducer) syntax or CASTs for all columns whose
charsets aren't in the default charset of the database. This isn't
applicable just to MBCS - it applies in all cases where you are mixing
charsets.
>5. How the translation of utf-16 (Windows default, for example) & utf-Yes. See the reply to Scott Morgan.
>8 to Unicode_FSS is to be handled? Does the engine take care of this?
>The utf-8/16 formats are not fixed size!! The size may vary, so thereThere is. But it's pretty distant likelihood that UNICODE_FSS will have a
>must be some piece of code somewhere to handle the transation to the
>fixed 3 bytes format.
mapping for any UTF words of greater than 3 bytes. Unicode wouldn't be the
ideal choice for the charset of the system locale. It would normally make
better sense to choose an OEM character set. If you have an application
that needs to cater for all possible character images on earth, you will
definitely have a problem or two!
>6. I see very strange behaviour now, having the following setup:First, with UNICODE_FSS in the connection string and the default charset in
>- Few columns defined as UNICODE_FSS
>- Client connects with UNICODE_FSS in the connection string, via ODBC
>- Data is inserted as-is from the GUI (supporting Unicode, under
>Windows)
>When I try to do a select with Unicode characters, I get ALL rows
>with unicode values... I saw similar description of this behavior in
>other forums/sites..
>What I do wrong?
the database something else, you have a mismatch.
Second, all search criteria will be passed in the default charset unless
you use an introducer (for literal searches) or cast (for parameterised
searches). So, if your default charset is NONE, and the characters passed
from the UI are anything but those represented by ascii codes < 128, a
garbage string will be passed. Throwing an illicit UNICODE_FSS lc_ctype
into that mix would simply make things more colourful.
>7. Firebird seems to be somehow removing Unicode information fromIf you pass a string of single byte characters into a charset NONE column,
>English or other local language.. Then, when returned back fro the
>DB, it gets as ASCII - the localized language is one-byte encoded..
they will be stored as ASCII. Ascii and Ansi characters up to decimal 127
and any Unicode characters equivalent to these characters are the
same. Windows (at least, don't know for sure about Linux) will knock them
down to single-byte regardless. If any of those strings then are to be
stored in a column defined with a single-byte charset, that's how Firebird
stores them. If they are to be stored in a MBCS column, and the bridging
is correct, then Firebird stores them as 3-byte unicode.
From Firebird 1.5.1 forward, a change was made so that a string passed
across a charset NONE connection to a column in the default charset of the
database would get bridged automatically. See "Character set NONE data now
accepted 'as-is'' on p.2 of the v.1.5.1 release notes (which you can pick
up on the Firebird 1.5 main downloads page at www.ibphoenix.com). This can
be of help in some mixed charset databases.
./hb