Subject | Re: Firebird and Unicode queries |
---|---|
Author | salisburyproject |
Post date | 2005-02-10T12:47:29Z |
Hi all,
First thanks for all suggestions. I hope that this discussion is
useful for others too...
But I'm still confused. I've been playing around with all options,
and never had success with Unicode, the way I see it..
Here are some points I try to solve:
1. Setting the default charset to UNICODE_FSS looks for me big waste
of resources (space, speed, indexing, etc). Usually a table needs
only few fields with MBCS characters.. I prefer to have only few
columns defined with this charset.
2. Using CHARSET NONE is problematic, I think, as I must provide open
(in this case ODBC) access to the data - only INSERTS and some
SELECTS are handled by the client GUI, data-mining should be done
with other application.
3. Setting the connection type to UNICODE_FSS looks also a problem -
how this affects the data, passed into non-Unicode fields? Any
overhead?
4. The best solution would be, if I can somehow pass and retrieve
Unicode data on the fly, using only SQL syntax for this. Helen
proposed the use of "where UField = _UNICODE_FSS 'carrots'" - I
haven't yet tested this.
5. How the translation of utf-16 (Windows default, for example) & utf-
8 to Unicode_FSS is to be handled? Does the engine take care of this?
The utf-8/16 formats are not fixed size!! The size may vary, so there
must be some piece of code somewhere to handle the transation to the
fixed 3 bytes format.
6. I see very strange behaviour now, having the following setup:
- Few columns defined as UNICODE_FSS
- Client connects with UNICODE_FSS in the connection string, via ODBC
- Data is inserted as-is from the GUI (supporting Unicode, under
Windows)
When I try to do a select with Unicode characters, I get ALL rows
with unicode values... I saw similar description of this behavior in
other forums/sites..
What I do wrong?
7. Firebird seems to be somehow removing Unicode information from
English or other local language.. Then, when returned back fro the
DB, it gets as ASCII - the localized language is one-byte encoded..
Sorry for all these issues.. May be I have misunderstood some basic
principle..
Thanks again,
Kiril.
First thanks for all suggestions. I hope that this discussion is
useful for others too...
But I'm still confused. I've been playing around with all options,
and never had success with Unicode, the way I see it..
Here are some points I try to solve:
1. Setting the default charset to UNICODE_FSS looks for me big waste
of resources (space, speed, indexing, etc). Usually a table needs
only few fields with MBCS characters.. I prefer to have only few
columns defined with this charset.
2. Using CHARSET NONE is problematic, I think, as I must provide open
(in this case ODBC) access to the data - only INSERTS and some
SELECTS are handled by the client GUI, data-mining should be done
with other application.
3. Setting the connection type to UNICODE_FSS looks also a problem -
how this affects the data, passed into non-Unicode fields? Any
overhead?
4. The best solution would be, if I can somehow pass and retrieve
Unicode data on the fly, using only SQL syntax for this. Helen
proposed the use of "where UField = _UNICODE_FSS 'carrots'" - I
haven't yet tested this.
5. How the translation of utf-16 (Windows default, for example) & utf-
8 to Unicode_FSS is to be handled? Does the engine take care of this?
The utf-8/16 formats are not fixed size!! The size may vary, so there
must be some piece of code somewhere to handle the transation to the
fixed 3 bytes format.
6. I see very strange behaviour now, having the following setup:
- Few columns defined as UNICODE_FSS
- Client connects with UNICODE_FSS in the connection string, via ODBC
- Data is inserted as-is from the GUI (supporting Unicode, under
Windows)
When I try to do a select with Unicode characters, I get ALL rows
with unicode values... I saw similar description of this behavior in
other forums/sites..
What I do wrong?
7. Firebird seems to be somehow removing Unicode information from
English or other local language.. Then, when returned back fro the
DB, it gets as ASCII - the localized language is one-byte encoded..
Sorry for all these issues.. May be I have misunderstood some basic
principle..
Thanks again,
Kiril.