Subject | Re: Major problems when using multiple character sets |
---|---|
Author | jimbeesleyx |
Post date | 2003-12-02T18:19:24Z |
Thanks for the pompt response.
The down side is that I'm trying to use Firebird with a fairly DBMS
neutral (depending on whom you ask) application. I can expose
certain things (like DBMS specific data type names) to the
application, but they are not set up to reformulate every column they
reference based on the target column's type and size. In order for
me to do it dynamically, I'd need to prepare, describe, and parse all
of the SQL at prepare time, regenerate the SQL by changing the column
references based on the describe info, and then reprepare the
regenerated SQL. Not real efficient, plus it gets ugly for
convoluted queries.
I'm using the isc_dsql_* interface (instead of embedded sql), so it
would need to be something like:
INSERT INTO "TOWEROFBABEL" ("COL") VALUES (_ISO8859_1 ?)
But sadly, that returns an error:
rc = 335544569, sqlcode = -104 - Invalid token "?".
So that would seem to leave the problem with connecting as NONE, plus
it would require regenerating SQL to change the parameter references.
The "somewhat good" news is that if I connect to the database as
UNICODE_FSS and use the cast, i.e.
SELECT CAST("COL" AS VARCHAR(24) CHARACTER SET UNICODE_FSS)
FROM "TOWEROFBABEL"
I don't get the internal conversion error. Firebird then considers
all parameters to contain UNICODE_FSS strings, so that works OK as
well.
In addition, the internal workings of Firebird prevent this from
being a complete general purpose solution ... consider:
create table BIG6(
col1 char(3640) character set ISO8859_1,
col2 char(3640) character set ISO8859_1,
col3 char(3640) character set ISO8859_1,
col4 char(3640) character set ISO8859_1,
col5 char(3640) character set ISO8859_1,
col6 char(3640) character set ISO8859_1)
It is perfectly valid to create a row with all 6 columns filled with
NLS characters.
Now consider the humble SELECT * FROM BIG6. The best "safe" query I
could regenerate would be one that reserved 1 UNICODE_FSS character
for each character in the source column (the application's data
bindings in the sqlvar are not available during prepare) ... which
would require the following regenerated SQL:
SELECT
CAST(COL1 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL2 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL3 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL4 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL5 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL6 AS CHAR(3640) CHARACTER SET UNICODE_FSS)
FROM BIG6
Which returns an error, presumably because 3640 * 3 * 6 = 65520 which
is larger than the max row size allowed. I'll grant you that this is
a fairly extreme example - I suspect that I might be able to add a
restriction indicating that the maximum row size from a create table
operation is limited to (total number of characters) * 3 + size of
all other types < Firebird max row size. Most of our applications
would probably be fine with this rule, but it will probably prevent
this being used for a general solution to the problem within the
firebird source code.
This also ignores the fact that Firebird's max column size means that
you can't cats or bind a sufficiently large UNICODE_FSS buffer for
some columns. Consider:
CREATE TABLE BIG1(COL1 char(12000) CHARACTER SET ISO8859_1)
The regenerated SQL for SELECT * FROM BIG1 would be:
SELECT CAST(COL1 AS CHAR(12000) CHARACTER SET UNICODE_FSS)
FROM BIG1
Which returns an error, presumably because 12000 * 3 is greater than
the Firebird maximum column size.
You'll only see the transliteration problem if you fetch a string
whose connection character set representation (in bytes) exceeds the
size of the declared column size (in bytes). Since most folks don't
fill their columns completely with NLS characters and then fetch them
as UNICODE_FSS, most don't see this issue, but it is there - as the
example below shows. One of our application's requirements is to be
able to query columns with different character sets and always return
the results correctly using a single connection, so we are forced to
use either NONE or UNICODE_FSS to connect.
I can always transcode the data after fetching it, so I don't care
whether Firebird does it or not - I just need to get (and put via
input parameters) the valid data.
I posted the internal details of the problem to firebird-devel
earlier, but I didn't get any responses.
Thanks for any help,
-Jim Beesley
--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@t...>
wrote:
The down side is that I'm trying to use Firebird with a fairly DBMS
neutral (depending on whom you ask) application. I can expose
certain things (like DBMS specific data type names) to the
application, but they are not set up to reformulate every column they
reference based on the target column's type and size. In order for
me to do it dynamically, I'd need to prepare, describe, and parse all
of the SQL at prepare time, regenerate the SQL by changing the column
references based on the describe info, and then reprepare the
regenerated SQL. Not real efficient, plus it gets ugly for
convoluted queries.
I'm using the isc_dsql_* interface (instead of embedded sql), so it
would need to be something like:
INSERT INTO "TOWEROFBABEL" ("COL") VALUES (_ISO8859_1 ?)
But sadly, that returns an error:
rc = 335544569, sqlcode = -104 - Invalid token "?".
So that would seem to leave the problem with connecting as NONE, plus
it would require regenerating SQL to change the parameter references.
The "somewhat good" news is that if I connect to the database as
UNICODE_FSS and use the cast, i.e.
SELECT CAST("COL" AS VARCHAR(24) CHARACTER SET UNICODE_FSS)
FROM "TOWEROFBABEL"
I don't get the internal conversion error. Firebird then considers
all parameters to contain UNICODE_FSS strings, so that works OK as
well.
In addition, the internal workings of Firebird prevent this from
being a complete general purpose solution ... consider:
create table BIG6(
col1 char(3640) character set ISO8859_1,
col2 char(3640) character set ISO8859_1,
col3 char(3640) character set ISO8859_1,
col4 char(3640) character set ISO8859_1,
col5 char(3640) character set ISO8859_1,
col6 char(3640) character set ISO8859_1)
It is perfectly valid to create a row with all 6 columns filled with
NLS characters.
Now consider the humble SELECT * FROM BIG6. The best "safe" query I
could regenerate would be one that reserved 1 UNICODE_FSS character
for each character in the source column (the application's data
bindings in the sqlvar are not available during prepare) ... which
would require the following regenerated SQL:
SELECT
CAST(COL1 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL2 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL3 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL4 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL5 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL6 AS CHAR(3640) CHARACTER SET UNICODE_FSS)
FROM BIG6
Which returns an error, presumably because 3640 * 3 * 6 = 65520 which
is larger than the max row size allowed. I'll grant you that this is
a fairly extreme example - I suspect that I might be able to add a
restriction indicating that the maximum row size from a create table
operation is limited to (total number of characters) * 3 + size of
all other types < Firebird max row size. Most of our applications
would probably be fine with this rule, but it will probably prevent
this being used for a general solution to the problem within the
firebird source code.
This also ignores the fact that Firebird's max column size means that
you can't cats or bind a sufficiently large UNICODE_FSS buffer for
some columns. Consider:
CREATE TABLE BIG1(COL1 char(12000) CHARACTER SET ISO8859_1)
The regenerated SQL for SELECT * FROM BIG1 would be:
SELECT CAST(COL1 AS CHAR(12000) CHARACTER SET UNICODE_FSS)
FROM BIG1
Which returns an error, presumably because 12000 * 3 is greater than
the Firebird maximum column size.
You'll only see the transliteration problem if you fetch a string
whose connection character set representation (in bytes) exceeds the
size of the declared column size (in bytes). Since most folks don't
fill their columns completely with NLS characters and then fetch them
as UNICODE_FSS, most don't see this issue, but it is there - as the
example below shows. One of our application's requirements is to be
able to query columns with different character sets and always return
the results correctly using a single connection, so we are forced to
use either NONE or UNICODE_FSS to connect.
I can always transcode the data after fetching it, so I don't care
whether Firebird does it or not - I just need to get (and put via
input parameters) the valid data.
I posted the internal details of the problem to firebird-devel
earlier, but I didn't get any responses.
Thanks for any help,
-Jim Beesley
--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@t...>
wrote:
> Jim,through ... sorry
>
> At 06:29 PM 1/12/2003 -0500, you wrote:
> >More posting issues ... hopefully this one will make it
> >if it is a duplicate.appreciated.
> >
> >This is currently a major issue for us, and any help is greatly
> >the data
> >Anyway, is it possible to fetch (and send via input parameters)
> >from 2 character columns with different character sets withouthaving them
> >converted to the connection character set?some
> >
> >As an example, let's say a user has a table with 2 columns: one is
> >ISO8859_1 and the other is BIG_5 (or whatever ...)
> >No matter what character set I connect with (*), there will be
> >possible data that con not be transcoded/copied to the sqlvar-the data
>sqldata
> >buffer (i.e. a character in the column being fetched may not be
> >representable in the connection character set).
> >
> >Ideally, I'd set the sqlda->sqlvar->sqlsubtype to CS_BINARY, and
> >would be fetched "as-is" but it appears that the subtype fieldgets
> >ignored during fetch.is the
> >I'm using dsql calls if that makes any difference.
> >
> >Is there anything obvious that I'm missing?
>
> Does this work for you? (Must be done in the SQL statement itself)
>
> INSERT INTO TOWEROFBABEL (ACHAR1, ACHAR2, ACHAR3)
> VALUES(
> _BIG_5 :ACHAR1,
> _ISO8859_1 :ACHAR2,
> _OCTETS :ACHAR3)
>
> If that *doesn't* work, try the casting solution below (N of course
> real size of the varchars.)set to
>
> For selecting, AFAIK, the Prepare tells the client what character
> expect and you shouldn't get a transliteration error. I have asimilar
> type of database and I don't get any transliteration problems withit.
>CHARACTER
> As a last resort, you could do:
>
> SELECT CAST(ACHAR1 AS VARCHAR(N) CHARACTER SET BIG_5) AS _ACHAR1,
> CAST(ACHAR2 AS VARCHAR(N) CHARACTER SET ISO9959_1) AS _ACHAR2,
> CAST(ACHAR3 AS VARCHAR(N) CHARACTER SET OCTETS) AS _ACHAR3
> FROM TOWEROFBABEL
>
> (albeit I don't think it is necessary...)
>
> /heLen
>
>
>
>
> >(*) UNICODE_FSS won't work - try inserting 'xíx' into a CHAR(3)
> >SET ISO8859_1, and then try to fetch it after connecting withUNICODE_FSS
> >- you'll get a truncation error internally (no matter how largeyou make
> >your UNICODE_FSS sqlvar fetch buffer) because firebird onlyallocates
> >buffers of the source column size.since I
> >
> >(**) Connecting as NONE doesn't work either. Fetching works fine,
> >just get the bytes "as-is", but if I try to use input parameters -like
> >INSERT INTO TABLE VALUES (?), I'll get a transcoding error if Iuse any
> >NLS characters. For example, try setting the sqlvar->sqldata fora
> >parameter to 'xíx' for a CHAR(3) CHARACTER SET ISO8859_1column ... it
> >_ought_ to work, but the input parameter is apparently assumed tobe in
> >the NONE character set, rather than the target character set, orthe subtype.
> >http://docs.yahoo.com/info/terms/
> >Thanks in advance,
> >-Jim Beesley
> >
> >
> >
> >
> >To unsubscribe from this group, send an email to:
> >firebird-support-unsubscribe@yahoogroups.com
> >
> >
> >
> >Your use of Yahoo! Groups is subject to