Subject Re: Major problems when using multiple character sets
Author jimbeesleyx
Thanks for the pompt response.

The down side is that I'm trying to use Firebird with a fairly DBMS
neutral (depending on whom you ask) application. I can expose
certain things (like DBMS specific data type names) to the
application, but they are not set up to reformulate every column they
reference based on the target column's type and size. In order for
me to do it dynamically, I'd need to prepare, describe, and parse all
of the SQL at prepare time, regenerate the SQL by changing the column
references based on the describe info, and then reprepare the
regenerated SQL. Not real efficient, plus it gets ugly for
convoluted queries.

I'm using the isc_dsql_* interface (instead of embedded sql), so it
would need to be something like:

INSERT INTO "TOWEROFBABEL" ("COL") VALUES (_ISO8859_1 ?)

But sadly, that returns an error:
rc = 335544569, sqlcode = -104 - Invalid token "?".
So that would seem to leave the problem with connecting as NONE, plus
it would require regenerating SQL to change the parameter references.

The "somewhat good" news is that if I connect to the database as
UNICODE_FSS and use the cast, i.e.

SELECT CAST("COL" AS VARCHAR(24) CHARACTER SET UNICODE_FSS)
FROM "TOWEROFBABEL"

I don't get the internal conversion error. Firebird then considers
all parameters to contain UNICODE_FSS strings, so that works OK as
well.

In addition, the internal workings of Firebird prevent this from
being a complete general purpose solution ... consider:

create table BIG6(
col1 char(3640) character set ISO8859_1,
col2 char(3640) character set ISO8859_1,
col3 char(3640) character set ISO8859_1,
col4 char(3640) character set ISO8859_1,
col5 char(3640) character set ISO8859_1,
col6 char(3640) character set ISO8859_1)

It is perfectly valid to create a row with all 6 columns filled with
NLS characters.

Now consider the humble SELECT * FROM BIG6. The best "safe" query I
could regenerate would be one that reserved 1 UNICODE_FSS character
for each character in the source column (the application's data
bindings in the sqlvar are not available during prepare) ... which
would require the following regenerated SQL:

SELECT
CAST(COL1 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL2 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL3 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL4 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL5 AS CHAR(3640) CHARACTER SET UNICODE_FSS),
CAST(COL6 AS CHAR(3640) CHARACTER SET UNICODE_FSS)
FROM BIG6

Which returns an error, presumably because 3640 * 3 * 6 = 65520 which
is larger than the max row size allowed. I'll grant you that this is
a fairly extreme example - I suspect that I might be able to add a
restriction indicating that the maximum row size from a create table
operation is limited to (total number of characters) * 3 + size of
all other types < Firebird max row size. Most of our applications
would probably be fine with this rule, but it will probably prevent
this being used for a general solution to the problem within the
firebird source code.
This also ignores the fact that Firebird's max column size means that
you can't cats or bind a sufficiently large UNICODE_FSS buffer for
some columns. Consider:

CREATE TABLE BIG1(COL1 char(12000) CHARACTER SET ISO8859_1)

The regenerated SQL for SELECT * FROM BIG1 would be:

SELECT CAST(COL1 AS CHAR(12000) CHARACTER SET UNICODE_FSS)
FROM BIG1

Which returns an error, presumably because 12000 * 3 is greater than
the Firebird maximum column size.

You'll only see the transliteration problem if you fetch a string
whose connection character set representation (in bytes) exceeds the
size of the declared column size (in bytes). Since most folks don't
fill their columns completely with NLS characters and then fetch them
as UNICODE_FSS, most don't see this issue, but it is there - as the
example below shows. One of our application's requirements is to be
able to query columns with different character sets and always return
the results correctly using a single connection, so we are forced to
use either NONE or UNICODE_FSS to connect.

I can always transcode the data after fetching it, so I don't care
whether Firebird does it or not - I just need to get (and put via
input parameters) the valid data.

I posted the internal details of the problem to firebird-devel
earlier, but I didn't get any responses.

Thanks for any help,
-Jim Beesley


--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@t...>
wrote:
> Jim,
>
> At 06:29 PM 1/12/2003 -0500, you wrote:
> >More posting issues ... hopefully this one will make it
through ... sorry
> >if it is a duplicate.
> >
> >This is currently a major issue for us, and any help is greatly
appreciated.
> >
> >Anyway, is it possible to fetch (and send via input parameters)
the data
> >from 2 character columns with different character sets without
having them
> >converted to the connection character set?
> >
> >As an example, let's say a user has a table with 2 columns: one is
> >ISO8859_1 and the other is BIG_5 (or whatever ...)
> >No matter what character set I connect with (*), there will be
some
> >possible data that con not be transcoded/copied to the sqlvar-
>sqldata
> >buffer (i.e. a character in the column being fetched may not be
> >representable in the connection character set).
> >
> >Ideally, I'd set the sqlda->sqlvar->sqlsubtype to CS_BINARY, and
the data
> >would be fetched "as-is" but it appears that the subtype field
gets
> >ignored during fetch.
> >I'm using dsql calls if that makes any difference.
> >
> >Is there anything obvious that I'm missing?
>
> Does this work for you? (Must be done in the SQL statement itself)
>
> INSERT INTO TOWEROFBABEL (ACHAR1, ACHAR2, ACHAR3)
> VALUES(
> _BIG_5 :ACHAR1,
> _ISO8859_1 :ACHAR2,
> _OCTETS :ACHAR3)
>
> If that *doesn't* work, try the casting solution below (N of course
is the
> real size of the varchars.)
>
> For selecting, AFAIK, the Prepare tells the client what character
set to
> expect and you shouldn't get a transliteration error. I have a
similar
> type of database and I don't get any transliteration problems with
it.
>
> As a last resort, you could do:
>
> SELECT CAST(ACHAR1 AS VARCHAR(N) CHARACTER SET BIG_5) AS _ACHAR1,
> CAST(ACHAR2 AS VARCHAR(N) CHARACTER SET ISO9959_1) AS _ACHAR2,
> CAST(ACHAR3 AS VARCHAR(N) CHARACTER SET OCTETS) AS _ACHAR3
> FROM TOWEROFBABEL
>
> (albeit I don't think it is necessary...)
>
> /heLen
>
>
>
>
> >(*) UNICODE_FSS won't work - try inserting 'xíx' into a CHAR(3)
CHARACTER
> >SET ISO8859_1, and then try to fetch it after connecting with
UNICODE_FSS
> >- you'll get a truncation error internally (no matter how large
you make
> >your UNICODE_FSS sqlvar fetch buffer) because firebird only
allocates
> >buffers of the source column size.
> >
> >(**) Connecting as NONE doesn't work either. Fetching works fine,
since I
> >just get the bytes "as-is", but if I try to use input parameters -
like
> >INSERT INTO TABLE VALUES (?), I'll get a transcoding error if I
use any
> >NLS characters. For example, try setting the sqlvar->sqldata for
a
> >parameter to 'xíx' for a CHAR(3) CHARACTER SET ISO8859_1
column ... it
> >_ought_ to work, but the input parameter is apparently assumed to
be in
> >the NONE character set, rather than the target character set, or
the subtype.
> >
> >Thanks in advance,
> >-Jim Beesley
> >
> >
> >
> >
> >To unsubscribe from this group, send an email to:
> >firebird-support-unsubscribe@yahoogroups.com
> >
> >
> >
> >Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/