firebird-java - Re: lc

Subject	Re: lc_ctype support
Author	rrokytskyy
Post date	2002-04-09T10:45:57Z

> Are you sure it is a feature? What about multilanguage applications?
> The problem is not that the DBMS tries to convert the incoming ASCII
> characters (8-bit characters) to its own internal representation,
> the problem is the JDBC driver makes an additional conversion that
> I can't switch off. I can not provide an already 8-bit-character
> string to the driver. I can only use getBytes and setBytes but that
> is ugly in the code.

setBytes(...) will not work too, simply because in FBStringField
everything goes through the setString(...) method.

I will not discuss if this is a bug or a feature simply I do not know
anything about the internal DBMS work, but this is how the API works:

- on connection time you specify the char encoding for _all_
character data you will be sending to database;

- when you set the value of the CHAR or VARCHAR column, you pass the
data in the encoding you specified during connection time;

- DBMS converts data you passed taking into consideration character
encoding of the connection and column, and if conversion is not
possible, exception is thrown.

> r> Driver should not make too much conversions. It just implements
> r> the DBMS API. The only this that is required from driver is to
> r> provide data to DBMS in correct format and describe the format
> r> it uses.
>
> You're right, but the DBMS does not assume you're using Unicode
> characters on the client side. Currently the JDBC driver does it,
> but the DBMS API expects 8 bit characters on the client side, not
> Unicode characters. lc_ctype says only how to interpret those
> characters (for example for collation orders) and which characters
> are valid.

DBMS expects data in the encoding you specified during connection. If
that was UNICODE_FSS, then the data passed to DBMS should be
UNICODE_FSS. If that was WIN1250, then data should be WIN1250.

And that's exactly what driver is supposed to do: assume that you
pass the _correct_ unicode string, get the encoding of the connection
and convert that unicode string to the encoding of the connection.
And I hope that drivers works in this way.

> As I mentioned still the problem is many system uses the 8-bit
> encoding, even in Java, although theoretically you're right that you
> expect the correct Unicode format, but practically this may not be a
> perfect solution.

Ok, let people vote on this thing.

Best regards,
Roman Rokytskyy