Subject Re: [Firebird-Java] Re: lc_ctype support
Author Marczisovszky Daniel
r> Hi,

r> --- In Firebird-Java@y..., Marczisovszky Daniel <marczi@d...> wrote:
>> Yes, they work, as there is a bug in the driver, so it actually
>> makes no translation:
>>
>> In FBField.java at line 396
>>
>> if (iscEncoding != null !iscEncoding.equalsIgnoreCase("NONE"))
>>   FBConnectionHelper.getJavaEncoding(iscEncoding);
>>
>> should be replaced with this:
>>
>> if (iscEncoding != null !iscEncoding.equalsIgnoreCase("NONE"))
>>   javaEncoding = FBConnectionHelper.getJavaEncoding(iscEncoding);
>>
>> otherwise javaEncoding will be always null, so no encoding will be
>> used.

r> Thanks! I have corrected this bug. Also, I have corrected the unit
r> test to use unicode test strings.

>> Before you correct this, please create a table with WIN1250. And try
>> this:
>>
>> PreparedStatement pst = conn.prepareStatement("insert into honap
r> (hosszunev) values (?)");
>> pst.setString(1, "õrült");
>> pst.executeUpdate();
>>
>> Note in the second line that character is 0xF5 so it may be replace
>> with \u00F5 in the source code.
>>
>> Please also run this after you corrected the bug. You will see that
>> the first character is replaced by question mark.

r> Please, check the updated TestFBEncodings, I added your code as a
r> test case. In IB-Console and in InterClient it works fine and does
r> not replace the 0xF5 char with '?'. But you have to pass to setString
r> (String) correct unicode string, not the string you obtain by
r> constructing it from WIN1250 byte array with default char encoding.

Yes, they do not replace it with ? because IB-Console does not use
Unicode at all, moreover I assume the InterClient driver does not use
such conversions (using encodings for getBytes and toString) that the
type 4 driver does.

>> Your german test will work fine. Why? Because actually there is no
>> translation. Every character in your german test has the same
>> Unicode and ASCII code.

r> Actually, all of them. I had to change the test string to correct
r> unicode representation. Then it works fine.

r> This is not a bug, but a feature. DBMS does not provide column
r> encoding within XSQLVAR structure. Instead, it tries to convert
r> characters from the column chasrset into connection charset
r> internally. If it is not able to do so, you get an exception ("Cannot
r> transliterate ..."). If you have connection with WIN1250 you can
r> store data into WIN1250, UNICODE_FSS and NONE character columns, and
r> read from WIN1250 and UNICODE_FSS. There might be some other
r> compatible encodings, but I'm not aware of them.

Are you sure it is a feature? What about multilanguage applications?
The problem is not that the DBMS tries to convert the incoming ASCII
characters (8-bit characters) to its own internal representation, the
problem is the JDBC driver makes an additional conversion that I can't
switch off. I can not provide an already 8-bit-character string to the
driver. I can only use getBytes and setBytes but that is ugly in the
code.

The nice in open source development is that I will make this
modification, even if you don't ;) Question is how many people has the
same problem as me? Because if many, this would be a useful feature (I
mean no java conversion in setString and getString)

r> Driver should not make too much conversions. It just implements the
r> DBMS API. The only this that is required from driver is to provide
r> data to DBMS in correct format and describe the format it uses.

You're right, but the DBMS does not assume you're using Unicode
characters on the client side. Currently the JDBC driver does it, but
the DBMS API expects 8 bit characters on the client side, not Unicode
characters. lc_ctype says only how to interpret those characters (for
example for collation orders) and which characters are valid.

As I mentioned still the problem is many system uses the 8-bit
encoding, even in Java, although theoretically you're right that you
expect the correct Unicode format, but practically this may not be a
perfect solution.

r> Please check the unit test, I would like to have your opinion on it.

Ok, I'll test it soon (at least today)