Subject Re: lc_ctype support
Author rrokytskyy
Hi,

--- In Firebird-Java@y..., Marczisovszky Daniel <marczi@d...> wrote:
> Yes, they work, as there is a bug in the driver, so it actually
> makes no translation:
>
> In FBField.java at line 396
>
> if (iscEncoding != null && !iscEncoding.equalsIgnoreCase("NONE"))
> FBConnectionHelper.getJavaEncoding(iscEncoding);
>
> should be replaced with this:
>
> if (iscEncoding != null && !iscEncoding.equalsIgnoreCase("NONE"))
> javaEncoding = FBConnectionHelper.getJavaEncoding(iscEncoding);
>
> otherwise javaEncoding will be always null, so no encoding will be
> used.

Thanks! I have corrected this bug. Also, I have corrected the unit
test to use unicode test strings.

> Before you correct this, please create a table with WIN1250. And try
> this:
>
> PreparedStatement pst = conn.prepareStatement("insert into honap
(hosszunev) values (?)");
> pst.setString(1, "õrült");
> pst.executeUpdate();
>
> Note in the second line that character is 0xF5 so it may be replace
> with \u00F5 in the source code.
>
> Please also run this after you corrected the bug. You will see that
> the first character is replaced by question mark.

Please, check the updated TestFBEncodings, I added your code as a
test case. In IB-Console and in InterClient it works fine and does
not replace the 0xF5 char with '?'. But you have to pass to setString
(String) correct unicode string, not the string you obtain by
constructing it from WIN1250 byte array with default char encoding.

> Your german test will work fine. Why? Because actually there is no
> translation. Every character in your german test has the same
> Unicode and ASCII code.

Ok, you're right, there was a bug in test case as well.

> But your Ukrainian test will not work at all. First you'll get
> arithmetic exception. If you write only to the Unicode field you
> will see many of your characters are lost.

Actually, all of them. I had to change the test string to correct
unicode representation. Then it works fine.

> There is also one more bug, that I realized when I saw your encoding
> test. What happens, when there are different fields with different
> encodings in the DBMS? Because the driver uses the encoding derived
> from lc_ctype of the *connection* not encoding from the field. This
> means if you use WIN1250 for a connection then you will be able to
> write only that field that has WIN120 character set. Actually if you
> try to write to a field that has, let's say WIN1252, then a few of
> your national characters will be replaced by ?, because the string
> will be encoded to a byte array using Cp1250 and not Cp1252.

This is not a bug, but a feature. DBMS does not provide column
encoding within XSQLVAR structure. Instead, it tries to convert
characters from the column chasrset into connection charset
internally. If it is not able to do so, you get an exception ("Cannot
transliterate ..."). If you have connection with WIN1250 you can
store data into WIN1250, UNICODE_FSS and NONE character columns, and
read from WIN1250 and UNICODE_FSS. There might be some other
compatible encodings, but I'm not aware of them.

> I know this is a very complex and complicated issue. But I think
> this examples that I showed in the last two hours makes clear that
> the automatic character encoding may only be an extra feature that
> the driver supports, but one may want switch that off completely,
> otherwise you won't be able to handle such strings that are already
> in their ASCII format and won't be able to handle tables that
> contains fields with different character sets.

Driver should not make too much conversions. It just implements the
DBMS API. The only this that is required from driver is to provide
data to DBMS in correct format and describe the format it uses.

Please check the unit test, I would like to have your opinion on it.

Best regards,
Roman Rokytskyy