firebird-java - Re: lc

Subject	Re: lc_ctype support
Author	rrokytskyy
Post date	2002-04-09T22:45:23Z

> r> Why do you need to pass 0xF5 to driver? You have to pass 0x151
> r> that is obtained by
>
> r> unicodeStr = new String(win1250Str.getBytes(), "Cp1250");
>
> Because this requires character conversions all over the source
> code, moreover I can not pass 0xF5 to a column with WIN1250
> encoding where the lc_ctype for the connection is let's say
> WIN1252. Different character sets in Firebird is quite useful for
> developing multilanguage applications.

Sorry, but I hardly understand the reason to have WIN1252 connection
to store WIN1250 encoded data. Your example with different columns
might be the case, but down there you will find some examples how to
solve this with current code.

> r> Sorry, I do not understand you. Firebird does perform the
> r> conversion from the encoding of your connection to encoding of
> r> the column. In my test case for Ukrainian, I do pass string to
> r> DBMS in WIN1251 encoding and it is correctly converted by the
> r> DBMS to UNICODE_FSS.
>
> Yes, because when you write an Unicode string to a stream it is UTF-
> 8 encoded, but that is made by Java, not the DBMS. My experience is
> Firebird stores those bytes what you write to it.

Wrong. I do pass the byte stream in the encoding specified in
connection. If that was UNICODE_FSS, then this is UTF8 stream, but if
it was WIN1251 I do pass 8-bit characters in WIN1251 encoding.
Firebird database makes conversion internally and we have no control
over such thing.

> This is true, but I'm speaking about those cases where I don't want
> such a server side translation. I pass the strings exactly in the
> format that is required for the given column, and with exception of
> UNICODE_FSS that is a set of 8-bit characters, not Unicode
> characters.

This is not possible if you have charset specified to the column.
This is not limitation of the driver, but the API. Even if you do not
add isc_dpb_lc_ctype in the DPB it will mean that you use NONE. Then,
if you try to store data in the column with specified character
encoding you will get the exception because Firebird has no hint to
translate characters. You cannot change this behaviour of the
Firebird.

> r> Not specifying lc_ctype in DPB is equal to NONE. If you set NONE
> r> as the charset, you will be able to read data in WIN1250
> r> columns, but not write.
>
> This a quite serious issue for me. That is my problem.

I tried to specify the lc_ctype WIN1252, and remove all conversions.
Problem is automatically introduced by the JVM in str.getBytes().
There your 0xF5 is replaced by '?' simply because it is not in the
JVM default encoding (Cp1252). Only specifying the encoding to
getBytes(...) method can save the 0xF5 character.

> Encoding is not unknown, but in many cases that is ISO-8859-1.

It is known and depends on your OS regional settings. If I set on my
Win2k regional settings to "Ukraine", my default encoding is Cp1251,
if I set it to "Germany/Dictionary collation", my default encoding is
Cp1252. I have not tried to set it to Hungarian, but I expect it to
be Cp1250.

> No it won't. I can not use proper UPPER and ORDER BY with
> UNICODE_FSS. Of course if it works, blame on me, this argue has no
> meaning, but otherwise...

It works. I have checked in TestFBEncodings where I use
lc_ctype=UNICODE_FSS and store simultaneously data into WIN1250,
WIN1251 and WIN1252 columns. Those data are correctly stored,
retrieved and are compatible with InterClient.

Best regards,
Roman Rokytskyy