Subject Re: lc_ctype support
Author rrokytskyy
> Yes, I understand this, but you should also understand that we're
> speaking about TWO encodings. When lc_ctype is WIN1250 I can send
> 0xF5 to the driver, when lc_ctype is different (e.g. NONE, or
> WIN1252) I can not. Problem is there is no way to pass 0xF5 *to the
> JDBC driver*, because it assumes my string is encoded (but this is
> a different encoding from lc_ctype) using Cp1250.

Why do you need to pass 0xF5 to driver? You have to pass 0x151 that
is obtained by

unicodeStr = new String(win1250Str.getBytes(), "Cp1250");

where win1250Str contains 0xF5. Driver assumes that your string is
_unicode_. If during the conversion from unicode to win1251 0xF5 is
replaced by '?', well... that was a wrong string.

> Java encoding convert Unicode characters to 8 bit characters, but
> lc_ctype (thus encoding specified for Firebird) *does not make* any
> conversion. That is an interpretation, not a conversion. When I set
> lc_ctype to WIN1250, I specify ONLY what characters are allowed in a
> 8-bit byte array, and how to order the strings with those
> characters.
>
> That is the difference.

Sorry, I do not understand you. Firebird does perform the conversion
from the encoding of your connection to encoding of the column. In my
test case for Ukrainian, I do pass string to DBMS in WIN1251 encoding
and it is correctly converted by the DBMS to UNICODE_FSS.

> Ok, but how can I pass a correct unicode string, if encoding is
> encoding is different in a column from the connection lc_ctype.

Encoding of what? Unicode string has no encoding, or I am wrong? In
correct unicode string, all national charactes have codes the
correspond them in unicode table. If you have 0xF5 in unicode string,
that is not 0x151 (even if you mean it) and you hardly can expect the
Unicode <-> national charset conversion work correctly.

Connection encoding is the encoding in what you will be passing the
data. DBMS takes all the responsibility to store data in the encoding
specified for column. Driver just needs to provide data in the
encoding is specified for connection.

> What I want to make clear is that lc_ctype is _not_ an encoding.
> That is only a method to make UPPER and ORDER BY work, but that is
> not an encoding.

Where did you read this?

API Guide says (page 47):

"isc_dpb_lc_ctype String specifying the character set to be
utilized".

Language Reference (page 277):

"A character set defines the symbols that can be entered as text in a
column, and its also defines the maximum number of bytes of storage
necessary to represent each symbol.... Each character set also has an
implicit collation order that specifies how its symbols are sorted
and ordered."

So, lc_ctype _is_ the character set of the client.

Language Reference (page 285):

"SET NAMES specifies the character set the server should use when
translating data from the database to the client application.
Similarly, when the client sends data to the database, the server
translates the data from the client's character set to the database's
default character set (or the character set for an individual column
if it differs from the database's default character set)."

SET NAMES is setting the isc_dpb_lc_ctype for the connection. From
the citation you can find that server does perform translation from
charset of the column to the client's charset.

> Encoding converts ASCII characters to Unicode characters and vica
> versa. If you send 0xF5 to Firebird, it will store 0xF5 in the
> database.

Not always (you might try defining the column with ASCII charset
0..127 and try to write 0xF5). It will accept it only if 0xF5 is
allowed in the character set of the connection (NONE, UNICODE_FSS,
WIN1251, etc.). But it will try to convert it according to the
charset of the database or column and throw an exception if it fails.
Therefore you cannot store data through the connection with NONE
charset into WIN1250 column, simply because Firebird has no hint how
the data you supplied must be converted into WIN1250 charset.

> Maybe, if you don't set lc_ctype it will say that character is not
> valid, but it won't convert it.

Not specifying lc_ctype in DPB is equal to NONE. If you set NONE as
the charset, you will be able to read data in WIN1250 columns, but
not write.

> I think now it is clear that the main problem is I can't pass 0xF5
> to the JDBC driver, what you should see is that in many cases there
> is no way to create a "correct" unicode string, because original
> encoding is not known.

If you do not know the encoding, how can you use them in database?
Java does not have "unknown" encoding as well, it has "default" one.
What prevents you to use Cp1250 as the default encoding for your JVM?
Then you are sure that this is true:

new String(new byte[]{(byte)0xf5}).charAt(0) == 0x151

> Moreover there is absolutely no hope to write data to column with
> differenc character set with the current system.

Should there be any? In this way you might corrupt data in the
database. Connect with correct charset (UNICODE_FSS for example),
provide correct data (UTF8 for example) and Firebird will do the rest.

> +1 vote to add an option to disable character set conversion. :)

-1 vote not to add such option.

But sure, you can add this feature in your driver, like an JVM option
(please, do not add a constant to GDS.java because this is an API).

Best regards,
Roman Rokytskyy