Subject Re: lc_ctype support
Author rrokytskyy
> I also agree with Roman :))) Although I wanted to show that there
> are problems that simply can not be solved with current solution.

:) Ok, but, as usual, I try to find the way how to solve that very
problem with current solution. lc_ctype=UNICODE_FSS works, so I need
more real-world examples. :)

> r> If we agree, the client-side encoding is always UNICODE_FSS
> r> (which is not too bad I think), then we really do not need all
> r> this conversion.
> r> Firebird will do (or is supposed to do) this automatically. But
> r> we then have to require people to set at least the default
> r> database encoding. We just pass the result of str.getBytes
> r> ("UTF8") (not str.getBytes() because this will not be a unicode
> r> stream) to engine.
> r> And this is the responsibility of the engine to store data
> r> according to database/column definition.
>
> I'll be very happy if that works!!!!

Well, check my latest unit test version. :)

> r> Therefore we need encodings.
>
> I've never said we don't need them. They are really important, your
> example is a really good one. In many cases I want to switch them
> off, because *many* Java subsystem does not care about the default
> encoding, but they use ISO-8859-1.

Default encoding is not always ISO-8859-1. In my case it is Cp1252 if
I have Germany in regional settings and Cp1251 in case of Ukraine.

> r> The main idea in the discussion is whether to provide or not an
> r> option not to convert using this way: national_unicode-
> >>correct_unicode->connection_encoding->byte[], but
> r>national_unicode-
> >>byte[] directly if we assume that national_unicode and
> r> connection_encoding are the same.
>
> Actually now the driver does the following:
> when you read from the database:
> byte[] -> correct_unicode

Right. Note, that byte[] contains data in the encoding specified by
lc_ctype.

> when you write to the database:
> correct_unicode -> national_unicode -> byte[]

Wrong. Only correct_unicode -> byte[], and byte[] contains data in
encoding specified by lc_ctype. "national_unicode" involves default
JVM encoding, and this was used before, but not after the lc_ctype is
present.

> but it requires you to pass correct_unicode. correct to national
> conversion is done by using lc_ctype, but probably that is not good
> when you need to write a column with different character set from
> lc_ctype.

It is definitely bad idea to have lc_ctype=WIN1252 and write to
WIN1251 column. You will get an exception from the Firebird. But, if
you specify the lc_ctype=UNICODE_FSS, you are able to write to
WIN1251 and WIN1252 columns simultaneously.

> I can ask again what to do with columns with different character
> sets.

Use lc_ctype=UNICODE_FSS. :)

> Everything can be implemented ;) Moreover only one boolean field
> should be added to FBManagedConnection, and when is set NONE should
> be returned by getIscEncoding:
>
> public String getIscEncoding() {
> *****
> if (pleaseDontConvertMyAlreadyConvertedString)
> return "NONE";
> *****
>
>
> try {
> String result = cri.getStringProperty
> (GDS.isc_dpb_lc_ctype);
> if (result == null) result = "NONE";
> return result;
> } catch(NullPointerException ex) {
> return "NONE";
> }
> }

I had problems with such solution. However, instead of adding this to
FBManagedConnection, I just commented code in FBField. And 0xF5 char
is converted to '?' by _JVM_ in the call str.getBytes(). I repeat
again, my JVM default encoding is Cp1252. Do you want such behaviour?
I doubt.

> That's all. The option can be passed with the Properties in the
> connect() in FBDriver.

This might work in your case, but definitely does not in mine. And
the problem here is uncertainty introduced by the JVM default
encoding that cannot be controlled.

Best regards,
Roman Rokytskyy