Subject Re: [Firebird-Java] Re: lc_ctype support
Author Marczisovszky Daniel
>> I can't say I understand this argument very well or at all, however
>> I'm finding Roman's side more convincing.

r> Thanks. :)

I also agree with Roman :))) Although I wanted to show that there are
problems that simply can not be solved with current solution.

r> If we agree, the client-side encoding is always UNICODE_FSS (which is
r> not too bad I think), then we really do not need all this conversion.
r> Firebird will do (or is supposed to do) this automatically. But we
r> then have to require people to set at least the default database
r> encoding. We just pass the result of str.getBytes("UTF8") (not
r> str.getBytes() because this will not be a unicode stream) to engine.
r> And this is the responsibility of the engine to store data according
r> to database/column definition.

I'll be very happy if that works!!!!


>> A new, really simple, example might help me.

r> Let's assume you have a database with default charset NONE and all
r> tables have charset NONE. You have connection encoding NONE.
r> Everything you store in the database is the same you provided.
r> str.getBytes() returns byte[] in default JVM encoding. Data you
r> stored are the same as data you retrieved.

r> But now another client with different default JVM encoding connects.
r> And instead of seeing your data, it will see something different
r> (unicode representation of the data according to the default JVM
r> encoding on that machine). So, this is not ok.

r> Therefore we need encodings.

I've never said we don't need them. They are really important, your
example is a really good one. In many cases I want to switch them off,
because *many* Java subsystem does not care about the default
encoding, but they use ISO-8859-1.

r> The main idea in the discussion is whether to provide or not an
r> option not to convert using this way: national_unicode-
>>correct_unicode->connection_encoding->byte[], but national_unicode-
>>byte[] directly if we assume that national_unicode and
r> connection_encoding are the same.

Actually now the driver does the following:
when you read from the database:
byte[] -> correct_unicode

when you write to the database:
correct_unicode -> national_unicode -> byte[]

but it requires you to pass correct_unicode. correct to national
conversion is done by using lc_ctype, but probably that is not good
when you need to write a column with different character set from
lc_ctype.

I can ask again what to do with columns with different character sets.

r> Here the "national_unicode" means that national symbols are using
r> positions from 0 to 255 (maybe) and not the unicode prescribed
r> positions even the string is called unicode. You can construct such
r> strings simply by using new String(byte[] bytes) if the default JVM
r> encoding does not match the encoding of the bytes.

r> My opinion is that this adds unnecessary complexity to the driver
r> even if this can be implemented (some kind of boolean switch).

Everything can be implemented ;) Moreover only one boolean field
should be added to FBManagedConnection, and when is set NONE should be
returned by getIscEncoding:

public String getIscEncoding() {
*****
if (pleaseDontConvertMyAlreadyConvertedString)
return "NONE";
*****


try {
String result = cri.getStringProperty(GDS.isc_dpb_lc_ctype);
if (result == null) result = "NONE";
return result;
} catch(NullPointerException ex) {
return "NONE";
}
}

That's all. The option can be passed with the Properties in the
connect() in FBDriver.

best wishes,
Daniel