Subject | Re: [Firebird-Java] Re: lc_ctype support |
---|---|
Author | Marczisovszky Daniel |
Post date | 2002-04-09T20:15:43Z |
>> I can't say I understand this argument very well or at all, howeverr> Thanks. :)
>> I'm finding Roman's side more convincing.
I also agree with Roman :))) Although I wanted to show that there are
problems that simply can not be solved with current solution.
r> If we agree, the client-side encoding is always UNICODE_FSS (which is
r> not too bad I think), then we really do not need all this conversion.
r> Firebird will do (or is supposed to do) this automatically. But we
r> then have to require people to set at least the default database
r> encoding. We just pass the result of str.getBytes("UTF8") (not
r> str.getBytes() because this will not be a unicode stream) to engine.
r> And this is the responsibility of the engine to store data according
r> to database/column definition.
I'll be very happy if that works!!!!
>> A new, really simple, example might help me.r> Let's assume you have a database with default charset NONE and all
r> tables have charset NONE. You have connection encoding NONE.
r> Everything you store in the database is the same you provided.
r> str.getBytes() returns byte[] in default JVM encoding. Data you
r> stored are the same as data you retrieved.
r> But now another client with different default JVM encoding connects.
r> And instead of seeing your data, it will see something different
r> (unicode representation of the data according to the default JVM
r> encoding on that machine). So, this is not ok.
r> Therefore we need encodings.
I've never said we don't need them. They are really important, your
example is a really good one. In many cases I want to switch them off,
because *many* Java subsystem does not care about the default
encoding, but they use ISO-8859-1.
r> The main idea in the discussion is whether to provide or not an
r> option not to convert using this way: national_unicode-
>>correct_unicode->connection_encoding->byte[], but national_unicode-r> connection_encoding are the same.
>>byte[] directly if we assume that national_unicode and
Actually now the driver does the following:
when you read from the database:
byte[] -> correct_unicode
when you write to the database:
correct_unicode -> national_unicode -> byte[]
but it requires you to pass correct_unicode. correct to national
conversion is done by using lc_ctype, but probably that is not good
when you need to write a column with different character set from
lc_ctype.
I can ask again what to do with columns with different character sets.
r> Here the "national_unicode" means that national symbols are using
r> positions from 0 to 255 (maybe) and not the unicode prescribed
r> positions even the string is called unicode. You can construct such
r> strings simply by using new String(byte[] bytes) if the default JVM
r> encoding does not match the encoding of the bytes.
r> My opinion is that this adds unnecessary complexity to the driver
r> even if this can be implemented (some kind of boolean switch).
Everything can be implemented ;) Moreover only one boolean field
should be added to FBManagedConnection, and when is set NONE should be
returned by getIscEncoding:
public String getIscEncoding() {
*****
if (pleaseDontConvertMyAlreadyConvertedString)
return "NONE";
*****
try {
String result = cri.getStringProperty(GDS.isc_dpb_lc_ctype);
if (result == null) result = "NONE";
return result;
} catch(NullPointerException ex) {
return "NONE";
}
}
That's all. The option can be passed with the Properties in the
connect() in FBDriver.
best wishes,
Daniel