Subject Re: Character Set question (practical & philosophical)
Author John Craig
Hi Gents,

Sorry to make this so obscure. It is true I haven't done much thinking
about different charsets for different columns; you also have to
understand I'm a complete newbie when it comes to JayBird--I don't
know enough to know what to propose, perhaps! (Although I've dealt
with this same issue on another driver we work with.)

Roman's very close in his prior message:

1. The user should be able to ignore the DB's charset (or use a
generic setting) when connecting.

The essential first idea is that the user shouldn't be required to
specify a particular charset when connecting (it's a Java program
which works in UCS2, it shouldn't have to care, right?). Roman's got
the idea exactly when he says I don't want to connect with the DB's
charset; I just want to connect and not worry about whether this
client's DB is set up as DOS850, WIN1252, or UNICODE_FSS--the server
and driver can work that out among themselves. There could be some
kind of default or setting that allows the driver (or the server,
either one) to:
a) determine what the data's charset is and
b) take care of converting to and from Java's native UCS2.
Whether the driver or the server handles the convertion isn't really
the issue either. But, it does seem to me that it should be one or the
other, not both; to avoid unnecessary overhead. (Converting to UTF8
and then to UCS2 seems a bit wasteful [although it's undoubtedly a
very fast conversion] and having both the server and the JDBC driver
convert certainly seems like a redundancy you'd want to eliminate.)

2. The JDBC driver's client class can determine the data's original
charset.

The essential second idea is that the user should be able to determine
the data's original charset (table or column, as applicable). If life
were perfect ;-), the driver would provide this information, but since
we're working within a standard here, it seems perfectly fine to read
the DB's native charset from the system table. Reading it by a direct
query to the DB in the case of a column seems a bit clumsy, but given
how unusual this need is, that's probably the best solution (since
otherwise you have to provide metadata calls that aren't JDBC standard
anyway).

Basically, that's it. And it sounds like the work that's happening now
will make this fairly easy.

Now, there's one more thing I want to be clear on: as I envision this,
the conversions to and from Java String values (that is, between the
DB's native charset and Java's native UCS2) should be handled without
the client class having to include anything to make this happen. For
instance, if I do a getString() call on a ResultSet, it should give me
a Java String (in UCS2, just like any normal Java String) that's
equivalent to the bytes in the cell on the DB (speaking within the
constraints of Unicode's definition, naturally). When I do an update
to a cell, I should be able to specify the new value as a Java String
(I shouldn't have to convert to bytes or do anything like that to make
sure it gets back to the DB with the bytes that are equivalent to my
Java String encoded in the appropriate bytes for the charset of the
data on the DB).

In my odd-ball case (weird non-standard encodings), what I'd do is:
a) Take the UCS2 String from the getString() call,
b) Convert back to bytes to match the byte sequence on
the DB (which I can do because I queried the DB to discover the data's
native charset),
c) Manipulate as needed, and then
d) Convert back to UCS2 for the updateString() call.
That is, the driver doesn't need to be able to take a byte array and
write it to a varchar cell on the DB without any conversion.
(Attempting to put a byte[] in a varchar cell is invalid on the other
JDBC drivers I've used--and that's okay; I wouldn't propose that
JayBird/Firebird behave differently in that regard.)

I do want to be sure I'm understanding one thing about this INTL work:
I assume that the "new INTL" stuff is in process right now, right?

In any case, gentlemen, thanks for your time! I appreciate your
working to understand this rather complex, and admittedly, unusual
sitation I'm faced with.

John