firebird-java - Re: [Firebird-Java] Re: Character Set question (practical & philosophical)

Subject	Re: [Firebird-Java] Re: Character Set question (practical & philosophical)
Author	Adriano dos Santos Fernandes
Post date	2005-01-25T01:51:58Z

John Craig wrote:

>
> Hi Gents,
>
> Sorry to make this so obscure. It is true I haven't done much thinking
> about different charsets for different columns; you also have to
> understand I'm a complete newbie when it comes to JayBird--I don't
> know enough to know what to propose, perhaps! (Although I've dealt
> with this same issue on another driver we work with.)
>
> Roman's very close in his prior message:
>
> 1. The user should be able to ignore the DB's charset (or use a
> generic setting) when connecting.
>
> The essential first idea is that the user shouldn't be required to
> specify a particular charset when connecting (it's a Java program
> which works in UCS2, it shouldn't have to care, right?). Roman's got
> the idea exactly when he says I don't want to connect with the DB's
> charset; I just want to connect and not worry about whether this
> client's DB is set up as DOS850, WIN1252, or UNICODE_FSS--the server
> and driver can work that out among themselves. There could be some
> kind of default or setting that allows the driver (or the server,
> either one) to:
> a) determine what the data's charset is and
> b) take care of converting to and from Java's native UCS2.

I missed this point. Java strings is UCS2 and not UTF8.
Then the new behaviour of connecting with NONE do exactly the trick.
Jaybird get the string in original charset (and know what is this charset) and convert to UCS2.

Roman, when it will be possible to connect with UTF16 or UCS2 are there any other reason to have charset package in JayBird?

> Whether the driver or the server handles the convertion isn't really
> the issue either. But, it does seem to me that it should be one or the
> other, not both; to avoid unnecessary overhead. (Converting to UTF8
> and then to UCS2 seems a bit wasteful [although it's undoubtedly a
> very fast conversion] and having both the server and the JDBC driver
> convert certainly seems like a redundancy you'd want to eliminate.)
>
> 2. The JDBC driver's client class can determine the data's original
> charset.
>
> The essential second idea is that the user should be able to determine
> the data's original charset (table or column, as applicable). If life
> were perfect ;-), the driver would provide this information, but since
> we're working within a standard here, it seems perfectly fine to read
> the DB's native charset from the system table. Reading it by a direct
> query to the DB in the case of a column seems a bit clumsy, but given
> how unusual this need is, that's probably the best solution (since
> otherwise you have to provide metadata calls that aren't JDBC standard
> anyway).
>
> Basically, that's it. And it sounds like the work that's happening now
> will make this fairly easy.
>
> Now, there's one more thing I want to be clear on: as I envision this,
> the conversions to and from Java String values (that is, between the
> DB's native charset and Java's native UCS2) should be handled without
> the client class having to include anything to make this happen. For
> instance, if I do a getString() call on a ResultSet, it should give me
> a Java String (in UCS2, just like any normal Java String) that's
> equivalent to the bytes in the cell on the DB (speaking within the
> constraints of Unicode's definition, naturally). When I do an update
> to a cell, I should be able to specify the new value as a Java String
> (I shouldn't have to convert to bytes or do anything like that to make
> sure it gets back to the DB with the bytes that are equivalent to my
> Java String encoded in the appropriate bytes for the charset of the
> data on the DB).
>
> In my odd-ball case (weird non-standard encodings), what I'd do is:
> a) Take the UCS2 String from the getString() call,
> b) Convert back to bytes to match the byte sequence on
> the DB (which I can do because I queried the DB to discover the data's
> native charset),
> c) Manipulate as needed, and then
> d) Convert back to UCS2 for the updateString() call.
> That is, the driver doesn't need to be able to take a byte array and
> write it to a varchar cell on the DB without any conversion.
> (Attempting to put a byte[] in a varchar cell is invalid on the other
> JDBC drivers I've used--and that's okay; I wouldn't propose that
> JayBird/Firebird behave differently in that regard.)
>
> I do want to be sure I'm understanding one thing about this INTL work:
> I assume that the "new INTL" stuff is in process right now, right?
>

Yes, first version is near feature complete.

Adriano