firebird-java - Re: lc

Subject	Re: lc_ctype support
Author	rrokytskyy
Post date	2002-04-09T13:09:35Z

> I can't say I understand this argument very well or at all, however
> I'm finding Roman's side more convincing.

Thanks. :)

> I was under the impressions that all Java strings were unicode, and
> there was no choice in this. Therefore I thought the only problem
> was how to send unicode strings to firebird, depending on the
> firebird db and column char sets. I think problems of properly
> getting national characters into firebird from sources such as web
> pages should not be the firebird driver's problem. From this point
> of view I don't really understand why you should need to specify
> any client encoding, since I thought it was always unicode.

If we agree, the client-side encoding is always UNICODE_FSS (which is
not too bad I think), then we really do not need all this conversion.
Firebird will do (or is supposed to do) this automatically. But we
then have to require people to set at least the default database
encoding. We just pass the result of str.getBytes("UTF8") (not
str.getBytes() because this will not be a unicode stream) to engine.
And this is the responsibility of the engine to store data according
to database/column definition.

> A new, really simple, example might help me.

Let's assume you have a database with default charset NONE and all
tables have charset NONE. You have connection encoding NONE.
Everything you store in the database is the same you provided.
str.getBytes() returns byte[] in default JVM encoding. Data you
stored are the same as data you retrieved.

But now another client with different default JVM encoding connects.
And instead of seeing your data, it will see something different
(unicode representation of the data according to the default JVM
encoding on that machine). So, this is not ok.

Therefore we need encodings.

The main idea in the discussion is whether to provide or not an
option not to convert using this way: national_unicode-

>correct_unicode->connection_encoding->byte[], but national_unicode-
>byte[] directly if we assume that national_unicode and

connection_encoding are the same.

Here the "national_unicode" means that national symbols are using
positions from 0 to 255 (maybe) and not the unicode prescribed
positions even the string is called unicode. You can construct such
strings simply by using new String(byte[] bytes) if the default JVM
encoding does not match the encoding of the bytes.

My opinion is that this adds unnecessary complexity to the driver
even if this can be implemented (some kind of boolean switch).

> In any case, when can I release a binary version of the driver?

I would say that we can do this righ now and enter the alpha or beta
stage (whatever you like). This will give us the possibility to
involve more testing/feature requests from the outer world.

Best regards,
Roman Rokytskyy