Subject Re: [Firebird-Java] Re: Approaches at JB-to-FB conenctions regarding charsets
Author Roman Rokytskyy
> Are there some statistics, how long takes establishing a connection
> and hence what would be the penalty for two-phase mode ?

Establishing connection is a sub-second op. But can you guarantee that
the database charset is not changed between connections? Unlikely - yes.
Sure - no. But in the database world we have either TRUE or FALSE, not
VERY LIKELY.

> a. wire protocol is extended to query/specify charset after
> connection is made
> b. if client connects without specifying charset, then connection is
> "restricted"
> c. in restricted mode client can
> c.1: specify choosen connection, switching connection to full mode
> c.2: ask server which charset it considers broad enough (least common
> charset)
> c.3: read metadata, if client wants to choose a charset w/o server
> hints (say, want to reduce network load and choose some SBCS): only
> RDB$... tables and views, only 7-bit ASCII strings and numbers, only
> read, no side-effects - no UFDs nor SPs,
>
> This would need both client and server orchestrated update.
> This would add single extra after-connect roundtrip, negotiating a
> charset, which is comparable to your proposal of covertly converting
> to DB default charset (which still would be needed to query)

Easier is to ensure that no new database has NONE and let those, who
use NONE do their dirty work further.

>> If we would default to connect with UTF-8 either you get all kinds
>> of
>> transliteration errors or data is now stored in two different
>> encodings:
>> in other words you introduce logical data corruption.
>
> Yes, transliteration errors like
> http://www.trackstudio.ru/forum/download/file.php?id=212
> User would see it, panic and call HelpDesk instead of working noe the
> less - so, no data corruption. Until admins would mend the connection
> mode (even at least ugly and dirty fixing ?encoding=NONE) db would
> not
> be usable for humans,so they would not enter new data.

Yes.

>> The problem is that when the database charset and local system
>> encoding
>> are different when using NONE, in most cases you won't be able to
>> detect
>> the logical data corruption that will occur when Jaybird changes its
>> behavior, so Jaybird will have no way to fail early.
>
> To me it looks quite opposite.
> http://www.trackstudio.ru/forum/download/file.php?id=212
> Users would instantly detect and "cry early"

Aha, and what about the applications that compute in background?

>> > P4: deny connections without explicitly specified charset. If NONE
>> is specified explicitly - do NO conversion at all, as told in FB docs.
>
> no critical comments from you

Except that there is no such thing "NO conversion" in Java, it will use
environment settings.

.....

> Except for absolutely screwed sorting and partially text search
> I wonder if i would send then explicit statement like "EXECUTE BLOCK"
> with russian text constants inside, what would be...

depends on your connection encoding.

> If i would create new table then (software version updating and
> adjusting schema), would it inherit WIN1252 of database or WIN1251 of
> de-facto connection mode or NONE of de jure connection mode...

It will inherit on the fly, since you did not specify charset for the
column. Has nothing to do with the client (fbclient.dll, Jaybird or
.NET).

> In a way that users would see garbage, cease operations and request
> admin to fix it.

Not an option, since things might get screwed before somebody notices.


> Now, i think that clients are expected to behave similar to each
> other, since they are all official.
> So let's imagine your proposal was ported to .NetProvider and
> fbclient/fbembed.
> Now imagine someone configured Jaybuird to use fbclient/fbembed.
> He does not specify connection charset, so fbclient/fbembed does
> covert transcoding while pretending they use NONE, then Jaybird
> seconds and makes yet another transcoding while pretending to use
> NONE.
> "Thames, ^W
> "Double conversion, sire!"

No double conversion - neither piece of code will apply any conversion
if NONE is used (explicitly or implicitly). The conversion to string
with platform's default encoding will happen far above the code that
performs network calls. That is the requirement of Java, you have to
live with it. Below that level only bytes are shifted over the wire.

Roman