Subject Re: Trying to run TrackStudio :-)
Author the_a_rioch
> > It seems to work, even built-in initialization works.
> > Yet... the fact that they did not identified problem hints at broken expectations...
>
> You said they use Linux themselves. Under Linux the default system
> encoding is usually UTF-8.

In modern Linux. Say, for last 3-5 years.
In old-school Linux it was different.
And what about BSD servers if they would by chance run Jetty on them ?

So if they had old machines or traditional installs, their clients or servers might still use KOI8-R locale.

> In that case the NONE charset applied by
> Jaybird will work transparently for them if the database characterset is
> UTF-8 as well. With NONE, Jaybird will use the system encoding to
> convert between Java strings to bytes.

Oooops, i suspected it.
WHY would it convert ?
http://www.firebirdsql.org/refdocs/langrefupd21-notes-charset-none.html

Isn't NONE supposed to be raw binary, intentionally lacking context and charset, rather than anything having context and convertable ???

Isn't THIS an unexpected behaviour ?

> However when running on Windows, the default characterset for you is
> probably WINDOWS-1251. In that case Jaybird will use that characterset
> to convert between Java strings and bytes.

That looks very probable.
As my screenshots show, i saw in www extra-conversion, typical "UTF8 bytes mis-interpreted for being win1251"
FlameRobin did converion of script and FDB kept UTF8 data, yet Jaybird did not knew it...

But Jaybird could query charset of that column, couldn't it ?
For what i rememver in TS they are VarChar(2048) not BLOBs

Okay, i do not suggest you to guess on per-column level (though why not?) i just stress that you still do guess-work and still do transcoding that - to the letter - you should not do.

I frankly feel like connection-time guesswork, documented and maybe checkable by program (can program request Jaybird "what is current connection charset?" ?) would be less "unexpected behaviour" than current heuristics behind the hood.
To me NONE looks liek zero and non-specified looks like NULL.
If connection is not specified - then you can sue defaults or heuristics or whatever. If connection is specified to be NONE then you're banned from any transcoding. That is how i see it theoretically consistent and reasonable.

> The problem now is that those
> bytes may contain byte combinations which are not valid UTF-8 so upon
> storing in the database a transliteration error will occur.

Truly so.
Program (most probably) gave JB UTF-8 stream, that JB converted to win1251 and passed through NONE-connection to server, whci expected them = to the letter - to be no-charset bionary copy of his choosen UTF8 data. Which it frankly was not and could not conform too.

It was obvious about double-conversion. Now you described where and why it happened.