Subject | Re: [Firebird-Java] Re: Trying to run TrackStudio :-) |
---|---|
Author | Roman Rokytskyy |
Post date | 2012-06-29T07:15:48Z |
The issue with Java is, that every byte sequence has to be converted to
a Unicode characters. That is Java. Period. Or access them as byte
array. Another period. :)
With your Delphi background you can think of having only wide strings
and doing oem-to-ansi conversion each time you get some raw data from
disk or network. In order to convert a byte steam into a string you
have to make assumptions re in which encoding data is coming in. The
encoding parameter in the connection string tells the server, in which
encoding we want to receive data, not which default encoding should be
used. The translation of database or column encoding is performed on
the server. When you specify NONE encoding, then you explicitly tell
server to perform no translation and hope that client will get it
right. It is a powerful mechanism to solve some problems in
heterogenous environments, but when you use it wrong, you can make
database unusable.
Roman
"the_a_rioch" <ariochthe@...> 29 червня 2012 р. 08:43:07 написав:
a Unicode characters. That is Java. Period. Or access them as byte
array. Another period. :)
With your Delphi background you can think of having only wide strings
and doing oem-to-ansi conversion each time you get some raw data from
disk or network. In order to convert a byte steam into a string you
have to make assumptions re in which encoding data is coming in. The
encoding parameter in the connection string tells the server, in which
encoding we want to receive data, not which default encoding should be
used. The translation of database or column encoding is performed on
the server. When you specify NONE encoding, then you explicitly tell
server to perform no translation and hope that client will get it
right. It is a powerful mechanism to solve some problems in
heterogenous environments, but when you use it wrong, you can make
database unusable.
Roman
"the_a_rioch" <ariochthe@...> 29 червня 2012 р. 08:43:07 написав:
> > > It seems to work, even built-in initialization works.
> > > Yet... the fact that they did not identified problem hints at
> broken expectations...
> >
> > You said they use Linux themselves. Under Linux the default system
> > encoding is usually UTF-8.
>
> In modern Linux. Say, for last 3-5 years.
> In old-school Linux it was different.
> And what about BSD servers if they would by chance run Jetty on them ?
>
> So if they had old machines or traditional installs, their clients or
> servers might still use KOI8-R locale.
>
> > In that case the NONE charset applied by
> > Jaybird will work transparently for them if the database characterset is
> > UTF-8 as well. With NONE, Jaybird will use the system encoding to
> > convert between Java strings to bytes.
>
> Oooops, i suspected it.
> WHY would it convert ?
> http://www.firebirdsql.org/refdocs/langrefupd21-notes-charset-none.html
>
> Isn't NONE supposed to be raw binary, intentionally lacking context and
> charset, rather than anything having context and convertable ???
>
> Isn't THIS an unexpected behaviour ?
>
> > However when running on Windows, the default characterset for you is
> > probably WINDOWS-1251. In that case Jaybird will use that characterset
> > to convert between Java strings and bytes.
>
> That looks very probable.
> As my screenshots show, i saw in www extra-conversion, typical "UTF8
> bytes mis-interpreted for being win1251"
> FlameRobin did converion of script and FDB kept UTF8 data, yet Jaybird
> did not knew it...
>
> But Jaybird could query charset of that column, couldn't it ?
> For what i rememver in TS they are VarChar(2048) not BLOBs
>
> Okay, i do not suggest you to guess on per-column level (though why
> not?) i just stress that you still do guess-work and still do
> transcoding that - to the letter - you should not do.
>
> I frankly feel like connection-time guesswork, documented and maybe
> checkable by program (can program request Jaybird "what is current
> connection charset?" ?) would be less "unexpected behaviour" than
> current heuristics behind the hood.
> To me NONE looks liek zero and non-specified looks like NULL.
> If connection is not specified - then you can sue defaults or
> heuristics or whatever. If connection is specified to be NONE then
> you're banned from any transcoding. That is how i see it theoretically
> consistent and reasonable.
>
> > The problem now is that those
> > bytes may contain byte combinations which are not valid UTF-8 so upon
> > storing in the database a transliteration error will occur.
>
> Truly so.
> Program (most probably) gave JB UTF-8 stream, that JB converted to
> win1251 and passed through NONE-connection to server, whci expected
> them = to the letter - to be no-charset bionary copy of his choosen
> UTF8 data. Which it frankly was not and could not conform too.
>
> It was obvious about double-conversion. Now you described where and why
> it happened.
>
>
>
> ------------------------------------
>
> Yahoo! Groups Links
>
>
>