Subject | Re: [Firebird-Java] Re: Trying to run TrackStudio :-) |
---|---|
Author | Mark Rotteveel |
Post date | 2012-06-28T17:55:39Z |
On 28-6-2012 17:07, the_a_rioch wrote:
but for remote connections it already has some performance implications
as it will take several roundtrips.
thing is: using NONE as the connection characteret if none is specified
is probably the only best default there is.
BTW which I believe Jaybird currently doesn't handle well when the blob
characterset deviates from the connection characterset). The driver
should take of conversion here, not Firebird.
especially not if the database connected to is not UTF8.
However maybe a more intelligent algorithm for deciding on the
characterset is possible (eg when NONE is used, try to use the
characterset of the database, if that is NONE as well then use system
encoding). I created http://tracker.firebirdsql.org/browse/JDBC-257 to
look into this.
<snip>
can be as much as 4x the declared length of a VARCHAR or CHAR, even when
sending only one character (see
http://tracker.firebirdsql.org/browse/JDBC-237 )
general UTF-8 or the local system encoding is used for communication (it
really depends a lot on the application).
with binary streams or anything, except between Jaybird and Firebird.
the connection characterset is NONE and the local system encoding is
WINDOWS-1251, then Jaybird can send byte combinations which are not
valid UTF-8 and therefor causes transliteration errors.
Mark
--
Mark Rotteveel
>This is not going to happen, for local connections this might be fine,
>>> If Java apps always run in some pre-defined flavour of unicode, if the database is created in some standard charset (here - UTF8), why cannot JayBird query database and choose best matching default.
>>
>> The characterset has to be specified on connect.
>
> True. That might ask for two-step connect.
> Conect with NONE, read default charset. Reconnect.
but for remote connections it already has some performance implications
as it will take several roundtrips.
>> Also, the connectionTo be honest, I am not entirely happy with the way it works now. But the
>> characterset does not have to match the database characterset at all
>
> Sure. If user overrode connection charset - just obey.
> But if he omitted it, why not to choose best default ?
thing is: using NONE as the connection characteret if none is specified
is probably the only best default there is.
>> (firebird server will translate between db charset and connectionActually, blobs are always sent in their defined characterset (a thing
>> charset; if possible).
>
> But - if to believe Kuzmenko - not for BLOBs, only for VARCHARs.
> Or - another thread at 2.0 times on russian Python forum - BLOB conversion was planned for some future, but not for released to that date servers.
> Dunno if that still applies to FB 2.1.x or 2.5.x
BTW which I believe Jaybird currently doesn't handle well when the blob
characterset deviates from the connection characterset). The driver
should take of conversion here, not Firebird.
>> Guessing for the user could result in 1) decreased performance becauseI admit the documentation should be more explicit about it.
>> it would need to connect twice
>
> which is very fast on FB
>
>> 2) incorrect behavior. You as the
>> developer / db administrator simply have to be explicit when
>> defining the connection.
>
> That is a gotcha. You should have foreknowledge or you're screwed.
> Why not make sensible defaults ?
>
> Person tries FB, it does not work, person ditches FB and goes to suggest against it on every forum.
> Frankly, i can hardly think of situation where charset NONE would give correct behaviour and charset UTF8 would not.The problem is that defaulting to UTF8 will not always work fine,
> I think that Java developers would take pervasive unicode for granted.
>
> I believe it takes rather intimate knowledge of FB and its legacy to know that it should be enforced into unicode-aware connection.
especially not if the database connected to is not UTF8.
However maybe a more intelligent algorithm for deciding on the
characterset is possible (eg when NONE is used, try to use the
characterset of the database, if that is NONE as well then use system
encoding). I created http://tracker.firebirdsql.org/browse/JDBC-257 to
look into this.
<snip>
> BTW, what is that overhead if database and connection both are UTF-8 ?Part of it is the current implementation in Jaybird, but the overhead
>
> Are u talking that some characters have up to 4 bytes in UTF 8 so datastreams tend to inflate ? Or that there is check that each character is valid UTF8 character ? or whatever ?
can be as much as 4x the declared length of a VARCHAR or CHAR, even when
sending only one character (see
http://tracker.firebirdsql.org/browse/JDBC-237 )
> And what is character set used by Java applications themselves ? UCS-2 ? or UTF-8 ? or... ?UCS-2 is an old version of UTF-16. Internally Java uses UTF-16, but in
general UTF-8 or the local system encoding is used for communication (it
really depends a lot on the application).
> I just try to imagine which misunderstanding could happen.All those boundaries simply use Strings and don't concern themselves
>
> TS -> Spring/Hibernate -> JDBC -> Jaybird -> Firebird
>
> At each boundary there are expectations what charset is used to encode binary streams, if expectations mismath things are broken.
with binary streams or anything, except between Jaybird and Firebird.
> But i think that all arrows, except for last, use JRE internal charset, so how could NONE damage anything ?Problem is as I explained in my other e-mail: if the database is UTF-8,
the connection characterset is NONE and the local system encoding is
WINDOWS-1251, then Jaybird can send byte combinations which are not
valid UTF-8 and therefor causes transliteration errors.
Mark
--
Mark Rotteveel