Subject Re: Trying to run TrackStudio :-)
Author the_a_rioch
> > If Java apps always run in some pre-defined flavour of unicode, if the database is created in some standard charset (here - UTF8), why cannot JayBird query database and choose best matching default.
>
> The characterset has to be specified on connect.

True. That might ask for two-step connect.
Conect with NONE, read default charset. Reconnect.

> Also, the connection
> characterset does not have to match the database characterset at all

Sure. If user overrode connection charset - just obey.
But if he omitted it, why not to choose best default ?

> (firebird server will translate between db charset and connection
> charset; if possible).

But - if to believe Kuzmenko - not for BLOBs, only for VARCHARs.
Or - another thread at 2.0 times on russian Python forum - BLOB conversion was planned for some future, but not for released to that date servers.
Dunno if that still applies to FB 2.1.x or 2.5.x

> Guessing for the user could result in 1) decreased performance because
> it would need to connect twice

which is very fast on FB

> 2) incorrect behavior. You as the
> developer / db administrator simply have to be explicit when
> defining the connection.

That is a gotcha. You should have foreknowledge or you're screwed.
Why not make sensible defaults ?

Person tries FB, it does not work, person ditches FB and goes to suggest against it on every forum.

Frankly, i can hardly think of situation where charset NONE would give correct behaviour and charset UTF8 would not.
I think that Java developers would take pervasive unicode for granted.

I believe it takes rather intimate knowledge of FB and its legacy to know that it should be enforced into unicode-aware connection.


> > Yes, potentially different tables/columns MIGHT have different charsets, but that is marginally rare case and that is when one could override default by JDBC URL options.
> >
> > I would ask TS to implement that parameter, since they had some built-in language selection. But overall to me it looks a bit outdated approach.
>
> Looking at the documentation at
> http://www.trackstudio.com/connecting-firebird.html

Looking at that screnshot with GDB extension, i'd think they know very little of FB/Win and that is kinda the experience u might expect from average Java developer with some annoying user asking him "FB too"
He might give it short try, but would bail out after 1st non-obvious failure.

> the developers of TS
> don't need to do anything. You just need to include the charSet or
> encoding property in the JDBC url specified in the
> hibernate.connection.url property.

Yes, i thought of that and would try it.
However, the very fact that they do not document it shows that they do not know it !
Tehy tell to specify UTF8 when creating, but then they tell that FB/JB are not unicode capable and only can run KOI8-R.
Remember what i told u about KOI8-R ? Connect it to the fact that they use Linux as main devel/support boxes.

I scanned their howto's and except for MySQL they never specify connection charset, they take it for granted.

I think that on Linux Rus that would result in everything been transcoded to KOI8-R (though NONE, if i get it right, should disable any transcoding)
http://www.firebirdsql.org/refdocs/langrefupd21-notes-charset-none.html

That makes sense to me.

You may call them illiterate. They might call you putting rooks onto their track. you would both be kind of right, but the result would be alienation between FB and webservices app.

Frankly, their documentationshows UTF8 enforced for MySQL, to me it somehow mean that MySQLalso has similar gotcha, but they succeeded to inform developers of it. FB/JB did not. Communication breakdown that might result in poor opinion about FB/JB.

> All JDBC drivers work similar to this. You can question the decision to
> use characterset NONE as the default, but apart from UTF8 it is the only
> sensible default.

Why, database default or application default to me seem reasonable too. And UCS-2/UCS4, though probably not supported, could be reasonable too.

> Besides, UTF-8 has the downside that it results in
> significant protocol overhead which you usually don't need if you only
> use your local language.

The problem might begin when you have more than single representation for the language...

And Java frequently used to make WWW services, that tends to be used from all the countries with all the languages

BTW, what is that overhead if database and connection both are UTF-8 ?

Are u talking that some characters have up to 4 bytes in UTF 8 so datastreams tend to inflate ? Or that there is check that each character is valid UTF8 character ? or whatever ?

And what is character set used by Java applications themselves ? UCS-2 ? or UTF-8 ? or... ?

I just try to imagine which misunderstanding could happen.

TS -> Spring/Hibernate -> JDBC -> Jaybird -> Firebird

At each boundary there are expectations what charset is used to encode binary streams, if expectations mismath things are broken.

But i think that all arrows, except for last, use JRE internal charset, so how could NONE damage anything ?

I only can see if TS employed explicit charset conversion believing that JB/FB are charset-illiterate and can be instructed to do so...

> No it actually isn't that involved at all (AFAIK). You wouldn't need to
> subclass datatypes.

not subclass, but override, map, whatever u call it.
It just would not be enough for XWiki to overcome single datatype i believe