Subject Re: Trying to run TrackStudio :-)
Author the_a_rioch
> > True. That might ask for two-step connect.
> > Conect with NONE, read default charset. Reconnect.
>
> This is not going to happen, for local connections this might be fine,
> but for remote connections it already has some performance implications
> as it will take several roundtrips.

Is it common setup that Java app connects directly to FB server over long multi-hop network, rather than intranet LAN, without some intermediate middleware ?

May this behaviour be optional maybe?
For i saw double-connection in similar pattern for implementing user/passwords layer independent from server's one. Program probes for given credentials, if they suit it it reads the ROLE and server's credentials and reconnects.

In some setups that could make connections slow, but IMHO they are not typical, though of course we typically saw FBs in different niches.

> >> Also, the connection
> >> characterset does not have to match the database characterset at all
> >
> > Sure. If user overrode connection charset - just obey.
> > But if he omitted it, why not to choose best default ?
>
> To be honest, I am not entirely happy with the way it works now. But the
> thing is: using NONE as the connection characteret if none is specified
> is probably the only best default there is.

I think it is not, but this sub-branch seems to split again. I wrote in previous mail why not.
Up to the point that i tend to think that better alternative could be refusing to connect until charset specified, even in form ?encoding=GO-GUESS-I-APPROVE-IT

> >> (firebird server will translate between db charset and connection
> >> charset; if possible).
> >
> > But - if to believe Kuzmenko - not for BLOBs, only for VARCHARs.
> > Or - another thread at 2.0 times on russian Python forum - BLOB conversion was planned for some future, but not for released to that date servers.
> > Dunno if that still applies to FB 2.1.x or 2.5.x
>
> Actually, blobs are always sent in their defined characterset

with that characterset bundled, or driver should separately query "what charset it was supposed to be for column/blob-id x?")

> (a thing
> BTW which I believe Jaybird currently doesn't handle well when the blob
> characterset deviates from the connection characterset). The driver
> should take of conversion here, not Firebird.

Scanning through Jaybird_2_1_JDBC_driver_manual...
Out of line, that table of mapping between Java and FB types, could you add there numeric codes like that "-1" ?
Why for Javist it would be lame question, "use the source, Luke" for outsiders that would require an investigation "where and what to search for".

Frankly, i can't imagine how would you made enhanced Hibernate Fb Dialect, if max varchar can be 32K, 16K or 8K depending on charset and collation...

Ok, back to manual. The typo:
"This mainly applies to the cases when Cyrillic characters are stored in
the database, since each character would occupy two bytes of the network packet,
which in turn might cause additional roudtrips to the server to fetch data."
rouNdtrip

This probably can be alleviated by specifying larger packet size wither for TCP stack or for FB itself ?

Is it the only source of speed degradation ? I consider UTF 8 sanity check and transcoding if needed
1) neglectible for CPU is much faster than HDD
2) not needed - hence just skipped - if both connection and database are UTF8


After reading "Appendix D. Character Encodings" i think that current heuristics about NONE has just no sense at all.
There is UTF-16 charset in Java, there is some unknown charset in DB, and you introduce yet another, 3rd charset !
Okay, imagine that heavy-loaded FB server is running on Linux. Client desktops on Windows. Under this logic, there would be (assuming Russian as the language):

1) database language - unknown yet before connection. Valid guesses would be NONE, UTF8, Win1251 (if database was designed on local Delphi/IBExpert/etc Windows box, then moved to server), KOI8-R (if designed on Linux from start).
2) default OS charset (file.encoding) on server would be KOI8-R, unknown before connection
3) default OS charset (file.encoding) on client would be 1251
4) Java application internal charset would be UTF-16 (for Russian it would be the same as UCS-2, used by Windows API, if i get it right)
5) Jaybird's heuristics would ( from my POV - ignoring and overriding connection's NONE charset) assume connection charset charset of 3) which has high chances to differ from server's one.


I believe you had a rational and good thought of choice for defaults but i just cannot see any rationale behind that.
Sorry for being harsh, that is just how i really feel - completely lost on that subject.

(***************)

Now, frankly, if you want to "work to the rules", then NONE is NONE. No conversion. Yes, FB does not know about UTF-16. but... we told you that NONE is binary dump, you get what you (knowingly or unknowingly" asked for. The earlier it crash - the earlier you should know about it and fix it, when your DB is 1MB size, not 1GB size.

If u want to be flexible - assume UTF8. Those who wanna better network throughput, let them read "optimization HowTo" and fix charset in some OS-native flavor.

If u assumed UTF8 and on connection found it is legacy FB 1.x - cry warning and re-connect. Would user be bothered by 2-connecting, let him either upgrade server at last, or learn to specify connection.

Or at least, if you assumed some kind of it like KOI8 or 1251, but after connection you found that the guesswas wrong and the DB is in another connection - let it be UTF8 or non-Russian one - cry warning and reconnect. Better be slow, than crashing in best case or damaging data in worst.

(***************)

No, frankly, let's imagine i am just an average SysOp on some co-location hoster.
I know MySQL to its internals and every gotcha - just for the fact that it been using for years in LAMP.
I don't know what firebird is, i only know that some crazy client wants it when there are a lot of real servers like MySQL and PostgreSQL. But okay, he wants his fun for his mony so lets it be.
Does my BSD ports / Linux repo have a firebird package... it does. Okay, be it. Turnkey no-brainer installation complete.

Now user asks me to install some JSP web-app. Year, nop problem, did it for years.
( if u don;t like sysop/user, then assume some corporation and sysadmin/developer pair, that still have rather different experience domains)

Create database ? how should i... google... Oh, there is ISQL sample, copypaste to script. (most documentation shows ISQL as least common documentation. You saw it youself in TrackStudio docs. But can you tell me with 100% assuredness and without querying docs or anything, which SQL dialect and charset would use ISQL if just started and ordered to CREATE DATABASE without anything specified anywhere ? I can not. I would hope for dialect 3 and guess that charset NONE, but it only would be guesses. But me and you, we would fear uncertainty and check the manuals. Average Joe would "just do it" without even realisyng he had the choice. That is why GUI tools better, they show those textboxes and Joe starts to think "what should i put there? those boxes probably mean something". ISQL is not that hinting. )
So here we are, we created database with hopefully 3rd dialect and probably NONE charset (if only ISQL did not engaged some heuristics like Jaybird do. Then he would have KOI8-R)

Now user demands backing up. No problem, i read man for gbak, i do a simple scripts+webform for user and also put the script to cron.

Now user fetches the backup, modifies it locally, and uploads another backup. Let's assume he changed DB default charset.
Why ? maybe he noted ONE and decided to fix it. Maybe he recreated it from script with some desktop IDE like IBExpert. Maybe he now has customers from all the globe and needed to start supporting unicode...
He even might not have any FB experience, just got the recipe from internet and did it letter by letter. He may even not understand all implications and changed. He wished it to be turnkey "do this and that and you have it"

Now he restores that database and his web-app, perfectly running before... Either crashes or starts corrupting data.

SysOp does have no experience in FB nor JB.
User has no experience in Java and Linux.

How would they find they need to tweak JDBC URL, if it worked ever before and they did not had experience to put there encoding ever before ?

Outcome? SysOp tells him "i told you to use PostgreSQL( or MySQL). Everyone uses them. You think you're most smart - then go and fix it yourself". User has to agree "yeah. People know what they do... Help me to switch the server to reliable one. Would alert anyone if i saw he thinks to ever use FB". Fin.
(***************)

Also that appendix D in manual seems full of ambiguous wording to my eye.
"When the NONE character set is used, the driver does not know how to interpret the received data. "
Driver is just a intermediate bridge. So was it about data, it receives from server ? Or was it data it receives from JDBC/application ? And why it "does not know" if it can query the server while reading schema and it can assume UTF-16 on JDBC endpoint ?

cutting it here, it is getting too long :-)