Subject Re: Fw: [Firebird-Architect] Re: Character Set Support
Author Peter Jacobi by way ofPeter Jacobi
Hi Jim,

(Is it O.K to CC this to the list?)

Fighting the charset/collation issues in FB can get very frustrating. It has
very advanced
concepts and designs, but at several places "the last 10%" are missing, to
make it actually
work. And most developers and users just don't care. (See for recent
examples the
newly filed bug that path uppercasing messes up DBCS filenames, or Jim
Starkey doing
a complete new configuration module, which is totally unaware of charset
issues [folks, link to
some XML parser and libiconv and charset support is free]). Sorry for
ranting.

> > b) Is the only real option as far as I can tell, but in order to get
> > parameters to work, you'd have to make a patch similar to the one I
> > suggest. The question is whether this is safe or appropriate? The
> > heavy hitters seem to be fighting other battles at the moment, but
> > maybe I'll get a chance to ask Jim Starkey how he intended it to work
> > ... The only other alternative might be to try to define a new
> > character set that ignored all collations, but that seems to me to be
> > what NONE should have been.

I'm 50/50 at this issue. Perhaps it is more safe to have a new charset
TRANSPARENT (or RAW). NONE is just insane and I assume it is best left
alone.

> > c) The internal buffers get allocated using the byte size of the
> > original column, so it is actually a very bad idea to generally
> > connect as UNICODE_FSS, unless you can guarantee that your data
> > representation in UNICODE_FSS never exceeds the size of the original,
> > or you don't use parameters and can supply a LOT of casts.

This is the #$%@! of UNICODE_FSS. Wading through the sources and unable to
find
a way to fix this, this was the last straw making me re-consider whether I
would be
really able to contribute to the project.

> > Anyway, thanks for taking the time to answer - so far only you and
> > Helen have had any suggestions ... hopefully my suggested patch is
> > acceptable (I've been running with it for a couple of months and not
> > seen any problems).

Whatever the old hands will accept (NONE or TRANSPARENT) will make FB I18n
less painfull.

BTW: If only UNICODE_FSS or UTF16BE would support "all" collations, would it
fit your
application to store the data in one of those? Or do you require the space
savings of a
"best match" character set?

Regards,
Peter Jacobi

--
+++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++
100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz