Subject Re: [IBO] UTF-8 handling
Author Daniel Albuschat
Hello group,

as I am quite aware of the character-set problems and how UTF-8 and
Unicode works, I'd like to give my two cents on this topic.

First I'd like to note that I hated how Firebird did not provide any
Unicode-compatible character set, only the broken UNICODE_FFS. I'm
glad that Firebird 2.0 has addressed that issue, but unfortunately I
couldn't find the time to switch our code to FB 2.0. :-(

Using any kind of character sets (that is, an 8-bit character encoding
that's extending ASCII) makes your application basically broken for
any other country than you specifically developed it for.
Unfortunately, with Delphi, you usually do write those broken
applications, since the 'standard' controls don't support Unicode.
It's extra work to add unicode-aware controls to your
Delphi-application.

So let's assume the most standard scenario: The user uses IBObjects
with unicode-unaware controls and direct data-binding. In this case,
the transliteration from UTF-8 to the local character set is
definitely needed. (We're only talking UTF-8 databases here). Having
the database in UTF-8 makes the app somewhat portable if it applies
the local computer's character set as the client-character set to the
fb-connection. As Stefan already mentioned, the fbclient-API will
transliterate the incoming UTF-8 to the local character set. Hence,
the UTF-8 will already be the correct character set for the data-bound
controls, provided that the developer has set it correctly.

The advanced scenario is that the developer users unicode-aware
controls like TMS or TNT. If this is the case, he'll provide the
UTF-8 charset to the fbclient-API and hence there's no conversion
necessary.

This means that IBO does not play a role in this game, because the
conversion is already done by the fbclient-API.
There's one pitfall, though: If the fbclient-API finds characters that
it was unable to transcode, it'll signal an error and the current
operation will be aborted. This is sometimes undesired behaviour.
Sometimes you just want to replace unknown characters with a
replacement-character, like the question mark. If you are not 100%
sure that all utf-8 strings that are stored in the server can be
transliterated to your local character set -- and you can virtually
NEVER be sure of that -- you're risking to make your application
unusable.
That's why I would prefer, in the case that I'm using unicode-unaware
controls and a unicode database, for IBO to do a 'soft' conversion
with loose error handling.

This, in turn, leads me to the conclusion that Martijn's solution is
indeed the best. I'd imagine an event

OnTranslateCharacter(const AFromCharset, AToCharset: String; var
AString: string);

For incoming data, FromCharset is the character set on the server and
ToCharset is the client's charset. For outgoing data, From is the
client and To is the server.
You could argue that two separete events are more efficient because
you don't need to do string-comparison to determine which kind of
conversion you need to make. I'd be totally fine with that, too.

You can simply plug the iconv library into this event and 'softly'
transliterate the characters. This is just one function-call, when
using iconv.

Regards,

Daniel Albuschat

--
eat(this); // delicious suicide