Subject Re: [IBO] UTF-8 handling
Author Stefan Heymann
Daniel,

> So let's assume the most standard scenario: The user uses IBObjects
> with unicode-unaware controls and direct data-binding. In this case,
> the transliteration from UTF-8 to the local character set is
> definitely needed. (We're only talking UTF-8 databases here).

Yes, but it is not needed *in IBO*. The fbclient-DLL already does it.

> Having the database in UTF-8 makes the app somewhat portable if it
> applies the local computer's character set as the client-character
> set to the fb-connection. As Stefan already mentioned, the
> fbclient-API will transliterate the incoming UTF-8 to the local
> character set.

Yes.

> Hence, the UTF-8 will already be the correct character set for the
> data-bound controls, provided that the developer has set it
> correctly.

IMHO this should be "Provided that the developer has set the Client
Character Set property correctly, fbclient.dll will already deliver
the correct character set".

> The advanced scenario is that the developer uses unicode-aware
> controls like TMS or TNT. If this is the case, he'll provide the
> UTF-8 charset to the fbclient-API and hence there's no conversion
> necessary.

Yes! No conversion necessary (on IBO side).

> This means that IBO does not play a role in this game, because the
> conversion is already done by the fbclient-API.

No. The fbclient doesn't have to convert anything when the database is
in UTF-8 and the client connection is in UTF-8. However, I agree that
IBO does not play a role either in this case.

> There's one pitfall, though: If the fbclient-API finds characters
> that it was unable to transcode, it'll signal an error and the
> current operation will be aborted.

But only if the Client Character Set is not UTF8.

> This is sometimes undesired behaviour. Sometimes you just want to
> replace unknown characters with a replacement-character, like the
> question mark. If you are not 100% sure that all utf-8 strings that
> are stored in the server can be transliterated to your local
> character set -- and you can virtually NEVER be sure of that --
> you're risking to make your application unusable.

> That's why I would prefer, in the case that I'm using
> unicode-unaware controls and a unicode database, for IBO to do a
> 'soft' conversion with loose error handling.

I can understand what you mean. But having everything go through these
events would be quite time-consuming. However, I don't care as long as
I can have unmodified UTF-8 ;-)


Best Regards

Stefan