Subject Re: [IBO] UTF-8 handling
Author Stefan Heymann
> If I don't use transliteration from the UTF-8 that comes from the
> client then there won't be any transliteration when the data is
> presented to the user in the local character set.

Yes. A user that uses UTF8 as the client character set IMHO *expects*
to get UTF8 from AsString. The same way he expects to get Windows-1252
strings when he selects WIN1252 as the Client character set.

> So, by default I am invoking the Delphi routine to do that
> transliteration from UTF-8 to the local characterset automatically.

I want to write a multi-language application. An application that can
deal with strings from English, German, Czech, Russian, Hebrew, etc.
For that, the normal Delphi VCL controls don't find anyway, I have to
use Unicode aware controls like TNTWare Controls. So I set the Client
Character Set to UTF8 to be able to get full Unicode. The "local
characterset" is meaningless in such an application.

When I get UTF-8 from AsString, I translate it to UTF-16 (WideString)
and pass it on to the TNT controls (and vice versa). I want to store
UTF-8 strings inside my application because my Object/Relational
mapper code uses Strings and not WideStrings.

> How is this a problem for you and please help me get a better grip
> on what exactly you propose to do with the raw UTF-8 character data.

UTF-8 is not raw. It is a well-defined Unicode Transformation format.
I know what it is, I can deal with it and I want to deal with it.
Everything else would mean processor cycles to translate it to
something else, thereby maybe losing information.

Please correct me if I am wrong: I specify a Client Character Set in
the CharSet property of my IB_Connection. This is the character set
that is then used to interface with the Client Library (fbclient.dll).
The Client Library will transliterate everything that comes from the
database to that Client Character Set and will transliterate
everything that comes in for storage from the Client Character Set to
the character set of the specific column.

+------------+ +--------------+ +-----+ +-------+
| FB Service |--Network--| fbclient.dll |--API--| IBO |--| MyApp |
+------------+ +--------------+ +-----+ +-------+

When I use FieldByName ('xy').AsString, IBO will usually deliver
whatever it gets from the Client Library (Trimming rules and
OnGetString applied before). So when I specify WIN1252 as the Client
Character Set, then AsString will deliver a Windows 1252 string,
because that's what it got from fbclient.dll. Even when the field is
stored as UTF8 in the database.

Is that all correct? Or do I have a misunderstanding here?

Regards

Stefan