Subject Re: [IBO] UTF-8 handling
Author Andreas Hesse
>> If I don't use transliteration from the UTF-8 that comes from the
>> client then there won't be any transliteration when the data is
>> presented to the user in the local character set.
>
> Yes. A user that uses UTF8 as the client character set IMHO *expects*
> to get UTF8 from AsString. The same way he expects to get Windows-1252
> strings when he selects WIN1252 as the Client character set.
>
>> So, by default I am invoking the Delphi routine to do that
>> transliteration from UTF-8 to the local characterset automatically.
>
> I want to write a multi-language application. An application that can
> deal with strings from English, German, Czech, Russian, Hebrew, etc.
> For that, the normal Delphi VCL controls don't find anyway, I have to
> use Unicode aware controls like TNTWare Controls. So I set the Client
> Character Set to UTF8 to be able to get full Unicode. The "local
> characterset" is meaningless in such an application.
>
> When I get UTF-8 from AsString, I translate it to UTF-16 (WideString)
> and pass it on to the TNT controls (and vice versa). I want to store
> UTF-8 strings inside my application because my Object/Relational
> mapper code uses Strings and not WideStrings.
>
>> How is this a problem for you and please help me get a better grip
>> on what exactly you propose to do with the raw UTF-8 character data.
>
> UTF-8 is not raw. It is a well-defined Unicode Transformation format.
> I know what it is, I can deal with it and I want to deal with it.
> Everything else would mean processor cycles to translate it to
> something else, thereby maybe losing information.
>
> Please correct me if I am wrong: I specify a Client Character Set in
> the CharSet property of my IB_Connection. This is the character set
> that is then used to interface with the Client Library (fbclient.dll).
> The Client Library will transliterate everything that comes from the
> database to that Client Character Set and will transliterate
> everything that comes in for storage from the Client Character Set to
> the character set of the specific column.
>
> +------------+ +--------------+ +-----+ +-------+
> | FB Service |--Network--| fbclient.dll |--API--| IBO |--| MyApp |
> +------------+ +--------------+ +-----+ +-------+
>
> When I use FieldByName ('xy').AsString, IBO will usually deliver
> whatever it gets from the Client Library (Trimming rules and
> OnGetString applied before). So when I specify WIN1252 as the Client
> Character Set, then AsString will deliver a Windows 1252 string,
> because that's what it got from fbclient.dll. Even when the field is
> stored as UTF8 in the database.
>
> Is that all correct? Or do I have a misunderstanding here?
>
> Regards
>
> Stefan
>

Here my thoughts about it:

What about TIB_Column.AsWideString?

And if we need a UTF8 string, we could add a method AsUtf8 to TIB_Column?!

Andreas