ibobjects - Re: [IBO] IBO - IB_String vs AnsiString

Subject	Re: [IBO] IBO - IB_String vs AnsiString
Author	m. Th.
Post date	2009-04-09T06:05:26Z

Jason Wharton wrote:

>> So, I haven't looked at the new IBO yet, but is a "sql statement" now
>> fully unicode? Or AnsiString?
>>
>
> The API doesn't appear to allow unicode strings yet. If it does I am not
> aware of it.
> My guess is they will have new API calls that are unicode compliant rather
> than making the existing AnsiString based calls switch over.
>
> Jason

Nope. See bellow a message pasted from firebird-devel mailing list. The
thread is called 'Unicode API planned?'. The one who gives the answer
(Adriano dos Santos Fernandes) is the 'chief in charge' in Firebird team
WRT internationalization / collation issues:

Greg Strauss wrote:

> > Hi all,
> >
> > We are writing a Windows application that will work against the
> > Firebird API. Unfortunately, the current incarnation (i.e. 2.0.1) of
> > the API does not support Unicode (or wide character) interfaces in the
> > API; all strings passed in are via char* interfaces. This makes
> > working within our application a bit of a challenge, since all strings
> > must be converted down to UTF-8 (since we're using the UTF-8 character
> > set based on various recommendations on the Firebird site and in the
> > forums).
> >
> > Has any consideration been given to exposing an API which supports a
> > wide character interface for it's clients? If the API supported wide
> > characters and the conversion to/from UTF-8 from/to wide characters
> > would be done behind the API... that would be a huge help. I don't
> > mind using a UTF-8 character set in the database, except that working
> > with UTF-8 in C++ is a pain; the later the conversion to/from is done
> > the better :) . Implementing a fully functional Unicode character set
> > in the database would also be nice, but the wide-char API
> > implementation would be needed first to avoid an inordinate number of
> > string conversions from wide-character applications.
>

This seems to be a very difficult work for not too much gain, sorry.
Most applications that will benefit from it is C++ Win32 applications,
using wide Windows API.

Adriano

----8<-------------8<----------8<-------------8<----------8<-------------8<------

Also, see bellow how the Unicode is handled in Firebird. The answer is
given by Milan Babuskov, the head of Flamerobin project, the 'official'
cross-platform admin tool for Firebird. It's a message pasted (again)
from Firebird-devel mailing list:

zedalaye@... wrote:

> > - How do Firebird handles Unicode ? i.e. will Firebird run some
> > conversions between data submitted by client applications and data
> > stored in the database file ? or will it store the data "straight to the
> > disk" and use charsets/collations definitions only to sort the data a
> > query time ?
>

Each table column has its charset. Firebird translates between that
charset and connection charset all the time.

> > - Will Firebird be able to deal with Delphi 2009 Unicode Strings where
> > each character is 2 bytes length ?
> >
> > - Does things depend on the connection charset ?
>

Yes. My suggestion is you use UTF-8 as character set and then convert
everything to/from UTF-16 or whatever Delphi is using.

> > - Why is UNICODE_FFS 3 bytes per character lenght and UTF8 is 4 ?
> > (according to RDB$CHARACTER_SET)
>

Because they use the 'max. possible size'. Some characters in UTF8 are 4
bytes.

> > - Do the sqlvar.sqllen member returned by describe functions takes
> > connection charset, field charset or other parameters in account ?
>

Not really. It gives you the buffer length. You also get character set
ID, so you need to use the value from system table to divide sqllen with
number of bytes per character to get the actual number of characters in
a string. How this works:

1. you application read sqllen and allocates the buffer of that size

2. you read in the data (so, a string 'AB' in UTF8 will be returned as
'AB '

3. you read the bytes-per-character info from a system table and divide
sqllen with it (8/4 = 2), and then truncate the string to that many
characters.

In FlameRobin we cache the bytes-per-character info just after user
connects to database, so we don't have to read it each time.

HTH

-- Milan Babuskov http://www.flamerobin.org http://www.guacosoft.com

------8<-------------8<-------------8<-------------8<-------------8<-------

...Also, if you have any questions WRT Unicode issues in Delphi perhaps
it's better to post them here and/or EMBT newsgroups
(forums.embarcadero.com - there exists also an NNTP interface). Also,
you can read Marco Cantu's book 'Delphi 2009 Handbook' (highly
recommended reading) available free of charge as a PDF download for
registered users of D2009 (very good move, btw). For Firebird Unicode
issues you can also post on Firebird mailing lists, of course.

HTH,

m. Th.