Subject Re: [firebird-support] Re: Firebird and Unicode queries
Author Helen Borrie
At 12:49 PM 10/02/2005 +0000, you wrote:

>Helen Borrie wrote:
>
> >At 02:19 AM 10/02/2005 +0000, you wrote:
> >
> >
> >
> >>But what encoding form does UNICODE_FSS use, or rather what encoding
> >>should I pass to DSQL API functions, UTF-8? That can take up to 6 bytes!
> >>
> >>
> >
> >Don't pass unicode at all. Pass characters - whatever the user's keyboard
> >mapping throws at the client. The client's lc_ctype setting will take care
> >of the UNICODE_FSS conversion if it and the database are set up for it. If
> >not, i.e. the database column is fss_unicode but the client and database
> >are not, you face extra work (introducers or casting) to pass the locale
> >strings in such a way that the engine is able to do that bridge work.
> >
> >
>
>So if I specify lc_ctype as "UTF-8" I'm safe? (assuming I want to pass
>UTF-8 form data of course)

No!! the lc_ctype should always match the default character set of the
database. Its purpose is to interpret the incoming characters for the
database. The only valid entries for lc_ctype are character set names (or
aliases) that the database knows about. In turn, the database will only
know about character sets that were known to the server when the database
was created.

So, if you choose to set the default character set to be NONE, you must use
introducer syntax or CAST with any input you need to store in a FSS_UNICODE
column in order to reproduce the bridging work that the client's lc_ctype
does in concert with a database whose default charset matches that
lc_ctype. This also applies to any search conditions on those
columns. (I'm getting frustrated here, trying to find a way to SAY
this! If you STILL think it means that lc_ctype can be any old thin you
chooseg, then I'm not getting through...)

You don't have to tell the client what it is getting from the UI. It knows
that. You only have to tell it what to store or search for, and only if it
is storing or searching columns that are not in the default character set.

As for 6-byte characters, you *will* have problems with them if UNICODE_FSS
can't map them to a 3-byte word it recognises. Which particular 6-byte
codes did you have in mind?

./hb