firebird-architect - Re: [Firebird-Architect] The Wolf on Firebird 3

Subject	Re: [Firebird-Architect] The Wolf on Firebird 3
Author	Olivier Mascia
Post date	2005-11-15T17:55:36Z

Le 04-nov.-05 à 07:18, Geoff Worboys a écrit :

> The (possible) benefit is that interactions with the API may
> be able to use UTF-16 data directly in some cases (const ptrs
> to output functions), and with simple memcpy in other cases.
> No translation required.
>
> OTOH UTF-8 is likely to require translation to UTF-16 in most
> common client use. ie. To read database info into Qt it will
> need to be translated. To pass to the Windows API it will
> need to be translated etc etc etc.
>
> That is the benefit that I see, but there are costs (especially
> with incompatible client/server endianness). I do not have the
> experience to guess which way the scales may balance.

I might be horribly off with this answer, regarding what Jim said,
think or will add, but to me there is a clear distinction between
what happens 'inside' and what happens on the 'interfaces'.

Storing utf-8 and using utf-8 (or utf-32) everywhere inside the
'borders' to simplify and streamline a lot of things is one thing.
Which I support 100% from my 'not-touching-the-core-but-reading-it'
developer point of view.

On the 'interfaces' (which I define to be the few links between
something using the 'system' and the 'system' itself), multiple
charsets must still be supported. I have a vision of them as a per-
connection characteristic. Maybe with defaults mappings specified at
the DDL level. Declaring a column CHARACTER SET 'this or that' would
only express that I intend, by default, to talk to the database
system in that charset. From the application, through whatever API, I
give strings in that charset and expect to receive strings in that
charset. They would be translated to utf-8 or whatever at API I/O
time. Certainly before transport to remote in remote configurations.
They become something more generic as soon as they enter the
'system'. They are returned to me by the system in my prefered
encoding on output. Ideally some mechanism (I suppose they exist at
the SQL standard level) would allow me to sometimes retrieve some
columns in another charset than the one declared in the DDL.
Declaring my colums UTF8 would simply mean no translation will ever
be applied. In that unique case, it would then be WYSIWYG (what you
_store_ is what you get).

Yes, if you declare wanting to read and write only utf-16, it would
imply some translation at the API level to/from utf8/utf16. Done
well, I have no fears at all for this translation. Worse things
happen millions times a day in the average windows program, more
often than need written against the Ansi entry-points instead of the
Unicode entry-points.

Doesn't this makes sense ?

--
Olivier