firebird-architect - Re: [Firebird-Architect] Writing UTF16 to the database

Subject	Re: [Firebird-Architect] Writing UTF16 to the database
Author	Olivier Mascia
Post date	2005-03-01T06:46:09Z

Le 01-mars-05, à 03:56, Adriano dos Santos Fernandes a écrit :

> 1) Slow (because conversions will always be made).

Do you happen to use Windows NT, Windows 2000 or Windows XP ?
The whole API is based on Unicode (a 16 bits limited version of it).
All the API entry-point of these OSes that use non-unicode strings are
wrappers that convert parameters, call the unicode version, and convert
backs results before returning to the user. The vast majority of
end-user application don't use unicode natively to talk to Win32 API
but some ansi code-page. Nobody ever complained it was *that* slow.

> 2) Break all UDFs that use strings.
> 3) Break all external tables.

Not necessarily. Conversions might be applied, for past compatibility,
unless otherwise specified.
True, the conversions might need to be available engine-side. They
could as well be implemented in a logical shared sub-system between the
engine and other parts (the 'client').

> 4) Don't simplify the engine because conversions between charsets is
> done like conversions between numbers and strings.

Huh ?

> 5) Waste of disk space and memory.

Marginal, compared to the interest of having a trully global database
system.
If many applications still are happy today with handling a single or
two national character sets, a growing number of them need to go
global. Going global for an application has a number of challenges : UI
changes, locale/facet/culture changes and many more other details. One
way to simplify such a process is to at least work internally with a
unified charset, else it can become a nightmare to code. If using a
unified charset, the least expectation is to be able to store such a
unified charset in the underlying database system. The proposal would
translate on in and out for compatibility with processes asking for a
specific charset. But its value comes more from the fact that new
trully global charsets can be also used. A field or all database could
be declared utf-8 by an application happy to work exclusively with such
a global solution. The current UNICODE_FSS is a mess for such
wanting-to-be-global applications, due to its implementation quirks.
While the support for multiple national charsets inside the engine
complicates things. A simplification that can bring a better global
solution while keeping compatibility with all the existing
implementations is appealing, you will probably agree on this.

Anyway, thanks for challenging the idea. It's critically important to
challenge such ideas.

--
Olivier Mascia