Subject Re: [Firebird-Architect] UTF-8 Everywhere
Author Ann Harrison
Jason Wharton <supportlist@...> wrote:


I'm tending to think that adopting a UTF8 only approach is a step backward
...
As for everyone else still dealing with windows
wide strings, codepages, etc., this simply imposes a potentially major
rewrite of their applications to conform to this new requirement. Legacy
support is always a factor to consider.

I hope that the proposal was for UTF8 internally - storage, sorting, manipulation -
with transformation to and from the declared character set on output and input.
There should be no changes to applications.Having a single internal character 
representation simplifies comparisons, and more important, greatly reduces
the number of collations - one per desired character sequence rather than one
for each character set that express the language.

If we want to talk about a step forward in flexibility, I suggest you
consider adding in a universal string where you can have each record
indicate what charset is being stored. This would allow any of the
registered charsets to be stored on a per-record basis. 


Arrg!  What possible difference does it make to the user how a character
is stored as long as it arrives at the application in the desired format and
order?  If your goal is to have a world-wide phone book, UTF8 is the only
way to go.  Checking each record (why not each field?) to see how to
interpret its strings will just slow every character operation and introduce
bugs.

Moreover, at the moment, Firebird relies on preallocated record buffers
for transfers from the compressed storage format for comparisons and
other manipulation. If the record format is unknown at request compilation
time, all buffers would need to be allocated at the maximum possible
size.

Best regards,


Ann