Subject RE: [Firebird-Architect] UTF-8 Everywhere
Author IBO Support List
Ann,
 
You brought out a key distinction that I missed.  My apologies for the confusion.
 
Going with UTF8 uniformly in terms of how things are stored and buffered internally is certainly something I wouldn't discourage, assuming that the external interfaces will all still be supported via transliteration, etc. for legacy API support.
 
Having a strictly UTF8 core could radically simplify many things. I see it as simply a matter of where you want to push all of the complexity. I'd like to hear more about what all the trade-offs would be.
 
Jason Wharton


From: Firebird-Architect@yahoogroups.com [mailto:Firebird-Architect@yahoogroups.com] On Behalf Of Ann Harrison
Sent: Friday, January 17, 2014 1:58 PM
To: Firebird-Architect@yahoogroups.com
Subject: Re: [Firebird-Architect] UTF-8 Everywhere

Jason Wharton <supportlist@...> wrote:


I'm tending to think that adopting a UTF8 only approach is a step backward
...
As for everyone else still dealing with windows
wide strings, codepages, etc., this simply imposes a potentially major
rewrite of their applications to conform to this new requirement. Legacy
support is always a factor to consider.

I hope that the proposal was for UTF8 internally - storage, sorting, manipulation -
with transformation to and from the declared character set on output and input.
There should be no changes to applications.Having a single internal character 
representation simplifies comparisons, and more important, greatly reduces
the number of collations - one per desired character sequence rather than one
for each character set that express the language.

If we want to talk about a step forward in flexibility, I suggest you
consider adding in a universal string where you can have each record
indicate what charset is being stored. This would allow any of the
registered charsets to be stored on a per-record basis. 


Arrg!  What possible difference does it make to the user how a character
is stored as long as it arrives at the application in the desired format and
order?  If your goal is to have a world-wide phone book, UTF8 is the only
way to go.  Checking each record (why not each field?) to see how to
interpret its strings will just slow every character operation and introduce
bugs.

Moreover, at the moment, Firebird relies on preallocated record buffers
for transfers from the compressed storage format for comparisons and
other manipulation. If the record format is unknown at request compilation
time, all buffers would need to be allocated at the maximum possible
size.

Best regards,


Ann