Subject | RE: [Firebird-Architect] UTF-8 Everywhere |
---|---|
Author | IBO Support List |
Post date | 2014-01-17T21:08:14Z |
Ann,
You brought out a key distinction that I missed.
My apologies for the confusion.
Going with UTF8 uniformly in terms of how things are stored
and buffered internally is certainly something I wouldn't discourage, assuming
that the external interfaces will all still be supported via transliteration,
etc. for legacy API support.
Having a strictly UTF8 core could radically simplify many
things. I see it as simply a matter of where you want to push all of the
complexity. I'd like to hear more about what all the trade-offs would
be.
Jason Wharton
From: Firebird-Architect@yahoogroups.com [mailto:Firebird-Architect@yahoogroups.com] On Behalf Of Ann Harrison
Sent: Friday, January 17, 2014 1:58 PM
To: Firebird-Architect@yahoogroups.com
Subject: Re: [Firebird-Architect] UTF-8 Everywhere
Jason Wharton <supportlist@...> wrote:
I'm tending to think that adopting a UTF8 only approach is a step backward
...As for everyone else still dealing with windows
wide strings, codepages, etc., this simply imposes a potentially major
rewrite of their applications to conform to this new requirement. Legacy
support is always a factor to consider.I hope that the proposal was for UTF8 internally - storage, sorting, manipulation -with transformation to and from the declared character set on output and input.There should be no changes to applications.Having a single internal characterrepresentation simplifies comparisons, and more important, greatly reducesthe number of collations - one per desired character sequence rather than onefor each character set that express the language.
If we want to talk about a step forward in flexibility, I suggest you
consider adding in a universal string where you can have each record
indicate what charset is being stored. This would allow any of the
registered charsets to be stored on a per-record basis.Arrg! What possible difference does it make to the user how a characteris stored as long as it arrives at the application in the desired format andorder? If your goal is to have a world-wide phone book, UTF8 is the onlyway to go. Checking each record (why not each field?) to see how tointerpret its strings will just slow every character operation and introducebugs.Moreover, at the moment, Firebird relies on preallocated record buffersfor transfers from the compressed storage format for comparisons andother manipulation. If the record format is unknown at request compilationtime, all buffers would need to be allocated at the maximum possiblesize.Best regards,Ann