Subject | Re: [Firebird-Architect] Re: The Wolf on Firebird 3 |
---|---|
Author | Jim Starkey |
Post date | 2005-11-06T16:43:17Z |
Dmitry Yemanov wrote:
EBCDIC and Ascii. Then there was wide (and understandable)
proliferation of national character sets. When memory was tight,
universal use of a single national character set made all the sense in
the world. Now memory is cheap and plentiful and we live a more
international world than we did before. We've also learned, painfully,
that multiple national character sets are a bad way to manage data. It
seemed attractive -- just tag string with character set and make the
necessary allowances at compare, convert, and key generation time. It
turned out to be a great deal more difficult than that (there are 167
hits on the string dsc_sub_type in Vulcan). We need to learn what the
Java guys and Microsoft learned long ago, which is that multiple
national characters are too complex to handle, and Unicode is the
alternative.
Of course there will problems and minor incompatibilities along the way,
but nothing that we can't solve.
On a related subject, I continue to be amazed that SQL database continue
to deal with fixed/limited length strings long after the language guys
decided "string" was a more useful construct than "string of up to but
not exceeding 14 bytes". Isn't it interesting that the last vestige of
punched cards is in "state of the art" SQL databases. The only thing
weirder is that database guys aren't leading the charge to get rid of
there archaic beasts.
>"Jim Starkey" <jas@...> wrote:Databases are about managing data. Things were originally simple with
>
>
>>An exception would be character set "none", which would indicate that
>>the field was raw octets and subject to no character set handling at
>>all. For all practical purposes, a field of character set none is a
>>different datatype with no mapping / assignment / comparison to
>>character set data.
>>
>>
>
>What about WIN1251 databases (in order to use non-binary collations and the
>UPPER function) and NONE attachment charset (to disable any character
>conversions)? I'm afraid this is even more often used than NONE databases.
>
>
>
>
EBCDIC and Ascii. Then there was wide (and understandable)
proliferation of national character sets. When memory was tight,
universal use of a single national character set made all the sense in
the world. Now memory is cheap and plentiful and we live a more
international world than we did before. We've also learned, painfully,
that multiple national character sets are a bad way to manage data. It
seemed attractive -- just tag string with character set and make the
necessary allowances at compare, convert, and key generation time. It
turned out to be a great deal more difficult than that (there are 167
hits on the string dsc_sub_type in Vulcan). We need to learn what the
Java guys and Microsoft learned long ago, which is that multiple
national characters are too complex to handle, and Unicode is the
alternative.
Of course there will problems and minor incompatibilities along the way,
but nothing that we can't solve.
On a related subject, I continue to be amazed that SQL database continue
to deal with fixed/limited length strings long after the language guys
decided "string" was a more useful construct than "string of up to but
not exceeding 14 bytes". Isn't it interesting that the last vestige of
punched cards is in "state of the art" SQL databases. The only thing
weirder is that database guys aren't leading the charge to get rid of
there archaic beasts.