firebird-architect - Re: [Firebird-Architect] Re: The Wolf on Firebird 3

Subject	Re: [Firebird-Architect] Re: The Wolf on Firebird 3
Author	Jim Starkey
Post date	2005-11-03T01:17:46Z

Adriano dos Santos Fernandes wrote:

>Jim Starkey wrote:
>
>
>
>>The overhead is pervasive and exists a hundreds of places -- almost
>>every place user data is referenced. I don't know any objective way to
>>measure it.
>>
>>
>>
>>
>Do you consider a huge overhead a lookup in a array, checking if string
>need to be processed and process it only if necessary?
>I don't.
>
>Most collations do simple operations now. With UTF8 strings the overhead
>will be transfered to collation functions.
>
>The argument about that the codebase will be more simple is true with
>the current codebase, but with a maintanable codebase the complexity is
>not high.
>
>

Yes, it's a large overhead in cycles, code, and complexity. And the
work is done at runtime. With a switch to a single universal character
set the decision whether or not a collation must be applied can be
determined at compile time with different verbs for collation specific
or raw comparison. Assignments, which out number comparisons, would
require no checking at all.

The move and convert modules are now at (or maybe past) the limit of
maintainability. But it's far beyond those two modules. I would guess
there are more than a dozen places where strings are copied that require
separate checking for character set translation. Do all work now?
Frankly, I doubt it. Is each a liability? Yes. Is there any benefit
to having dozens of separate branches of the code detecting and handing
character set transactions? No. Would eliminating all of them simplify
and speed up the code? In my opinion, without the slightest doubt.

Character set and locale handling should be handled at the API, not
engine, layers. Collation decisions inside the engine should be made at
compilation time (once), not run time (once per field per record
referenced). No, that's not quite fair. Probably four to six times per
field per record referenced.