firebird-architect - Re: [Firebird-Architect] Re: The Wolf on Firebird 3

Subject	Re: [Firebird-Architect] Re: The Wolf on Firebird 3
Author	Jim Starkey
Post date	2005-11-02T22:54:29Z

Roman Rokytskyy wrote:

>Not that I am against schemas, but could you explain why would
>different web application use the same database assuming that they do
>not know about each other? Why can't they use two different databases?
>
>

Multiple databases are a pain in the butt to administer, backup, update,
etc. In Firebird, each database has a separate page cache, so multiple
databases eat large quantities of virtual space. If various schemas
operating in the same database, the page cache behaves like reason LRU.
If each "schema" is in a different database, each one has it's own
cache, requiring the administrator to choose between a bunch of small
caches where a very active database will run inefficiently, hand tuning
specific database cache size, or accepting page faulting.

That aside, as we go to layered subsystems (security plugins are an
example), but would be much better to qualify each with a schema name
(or SYSTEM) than a simple prefix.

When I designed Interbase, I didn't think a multilevel name space was
important. Experience convinced me otherwise. Netfrastructure has a
two level name space.

>
>
>>A habit we need break is dependence on fixed length strings
>>internally and across our interfaces.
>>
>>
>
>As I remember, there is no hard limit on the identifier length on the
>public interface level. It might appear in implementation, but JayBird
>does not have byte[31] or char[31] in its code and each string passed
>over the wire has also two-bytes length before it.
>
>

Of course not. Fixed length strings aren't a Java concept (hurray for
Java!). The engine and tools, however, have 31 character identifiers
hard coded in a thousand places. These need to be phased out in favor
of a string class.

The existing remote protocol is based on fixed length strings. We need
to get away from that as well (the new record encoding is the answer).
But the larger question is API, on the internals.

>
>>First, the constant
>>setting, checking, comparing, and looking up of character set has a
>>huge, huge overhead in performance.
>>
>>
>
>Which should not be needed when application uses the same charset as
>is defined in database. Right?
>
>
>

Wrong. Every string operation now requires that character sets be
checked, looked up, and handled. The fact that in most cases
translation isn't required does eliminate the checking. I've got a
standard lecture on the subject of the bitblit chip in the Sun 350 if
you want to go down this rathole. Suffice it to say that the single
easiest way to make a database system really slow is to encumber the
high volume primitive operations with a lot of overhead.

>
>
>>It's time to accept that we're all part
>>of the same world.
>>
>>
>
>Just that one part of the world accidentaly got 1 byte per char, while
>other needs 2 bytes per char :)
>
>

Life is unfair. Get used to it. Would you be happier if we made every
character use 2 bytes like Java? If we did a statistical analysis of
the (non-Chinese) world's character flows and designed an ad hoc
compression scheme, it wouldn't be much different from UTF-8.

We've been over this a number of times. The argument that although
everyone benefits some people benefit more isn't a very good one.

>
>
>>And if it halves the size of the code and doubles
>>performance, we get a kicker to boot.
>>
>>
>
>I would like to see numbers before we discuss it further.
>
>

Let's see -- you want someone else to implement the whole thing before
you're willing to discuss whether or not it's a good idea? Roman, that
isn't going to happen.

>How big would the final solution differ from the solution when we set
>default charset to UNICODE_FSS? If not too much, I would suggest that
>you provide a version that uses UTF-8 ODS (with all optimizations you
>wanted to include), I will take the AS3AP suite, generate English,
>German and Russian texts and we compare results against current ODS
>with default WIN1252/WIN1251 charsets in network and embedded modes.
>Then we present the numbers here and continue discussion. However, if
>difference in code bases is big, then it is your words against words
>of other people...
>
>
>

The overhead is pervasive and exists a hundreds of places -- almost
every place user data is referenced. I don't know any objective way to
measure it.

--

Jim Starkey
Netfrastructure, Inc.
978 526-1376