Subject Re: [Firebird-Architect] RFC: RDB$COUNT
Author = m. Th =
Ann W. Harrison wrote:
>> The main problem AFAIK is the optimizer which must know the cardinality
>> of each relation/table which must be joined and, also, the indexes must
>> have their selectivity updated (this is the ideal case).
>>
>
> Actually, the optimizer gets a good estimate of the size of a table
> by reading the pointer pages to get the total number of data pages.
> The result is not precise - a data page may be mostly empty or full
> of back versions of records and records are compressed for storage.
> However the inaccuracies tend to cancel or reflect the actual cost
> that the optimizer needs, since it's trying to minimize the number
> of page reads - a page read costs the same whether there's on
> record on the page or one hundred.
>
Then, if the cost of processing in memory records is insignificant,
compared to cost of page reading, even if you have several hundreds
records in a page, (or thousands if you'll upgrade at 32k page size,
ASAP I hope) then isn't worth to implement a maintained Count value.

>> Perhaps is better to add in RDB$RELATIONS a new column RDB$COUNT (type
>> BIGINT) which holds the cardinality of a table (ie. the number of the
>> *commited* records).
>>
>
> Now there's a really bad idea, but not a new one. When I started
> in the business, lots of databases kept counts. A bit of probing
> uncovered the fact that the count fields were serious hot-spots
> since every committing transaction had to update the same field.
>
>
>
I did some tests and isn't a (visible) performance hit. I simulated
updating the RDB$COUNT by a script like
...
<my work here. Inserts, updates, deletes aso.>
...
Commit; /* we finish here */
Update RDB$RELATIONS_2 set RDB$COUNT=RDB$COUNT+<no. inserts/transaction>
- <no. deletes/transaction>;

FYI, 1000 inserts in a table with a PK and an 'Autoincrement' field
commits in 500 milliseconds. The variant with 'updating the RecNo' has
the same timing.
(The 'autoincrement' is done in the classical way:
CREATE TRIGGER AI_BINDDOCS_DOCSID FOR BINDDOCS
ACTIVE BEFORE INSERT POSITION 0
AS
BEGIN
IF (NEW.DOCSID IS NULL) THEN
NEW.DOCSID = GEN_ID(BINDDOCS_DOCSID_GEN, 1);
END
)

As you see, you have another engine (the generators) which has 1000
updates/transaction, also 1000 index updates (which you know what
costly are) and all the job is done in 0,5 secs (on FB 2 RC5 on an
Athlon 64 at 3GHz). IMHO, on the other side an update, which is only
one/transaction isn't so costly... But *if this thing doesn't help
significantly the optimizing engine I think that isn't worth to do it*.

>> This will update at the end of each transaction
>> with something like New.RDB$COUNT=Old.RDB$Count+_ROWSAFFECTED(from
>> inserts)-_ROWSAFFECTED(from deletes).
>>
>
> That's interesting. Does it update in the system context or the
> transaction context?
>
>
Transaction context. It reflects how the current transaction changes the
table RecCount.

>> Result:
>> 1. The optimizer will know in any moment the cardinality of relations
>> from its transaction context, thus choosing (in a faster way) a better plan.
>>
>
> The optimizer already has a good estimate of cardinality which is
> automatically maintained.
>
>
...From the number of allocated pages / table?
>> 2. SET STATISTICS <index> will be obsolete, reducing once more the
>> administration costs.
>>
>
> Statistics refers to the selectivity of indexes, not the cardinality
> of tables. Nothing you've mentioned touches on that issue.
>
>
Selectivity isn't (number of entries in a index) / (table reccount)?
>> 3. If you really want to optimize select count(*) from table1...
>>
>>
> This isn't a problem worth solving. Counting whole tables
> is a trick used by lazy people creating benchmarks. It really
> doesn't have a lot of real world applicability.
>
>
>
Agreed. This was like an aside. As I stated before, the main target was
my try to help the optimizing engine.

> Regards,
>
>
> Ann
hth,

m. th.