firebird-architect - RFC: RDB$COUNT

Subject	RFC: RDB$COUNT
Author	= m. Th =
Post date	2006-10-28T16:52Z

Hi,

We know how costly is Count(*) and this isn't (mainly) a problem when
someone does a Select count(*) from table1;
The main problem AFAIK is the optimizer which must know the cardinality
of each relation/table which must be joined and, also, the indexes must
have their selectivity updated (this is the ideal case).

Perhaps is better to add in RDB$RELATIONS a new column RDB$COUNT (type
BIGINT) which holds the cardinality of a table (ie. the number of the
*commited* records). This will update at the end of each transaction
with something like New.RDB$COUNT=Old.RDB$Count+_ROWSAFFECTED(from
inserts)-_ROWSAFFECTED(from deletes). The rows affected must be stored
in a array, let's say, aRowsAffected with 2 columns (insert and delete)
and [Select count(*) from rdb$relations]. For each active transaction
will be allocated such an array. Or other sollution (perhaps is better):
a single (global) array (let's call it RDB$DELTAS) with 4 columns: TID,
RDB$Relation_ID, InsertCount, DeleteCount.

So the real number of rows for table T1 in the transaction ID TID1 will
be, expressed in SQL way:
RDB$RELATIONS.RDB$COUNT
+ Select InsertCount from RDB$DELTAS where TID=TID1 and
RDB$Relation_ID=<relation id for the table T1>
- Select DeleteCount from RDB$DELTAS where TID=TID1 and
RDB$Relation_ID=<relation id for the table T1> ;

On commit, the RDB$COUNT will be updated and the InsertCount/DeleteCount
values will be reset. On a rollback, only the values will be reset.

Result:
1. The optimizer will know in any moment the cardinality of relations
from its transaction context, thus choosing (in a faster way) a better plan.
2. SET STATISTICS <index> will be obsolete, reducing once more the
administration costs.
3. If you really want to optimize select count(*) from table1...

Any comments?

hth,

m. th.