Subject Re: [Firebird-Architect] RFC: RDB$COUNT
Author = m. Th =
Ann W. Harrison wrote:
>> The main problem AFAIK is the optimizer which must know the cardinality
>> of each relation/table which must be joined and, also, the indexes must
>> have their selectivity updated (this is the ideal case).
> Actually, the optimizer gets a good estimate of the size of a table
> by reading the pointer pages to get the total number of data pages.
> The result is not precise - a data page may be mostly empty or full
> of back versions of records and records are compressed for storage.
> However the inaccuracies tend to cancel or reflect the actual cost
> that the optimizer needs, since it's trying to minimize the number
> of page reads - a page read costs the same whether there's on
> record on the page or one hundred.
Then, if the cost of processing in memory records is insignificant,
compared to cost of page reading, even if you have several hundreds
records in a page, (or thousands if you'll upgrade at 32k page size,
ASAP I hope) then isn't worth to implement a maintained Count value.

>> Perhaps is better to add in RDB$RELATIONS a new column RDB$COUNT (type
>> BIGINT) which holds the cardinality of a table (ie. the number of the
>> *commited* records).
> Now there's a really bad idea, but not a new one. When I started
> in the business, lots of databases kept counts. A bit of probing
> uncovered the fact that the count fields were serious hot-spots
> since every committing transaction had to update the same field.
I did some tests and isn't a (visible) performance hit. I simulated
updating the RDB$COUNT by a script like
<my work here. Inserts, updates, deletes aso.>
Commit; /* we finish here */
Update RDB$RELATIONS_2 set RDB$COUNT=RDB$COUNT+<no. inserts/transaction>
- <no. deletes/transaction>;

FYI, 1000 inserts in a table with a PK and an 'Autoincrement' field
commits in 500 milliseconds. The variant with 'updating the RecNo' has
the same timing.
(The 'autoincrement' is done in the classical way:

As you see, you have another engine (the generators) which has 1000
updates/transaction, also 1000 index updates (which you know what
costly are) and all the job is done in 0,5 secs (on FB 2 RC5 on an
Athlon 64 at 3GHz). IMHO, on the other side an update, which is only
one/transaction isn't so costly... But *if this thing doesn't help
significantly the optimizing engine I think that isn't worth to do it*.

>> This will update at the end of each transaction
>> with something like New.RDB$COUNT=Old.RDB$Count+_ROWSAFFECTED(from
>> inserts)-_ROWSAFFECTED(from deletes).
> That's interesting. Does it update in the system context or the
> transaction context?
Transaction context. It reflects how the current transaction changes the
table RecCount.

>> Result:
>> 1. The optimizer will know in any moment the cardinality of relations
>> from its transaction context, thus choosing (in a faster way) a better plan.
> The optimizer already has a good estimate of cardinality which is
> automatically maintained.
...From the number of allocated pages / table?
>> 2. SET STATISTICS <index> will be obsolete, reducing once more the
>> administration costs.
> Statistics refers to the selectivity of indexes, not the cardinality
> of tables. Nothing you've mentioned touches on that issue.
Selectivity isn't (number of entries in a index) / (table reccount)?
>> 3. If you really want to optimize select count(*) from table1...
> This isn't a problem worth solving. Counting whole tables
> is a trick used by lazy people creating benchmarks. It really
> doesn't have a lot of real world applicability.
Agreed. This was like an aside. As I stated before, the main target was
my try to help the optimizing engine.

> Regards,
> Ann

m. th.