Subject | Re: [Firebird-Architect] RFC: RDB$COUNT |
---|---|
Author | = m. Th = |
Post date | 2006-10-29T18:30:25Z |
Ann W. Harrison wrote:
compared to cost of page reading, even if you have several hundreds
records in a page, (or thousands if you'll upgrade at 32k page size,
ASAP I hope) then isn't worth to implement a maintained Count value.
updating the RDB$COUNT by a script like
...
<my work here. Inserts, updates, deletes aso.>
...
Commit; /* we finish here */
Update RDB$RELATIONS_2 set RDB$COUNT=RDB$COUNT+<no. inserts/transaction>
- <no. deletes/transaction>;
FYI, 1000 inserts in a table with a PK and an 'Autoincrement' field
commits in 500 milliseconds. The variant with 'updating the RecNo' has
the same timing.
(The 'autoincrement' is done in the classical way:
CREATE TRIGGER AI_BINDDOCS_DOCSID FOR BINDDOCS
ACTIVE BEFORE INSERT POSITION 0
AS
BEGIN
IF (NEW.DOCSID IS NULL) THEN
NEW.DOCSID = GEN_ID(BINDDOCS_DOCSID_GEN, 1);
END
)
As you see, you have another engine (the generators) which has 1000
updates/transaction, also 1000 index updates (which you know what
costly are) and all the job is done in 0,5 secs (on FB 2 RC5 on an
Athlon 64 at 3GHz). IMHO, on the other side an update, which is only
one/transaction isn't so costly... But *if this thing doesn't help
significantly the optimizing engine I think that isn't worth to do it*.
table RecCount.
my try to help the optimizing engine.
m. th.
>> The main problem AFAIK is the optimizer which must know the cardinalityThen, if the cost of processing in memory records is insignificant,
>> of each relation/table which must be joined and, also, the indexes must
>> have their selectivity updated (this is the ideal case).
>>
>
> Actually, the optimizer gets a good estimate of the size of a table
> by reading the pointer pages to get the total number of data pages.
> The result is not precise - a data page may be mostly empty or full
> of back versions of records and records are compressed for storage.
> However the inaccuracies tend to cancel or reflect the actual cost
> that the optimizer needs, since it's trying to minimize the number
> of page reads - a page read costs the same whether there's on
> record on the page or one hundred.
>
compared to cost of page reading, even if you have several hundreds
records in a page, (or thousands if you'll upgrade at 32k page size,
ASAP I hope) then isn't worth to implement a maintained Count value.
>> Perhaps is better to add in RDB$RELATIONS a new column RDB$COUNT (typeI did some tests and isn't a (visible) performance hit. I simulated
>> BIGINT) which holds the cardinality of a table (ie. the number of the
>> *commited* records).
>>
>
> Now there's a really bad idea, but not a new one. When I started
> in the business, lots of databases kept counts. A bit of probing
> uncovered the fact that the count fields were serious hot-spots
> since every committing transaction had to update the same field.
>
>
>
updating the RDB$COUNT by a script like
...
<my work here. Inserts, updates, deletes aso.>
...
Commit; /* we finish here */
Update RDB$RELATIONS_2 set RDB$COUNT=RDB$COUNT+<no. inserts/transaction>
- <no. deletes/transaction>;
FYI, 1000 inserts in a table with a PK and an 'Autoincrement' field
commits in 500 milliseconds. The variant with 'updating the RecNo' has
the same timing.
(The 'autoincrement' is done in the classical way:
CREATE TRIGGER AI_BINDDOCS_DOCSID FOR BINDDOCS
ACTIVE BEFORE INSERT POSITION 0
AS
BEGIN
IF (NEW.DOCSID IS NULL) THEN
NEW.DOCSID = GEN_ID(BINDDOCS_DOCSID_GEN, 1);
END
)
As you see, you have another engine (the generators) which has 1000
updates/transaction, also 1000 index updates (which you know what
costly are) and all the job is done in 0,5 secs (on FB 2 RC5 on an
Athlon 64 at 3GHz). IMHO, on the other side an update, which is only
one/transaction isn't so costly... But *if this thing doesn't help
significantly the optimizing engine I think that isn't worth to do it*.
>> This will update at the end of each transactionTransaction context. It reflects how the current transaction changes the
>> with something like New.RDB$COUNT=Old.RDB$Count+_ROWSAFFECTED(from
>> inserts)-_ROWSAFFECTED(from deletes).
>>
>
> That's interesting. Does it update in the system context or the
> transaction context?
>
>
table RecCount.
>> Result:...From the number of allocated pages / table?
>> 1. The optimizer will know in any moment the cardinality of relations
>> from its transaction context, thus choosing (in a faster way) a better plan.
>>
>
> The optimizer already has a good estimate of cardinality which is
> automatically maintained.
>
>
>> 2. SET STATISTICS <index> will be obsolete, reducing once more theSelectivity isn't (number of entries in a index) / (table reccount)?
>> administration costs.
>>
>
> Statistics refers to the selectivity of indexes, not the cardinality
> of tables. Nothing you've mentioned touches on that issue.
>
>
>> 3. If you really want to optimize select count(*) from table1...Agreed. This was like an aside. As I stated before, the main target was
>>
>>
> This isn't a problem worth solving. Counting whole tables
> is a trick used by lazy people creating benchmarks. It really
> doesn't have a lot of real world applicability.
>
>
>
my try to help the optimizing engine.
> Regards,hth,
>
>
> Ann
m. th.