Subject | RE: [firebird-support] Re: What programming languages and toolkits do you use to access Firebird? |
---|---|
Author | RB Smissaert |
Post date | 2008-10-01T10:11:17Z |
I think I got a simple solution now to get the estimated table row count and
sofar it seems to be working well:
SELECT DISTINCT
1 / I.RDB$STATISTICS
FROM
RDB$INDEX_SEGMENTS RIS
INNER JOIN RDB$INDICES I ON
(RIS.RDB$INDEX_NAME = I.RDB$INDEX_NAME)
WHERE
I.RDB$RELATION_NAME = 'ADDRESS ' AND
I.RDB$UNIQUE_FLAG = 1
It will fail on tables with no unique index, but that will be small tables
in any case.
RBS
_____
From: firebird-support@yahoogroups.com
[mailto:firebird-support@yahoogroups.com] On Behalf Of Adam
Sent: 01 October 2008 01:37
To: firebird-support@yahoogroups.com
Subject: [firebird-support] Re: What programming languages and toolkits do
you use to access Firebird?
--- In firebird-support@ <mailto:firebird-support%40yahoogroups.com>
yahoogroups.com, bartsmissaert@... wrote:
If by "some overhead" you mean forcing me to wait for inserting a
record into a particular table because you have already inserted a
record and now the world is waiting for you to decide whether you want
to commit or rollback, then sure.
There is a conceptual problem with the word "count". It means
different things in different contexts.
If I am keeping track of how many stock items are being sold, then I
want an exact number. If I am telling the user how many results a
particular Google search returned, an estimation is going to be good
enough.
In terms of estimations, there are numerous options that have been
suggested. You can infer the count from the system tables storing
index statistics for tables with primary keys. You can look at gstat
and do a bit of math to get a reasonable guesstimation. Another simple
one is to create a generator and an after insert and after delete
trigger (after every other trigger has fired). The insert trigger
increments the generator by 1, the delete trigger decrements by 1. You
can simply query the generator value at your convenience.
The count will include inserted records that have not committed, and
will not include deleted records that have not committed, but it will
get out of whack if you roll back after a significant number of
changes, so you need some process to resync it with the actual count.
In terms of an accurate count, the only (fast) way I have seen is to
create a table to hold the count value. The insert trigger puts a +1.
The delete trigger puts a -1 in this table. You can then create a view
that sums the value of this counting table. All the inserts and
deletes in this table remain under your transaction, so the sum is
always right.
You then need to create a way of summarising the table; that is a
stored procedure that does something like
* Select sum into x
* empty count table
* insert x into count table
An external process needs to call that stored procedure on a regular
basis (eg daily, hourly or every minute, it depends on the traffic).
The view you created always returns the correct value, but the closer
it runs to your stored procedure, the quicker it will run.
But not all queries are 'select * from table'. You have joins, unions,
exists, not exists, where conditions, having clauses and all sorts of
other strange combinations of the above. There is no simple way of
guaranteeing an estimate with any chance of accuracy is even possible.
Adam
[Non-text portions of this message have been removed]
sofar it seems to be working well:
SELECT DISTINCT
1 / I.RDB$STATISTICS
FROM
RDB$INDEX_SEGMENTS RIS
INNER JOIN RDB$INDICES I ON
(RIS.RDB$INDEX_NAME = I.RDB$INDEX_NAME)
WHERE
I.RDB$RELATION_NAME = 'ADDRESS ' AND
I.RDB$UNIQUE_FLAG = 1
It will fail on tables with no unique index, but that will be small tables
in any case.
RBS
_____
From: firebird-support@yahoogroups.com
[mailto:firebird-support@yahoogroups.com] On Behalf Of Adam
Sent: 01 October 2008 01:37
To: firebird-support@yahoogroups.com
Subject: [firebird-support] Re: What programming languages and toolkits do
you use to access Firebird?
--- In firebird-support@ <mailto:firebird-support%40yahoogroups.com>
yahoogroups.com, bartsmissaert@... wrote:
>imagine
> > There is no way to know about
> > row number efficiently.
>
> Just wonder, why can't databases keep track internally of the count of
> table rows, so getting the number of rows would be instant.
> Of course there would be some overhead in doing that, but I can't
> it is much.RBS,
>
> RBS
If by "some overhead" you mean forcing me to wait for inserting a
record into a particular table because you have already inserted a
record and now the world is waiting for you to decide whether you want
to commit or rollback, then sure.
There is a conceptual problem with the word "count". It means
different things in different contexts.
If I am keeping track of how many stock items are being sold, then I
want an exact number. If I am telling the user how many results a
particular Google search returned, an estimation is going to be good
enough.
In terms of estimations, there are numerous options that have been
suggested. You can infer the count from the system tables storing
index statistics for tables with primary keys. You can look at gstat
and do a bit of math to get a reasonable guesstimation. Another simple
one is to create a generator and an after insert and after delete
trigger (after every other trigger has fired). The insert trigger
increments the generator by 1, the delete trigger decrements by 1. You
can simply query the generator value at your convenience.
The count will include inserted records that have not committed, and
will not include deleted records that have not committed, but it will
get out of whack if you roll back after a significant number of
changes, so you need some process to resync it with the actual count.
In terms of an accurate count, the only (fast) way I have seen is to
create a table to hold the count value. The insert trigger puts a +1.
The delete trigger puts a -1 in this table. You can then create a view
that sums the value of this counting table. All the inserts and
deletes in this table remain under your transaction, so the sum is
always right.
You then need to create a way of summarising the table; that is a
stored procedure that does something like
* Select sum into x
* empty count table
* insert x into count table
An external process needs to call that stored procedure on a regular
basis (eg daily, hourly or every minute, it depends on the traffic).
The view you created always returns the correct value, but the closer
it runs to your stored procedure, the quicker it will run.
But not all queries are 'select * from table'. You have joins, unions,
exists, not exists, where conditions, having clauses and all sorts of
other strange combinations of the above. There is no simple way of
guaranteeing an estimate with any chance of accuracy is even possible.
Adam
[Non-text portions of this message have been removed]