Subject | Inexact database operations |
---|---|
Author | Alexander Klenin |
Post date | 2005-06-28T08:27:11Z |
As promised in a message to "Bi-directional indexes" thread, I want to
raise the following issue:
It was just recently said in this group that search engines are a
terrible model for the databases, but the problem is -- the use cases
for both types of systems are often identical.
Users DO want to issue non-exact queries that matches perhaps a
million records in a database, browse a few (or not so few) pages of
the resultset, then re-formulate a query to narrow search, etc.
The solution I use to the first problem that occurs here -- effective
browsing of large resultset -- is persented in my previous mail, and I
am quite happy with it.
The second problem is much more difficult -- I want to estimate the
number of records in the resultset *without* fetching them all.
I understand well that "select count(*)" query is very cheap in some
databases (like MySQL) and very expensive in others (like Firebird).
But here is my heretic proposal: implement "APPROXIMATE_COUNT(*)" SQL
function, that would return the number of, say, index entries without
trying to materialize every record involved.
Note that the usage outlined above is not too concerned with the exact
accuracy of the counters -- it just needs an estimate. And, with
web-interface there can not be any consistency guarantee anyway --
because the transaction can not span more than one HTTP request.
In similiar vein, "APPROXIMATE_MIN" and "APPROXIMATE_MAX" functions
might also be of some use.
--
Alexander S. Klenin
Insight Experts Ltd.
raise the following issue:
It was just recently said in this group that search engines are a
terrible model for the databases, but the problem is -- the use cases
for both types of systems are often identical.
Users DO want to issue non-exact queries that matches perhaps a
million records in a database, browse a few (or not so few) pages of
the resultset, then re-formulate a query to narrow search, etc.
The solution I use to the first problem that occurs here -- effective
browsing of large resultset -- is persented in my previous mail, and I
am quite happy with it.
The second problem is much more difficult -- I want to estimate the
number of records in the resultset *without* fetching them all.
I understand well that "select count(*)" query is very cheap in some
databases (like MySQL) and very expensive in others (like Firebird).
But here is my heretic proposal: implement "APPROXIMATE_COUNT(*)" SQL
function, that would return the number of, say, index entries without
trying to materialize every record involved.
Note that the usage outlined above is not too concerned with the exact
accuracy of the counters -- it just needs an estimate. And, with
web-interface there can not be any consistency guarantee anyway --
because the transaction can not span more than one HTTP request.
In similiar vein, "APPROXIMATE_MIN" and "APPROXIMATE_MAX" functions
might also be of some use.
--
Alexander S. Klenin
Insight Experts Ltd.