Subject Re: [firebird-support] Limiting fetched rows doesn't offer much performance benefit
Author Helen Borrie
At 04:11 AM 8/03/2006, you wrote:
>We are trying to query a very large table (millions of records) and
>only retrieve a few rows using "FIRST n". From our tests, "SELECT
>FIRST 50 * FROM TABLE WHERE FLAG='Y'" doesn't run significantly faster
>than "SELECT * FROM TABLE WHERE FLAG='Y'". The FLAG field is combined
>in a compound index with a unique record number to avoid low selectivity.
>
>A review of previous postings appears to be contradictory. Several
>newsgroup postings indicate "FIRST n" doesn't make sense without an
>ordering clause. One newsgroup posting,
>news://news.atkin.com:119/43F0AC95.6090402@..., stated "If
>you don't include an order by, then the first x records are returned
>and the query stops" which is the behavior we would like but have not
>observed.
>
>We are not using an ordering clause since we do not care what order
>the records are in. We are trying to locate records that have been
>queued up for processing. Does anyone know of a way to retrieve a few
>records without the query going through the entire table?

First, I don't think it's going through the entire table. I think
the slowness has one or two possible causes, viz.

1) it's using your index and, because of the physical size of the
index, it's taking a horrendously long time to build the
bitmap. (Check the plan to verify or discount this theory!)

If so, then add a condition to prevent it from using the index , e.g.
SELECT
FIRST 50 * FROM TABLE WHERE FLAG='Y' and 1=1

...or drop the index if you have nothing that needs it.

--and/or

2) You are using a ReadCommitted transaction with WAIT lock
resolution. No doubt, a "queue" table is taking inserts fairly
frequently. With these settings, your query will literally keep
waiting until there is a long enough gap in the inserting for it to
actually get a valid "first 50" records, which could take ages. Each
time the feeding application commits something it has to re-run the
query to ensure that it's got the specified "first 50".

So try your query with Concurrency/WAIT and see if it makes a difference.


>We are using FB 1.5.3.4842 on Windows.
>
>Thanks in advance for any assistance.

If this is actually part of a stored procedure then don't use SELECT
FIRST at all. Instead, do this:

create procedure blah (batchsize integer....)
as
declare variable counter smallint = 0;
declare variable vRecordID BigInt = 0;
declare variable vFlag char;
declare variable [others needed for batch processing]
...
begin
for select RecordID, Flag, [others needed for batch processing]
from aTable
into :vRecordID, :vFlag, [others needed for batch processing]
as cursor aCursor
do
while (counter < batchsize) do
begin
if (vFlag = 'Y') then
begin
counter = counter +1;
[ do your row-level processing ] ;
update aTable
set flag = null where current of aCursor;
end
end
end

If you're actually doing the processing on the client side (as one
suspects, seeing SELECT * ! ) still use a SP, making it selectable,
to pull just the PKs of the queue records over to the client, viz.

create procedure GetBatchKeys (batchsize integer)
as
declare variable counter smallint = 0;
declare variable vRecordID BigInt = 0;
declare variable vFlag char;

begin
for select RecordID, Flag
from aTable
into :vRecordID, :vFlag
as cursor aCursor
do
while (counter < batchsize) do
begin
if (vFlag = 'Y') then
begin
counter = counter +1;
set flag = 'P' where current of aCursor; -- locks the record
suspend;
end
end
end

On the client side, in the same *concurrency/wait* transaction,
SELECT RecordID from GetBatchKeys(50) into a local structure and
close the set. DON'T COMMIT.

Then use the keys from the structure to formulate an IN clause:

select [column_list] from aTable
where RecordID in ([your list])

NB, whatever you use to get your batch set, make sure you have some
way to lock the records you're working on, and don't let it hang
around too long.

./heLen