Subject Re: Why "SELECT FIRST 1 ..." and "EXISTS ( SELECT * FROM ... )" are so slow in a
Author Adam
--- In firebird-support@yahoogroups.com, "Antti Nivala"
<antti.nivala@...> wrote:
>
> Dmitry Yemanov wrote:
> > Records are fetched from a table based on the bitmap filled by the
> > index scan. I.e. no record can be fetched until the index scan
> > completes for all existing matches. For a non-selective index,
> > the scan may take quite a long time.
>
> Thank you. This fully explains the problem I am experiencing. Is there a
> workaround, i.e. is there a way to make a pure index-based existence
> check based on the "PV = 2" condition when there are lots of (e.g.
> 1.000.000) records for which this is true? The table has 26.000.000
> records so a natural scan would be much worse in many cases.

Have you actually MEASURED this, or is it just theorised?

If you tried a natural read, then on average you would expect to find
a match every 26 records. The exact number of records per data page
depends on the size of each record. Each data page is 4096 bytes
unless you set it to some other size when restoring the database.

>
> In practice, I am looking for a way to delete a record from table
> PV_TYPE if that record is not referenced by any record in the DV_PV
> table. For example, when deleting a record from PV_TYPE, I currently do
> DELETE FROM PV_TYPE
> WHERE
> PV = <x> AND
> NOT EXISTS ( SELECT 1 FROM DV_PV WHERE PV = <x> )
>
> This takes 0 ms if there are not many matching records in DV_PV, but 400
> ms if there are 1.000.000 matching records in DV_PV. My database has
> just been backed up and restored. The point is, as Dmitry said, that the
> EXISTS check reads lots of data from the index in that case. Is there a
> way to avoid that, i.e. utilize the index for a pure existence check?

The index includes deleted records and uncommitted values that should
not be visible to your transaction. Existence tests must not only find
a match in the index, but must also confirm with the data on the
respective data page whether that record exists.

From Dmitry's post, the bitmap is built before the data pages are even
considered.

I
> would want FB to stop reading the index as soon as it sees that at least
> one record matches.
>
> The workaround of having a manual count field is one possibility, but I
> am still looking for a more efficient and elegant solution. Is there
> one?

Am I right to assume that there are only approx 26 distinct values in
the 26 million records?

The following query would then return quite a small number of values.
I am not sure how the optimiser would handle it in a subquery,
possibly not too well, but you could create a table PV_INUSE

INSERT INTO PV_INUSE VALUES
SELECT DISTINCT PV+0
FROM DV_PV

The +0 prevents it from using the index which would slow it down.

Then you can do your existence check against PV_INUSE

Adam