Subject | Query help |
---|---|
Author | Kjell Rilbe |
Post date | 2012-07-22T19:59:52Z |
Trying to figure out a way to do a search/query with decent performance.
Table "Master" contains roughly 100 million records and table "Detail"
contains roughly 200 million records.
Both have a bigint "Id" primary key and "Detail"."Master" links to
"Master"."Id" in a regular master-detail relationship. "Detail"."Master"
is indexed, but there's no FK constraint (never mind why).
"Detail"."Removed" TIMESTAMP is nullable and NOT indexed.
"Detail"."RemovedBy" bigint is nullable (FK to a third table) and IS
indexed.
If a detail has been removed, "Removed" is set to the time of removal
and "RemovedBy" is set to the id of a record containing data about the
removal, e.g. who did it and so forth.
Business logic says that if a master has any details at all, at least
one of them should be current, i.e. NOT be removed. So, at least one
detail should have "Removed" and "RemovedBy" set to null.
I've found a breech of this rule in my data and I want to find if there
are more.
The obvious select is this:
select *
from "Master" M
where exists ( /* Are there any details at all? */
select 1
from "Detail" D
where D."Master" = M."Id"
)
and not exists ( /* No detail that's current? */
select 1
from "Detail" D
where D."Master" = M."Id"
and D."Removed" is null
)
Unfortunately, this will cause a natural scan of the 100 million record
"Master" table, and roughly 200 million subselects...
Can anyone figure out a way to avoid a full table scan?
Thanks,
Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64
Table "Master" contains roughly 100 million records and table "Detail"
contains roughly 200 million records.
Both have a bigint "Id" primary key and "Detail"."Master" links to
"Master"."Id" in a regular master-detail relationship. "Detail"."Master"
is indexed, but there's no FK constraint (never mind why).
"Detail"."Removed" TIMESTAMP is nullable and NOT indexed.
"Detail"."RemovedBy" bigint is nullable (FK to a third table) and IS
indexed.
If a detail has been removed, "Removed" is set to the time of removal
and "RemovedBy" is set to the id of a record containing data about the
removal, e.g. who did it and so forth.
Business logic says that if a master has any details at all, at least
one of them should be current, i.e. NOT be removed. So, at least one
detail should have "Removed" and "RemovedBy" set to null.
I've found a breech of this rule in my data and I want to find if there
are more.
The obvious select is this:
select *
from "Master" M
where exists ( /* Are there any details at all? */
select 1
from "Detail" D
where D."Master" = M."Id"
)
and not exists ( /* No detail that's current? */
select 1
from "Detail" D
where D."Master" = M."Id"
and D."Removed" is null
)
Unfortunately, this will cause a natural scan of the 100 million record
"Master" table, and roughly 200 million subselects...
Can anyone figure out a way to avoid a full table scan?
Thanks,
Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64