Subject How to optimize this query using Firebird?
Author partsi
A support question to DB experts.

----
DB LAYOUT

1. Table DV_PV, its primary keys and indexes

CREATE TABLE DV_PV (
DV_OBJECTTYPE INTEGER NOT NULL, // PK
DV_DOCUMENTPART INTEGER NOT NULL, // PK
DV_VERSIONPART INTEGER NOT NULL, // PK
PROPERTYDEF INTEGER NOT NULL, // PK
SORTINDEX SMALLINT NOT NULL, // PK
PROPERTYVALUE NUMERIC( 18, 0 ) NOT NULL,
LATEST SMALLINT NOT NULL

);

CREATE ASC INDEX IX_DV_PV_PV ON DV_PV ( PROPERTYVALUE, LATEST )

2. Table DV_PV, its primary keys and indexes

CREATE TABLE PV_LOOKUP (
PROPERTYVALUE NUMERIC( 18, 0 ) NOT NULL, // PK
PROPERTYDEF INTEGER NOT NULL,
DATA INTEGER,
TARGETVER INTEGER
);

CREATE UNIQUE ASC INDEX IX_PV_LOOKUP_PD_DATA_TARGETVER ON PV_LOOKUP (
PROPERTYDEF, DATA, TARGETVER )

3. Table DV_PV, its primary keys and indexes

CREATE TABLE PV_TIMESTAMP (
PROPERTYVALUE NUMERIC( 18, 0 ) NOT NULL, // PK
PROPERTYDEF INTEGER NOT NULL,
DATA TIMESTAMP
);

CREATE UNIQUE ASC INDEX IX_PV_TIMESTAMP_PD_DATA ON PV_TIMESTAMP (
PROPERTYDEF, DATA )
----

This database layout is for storing properties of object versions.
Timestamp properties are stored in the PV_TIMESTAMP table. Lookup
properties are stored in the PV_LOOKUP table. The DV_PV table
contains links to these data tables, and based on it, one can query
the properties of a given object version. Note that PV_TIMESTAMP and
PV_LOOKUP are normalized. Any given value for the same property is
stored only once in these tables, and multiple object versions in
DV_PV can refer to this value.

On the given tables, we can run queries such as the following in
order to search for the object versions that match given property
conditions.

SELECT C2.DV_OBJECTTYPE, C2.DV_DOCUMENTPART, C2.DV_VERSIONPART
FROM DV_PV C2
INNER JOIN PV_TIMESTAMP C3 ON C3.PROPERTYVALUE =
C2.PROPERTYVALUE
INNER JOIN DV_PV C4 ON
C4.DV_OBJECTTYPE = C2.DV_OBJECTTYPE
AND C4.DV_DOCUMENTPART = C2.DV_DOCUMENTPART
AND C4.DV_VERSIONPART = C2.DV_VERSIONPART
INNER JOIN PV_LOOKUP C5 ON C5.PROPERTYVALUE = C4.PROPERTYVALUE
WHERE
C2.PROPERTYDEF = 21
AND C3.PROPERTYDEF = 21
AND C3.DATA >= '24.12.2007'
AND C4.PROPERTYDEF = 23
AND C5.PROPERTYDEF = 23
AND C5.DATA = 101

I would like to make my queries and DB layout optimum for these kinds
of searches. The basic procedure for boolean AND searching goes as
follows:

1. One data condition, say DC1, goes over others. The query starts
from the data table, say DT1, that corresponds this condition (e.g.
PV_TIMESTAMP or PV_LOOKUP).

2. JOIN to DV_PV finds the matching object versions.

3. For DC2: JOIN to DV_PV.

4. For DC2: JOIN to DT2 for checking DC2.

5. For DC3: JOIN to DV_PV.

6. For DC3: JOIN to DT3 for checking DC3.

7. For DC4: JOIN to DV_PV.

8. For DC4: JOIN to DT4 for checking DC4.

...and so on. In this procedure, it is highly critical for overall
performance that the data condition that is preferred over the others
results in the FEWEST amount of rows for the joins. If the optimizer
chooses a condition the effective selectivity of which is worst, the
query eventually slows down because the intermediate joins will
suffer from this poor selectivity and lots of unnecessary object
versions are involved.

How could I assist the optimizer to choose the most effective data
condition to be evaluated first? How do I make my query OR DB layout
optimum for these searches? Should I move from my normalized DB
structure to a more denormalized form in which the data values are
directly stored for object versions in the same table?

Any comments appreciated!

Timo Partanen