Subject RE: [firebird-support] How to optimize this query using Firebird?
Author Svein Erling Tysvær
Hi Timo!

Getting the ideal plan requires intimate knowledge of a lot more than your design, we need to know about which Firebird version you use and a lot about your actual data. Telling us the Firebird version and some basic statistics may be advisable, there are a few persons here that can benefit from that. I can only say that Firebird 2.0 knows more about each part of a combined index than Firebird 1.5, and that helps it make more 'educated' choices.

Though back to your original query. As far as I can see, the realistic choice of the optimizer is whether it should start the plan with C3 or C5. You say that C5 is unique (i.e. a good choice if considering only this table), but how selective is C3.PROPERTYDEF = 21 AND C3.DATA >= '24.12.2007'? Probably not better than unique, hence I'd say the optimizers choice is good. Which choice would be the ideal, I would guess boils down to (in addition to the selectivity of C3) which of the following is the most selective: C2.PROPERTYVALUE or C4.PROPERTYVALUE? The optimizer has no way of knowing this, it's only you who may know...

Though there is one thing I would recommend for this particular query. Either add a separate index for PROPERDYDEF or change to CREATE ASC INDEX IX_DV_PV_PV ON DV_PV ( PROPERTYVALUE, PROPERTYDEF, LATEST ) (I generally tend to use separate indexes for each field only adding the PK if otherwise non-selective, I think there is a slight speed improvement having combined indexes, but I find that it makes it much more difficult to understand plans quickly and may be troublesome for other queries). Even though PROPERTYDEF is part of your primary key, since it isn't the first field in your primary key and the preceding fields aren't available neither in the WHERE clause nor in joining C4 to C5, the PK cannot be used for C4.

With experience and intimate knowledge about your data, you could understand which table condition would be ideal to start with and prevent the optimizer from making stupid choices by adding +0 to some of your conditions. In your query, that would mean something like C4.DV_OBJECTTYPE = C2.DV_OBJECTTYPE+0 if you were certain (or wanted to test) that starting with C3 would be a better choice.

HTH,
Set

-----Original Message-----
From: firebird-support@yahoogroups.com [mailto:firebird-support@yahoogroups.com] On Behalf Of partsi
Sent: 18. november 2007 17:00 and 19. november 2007 08:21
To: firebird-support@yahoogroups.com
Subject: [firebird-support] How to optimize this query using Firebird?

A support question to DB experts.

----
DB LAYOUT

1. Table DV_PV, its primary keys and indexes

CREATE TABLE DV_PV (
DV_OBJECTTYPE INTEGER NOT NULL, // PK
DV_DOCUMENTPART INTEGER NOT NULL, // PK
DV_VERSIONPART INTEGER NOT NULL, // PK
PROPERTYDEF INTEGER NOT NULL, // PK
SORTINDEX SMALLINT NOT NULL, // PK
PROPERTYVALUE NUMERIC( 18, 0 ) NOT NULL,
LATEST SMALLINT NOT NULL

);

CREATE ASC INDEX IX_DV_PV_PV ON DV_PV ( PROPERTYVALUE, LATEST )

2. Table DV_PV, its primary keys and indexes

CREATE TABLE PV_LOOKUP (
PROPERTYVALUE NUMERIC( 18, 0 ) NOT NULL, // PK
PROPERTYDEF INTEGER NOT NULL,
DATA INTEGER,
TARGETVER INTEGER
);

CREATE UNIQUE ASC INDEX IX_PV_LOOKUP_PD_DATA_TARGETVER ON PV_LOOKUP (
PROPERTYDEF, DATA, TARGETVER )

3. Table DV_PV, its primary keys and indexes

CREATE TABLE PV_TIMESTAMP (
PROPERTYVALUE NUMERIC( 18, 0 ) NOT NULL, // PK
PROPERTYDEF INTEGER NOT NULL,
DATA TIMESTAMP
);

CREATE UNIQUE ASC INDEX IX_PV_TIMESTAMP_PD_DATA ON PV_TIMESTAMP (
PROPERTYDEF, DATA )
----

This database layout is for storing properties of object versions.
Timestamp properties are stored in the PV_TIMESTAMP table. Lookup
properties are stored in the PV_LOOKUP table. The DV_PV table
contains links to these data tables, and based on it, one can query
the properties of a given object version. Note that PV_TIMESTAMP and
PV_LOOKUP are normalized. Any given value for the same property is
stored only once in these tables, and multiple object versions in
DV_PV can refer to this value.

On the given tables, we can run queries such as the following in
order to search for the object versions that match given property
conditions.

SELECT C2.DV_OBJECTTYPE, C2.DV_DOCUMENTPART, C2.DV_VERSIONPART
FROM DV_PV C2
INNER JOIN PV_TIMESTAMP C3 ON C3.PROPERTYVALUE =
C2.PROPERTYVALUE
INNER JOIN DV_PV C4 ON
C4.DV_OBJECTTYPE = C2.DV_OBJECTTYPE
AND C4.DV_DOCUMENTPART = C2.DV_DOCUMENTPART
AND C4.DV_VERSIONPART = C2.DV_VERSIONPART
INNER JOIN PV_LOOKUP C5 ON C5.PROPERTYVALUE = C4.PROPERTYVALUE
WHERE
C2.PROPERTYDEF = 21
AND C3.PROPERTYDEF = 21
AND C3.DATA >= '24.12.2007'
AND C4.PROPERTYDEF = 23
AND C5.PROPERTYDEF = 23
AND C5.DATA = 101

The plan for this particular case (two data conditions) is:

PLAN JOIN (C5 INDEX (IX_PV_LOOKUP_PD_DATA_TARGETVER), C4 INDEX
(IX_DV_PV_PV), C2 INDEX (PK_DV_PV), C3 INDEX (PK_PV_TIMESTAMP))

The query processing starts from C5. This can be a good or bad
choice. Because IX_PV_LOOKUP_PD_DATA_TARGETVER is unique, no more
than one row is found in C5. But the join to C4 (DV_PV) can produce
hundreds of thousands of rows indicating that very many object
versions meet this condition. This fact can kill the overall query
performance because these rows are used in joining to C2 (DV_PV) and
finally to C3 for checking the second (timestamp) condition. If the
initial guess is bad, the database is subject to a big amount of
index seeks even though there are a few results in the result set.

How could I assist the optimizer to choose the most effective data
condition to be evaluated first? How do I make my query OR DB layout
optimum for these searches? Should I move from my normalized DB
structure to a more denormalized form in which the data values are
directly stored for object versions in the same table?

Any comments appreciated!

Timo Partanen