Subject Re: Query Optimisation question
Author Svein Erling Tysvær
Hi Tom! See comments within/below.

--- In firebird-support@yahoogroups.com, Tom Conlon wrote:
>
> Hi All,
>
> CandidateAttribute is a 3.5million row table (see DDL below).
>
> Although it is accessed regularly for an individual candidate, the
> most important queries against this table are requesting matching
> candidates with *a number of mandatory* attributes and/or a number
> of 'optional' attributes. Ratings (1-5 denoting advanced to basic)
> against each individual attribute may also be specified.
>
> Additional criteria may well be specified at the Candidate table
> level (location, type, pay, etc, etc), so the results have to be
> joined and ultimately a list of matches returned. The Candidate
> table contains approx 130,000 rows.
>
> e.g In plain english:
> "I want the following candidates:
> - They *must have* excellent C++ AND Java.
> - They *must have* above average Delphi.
> - They can have *any one of* MSWORD, EXCEL or POWERPOINT
> to average or higher level."
> - Be looking for work and looking for a permanent position.
>
> Mandatory AttributeName AttributeID Rating
> --------- ------------- ----------------
> YES C++ 101 1 (Excellent)
> YES *JAVA 107 1 (Excellent)
> YES *DELPHI 225 2 (Above Average)
> NO* MSWORD 333 3 (Average)
> NO* EXCEL 334 3 (Average)
> NO* POWERPOINT 418 3 (Average)
>
> * not mandatory individually, but as a group of three 1 must exist
> and therefore mandatory at this level.
>
> Any response(s) to the following two questions would be appreciated:
>
> Question 1.
> -----------
> The query was deemed too complex for a single SQL statement. As FB
> likes JOINS :) the current method is to load a qrycriteria table
> with the attribute criteria and then use an sp to join it to
> CandidateAttribute, inserting matches into a temporary user table
> qrytmp<UserId> (it has 1 column - ID) that performs the final join
> to Candidate. If no attributes are requested then query goes
> straight for home bypassing qrytmp.
>
> The sp uses a combination of GROUP BY for initial mandatory match
> and then within the FOR SELECT does an IF EXISTS test for 'optional'
> vals.
>
> If any anyone has a better alternative method then I'm all ears!
>
> (I was hoping that a combination of INNER JOINS (for mandatory) and
> LEFT OUTER JOINS (for optional) would have been fast and efficient
> but it didn't fly)

This doesn't sound to complex for a single SQL statement, but things
not relevant to your current problem may of course influence that
decision. In general, I find that INNER JOINs are great, but that
OUTER JOINs occasionally may cause problems in Firebird. In your case,
I would have tried the following SQL statement:

SELECT <whatever> FROM CandidateAttribute CA1
WHERE CA1.AttributeID = 101 AND CA1.Rating <= 1
AND EXISTS(SELECT *
FROM CandidateAttribute CA2
JOIN CandidateAttribute CA3 ON CA2.CandidateId = CA3.CandidateId
JOIN CandidateAttribute CA4 ON CA2.CandidateId = CA4.CandidateId
WHERE CA1.CandidateId = CA2.CandidateId
AND CA2.AttributeID = 107 AND CA2.Rating <= 1
AND CA3.AttributeID = 225 AND CA3.Rating <= 2
AND (CA4.AttributeID = 333 AND CA4.Rating <= 3 OR
CA4.AttributeID = 334 AND CA4.Rating <= 3 OR
CA4.AttributeID = 418 AND CA4.Rating <= 3))

> Question 2.
> -----------
> What would be the best PK, FK, Index strategy in this situation?
>
> Currently CandidateAttribute has:
> PK (AttributeID, CandidateID)
> FK1 (AttributeID)
> FK2 (CandidateID)
>
> There was an index on Rating but it was dropped as a lot of rows
> were the default.

Good choice to drop the rating index, especially if it only has three
values.

I would add another field and have this as a surrogate primary key.
I'm not 100% sure why, but I dislike indexes or keys that starts with
the first field of the primary key. Though it may just be me.

By the way, you explained your problem very well, that makes it a lot
easier to help. Thanks!
Set