Subject (Partially) Comparing records
Author John vd Waeter
Hi All,

I'm writing an SP that compares two records from the same table. I want
to know how many records have identical the same content (except from
the PK, of course). Later I want to know which records are *allmost*
identical.

I have this SP that takes a recordID (PK) as input and does an inner
join with itself, using an ON clause in which the columns are compared.
The record to compare with is the recordid in the WHERE clause.

CREATE PROCEDURE HOW_MANY_IDENTAL_BOOKS (
ABOOKID INTEGER)
RETURNS (
THE_RESULT INTEGER)
AS
begin
SELECT COUNT(*) AS TOTALS FROM BOOKS A
INNER JOIN BOOKS B
ON (UPPER(A.AUTHOR) = UPPER(B.AUTHOR))
AND
(UPPER(A.TITLE)=UPPER(B.TITLE))
AND
(A.PRICE = B.PRICE) /* price is is INTEGERfield */
AND
(A.YEAR = B.YEAR) /* YEAR is INTEGER field */
/*
and a lot more AND...)
*/
WHERE
A.BOOKID = :ABOOKID
INTO :THE_RESULT;

SUSPEND;

end

The above is the simplified version. The real SP does more calculation
on datefields and other real-numbers using UDF's. The suspend is because
the result is read from a Query like

SELECT THE_RESULT FROM HOW_MANY_IDENTAL_BOOKS (150)

And the field The_Result should hold an integer.

This works. But I have the feeling it is kinda slow. Using a table of
about 2000 books, it takes allmost 2 seconds. In real life, it may have
to run on a table of 100's thousands of records and a more dedicated
version has to return a list of comparable BOOK_ID's or even a
percentage that says how many fields had the same or almost the same
content.

So I wonder if this is the right approach. I tried to put indexes on
some fields like AUTHOR and TITLE, but I think the use of UPPER in the
ON-clause prevents using these indexes.

Is the ON_Clause always completely evaluated? Or does it stop after a
FALSE in the first condition?

Would/Should there be a performance-gain if I move some of the ON
conditions in the WHERE clause?


Thank you for any suggestions!

John