Subject Re: [firebird-support] (Partially) Comparing records
Author Adomas Urbanavicius
Considering, that with lots of records your search might become
exhaustive according to perfomance, and it might be used frequently, you
may think about
adding indexed group field to your table, or in another table (if there
might be more than one partial grouping criteria:with 1 to many table).
Then, on insert/update trigger, just
get my_group_id.

This would be something, what in oracle is called "materialized view".
Then any your partial searches would return results almost in 0
miliseconds :).
select count (*)/ * from my_books where group_id = (select group_id from
my_books where pk=:my_pk)

Advantage:search is extremely fast, you can implement more than one
partial search (with 1 to many table).Posibility to manually recalculate
all partial references.
Disadvantage:insert/update might become slower. data
overhead,maintainance of one more table, and one more trigger :)



Adomas


John vd Waeter wrote:

>Hi All,
>
>I'm writing an SP that compares two records from the same table. I want
>to know how many records have identical the same content (except from
>the PK, of course). Later I want to know which records are *allmost*
>identical.
>
>I have this SP that takes a recordID (PK) as input and does an inner
>join with itself, using an ON clause in which the columns are compared.
>The record to compare with is the recordid in the WHERE clause.
>
>CREATE PROCEDURE HOW_MANY_IDENTAL_BOOKS (
> ABOOKID INTEGER)
>RETURNS (
> THE_RESULT INTEGER)
>AS
>begin
> SELECT COUNT(*) AS TOTALS FROM BOOKS A
> INNER JOIN BOOKS B
> ON (UPPER(A.AUTHOR) = UPPER(B.AUTHOR))
> AND
> (UPPER(A.TITLE)=UPPER(B.TITLE))
> AND
> (A.PRICE = B.PRICE) /* price is is INTEGERfield */
> AND
> (A.YEAR = B.YEAR) /* YEAR is INTEGER field */
> /*
> and a lot more AND...)
> */
> WHERE
> A.BOOKID = :ABOOKID
> INTO :THE_RESULT;
>
> SUSPEND;
>
>end
>
>The above is the simplified version. The real SP does more calculation
>on datefields and other real-numbers using UDF's. The suspend is because
>the result is read from a Query like
>
>SELECT THE_RESULT FROM HOW_MANY_IDENTAL_BOOKS (150)
>
>And the field The_Result should hold an integer.
>
>This works. But I have the feeling it is kinda slow. Using a table of
>about 2000 books, it takes allmost 2 seconds. In real life, it may have
>to run on a table of 100's thousands of records and a more dedicated
>version has to return a list of comparable BOOK_ID's or even a
>percentage that says how many fields had the same or almost the same
>content.
>
>So I wonder if this is the right approach. I tried to put indexes on
>some fields like AUTHOR and TITLE, but I think the use of UPPER in the
>ON-clause prevents using these indexes.
>
>Is the ON_Clause always completely evaluated? Or does it stop after a
>FALSE in the first condition?
>
>Would/Should there be a performance-gain if I move some of the ON
>conditions in the WHERE clause?
>
>
>Thank you for any suggestions!
>
>John
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>Visit http://firebird.sourceforge.net and click the Resources item
>on the main (top) menu. Try Knowledgebase and FAQ links !
>
>Also search the knowledgebases at http://www.ibphoenix.com
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>Yahoo! Groups Links
>
>
>
>
>
>
>
>
>
>


--

Pagarbiai / Best Regards,
Adomas Urbanavicius
+37068543336