Subject Re: [firebird-support] Re: this works, but is it the best way?
Author Helen Borrie
At 01:53 PM 1/03/2005 +0000, you wrote:

> > >Right now the table is real small, but it will probably get huge.
> > >I will limit the scope of the query to a date range, and I do have an
> > >index on SD_GTLINENO, SD_INVOICE, and SD_GTID and also on the invoice
> > >date field. (SD_GTLINENO IS a unique identifier.)
> > >
> > >Is there a better way to make sure you don't have one invoice pointing
> > >to separate customer ID?
> >
>
>SELECT s.SD_GTLINENO, s.SD_INVOICE, s.SD_GTID, s1.SD_GTID FROM
>SALESDATA s, SALESDATA s1
>WHERE s.SD_GTID != s1.SD_GTID
>AND s1.SD_INVOICE=s.SD_INVOICE
>
>I want to find the most efficient way to find duplicate linkages. In
>the stored procedure tha populates the table, it already looks for
>obvious duplicates and marks them as such, but since this is a data
>table populated by remote heterogenous clients, I don't necessarily
>want to disallow any duplications, just evaluate them.
>
>And since the data table quite possibly will grow to a huge size, I
>want to use the most efficient code to find them. There will probably
>be other duplicate scenarios I will want to uncover, but if I can get
>this one right, I can copy it over.

OK, first, get rid of that compound index because it doesn't do anything
useful. Then index both SD_INVOICE and SD_GTID (2 indexes, not one).

Now, currently your query is going to return duplicated output rows if
there are three or more SD_GTID rows associated with the same
SD_INVOICE. If this is what you want, your query is OK - provided you fix
up the indexes.

If you want to return just one row for each row where there are divers
SD_GTIDs for the same SD_INVOICE, you could do a correlated grouped
subquery, like this:

SELECT
s.SD_GTLINENO,
s.SD_INVOICE,
s.SD_GTID
FROM SALESDATA s
WHERE EXISTS (
SELECT s1.INVOICE FROM SALESDATA s1
WHERE s1.SD_INVOICE=s.SD_INVOICE
GROUP BY 1
HAVING COUNT(s1.SD_GTID) > 1)

./heLen