Subject | Re: this works, but is it the best way? |
---|---|
Author | Dixon Epperson |
Post date | 2005-03-01T13:53:03Z |
--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@t...>
wrote:
the stored procedure tha populates the table, it already looks for
obvious duplicates and marks them as such, but since this is a data
table populated by remote heterogenous clients, I don't necessarily
want to disallow any duplications, just evaluate them.
And since the data table quite possibly will grow to a huge size, I
want to use the most efficient code to find them. There will probably
be other duplicate scenarios I will want to uncover, but if I can get
this one right, I can copy it over.
Thanx
E. D. Epperson
wrote:
> At 06:04 PM 28/02/2005 +0000, you wrote:involved
>
>
> >We have a table of invoices, what I am looking for is where an
> >invoice is assigned to two different customer numbers. The separate
> >customer numbers might point to the same customer, but by having a
> >double entry here, they could get a double rebate.
> >
> >Right now the table is real small, but it will probably get huge.
> >I will limit the scope of the query to a date range, and I do have an
> >index on SD_GTLINENO, SD_INVOICE, and SD_GTID and also on the invoice
> >date field. (SD_GTLINENO IS a unique identifier.)
> >
> >Is there a better way to make sure you don't have one invoice pointing
> >to separate customer ID?
>
> SD_GTLINENO is presumably the primary key. It doesn't need to be
> with the other fields. Your index doesn't do anything useful.I want to find the most efficient way to find duplicate linkages. In
>
>
> >SELECT s.SD_GTLINENO, s.SD_INVOICE, s.SD_GTID, s1.SD_GTID FROM
> >SALESDATA s, SALESDATA s1
> >WHERE s.SD_GTID != s1.SD_GTID
> >AND s1.SD_INVOICE=s.SD_INVOICE
>
> I'm not clear what you are aiming for with this question --
> a) do you want to simply *find* the duplicate linkages (but don't care
> about fixing the integrity problem)
> or
> b) do you want to know how to avoid getting the duplicate linkages from
> happening?
>
> ./heLen
the stored procedure tha populates the table, it already looks for
obvious duplicates and marks them as such, but since this is a data
table populated by remote heterogenous clients, I don't necessarily
want to disallow any duplications, just evaluate them.
And since the data table quite possibly will grow to a huge size, I
want to use the most efficient code to find them. There will probably
be other duplicate scenarios I will want to uncover, but if I can get
this one right, I can copy it over.
Thanx
E. D. Epperson