Subject Re: Getting duplicate records in a table
Author Adam
--- In firebird-support@yahoogroups.com, "Martijn Tonies"
<m.tonies@...> wrote:
>
> Adam,
>
> > I have the following scenario that although I can solve it two ways so
> > far, I am not happy with the plan it generates.
> >
> > TableA
> >
> > ID FieldA FieldB
> > -- ------ ------
> > 1 1 2
> > 2 1 2
> > 3 3 4
> > 4 1 2
> > 5 3 5
> > 6 3 5
> > 7 1 7
> >
> > Basically, I need to get all records with duplicated FieldA, FieldB
> > combinations.
> >
> > The query will end up in a PSQL for select block.
> >
> > I will then remove these records, and insert a new record for this
> > combination. (for example, it would remove records 1,2 and 4 and
> > insert record ID 8 with 1,2).
> >
> > (This is a notification table)
> >
> > I can do the following:
> >
> > select FieldA, FieldB
> > from TableA
> > group by 1,2
> > having count(*)>1
> >
> > And get the right records, but I don't really need to visit and count
> > each of the records (I just need to know if at least one other
> > matching record exists).
> >
> > I have tried several variations of joining back to itself and using
> > exists clauses but they still need to do a natural read of TableA, and
> > if it is going to do that, then the count(*) is just as efficient.
>
> Is this an operation that needs to be done regularly, or a one-time
> operation to clean up the table?

Hello Martijn

It is a regular operation, potentially every couple of minutes. A
service is run that makes some changes to the table (a mini
replication if you like). Triggers then store a notification record in
the table. If the application does not check its notifications for a
while, there may be many notifications for the same record (which has
changed many times in the interim).

Of course a distinct is used in the query to retrieve the records, but
I don't like the idea of letting it go, particularly if the
application is shut down for an extended period of time (someone goes
on holidays for example).

Adam