firebird-support - Re: Getting duplicate records in a table

Subject	Re: Getting duplicate records in a table
Author	Svein Erling Tysvær
Post date	2006-04-26T10:36:27Z

--- In firebird-support@yahoogroups.com, "Adam" wrote:

> Hello Group,
>
> I have the following scenario that although I can solve it two ways
> so far, I am not happy with the plan it generates.
>
> TableA
>
> ID FieldA FieldB
> -- ------ ------
> 1 1 2
> 2 1 2
> 3 3 4
> 4 1 2
> 5 3 5
> 6 3 5
> 7 1 7
>
> Basically, I need to get all records with duplicated FieldA, FieldB
> combinations.
>
> The query will end up in a PSQL for select block.
>
> I will then remove these records, and insert a new record for this
> combination. (for example, it would remove records 1,2 and 4 and
> insert record ID 8 with 1,2).
>
> (This is a notification table)
>
> I can do the following:
>
> select FieldA, FieldB
> from TableA
> group by 1,2
> having count(*)>1
>
> And get the right records, but I don't really need to visit and
> count each of the records (I just need to know if at least one other
> matching record exists).
>
> I have tried several variations of joining back to itself and using
> exists clauses but they still need to do a natural read of TableA,
> and if it is going to do that, then the count(*) is just as
> efficient.
>
> Any ideas are appreciated.
>
> Adam

You're right Adam, since you do not have any WHERE clause, you have to
go natural. Normally, I'd do something like

select distinct T.FieldA, T.FieldB
from TableA T
where exists(select * form TableA T2
where T2.FieldA = T.FieldA
and T2.FieldB = T.FieldB
and T2.ID > T.ID)

or

select distinct T.FieldA, T.FieldB
from TableA T
join TableA T2 on T2.FieldA = T.FieldA
and T2.FieldB = T.FieldB
and T2.ID > T.ID

or

select T.FieldA, T.FieldB
from TableA T
where exists(select * form TableA T2
where T2.FieldA = T.FieldA
and T2.FieldB = T.FieldB
and T2.ID > T.ID)
and not exists(select * form TableA T3
where T3.FieldA = T.FieldA
and T3.FieldB = T.FieldB
and T3.ID < T.ID)

Amongst these, the last alternative will start returning rows quicker
than the other alternatives (something that may matter if you e.g.
present the result through IBOs TIB_Grid, which may retrieve the
remaining rows at a later point), but for returning all rows I suspect
it (for no reason) to be marginally slower (and it doesn't sort the
rows returned in any particular order). But regardless of how you do
it, you have to go natural on one occurence of TableA.

Set