Subject | Re: Getting duplicate records in a table |
---|---|
Author | Adam |
Post date | 2006-04-26T23:39:25Z |
--- In firebird-support@yahoogroups.com, Svein Erling Tysvær
<svein.erling.tysvaer@...> wrote:
These are quite clever in their ability to return the first row
quickly. As it will occur within a PSQL for select loop in an
executable procedure that does not return anything, I really don't
care how long it will take to respond with the first record, which
means the 'having count(*) > 1 is going to perform the quickest,
because all other options appear to need to visit every record at
least once.
I suppose, in context (information I hadn't thought through in my post
yesterday that in hindsight would have been useful) there will be few
times when more than 1000 records will be in the notification table,
and seldom more than 10000 records, so perhaps I have an irrational
phobia of count(*) ;)
Thanks for your help
Adam
<svein.erling.tysvaer@...> wrote:
>Hi Set,
> --- In firebird-support@yahoogroups.com, "Adam" wrote:
> > Hello Group,
> >
> > I have the following scenario that although I can solve it two ways
> > so far, I am not happy with the plan it generates.
> >
> > TableA
> >
> > ID FieldA FieldB
> > -- ------ ------
> > 1 1 2
> > 2 1 2
> > 3 3 4
> > 4 1 2
> > 5 3 5
> > 6 3 5
> > 7 1 7
> >
> > Basically, I need to get all records with duplicated FieldA, FieldB
> > combinations.
> >
> > The query will end up in a PSQL for select block.
> >
> > I will then remove these records, and insert a new record for this
> > combination. (for example, it would remove records 1,2 and 4 and
> > insert record ID 8 with 1,2).
> >
> > (This is a notification table)
> >
> > I can do the following:
> >
> > select FieldA, FieldB
> > from TableA
> > group by 1,2
> > having count(*)>1
> >
> > And get the right records, but I don't really need to visit and
> > count each of the records (I just need to know if at least one other
> > matching record exists).
> >
> > I have tried several variations of joining back to itself and using
> > exists clauses but they still need to do a natural read of TableA,
> > and if it is going to do that, then the count(*) is just as
> > efficient.
> >
> > Any ideas are appreciated.
> >
> > Adam
>
> You're right Adam, since you do not have any WHERE clause, you have to
> go natural. Normally, I'd do something like
>
> select distinct T.FieldA, T.FieldB
> from TableA T
> where exists(select * form TableA T2
> where T2.FieldA = T.FieldA
> and T2.FieldB = T.FieldB
> and T2.ID > T.ID)
>
> or
>
> select distinct T.FieldA, T.FieldB
> from TableA T
> join TableA T2 on T2.FieldA = T.FieldA
> and T2.FieldB = T.FieldB
> and T2.ID > T.ID
>
> or
>
> select T.FieldA, T.FieldB
> from TableA T
> where exists(select * form TableA T2
> where T2.FieldA = T.FieldA
> and T2.FieldB = T.FieldB
> and T2.ID > T.ID)
> and not exists(select * form TableA T3
> where T3.FieldA = T.FieldA
> and T3.FieldB = T.FieldB
> and T3.ID < T.ID)
>
> Amongst these, the last alternative will start returning rows quicker
> than the other alternatives (something that may matter if you e.g.
> present the result through IBOs TIB_Grid, which may retrieve the
> remaining rows at a later point), but for returning all rows I suspect
> it (for no reason) to be marginally slower (and it doesn't sort the
> rows returned in any particular order). But regardless of how you do
> it, you have to go natural on one occurence of TableA.
>
> Set
>
These are quite clever in their ability to return the first row
quickly. As it will occur within a PSQL for select loop in an
executable procedure that does not return anything, I really don't
care how long it will take to respond with the first record, which
means the 'having count(*) > 1 is going to perform the quickest,
because all other options appear to need to visit every record at
least once.
I suppose, in context (information I hadn't thought through in my post
yesterday that in hindsight would have been useful) there will be few
times when more than 1000 records will be in the notification table,
and seldom more than 10000 records, so perhaps I have an irrational
phobia of count(*) ;)
Thanks for your help
Adam