Subject Re: [firebird-support] Getting duplicate records in a table
Author Radu Sky
Ann W. Harrison wrote:
>>> Radu Sky wrote:
>>>
>>>> If you need to remove duplicate rows within unnamed cursor you can try
>>>> using a new UNIQUE field.
> > Ann W. Harrison wrote:
> >>
> >>Or use the always present, always unique rdb$db_key.
> >>
> Radu Sky wrote:
>> My intention was to use the unique field to generate the -803 exception,
>> to exclude duplicates.
>> How can this be achieved using rdb$db_key ?
>
> Ah, sorry, I didn't read your reply closely enough ... just caught
> the "add a unique field" and thought that adding a field is more
> expensive than using one that exists. Now that I've read your
> post, I've got some quibbles, in terms of performance - always an
> interesting issue in queries like these.
>
>
> > If you need to remove duplicate rows within unnamed cursor you can try
> > using a new UNIQUE field.
>
> Adding a new unique field requires that you first create the field,
> then populate it with unique values, then create the constraint, which
> reads and sorts the whole table, then creates a new index on it.
> >
> > For example having a FIELD12 defined as unique, you can do smth like
> >
> > FOR SELECT ID FROM yourtable INTO :varID
> > AS CURSOR C
> > DO
> > BEGIN
> > UPDATE FIELD12=field1||field2;
> > WHEN SQLCODE -803 DO DELETE FROM yourtable WHERE CURRENT OF C;
> > END
> >
> > It will only remove duplicate values, it does not create a new record
> > from duplicate but you can do UPDATE ID=<some_generator_value>.
> > It will still parse the whole table but you might embed it into your
> > PSQL code.
>
> Adam was concerned abou the cost of a natural scan of the table.
> Your suggestion updates the whole table twice, plus reading it
> and sorting it and creating an index.

I know that creating a new field, new index etc are costly operations.
I was thinking in terms of a rather regular duplicate cleanup where the
new unique field/index are persistent, not created on-the-fly.
Since he wanted to do a cursor parsing, I was thinking that both
operations might merge and become less expensive as one.
Anyway, it was just another pov, thinking that sometimes it is easier to
catch the exception than check for conditions.

Regards

Radu