Subject Re: [firebird-support] Re: duplicate string fields?
Author Ann W. Harrison
ifitsx wrote:
>

select a.noteid as noteid1, a.folderid as folderid1,
b.noteid as noteid2, b.folderid folderid2, a.subject
from dbnotes a
join dbnotes b on (a.subject = b.subject)
where a.noteid < b.noteid


> The restriction imposed by a.noteid < b.noteid missed those records
> where subject is duplicated and a.noteid > b.noteid.

I put that condition in so each pair of duplicates would be reported
only once. It shouldn't miss any duplicates. Consider this data:

noteid folder string

11 1 abc
12 1 abc

That data and my original query produce this result:

noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 12 1 1 abc

which indicates to me that there are two rows with the
string value 'abc'. One has a primary key value of 11,
the other 12.

Without the a.noteid < b.noteid, the query returns four rows when
there is only one duplicate. The result would look like this:

noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 11 1 1 abc
11 12 1 1 abc
12 11 1 1 abc
12 12 1 1 abc


You could add an a.noteid <> b.noteid and not report that each row
matches itself, like this:

noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 12 1 1 abc
12 11 1 1 abc

But it's still true that noteid 11 and noteid 12 is a single
combiation of duplicates and the second row doesn't add any
information.

> Changing the script to where a.noteid < b.noteid or a.noteid >
> b.noteid effectively returns only where a.noteid > b.noteid and misses
> the actual record(s) where where a.noteid < b.noteid.

Hunh? That doesn't seem likely. The or condition should produce the
same result as a.noteid <> b.noteid
>
> I would appreciate the opportunity to totally complete this SQL/IB_SQL
> lesson - if I haven't exhausted everyone's patience yet - and find out
> if there is a script that will return both _pairs_ of records, instead
> of returning either one of the pair of records where a.noteid <
> b.noteid and implies there is another with a matching subject, or one
> of the pair of records where a.noteid > b.noteid and implies there is
> another with a matching subject.
>
> Is there a script to return both pairs, i.e., all four records?

I still think my query returns all the pairs - and some unnecessary rows
when the same subject appears in more than two dbnotes records. Perhaps
it's the combination of folder and noteid that is the unique identifier.
If that's the case then this query will work better

select a.noteid as noteid1, a.folderid as folderid1,
b.noteid as noteid2, b.folderid folderid2, a.subject
from dbnotes a
join dbnotes b on (a.subject = b.subject)
where a.noteid < b.noteid or a.folderid < b.folderid

The data could look like this:

noteid folder string

11 1 abc
11 2 abc
12 1 abc

The result would be:

noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 11 1 2 abc
11 12 1 1 abc


Some would argue that all the conditions should be in the join on
clause since they all resolve to values from stream a and stream
b. That's probably true... so this would be better:

select a.noteid as noteid1, a.folderid as folderid1,
b.noteid as noteid2, b.folderid folderid2, a.subject
from dbnotes a
join dbnotes b on (a.subject = b.subject and
(a.noteid < b.noteid or a.folderid < b.folderid)




Regards,


Ann