Subject | Re: [firebird-support] Re: duplicate string fields? |
---|---|
Author | Ann W. Harrison |
Post date | 2008-02-08T17:42:19Z |
ifitsx wrote:
b.noteid as noteid2, b.folderid folderid2, a.subject
from dbnotes a
join dbnotes b on (a.subject = b.subject)
where a.noteid < b.noteid
only once. It shouldn't miss any duplicates. Consider this data:
noteid folder string
11 1 abc
12 1 abc
That data and my original query produce this result:
noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 12 1 1 abc
which indicates to me that there are two rows with the
string value 'abc'. One has a primary key value of 11,
the other 12.
Without the a.noteid < b.noteid, the query returns four rows when
there is only one duplicate. The result would look like this:
noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 11 1 1 abc
11 12 1 1 abc
12 11 1 1 abc
12 12 1 1 abc
You could add an a.noteid <> b.noteid and not report that each row
matches itself, like this:
noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 12 1 1 abc
12 11 1 1 abc
But it's still true that noteid 11 and noteid 12 is a single
combiation of duplicates and the second row doesn't add any
information.
same result as a.noteid <> b.noteid
when the same subject appears in more than two dbnotes records. Perhaps
it's the combination of folder and noteid that is the unique identifier.
If that's the case then this query will work better
select a.noteid as noteid1, a.folderid as folderid1,
b.noteid as noteid2, b.folderid folderid2, a.subject
from dbnotes a
join dbnotes b on (a.subject = b.subject)
where a.noteid < b.noteid or a.folderid < b.folderid
The data could look like this:
noteid folder string
11 1 abc
11 2 abc
12 1 abc
The result would be:
noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 11 1 2 abc
11 12 1 1 abc
Some would argue that all the conditions should be in the join on
clause since they all resolve to values from stream a and stream
b. That's probably true... so this would be better:
select a.noteid as noteid1, a.folderid as folderid1,
b.noteid as noteid2, b.folderid folderid2, a.subject
from dbnotes a
join dbnotes b on (a.subject = b.subject and
(a.noteid < b.noteid or a.folderid < b.folderid)
Regards,
Ann
>select a.noteid as noteid1, a.folderid as folderid1,
b.noteid as noteid2, b.folderid folderid2, a.subject
from dbnotes a
join dbnotes b on (a.subject = b.subject)
where a.noteid < b.noteid
> The restriction imposed by a.noteid < b.noteid missed those recordsI put that condition in so each pair of duplicates would be reported
> where subject is duplicated and a.noteid > b.noteid.
only once. It shouldn't miss any duplicates. Consider this data:
noteid folder string
11 1 abc
12 1 abc
That data and my original query produce this result:
noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 12 1 1 abc
which indicates to me that there are two rows with the
string value 'abc'. One has a primary key value of 11,
the other 12.
Without the a.noteid < b.noteid, the query returns four rows when
there is only one duplicate. The result would look like this:
noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 11 1 1 abc
11 12 1 1 abc
12 11 1 1 abc
12 12 1 1 abc
You could add an a.noteid <> b.noteid and not report that each row
matches itself, like this:
noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 12 1 1 abc
12 11 1 1 abc
But it's still true that noteid 11 and noteid 12 is a single
combiation of duplicates and the second row doesn't add any
information.
> Changing the script to where a.noteid < b.noteid or a.noteid >Hunh? That doesn't seem likely. The or condition should produce the
> b.noteid effectively returns only where a.noteid > b.noteid and misses
> the actual record(s) where where a.noteid < b.noteid.
same result as a.noteid <> b.noteid
>I still think my query returns all the pairs - and some unnecessary rows
> I would appreciate the opportunity to totally complete this SQL/IB_SQL
> lesson - if I haven't exhausted everyone's patience yet - and find out
> if there is a script that will return both _pairs_ of records, instead
> of returning either one of the pair of records where a.noteid <
> b.noteid and implies there is another with a matching subject, or one
> of the pair of records where a.noteid > b.noteid and implies there is
> another with a matching subject.
>
> Is there a script to return both pairs, i.e., all four records?
when the same subject appears in more than two dbnotes records. Perhaps
it's the combination of folder and noteid that is the unique identifier.
If that's the case then this query will work better
select a.noteid as noteid1, a.folderid as folderid1,
b.noteid as noteid2, b.folderid folderid2, a.subject
from dbnotes a
join dbnotes b on (a.subject = b.subject)
where a.noteid < b.noteid or a.folderid < b.folderid
The data could look like this:
noteid folder string
11 1 abc
11 2 abc
12 1 abc
The result would be:
noteid1 noteid2 folderid1 folderid2 string
------- ------- --------- --------- ------
11 11 1 2 abc
11 12 1 1 abc
Some would argue that all the conditions should be in the join on
clause since they all resolve to values from stream a and stream
b. That's probably true... so this would be better:
select a.noteid as noteid1, a.folderid as folderid1,
b.noteid as noteid2, b.folderid folderid2, a.subject
from dbnotes a
join dbnotes b on (a.subject = b.subject and
(a.noteid < b.noteid or a.folderid < b.folderid)
Regards,
Ann