Subject | [firebird-support] Re: duplicate string fields? |
---|---|
Author | Svein Erling Tysvær |
Post date | 2008-02-08T08:36:23Z |
Hi again!
You haven't by any means exhausted any of my patience at least, although I do admit that I did sleep between these two replies I've had to this thread.
The main point with 'where a.noteid < b.noteid' is simply to avoid self joining to not only the same table, but to the very same record. Hence, you could equally well have written:
select a.noteid, a.folderid, a.subject
from dbnotes a
join dbnotes b on a.subject = b.subject
where NOT (a.noteid = b.noteid)
(assuming NOTEID is the only field in your primary key)
If there is any chance of triplicates, not only duplicates, then the above statement would return too many rows and you would be better off doing:
select a.noteid, a.folderid, a.subject
from dbnotes a
where exists(select *
from dbnotes b where a.subject = b.subject
and NOT (a.noteid = b.noteid))
If noteid isn't the only field in dbnotes primary key, you will have to add something more to your WHERE clause. I do not understand the part where you wrote that changing to 'where a.noteid < b.noteid or a.noteid > b.noteid' didn't produce the desired result, hence the suspicion that I haven't quite understood what your primary key is. Maybe you have to change to something like
where a.subject = b.subject
and NOT (a.noteid = b.noteid
and a.folderid = b.folderid)
if your primary key is a composite key.
Good luck with your research. Here at the Cancer Registry of Norway where I'm working, it is generally some of us at the IT Department that take care of writing sql, researchers are generally focusing more on the result sets.
HTH,
Set
You haven't by any means exhausted any of my patience at least, although I do admit that I did sleep between these two replies I've had to this thread.
The main point with 'where a.noteid < b.noteid' is simply to avoid self joining to not only the same table, but to the very same record. Hence, you could equally well have written:
select a.noteid, a.folderid, a.subject
from dbnotes a
join dbnotes b on a.subject = b.subject
where NOT (a.noteid = b.noteid)
(assuming NOTEID is the only field in your primary key)
If there is any chance of triplicates, not only duplicates, then the above statement would return too many rows and you would be better off doing:
select a.noteid, a.folderid, a.subject
from dbnotes a
where exists(select *
from dbnotes b where a.subject = b.subject
and NOT (a.noteid = b.noteid))
If noteid isn't the only field in dbnotes primary key, you will have to add something more to your WHERE clause. I do not understand the part where you wrote that changing to 'where a.noteid < b.noteid or a.noteid > b.noteid' didn't produce the desired result, hence the suspicion that I haven't quite understood what your primary key is. Maybe you have to change to something like
where a.subject = b.subject
and NOT (a.noteid = b.noteid
and a.folderid = b.folderid)
if your primary key is a composite key.
Good luck with your research. Here at the Cancer Registry of Norway where I'm working, it is generally some of us at the IT Department that take care of writing sql, researchers are generally focusing more on the result sets.
HTH,
Set
>Still the following script only returned one of the duplicate records:
>
>select a.noteid, a.folderid, b.noteid, b.folderid, a.subject
>from dbnotes a
>join dbnotes b on (a.subject = b.subject)
>where a.noteid < b.noteid
>
>However, that was enough to point me toward one pair that I needed to
>evaluate, and begin my housecleaning.
>
>The restriction imposed by a.noteid < b.noteid missed those records
>where subject is duplicated and a.noteid > b.noteid.
>
>Changing the script to where a.noteid < b.noteid or a.noteid >
>b.noteid effectively returns only where a.noteid > b.noteid and misses
>the actual record(s) where where a.noteid < b.noteid.
>
>I would appreciate the opportunity to totally complete this SQL/IB_SQL
>lesson - if I haven't exhausted everyone's patience yet - and find out
>if there is a script that will return both _pairs_ of records, instead
>of returning either one of the pair of records where a.noteid <
>b.noteid and implies there is another with a matching subject, or one
>of the pair of records where a.noteid > b.noteid and implies there is
>another with a matching subject.
>
>Is there a script to return both pairs, i.e., all four records?