firebird-support - Re: Re[4]: [firebird-support] Find duplicate multi-row entries

Subject	Re: Re[4]: [firebird-support] Find duplicate multi-row entries
Author	Svein Erling Tysvær
Post date	2017-09-28T07:52:07Z

OK, that's completely different, but I've done something similar once before. It is not possible to directly compare sets for equality, but it is possible to check that set A doesn't contain anything that isn't in set B and that set B doesn't contain anything that isn't in set A. I assume NODE is the one common denominator:

with tmp (NODE) as

(select distinct NODE from routes),

new_routes(NODE) as

(select t.NODE

from tmp t

where not exists(select *

from routes r

left join route_history rh on r.NODE = rh.NODE

and r.ROUTE_INDEX = rh.ROUTE_INDEX

and r.LINK_NODE = rh.LINK_NODE

and r.QUALITY = rh.QUALITY

where t.NODE = r.NODE

and rh.NODE is null)

and not exists(select *

from route_history rh

left join routes r on r.NODE = rh.NODE

and r.ROUTE_INDEX = rh.ROUTE_INDEX

and r.LINK_NODE = rh.LINK_NODE

and r.QUALITY = rh.QUALITY

where t.NODE = rh.NODE

and r.NODE is null))

select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY

from new_routes nr

join routes r on nr.NODE = r.NODE

Does this get you the result you want?

Set

2017-09-28 9:19 GMT+02:00 'Daniel Miller' dmiller@... [firebird-support] <firebird-support@yahoogroups.com>:

Thanks - but that still doesn't work. I get a huge return set with a ton of duplicates. I think something's being lost in translation. A simple match of one row isn't sufficient - I need to match the "set".

With the following six records:
5557 1 1116 0
5557 2 1115 0
5557 3 5104 2
5557 4 5073 2
5557 5 5222 2
5557 6 5110 2

That is the list of routes a given node has at a specific point in time. So it's perfectly possible at another time to have fewer, or more, or different routes. And I need to consider ALL the routes in play for a given timestamp to be a single "set". So if today I have the above list of routes, and yesterday route index 4 was looking at a different node - that would constitute a different and unique set even though the other 5 records match.

5557 1 1116 0
5557 2 1115 0
5557 3 5104 2
5557 4 9999 2
5557 5 5222 2
5557 6 5110 2

So it's quite possible I will have lots of duplicates for an individual row in ROUTES when compared in ROUTE_HISTORY - what I'm trying to filter is the pattern of rows. Only if ALL the rows of a current entries in ROUTES exist in ROUTE_HISTORY should it be considered a duplicate condition.

--

Daniel

On 9/27/2017 11:53:48 PM, "Svein Erling Tysvær setysvar@... [firebird-support]" <firebird-support@yahoogroups. com> wrote:

Sure it is possible to write such a query:

select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
from routes r
left join route_history rh1 on r.node = rh1.node
and r.route_index = rh1.route_index
and r.link_node = rh1.link_node
and r.quality is not distinct from rh1.quality
left join route_history rh2 on rh1.node = rh2.node
and rh1.route_index = rh2.route_index
and rh1.route_updated < rh2.route_updated
where rh1.node is null
and rh2.node is null

I use two LEFT JOINS because I assume you want things to be returned if things are changed back, e.g. if you have

5557 1 1116 0
5557 1 1116 0

then you only want one row in route_history, but if you have

5557 1 1116 0
5557 1 1116 1
5557 1 1116 0

then you don't want the last record to be considered a duplicate, but know when it changed back. If ROUTE_INDEX is just a running number, you consider

5557 1 1116 0
5557 2 1116 0

to be duplicates, and there are no gaps between the ROUTE_INDEX for each NODE, then the query can be simplified:

select r.NODE, r.ROUTE_INDEX, r.LINK_NODE, r.QUALITY
from routes r
left join route_history rh1 on r.node = rh1.node
and r.route_index+1 = rh1.route_index
and r.link_node = rh1.link_node
and r.quality is not distinct from rh1.quality
where rh1.node is null

Note that I use IS NOT DISTINCT so that NULLs are considered equal. If you prefer, the left joins can be replaced by (nested) NOT EXISTS.

HTH,
Set

2017-09-28 7:22 GMT+02:00 'Daniel Miller' dmiller@... [firebird-support] <firebird-support@yahoogroups. com>:

On 9/27/2017 9:20:54 PM, "liviuslivius liviuslivius@... [firebird-support]" <firebird-support@yahoogroups. com> wrote:

group by + having
is your friend.
E.g.
Select field1, field2, count(*)
From tablex
Group by field1, field2
Having count(*)>1

But in your scenario i do not know if this is the solution. Question is if record was changed and in next update "restored" and once again changed is this duplicate or no?
If yes above query is ok if no then procedure is your friend.

And for the future change your audit trigger and check before if there was any change

Thank you, but I this isn't quite what I need. I need to match against all the records of a set (up to 8 rows per set) - not just individual rows.

--

Daniel