Subject | Re: [firebird-support] sql question |
---|---|

Author | Svein Erling Tysvaer |

Post date | 2006-09-05T11:47:52Z |

Hi again Maurizio!

Yes, my solution should work regardless of how many rows are identical,

all rows just have to exist for both IDs.

Let's take some examples:

ID F1 F2 F3

1 A B C

2 A B C

3 A B C

3 D E F

4 A B C

4 D E F

5 A B C

5 D E F

5 G H I

6 A B C

6 D E F

My suggested query would (hopefully, still not tested) return

1 2

3 4

3 6

4 6

since these are the identical tuples. All other comparisons would fail,

since there is minimum one row that is missing. E.g. 4 has a row that is

not in 1 (failing the M5/M6 subselect) and 5 has a row that is not in 6

(failing the M3/M4 subselect).

It doesn't matter how many rows match or not match (it could be hundreds

of rows, but must be minimum 1), there just has to be a match somewhere.

Though suppose you had the following situation:

3 A B C

3 A B C

3 D E F

4 A B C

4 D E F

that would match even though the number of rows are different. Hence, my

solution cannot be used if two identical rows are supposed to be

considered different (probably a rare case that has to be solved with a

"subselect count" or redesign of your database, just thought I should

mention it).

HTH,

Set

Maurizio P. wrote:

Yes, my solution should work regardless of how many rows are identical,

all rows just have to exist for both IDs.

Let's take some examples:

ID F1 F2 F3

1 A B C

2 A B C

3 A B C

3 D E F

4 A B C

4 D E F

5 A B C

5 D E F

5 G H I

6 A B C

6 D E F

My suggested query would (hopefully, still not tested) return

1 2

3 4

3 6

4 6

since these are the identical tuples. All other comparisons would fail,

since there is minimum one row that is missing. E.g. 4 has a row that is

not in 1 (failing the M5/M6 subselect) and 5 has a row that is not in 6

(failing the M3/M4 subselect).

It doesn't matter how many rows match or not match (it could be hundreds

of rows, but must be minimum 1), there just has to be a match somewhere.

Though suppose you had the following situation:

3 A B C

3 A B C

3 D E F

4 A B C

4 D E F

that would match even though the number of rows are different. Hence, my

solution cannot be used if two identical rows are supposed to be

considered different (probably a rare case that has to be solved with a

"subselect count" or redesign of your database, just thought I should

mention it).

HTH,

Set

Maurizio P. wrote:

> thanks for your answer ,

> i could not make a test now ,

>

> ...just a dubt i have :

> is it valid for a different rows number too ?

> in my case the number of the row is variable .

>

> Maurizio

>

>

> ----- Original Message -----

> From: Svein Erling Tysvaer

> To: Firebird-support

> Sent: Tuesday, September 05, 2006 9:03 AM

> Subject: Re: [firebird-support] sql question

>

>

> Hi again, now I'm awake and my suggested solution needs some minor

> corrections:

>

> select distinct M1.ID, M2.ID

> /*Distinct to avoid duplicates*/

> from MaurizioTable M1

> join MaurizioTable M2 on M2.F1 = M1.F1

> and M2.F2 = M1.F2

> and M2.F3 = M1.F3

> and M2.ID+0 > M1.ID

> /*Clause added to avoid lots of cases where M1.ID = M2.ID*/

> /*I also expect one row (10, 16) to be enough*/

> /*and that you do not need (16, 10) as well*/

> /*+0 added due to expected speed improvement*/

> where not exists(select * from MaurizioTable M3

> where M3.ID = M1.ID

> where not exists(select * from MaurizioTable M4

> where M4.ID = M2.ID

> and M4.F1 = M3.F1

> and M4.F2 = M3.F2

> and M4.F3 = M3.F3))

> /*There should be no M1.ID that does not have a duplicate in M2*/

> and not exists(select * from MaurizioTable M5

> where M5.ID = M2.ID

> where not exists(select * from MaurizioTable M6

> where M6.ID = M1.ID

> and M6.F1 = M5.F1

> and M6.F2 = M5.F2

> and M6.F3 = M5.F3))

> /*There should be no M2.ID that does not have a duplicate in M1*/

>

> Though I still haven't tried parsing this and suspect that the size of

> the table as well as index selectivity may or may not make this a viable

> solution (it will be slow on a table with millions of rows)...

>

> Set