firebird-architect - RE: [Firebird-Architect] Ann, Arno

Subject	RE: [Firebird-Architect] Ann, Arno :: Explicit join semantics
Author	Claudio Valderrama C.
Post date	2005-03-08T12:03:46Z

Helen Borrie wrote:

> An interesting thing came up in the support list today, regarding
> explicit join semantics. Since both Ann and Arno seemed to be "coming
> from the same
> place" with respect to this issue, I'd be interested in their
> expanding on
> this question. If it makes sense to me, I'd like to write a FAQ
> sheet or similar with regard to it, since it news to me.
>
> The inquirer wanted to know how to write this query:
>
> select * from TABLE1 a
> left join (
> select * from TABLE2 a2 where a2."Val" = :Val
> and a2."Pos" = :Pos ) b
> on a."ID" = b."ID_ref"
>
> He seemed to be attempting to make derived table style of query.
> I replied
> as follows:
>
> Well, the derived table seems redundant for this example, which
> appears to be a straightforward left join.
>
> select a.*, a2.*
> from TABLE1 a
> left join TABLE2 a2
> on a."ID" = a2."ID_ref"
> where a2."Val" = :Val
> and a2."Pos" = :Pos

[snip]

Let me show some examples. Please excuse the formatting if I forgot to fix
some line as I'm copying from the isql console. BTW, Fabricio, I see your
output correctly formatted in Outlook 2k.

SQL> select count(*) from rdb$relations r join rdb$relation_fields rf on
r.rdb$relation_name = rf.rdb$relation_name and
r.rdb$relation_name = 'RDB$GENERATORS';

COUNT
============

4

SQL> select count(*) from rdb$relations r left join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name
and r.rdb$relation_name = 'RDB$GENERATORS';

COUNT
============

39

SQL> select count(*) from rdb$relations r left join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name
where r.rdb$relation_name = 'RDB$GENERATORS';

COUNT
============

4

I've tried to explain to SQL newbies this way:

- In an inner join, putting everything in the ON clause or anything except
the join condition in the WHERE clause becomes a matter of style. Being a
purist myself, I prefer for readability to write the condition that makes
the match in the join after the JOIN clause and the sieve that applies to
the whole result set in the WHERE clause. Something like
A join B on A.f = B.f
where A.f > x;
See the first case. You can put
WHERE r.rdb$relation_name = 'RDB$GENERATORS'
instead of
and r.rdb$relation_name = 'RDB$GENERATORS'
and it won't make a difference in practice. The table rdb$generators has
four fields in FB2.

- In a left or right [outer] join, the ON clause can filter out only records
from the optional table (I mean non-mandatory table). The ON clause doesn't
have "power" to leave out records from the mandatory table. However,
whatever you put here, if it doesn't happly to the current record (not only
to the optional table) will cause the optional table's fields to be returned
as null. Take the second query as an example. I think I need to post the
result to explain better:

RDB$RELATION_NAME RDB$RELATION_NAME
=============================== =================

RDB$PAGES <null>
RDB$DATABASE <null>
RDB$FIELDS <null>
RDB$INDEX_SEGMENTS <null>
RDB$INDICES <null>
RDB$RELATION_FIELDS <null>
RDB$RELATIONS <null>
RDB$VIEW_RELATIONS <null>
RDB$FORMATS <null>
RDB$SECURITY_CLASSES <null>
RDB$FILES <null>
RDB$TYPES <null>
RDB$TRIGGERS <null>
RDB$DEPENDENCIES <null>
RDB$FUNCTIONS <null>
RDB$FUNCTION_ARGUMENTS <null>
RDB$FILTERS <null>
RDB$TRIGGER_MESSAGES <null>
RDB$USER_PRIVILEGES <null>
RDB$TRANSACTIONS <null>

RDB$RELATION_NAME RDB$RELATION_NAME
=============================== =================
RDB$GENERATORS RDB$GENERATORS
RDB$GENERATORS RDB$GENERATORS
RDB$GENERATORS RDB$GENERATORS
RDB$GENERATORS RDB$GENERATORS
RDB$FIELD_DIMENSIONS <null>
RDB$RELATION_CONSTRAINTS <null>
RDB$REF_CONSTRAINTS <null>
RDB$CHECK_CONSTRAINTS <null>
RDB$LOG_FILES <null>
RDB$PROCEDURES <null>
RDB$PROCEDURE_PARAMETERS <null>
RDB$CHARACTER_SETS <null>
RDB$COLLATIONS <null>
RDB$EXCEPTIONS <null>
RDB$ROLES <null>
RDB$BACKUP_HISTORY <null>
P <null>
A <null>
M <null>

For example, you find <null> in front of RDB$PAGES not because this table
doesn't have fields, but because
on r.rdb$relation_name = rf.rdb$relation_name
and r.rdb$relation_name = 'RDB$GENERATORS';
matches the first part (there will be fields for that table) but not the
second condition. Notice it references the mandatory table, but since it
can't drop records from the mandatory table, it's interpreted as a failure
of the left join and the same as if the first condition has failed: a single
record with null fields if returned for the optional table. This is clearly
shown with the repetition of RDB$GENERATORS, that's the only record that
matches the complete ON clause. I could also have returned the
rf.rdb$field_name probably, but you know you will get the names of the four
fields in the table rdb$generators.

- In a left or right [outer] join, it's then becomes not only readability
but a need to get the correct result to separate the ON clause from the
WHERE clause. Then for consistency, I take the same care with inner joins,
separating logically the ON and the WHERE clause. The third example shows
what was probably intended from the left join, that again gave us four
records. First, a lot of records where produced, where a single value in the
mandatory table was repeated as many times as matching fields exist in the
optional table. In this case, we'll observe rdb$pages paired with its
fields, then rdb$database paired with its fields, etc. The intermediate
result will have the same count as simply

SQL> select count(*) from rdb$relations r left join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name;

COUNT
============

248

that's indeed the same than simply counting how many entries we have in
rdb$relation_fields, since we know (and hope) there aren't orphan fields
(fields that don't belong to any relation).

SQL> select count(*) from rdb$relation_fields;

COUNT
============

248

Then the WHERE sieve is applied and leaves four records. This explains the
third result.

- Expect more weirdness if you go for a full [outer] join. Example:

SQL> select count(*) from rdb$relations r full join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name;

COUNT
============

248

SQL> select count(*) from rdb$relations r full join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name and r.rdb$relation_name =
'RDB$GENERATORS';

COUNT
============

283

Notice the second case. We tried (but couldn't) to reduce one side by
applying a filter in the ON clause, but got MORE records! Please don't pull
your hair. The full join means the filter in the ON clause can't leave
records out of the result in any side! Just to calm you down:

SQL> select count(*) from rdb$relations r full join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name where r.rdb$relation_name =
'RDB$GENERATORS';

COUNT
============

4

SQL> select count(*) from rdb$relations r full join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name where rf.rdb$relation_name =
'RDB$GENERATORS';

COUNT
============

4

Notice the WHERE clause was first applied to the left table and then to the
right table. No changes in the result. Both tables are mandatory, so the
results are mostly symmetric. (Our data has a very well defined parent-child
relationship between the two tables.)

I don't want to offend, but maybe some was caught asleep and really don't
know why we went from 248 to 283. The difference is 35. Hint:

SQL> select count(*) from rdb$relations;

COUNT
============

36

Now why is the difference 35 and not 36 if the number of tables is 36?
Because all non-matching field pairs generate a new pair with null in the
side that can't match the extra condition in the ON clause, that was to find
the 'RDB$GENERATORS' name. (But they will exclude all full matches of common
fields not passing the ON clause! Only in that case, the ON clause behaves
like facing an inner join.) There's only one case (for rdb$generators) that
doesn't generate such pair, because it matches. Convince yourself:

SQL> select count(*) from rdb$relations r full join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name and r.rdb$relation_name =
'RDB$GENERATORS' where rf.rdb$relation_name is null;

and

SQL> select count(*)from rdb$relations r full join rdb$relation_fields rf on
r.rdb$relation_name = rf.rdb$relation_name
and rf.rdb$relation_name = 'RDB$GENERATORS' where rf.rdb$relation_name is
null;

give the same result:

COUNT
============

35

So, for each case that doesn't match "left table having 'RDB$GENERATORS' in
its field", we some records removed and some records with null in one side
added! We have 36 tables but 35 mismatches due to the ON clause. Since I
know not everyone will be convinced, here you have a boring output:

SQL> select r.rdb$relation_name, rf.rdb$relation_name, rf.rdb$field_name
from rdb$relations r full join rdb$relation_fields rf on r.rdb$relation_name
= rf.rdb$relation_name and r.rdb$relation_name = 'RDB$GENERATORS' where
rf.rdb$relation_name is null;

I'm not showing the rdb$field_name, but you can trust me that it's
completely null:

RDB$RELATION_NAME RDB$RELATION_NAME
=============================== =================

RDB$PAGES <null>
RDB$DATABASE <null>
RDB$FIELDS <null>
RDB$INDEX_SEGMENTS <null>
RDB$INDICES <null>
RDB$RELATION_FIELDS <null>
RDB$RELATIONS <null>
RDB$VIEW_RELATIONS <null>
RDB$FORMATS <null>
RDB$SECURITY_CLASSES <null>
RDB$FILES <null>
RDB$TYPES <null>
RDB$TRIGGERS <null>
RDB$DEPENDENCIES <null>
RDB$FUNCTIONS <null>
RDB$FUNCTION_ARGUMENTS <null>
RDB$FILTERS <null>
RDB$TRIGGER_MESSAGES <null>
RDB$USER_PRIVILEGES <null>
RDB$TRANSACTIONS <null>

RDB$RELATION_NAME RDB$RELATION_NAME
=============================== =================
RDB$FIELD_DIMENSIONS <null>
RDB$RELATION_CONSTRAINTS <null>
RDB$REF_CONSTRAINTS <null>
RDB$CHECK_CONSTRAINTS <null>
RDB$LOG_FILES <null>
RDB$PROCEDURES <null>
RDB$PROCEDURE_PARAMETERS <null>
RDB$CHARACTER_SETS <null>
RDB$COLLATIONS <null>
RDB$EXCEPTIONS <null>
RDB$ROLES <null>
RDB$BACKUP_HISTORY <null>
P <null>
A <null>
M <null>

These are the phantom 35 records that appeared. As expected, we are missing
an entry for RDB$GENERATORS.

In our example, we don't have dangling values or the db would be corrupt:

SQL> select count(*)from rdb$relations r full join rdb$relation_fields rf on
r.rdb$relation_name = rf.rdb$relation_name
and rf.rdb$relation_name = 'RDB$GENERATORS' where rf.rdb$relation_name is
null and r.rdb$relation_name is null;

COUNT
============

0

Our example is biased. Let's consider using an example where both tables
have records that can't be matched in the other as well as some common
values:

F:\fb2dev\fbbuild\firebird2\temp\debug\firebird\bin>isql
Use CONNECT or CREATE DATABASE to specify a database
SQL> create database 'fuss';
SQL> create table a(a int);
SQL> insert into a values(1);
SQL> insert into a values(2);
SQL> create table b(b int);
SQL> insert into b values(2);
SQL> insert into b values(3);
SQL> select a.a, b.b from a full join b on a.a = b.b;

A B
============ ============

2 2
<null> 3
1 <null>

Normal full join. All matches plus all mismatches.

SQL> select a.a, b.b from a full join b on a.a = b.b and a.a = 1;

A B
============ ============

<null> 2
<null> 3
1 <null>
2 <null>

(1) The ON clase excludes the full match (like it had the power it has in an
inner join) but two phantoms appeared.

SQL> select a.a, b.b from a full join b on a.a = b.b and a.a = 1 and b.b =
3;

A B
============ ============

<null> 2
<null> 3
1 <null>
2 <null>

Nothing changes.

SQL> select a.a, b.b from a full join b on a.a = b.b and b.b = 3;

A B
============ ============

<null> 2
<null> 3
1 <null>
2 <null>

Same example as (1), but reversed.

SQL> select a.a, b.b from a full join b on a.a = b.b and b.b = 2;

A B
============ ============

2 2
<null> 3
1 <null>

Finally, our matching field appears and makes no different WRT our first
full join.

SQL> drop database;
SQL> ^Z

No, I'm no SQL guru, so I'm not trying to be pedant. I'm trying to show the
FB behavior is consistent.

C.