firebird-architect - RFC -- CONNECT BY

Subject	RFC -- CONNECT BY
Author	paulruizendaal
Post date	2004-05-08T00:52:43Z

I'm happy to report that the work to implement CONNECT BY is
completed and that the Compiere web-interface (which uses CONNECT BY)
now also works with Firebird. The translation layer converts CONNECT
BY to standard WITH RECURSIVE syntax. Firebird has been enhanced to
support WITH and WITH RECURSIVE.

DSQL Changes
============

The syntax has been extended with the following clauses:

select :
WITH RECURSIVE recursive_with final_select |
WITH with_list |
final_select ;

with_list :
with_item final_select |
with_item ',' with_list ;

with_item : symbol_table_alias_name derived_column_list AS '('
final_select ')';

recursive_with : symbol_table_alias_name derived_column_list AS
recursive_union;

recursive_union :
'(' initial_select UNION recursive_select ')' |
'(' initial_select UNION ALL recursive_select ')' ;
;

initial_select : ordered_select_expr;

recursive_select : ordered_select_expr;

final_select : union_expr order_clause for_update_clause lock_clause;

The WITH clauses are processed in pass1.cpp as 'out-of-line' derived
tables. For the recursive case, the recursion is broken by first
generating a non-recursive union and using this as the context for
the second, real compilation. The usual union tree is generated, with
a flag to signal recursion. This flag is used in gen.cpp to generate
a blr_recurse instruction instead of a blr_union instruction. The
body of both instructions is identical, although blr_recurse has a
fixed sub-rse count of 2.

Engine BLR changes
==================

The BLR parser deals with blr_recurse as if it were a regular union,
even generating a nod_union node for it. This nod_union node has a
recursion flag set on it. Further processing through PAR/CMP/OPT is
identical to the processing for regular unions. At the very end, the
flag is used to generate an rse struct of type rse_recurse instead of
type rse_union.

Engine RSE changes
==================

The following code has been added to the RSE module:

RSBRecurse::open(TDBB tdbb, RSB rsb, IRSB_RECURSE irsb)
{
RSB *ptr, *end;
JRD_REQ request = tdbb->tdbb_request;
RPB *rpb = &request->req_rpb[rsb->rsb_stream];

// Set up our instance data
irsb->irsb_mode = root;
irsb->irsb_level = 1;
irsb->irsb_stack = NULL;
irsb->irsb_data = NULL;
VIO_record(tdbb, rpb, rsb->rsb_format, tdbb->tdbb_default);

// Initialize the record number for both <root> and <child>
stream
ptr = &rsb->rsb_arg[rsb->rsb_count];
for(end = ptr + (USHORT) *ptr; ++ptr <= end;)
request->req_rpb[(USHORT) *ptr].rpb_number = -1;

RSE_open(tdbb, rsb->rsb_arg[0]);
}

BOOLEAN RSBRecurse::get(TDBB tdbb, RSB rsb, IRSB_RECURSE irsb)
{
RSB *rsb_ptr;
JRD_NOD map, *ptr, *end;

SET_TDBB(tdbb);
JRD_REQ request = tdbb->tdbb_request;
ULONG n = request->req_impure_size;

switch( irsb->irsb_mode ) {

case root:
rsb_ptr = &rsb->rsb_arg[0];
break;

case recurse:
// Stop infinite recursion of bad queries at 50 deep
if( irsb->irsb_level>50 )
ERR_post(gds_req_max_clones_exceeded, 0);

// Save where we are
JRD_REQ tmp = (JRD_REQ) new char[n];
memcpy(tmp, request, n);
irsb->irsb_stack = tmp;

REC record = request->req_rpb[rsb-

>rsb_stream].rpb_record;

irsb->irsb_data = new char[record->rec_length];
memcpy(irsb->irsb_data, record->rec_data, record-

>rec_length);

// (Re-)Open a new child stream & reset record number
rsb_ptr = &rsb->rsb_arg[2];
if( irsb->irsb_level>1 )
RSE_close(tdbb, *rsb_ptr);
request->req_rpb[(USHORT)rsb->rsb_arg[6]].rpb_number
= -1;
RSE_open(tdbb, *rsb_ptr);

irsb->irsb_level++;
break;

}

// Get the data -- if there is none go back one level and when
// there isn't a previous level, we're done
while( !get_record(tdbb, *rsb_ptr, NULL, RSE_get_forward) ) {

if( irsb->irsb_level==1 ) {
return FALSE;
}
else {
RSE_close(tdbb, *rsb_ptr);
delete[] irsb->irsb_data;

JRD_REQ tmp = irsb->irsb_stack;
memcpy(request, tmp, n);
delete[] tmp;
}

if( irsb->irsb_level>1 ) {
rsb_ptr = &rsb->rsb_arg[2];

// Reset our record data so that recursive
WHERE clauses work
REC record = request->req_rpb[rsb-

>rsb_stream].rpb_record;

memcpy(record->rec_data, irsb->irsb_data,
record->rec_length);
}
else {
rsb_ptr = &rsb->rsb_arg[0];
}

}
irsb->irsb_mode = recurse;

// We've got a record, map it into the target record
map = (JRD_NOD) rsb_ptr[1];
for (ptr = map->nod_arg, end = ptr + map->nod_count; ptr <
end; ptr++)
EXE_assignment(tdbb, *ptr);

return TRUE;
}

void RSBRecurse::close(TDBB tdbb, RSB rsb, IRSB_RECURSE irsb)
{
SET_TDBB(tdbb);
JRD_REQ request = tdbb->tdbb_request;
ULONG n = request->req_impure_size;

while( irsb->irsb_level>1 ) {

RSE_close(tdbb, rsb->rsb_arg[2]);

JRD_REQ tmp = irsb->irsb_stack;
memcpy(request, tmp, n);
delete[] tmp;
}
RSE_close(tdbb, rsb->rsb_arg[0]);
}

Bugs & Issues
=============

The current code copies the entire requestblock on each recursion.
This is wasteful. For a regular query the request block is 1..5K and
the overhead is bad, but acceptable for my purposes. For use in a big
PSQL procedure (request block size up to 500K!) it is a bit over the
top. <soapbox> Yet another reason to separate procedural processing
from relational processing. </soapbox>

I figure this can be solved by not saving the entire request, but
just that part that deals with the recursive select expression and
its tributary streams. It would involve looping through the tributary
streams and pushing the corresponding IRSB blocks onto a stack -- as
we are not interested in record contents, just saving this should be
enough. This more clever scheme can be part of a general clean-up &
encapsulation of the RSE code.

Thanks
======

A big thank you goes to Jim, Nicolay and Arno: their advice sure
speeded things up a lot.

Paul