Subject | Design of RSE module |
---|---|
Author | paulruizendaal |
Post date | 2004-04-15T11:27:04Z |
Having had a chance to think through RSE's (unfortunately without
much access to source) whilst on the beach, I would like to put up
another description of RSE's to get feedback on. Again, I would
welcome feedback from engine developers, like the one from Dmitry
that pointed out the misconceptions in the earlier attempt. Here it
goes:
"The execution tree of a compiled request consists of two types of
nodes: JRD_NOD's and RSB's. The former hold procedural and (scalar)
expression code, the latter hold record stream code. The execution
tree is shared between request instances ("clones"), with instance
data being held in the "impure" area. [The impure area is a data
frame that is allocated as part of a request struct]
Each (potentially shared) RSB node has a corresponding (non-shared)
instance data:
- First, there are the RPB blocks, which are an array at the end of a
request struct. An RPB struct seems to describe where
data can be found for the current record and what format it is in.
Note that the actual data is always located in the impure block.
- Second, each RSB has a specific non-shared extension struct in the
impure area, the so called IRSB structs. IRSB's are located via a
pointer offset in the shared RSB block.
The tree of JRD_NOD's and RSB's is executed by the code in the EXE
and EVL modules (for JRD_NOD's) and by the RSE module (for RSB's),
with (recursive) calls occuring between each module:
- EXE calls into EVL to evaluate expressions
- EXE calls into RSE to evaluate DML statements
- RSE calls into EVL to evaluate conditions
- RSE calls into EXE to evaluate selectable procedures
- EVL calls into RSE to evaluate singleton selects
Note that all three modules share a single instance of the request
struct and its impure area, which by itself is not recursive.
Recursive procedure calls are implemented by "cloning" the request on
the fly, i.e. creating a copy of the request struct and impure area.
RSB's are a little like objects with three methods: open(), fetch()
and close(). The EXE and EVL modules access the RSE module through
this abstract interface.
RSB nodes are a tree structure in themselves. There are three kinds
of RSB nodes (the naming is mine and cannot be found in source):
RSBSource, RSBMorph and RSBJoin.
RSBSource types are the leafs of the RSB tree and record streams
originate from these leafs. There are many types of RSBSource node,
as the types are a (pruned) cartesian product of table types
(internal, external, procedural) and access types (sequential,
indexed, navigational, ...).
RSBMorph types process a single record stream into another single
record stream. They are "branches" in the RSB tree. Examples would be
RSBBoolean (selects records based on a boolean expression), RSBSort
(sorts records in an expression order) and RSBAggregate (implements
groub by semantics). Other examples would be RSBFirst and RSBNext.
The third type, RSBJoin, combines two or more streams into a single
stream. The RSBJoin types include merge, inner, outer and semi-outer
join logic.
If written out as a hierarchy, a possible ordering could be:
RSB
RSBSource
RSBInternal
rsb_sequential, // natural scan access
rsb_indexed, // access via an index
rsb_navigate, // navigational walk on an index
RSBExternal
rsb_ext_sequential, // external sequential access
rsb_ext_indexed, // external indexed access
rsb_ext_dbkey, // external DB_KEY access
RSBProcedural
rsb_procedure // stored procedure
RSBGateway
...
RSBMorph
rsb_boolean, // predicate (logical condition)
rsb_sort, // sort
rsb_aggregate, // aggregation
rsb_first, // retrieve first n records
rsb_skip, // skip n records
RSBJoin
rsb_cross, // inner join as a nested loop
rsb_left_cross, // left outer join as a nested loop
rsb_merge, // join via a sort merge
rsb_union, // union
----
So far my current understanding of RSB's I would welcome all input
on misunderstandings. Also, I would be happy if people could post
elaborations on points that are too obscure in the current text.
Paul
much access to source) whilst on the beach, I would like to put up
another description of RSE's to get feedback on. Again, I would
welcome feedback from engine developers, like the one from Dmitry
that pointed out the misconceptions in the earlier attempt. Here it
goes:
"The execution tree of a compiled request consists of two types of
nodes: JRD_NOD's and RSB's. The former hold procedural and (scalar)
expression code, the latter hold record stream code. The execution
tree is shared between request instances ("clones"), with instance
data being held in the "impure" area. [The impure area is a data
frame that is allocated as part of a request struct]
Each (potentially shared) RSB node has a corresponding (non-shared)
instance data:
- First, there are the RPB blocks, which are an array at the end of a
request struct. An RPB struct seems to describe where
data can be found for the current record and what format it is in.
Note that the actual data is always located in the impure block.
- Second, each RSB has a specific non-shared extension struct in the
impure area, the so called IRSB structs. IRSB's are located via a
pointer offset in the shared RSB block.
The tree of JRD_NOD's and RSB's is executed by the code in the EXE
and EVL modules (for JRD_NOD's) and by the RSE module (for RSB's),
with (recursive) calls occuring between each module:
- EXE calls into EVL to evaluate expressions
- EXE calls into RSE to evaluate DML statements
- RSE calls into EVL to evaluate conditions
- RSE calls into EXE to evaluate selectable procedures
- EVL calls into RSE to evaluate singleton selects
Note that all three modules share a single instance of the request
struct and its impure area, which by itself is not recursive.
Recursive procedure calls are implemented by "cloning" the request on
the fly, i.e. creating a copy of the request struct and impure area.
RSB's are a little like objects with three methods: open(), fetch()
and close(). The EXE and EVL modules access the RSE module through
this abstract interface.
RSB nodes are a tree structure in themselves. There are three kinds
of RSB nodes (the naming is mine and cannot be found in source):
RSBSource, RSBMorph and RSBJoin.
RSBSource types are the leafs of the RSB tree and record streams
originate from these leafs. There are many types of RSBSource node,
as the types are a (pruned) cartesian product of table types
(internal, external, procedural) and access types (sequential,
indexed, navigational, ...).
RSBMorph types process a single record stream into another single
record stream. They are "branches" in the RSB tree. Examples would be
RSBBoolean (selects records based on a boolean expression), RSBSort
(sorts records in an expression order) and RSBAggregate (implements
groub by semantics). Other examples would be RSBFirst and RSBNext.
The third type, RSBJoin, combines two or more streams into a single
stream. The RSBJoin types include merge, inner, outer and semi-outer
join logic.
If written out as a hierarchy, a possible ordering could be:
RSB
RSBSource
RSBInternal
rsb_sequential, // natural scan access
rsb_indexed, // access via an index
rsb_navigate, // navigational walk on an index
RSBExternal
rsb_ext_sequential, // external sequential access
rsb_ext_indexed, // external indexed access
rsb_ext_dbkey, // external DB_KEY access
RSBProcedural
rsb_procedure // stored procedure
RSBGateway
...
RSBMorph
rsb_boolean, // predicate (logical condition)
rsb_sort, // sort
rsb_aggregate, // aggregation
rsb_first, // retrieve first n records
rsb_skip, // skip n records
RSBJoin
rsb_cross, // inner join as a nested loop
rsb_left_cross, // left outer join as a nested loop
rsb_merge, // join via a sort merge
rsb_union, // union
----
So far my current understanding of RSB's I would welcome all input
on misunderstandings. Also, I would be happy if people could post
elaborations on points that are too obscure in the current text.
Paul