Subject Re: [Firebird-Architect] Re: embedding Lua as procedural language
Author Jim Starkey
Everything you said about looper is correct, though I will bicker about
the trade-off between threaded execution and recursion. I also take
exception to the statement that looper operates on a parse tree. There
is a semantic analysis phase between the BLR parser and the execution
tree that performs view substitution, name resolution, computed express
substitution, etc.

That said, the factor that drove the design of looper was the
requirement that the execution cycle had to be able to stall on a
BLR_RECEIVE and return to the caller. Capturing and restoring state
from recursive execution would have been a major pain in the butt.

That said, BLR is obsolete. It has a number of outstanding
characteristics -- the ability to encode a full transaction, foundation
for trivial network protocol, cheap to parse, analyze, and execution,
dense, language independent -- but virtually none of these matter
anymore. All of my post-Interbase engines have used recursive execution
on trees of polymorphic objects.

Some language lend themselves byte code implementations, Java, for
example. But modern JVM don't interpreter Java bytes code. Instead,
byte codes are used for input to a Just In Time code generator.

Byte code generation is quite simple and straightforward. There is a
trick to generating branches for complex booleans, but it's well even very old books on compiler construction.

The question of whether or not byte codes are a good solution depends on
the semantic size of the primitive operations. For Java, the primitives
are binary operation evaluation on known types, scalar comparisons,
branches, assignments, etc. Executing these with recursive execution is
a waste of cycles, but then so is byte code interpretation.

Database engines in general and SQL engines in specific have very high
level, high over head primitive operations -- fetch a record, filter a
record, sort a record stream, perform a backoutable update,etc. The
cost of most of these operation dwarfs the execution code of byte codes,
threaded execution (aka loop), or recursive execution. Furthermore, the
AMD64 calling sequence uses registers rather than the stack for passing
arguments, significantly reducing the overhead of a call/return
sequence. (I hope we all agree that the days of 32 bit database servers
has come and past.)

At least initially, Nimbus is using the SQL engine from
Falcon/Netfrastructure, which is based on recursive execution. I'm
quite happy with it as a nice balance between good software engineering
(mean, specifically, encapsulation of each node type) and execution
efficiency. Yes, with a significant amount of additional effort it
could be made more efficient, but the net gain from a full system
perspective would be negligible.

Dmitry Yemanov wrote:
> paulruizendaal wrote:
>> If the observations on that page are correct, than FB would benefit
>> from refactoring 'looper' (essentially a parse tree walker coroutine)
>> into a byte-code engine.
> I wouldn't be so optimistic. Let's analyze what runtime performance
> costs do they eliminate:
> "First, a syntax tree describes a program�s grammatical structure, not
> the operations needed to execute it. Therefore, during execution, the
> interpreter would repeatedly visit nodes that did no useful work. For
> example, for the block �{ x++; }�, the interpreter would first visit the
> block node �{�}�, which did nothing, and then visit its first child, the
> increment node �x++�, which incremented x."
> Our execution tree doesn't have any useless grammatical nodes, they're
> eliminated during the SQL->BLR compilation. The only node that doesn't
> perform any useful work itself is nod_list which acts as a container for
> other nodes.
> "Second, even nodes that did useful work were expensive to visit. Each
> visit required a virtual function call and return, which meant a couple
> of indirect memory reads to retrieve the function being called, and two
> indirect branches�one for the call, and one for the return. On modern
> hardware, �indirect� is a synonym for �slow�, since indirection tends to
> defeat caching and branch prediction."
> Our looper doesn't call any virtual methods, the entire node handling is
> embedded. Hard to maintain but fast to execute.
> "Third, to propagate execution state between nodes, the interpreter had
> to pass around a bunch of data. For example, when processing a subtree
> involving a local variable, the interpreter would copy the variable�s
> value between all the nodes in the subtree. So, starting at the �x� part
> of the expression �f((x) + 1)�, a variable node �x� would return x to a
> parentheses node �(x)�, which would return x to a plus node �(x) + 1�.
> Then, the plus node would return (x) + 1 to an argument list node �((x)
> + 1)�, which would copy that value into an argument list object, which,
> in turn, it would pass to the function node for f."
> Our state (aka impure area) is pre-allocated during the BLR compilation
> phase and it's not transported between nodes during execution.
> The funny thing is that some our ideas (replace the looper switch with
> virtual functions, get rid of BLR and execute the SQL grammar tree
> directly, etc) are going exactly wrong direction from the SquirrelFish
> point of view. Maybe we'll have to utilize their experience when such
> changes will be about to happen :-)

Jim Starkey
President, NimbusDB, Inc.
978 526-1376

[Non-text portions of this message have been removed]