firebird-architect - external procedure engines

Subject	external procedure engines
Author	paulruizendaal
Post date	2003-11-17T18:50:51Z

Hi all:

As several of the project admins know, we've been busy developing an
"Oracle"-mode for Firebird. They initial goal is to be able to run a
single application, "Compiere". Progress has been good and large
parts of the Compiere application already work with Firebird. :^)

A significant part of the effort is to make the Oracle procedural
language PL/SQL available to Firebird. As part of this effort we want

to make it possible for the engine to call out to external procedure
engines, a bit like UDF's in the current Firebird.

Please find a summary of an e-mail conversation between Dmitry and
myself below. I would look forward to input from the community.

Paul

==========================================

Hi Dimitry,

As you may have heard from Roman, an effort is underway to build an
'oracle mode' for Firebird. The initial goal is to be able to run the

Compiere application.

Quite a bit of functionality has already been done:
- the compiere tables can created and loaded
- views and sequences can be created
- the compiere SQL usage can be handled by Firebird (thanks to Arno),
with only a little syntax morphing
- we've done a client side PL/SQL engine, that can handle all
compiere needs (some 20K lines of PL/SQL)
- we're working on a UDF lib with oracle built-in functions

We had to do our own PL engine, because BLR lacks in functionality
(PL/SQL has lexical nesting like pascal; cursor handling need not be
nested; GOTO is more flexible than BLR allows, etc). Our PL/SQL
engine translates the source code to a pascal p-code-like bytecode,
which is executed by a small interpreter.

Client-side execution is nice, but too limited to support Compiere's
needs:
- some SQL calls user functions, written in PL/SQL
- trigger bodies are written in PL/SQL
For these to be implemented, our little (300 line) interpreter needs
to run on the server side, with 'looper' calling into this
interpreter as needed. Note that our interpreter may need to call
back into BLR/looper if it needs to execute non-procedural (ie. SQL)
statements. This month I would like to start on getting this
integration done with FB2.0

I am thinking that perhaps the most practical would be have an API
for 'external' interpreters to hook into. My thoughts are for this
API to be internal to the engine, an interface if you like. I am not
thinking of an external API where external libraries can (un)register

themselves at will.

Through this API it could become possible to run a variety of
procedural code inside firebird. Somebody seems to have already
thought this through for Java / the JVM. See:

http://www.ibphoenix.com/a635.htm

I look forward to hearing your thoughts and your recommendations for
discussing this with the FB core developers.

============================================

Hi Paul,

[...snip...]

> We had to do our own PL engine, because BLR lacks in functionality
> (PL/SQL has lexical nesting like pascal; cursor handling need not
> be nested; GOTO is more flexible than BLR allows, etc). Our PL/SQL
> engine translates the source code to a pascal p-code-like bytecode,
> which is executed by a small interpreter.

Understood, but did you consider implementing missing BLR codes in
our engine instead? Our team could handle this work, if required. At
least we could discuss it.

> Client-side execution is nice, but too limited to support
> Compiere's needs:
> - some SQL calls user functions, written in PL/SQL

SQL functions are going to be supported in v2.0.

> - trigger bodies are written in PL/SQL

I did have a look at your complete SQL script and think your triggers

can be handled by FB. Am I wrong?

> For these to be implemented, our little (300 line) interpreter
> needs to run on the server side, with 'looper' calling into this
> interpreter as needed. Note that our interpreter may need to call
> back into BLR/looper if it needs to execute non-procedural (ie.
> SQL)> statements. This month I would like to start on getting this
> integration done with FB2.0

Yes, it's a subject of serious discussion, but I'm sure it can be
done.

[...snip...]

> Through this API it could become possible to run a variety of
> procedural code inside firebird. Somebody seems to have already
> thought this through for Java / the JVM.

Once the hooking is done, it would open doors for both
embedded/plugged JVM and BLR execution from UDFs. Sounds interesing
and useful.

[...snip...]

Dmitry

==============================================

Hi Dmitry,

> > - we're working on a UDF lib with oracle built-in functions
>
> Great. But which Oracle built-in functions do you need? We're going

> to include new functions (SQL99-compliant ones, e.g. TRIM, PAD,
> CHAR_LENGTH, POSITION etc) into v2.0, so maybe we could also agree
> on some you need most as well?

Oracle has some 70 built-in functions. I don't have a precise list of

Compiere's needs, but so far we have identified only a handful being
actually used. My main worry now is to get some date functions coded
up: string-date and date-string conversion, date rounding. These
functions are quite sophisticated in Oracle. For example, date
rounding can be rounded to year, quarter, month, week, day, etc.

> > We had to do our own PL engine, because BLR lacks in
> > functionality (PL/SQL has lexical nesting like pascal; cursor
> > handling need not be nested; GOTO is more flexible than BLR
> > allows, etc). Our PL/SQL engine translates the source code to a
> > pascal p-code-like bytecode, which is executed by a small
> > interpreter.
>
> Understood, but did you consider implementing missing BLR codes in
> our engine instead? Our team could handle this work, if required.
> At least we could discuss it.

Initially, this was my plan: I wanted to use the old BLR call
interface. The client would generate BLR and send that to the engine,

a bit like a gpre'ed client program would do. Perhaps I should revert

to this plan, as you suggest.

Studying the problem and coding things up, I came to the conclusion
that extending BLR was too cumbersome: the BLR design is well suited
to representing queries, but perhaps less to representing procedural
code. So that's when I switched to doing a separate bytecode engine
for procedural elements (I call this "BTC", for ByTeCode).

I have been thinking about merging the two:
(i) There would be a new BLR opcode, say "blr_btcblock". This opcode
would signal the start of a block of BTC code. The BLR compiler would

turn this into a single execution node that contained the entire BTC
block. The looper would see the block and shift to executing the BTC
(the bytecodes are low level operations that are executed directly,
i.e. not expanded to an execution net, like BLR is).
(ii) The BTC contains sections of BLR (just like a gpre'ed C program
would). The BTC engine would take the BLR section, send it through
the BLR compiler to turn it into execution nodes and recursively call

the looper. Obviously, this needs some kind of caching.

The above arrangement would allow for real fast execution of
procedural code and would allow to add things like objects, without
making BLR bewildering complex. The complexity is in the PL/SQL to
BTC compiler instead, which is a relatively stand-alone bit of code.

I suppose I could prototype some of the above by doing a special UDF
lib. It would have one function that took one string parameter and 9?

descriptor parameters. The string would be used to locate the PL/SQL
procedure (compiled to BTC), a stack frame would be set up with the
other parameters, and the BTC would be executed. The old OSRI
interface would be used to send back BLR sections to the engine. The
three problems with this approach are:
- the limit in the number of parameters that a UDF can take
- the fixed return type of a UDF, even if it is passed by descriptor
- i've heard that the engine is locked during UDF execution... true?

Paul

===============================================

Paul,

[...snip...]

> Studying the problem and coding things up, I came to the conclusion
> that extending BLR was too cumbersome: the BLR design is well
> suited to representing queries, but perhaps less to representing
> procedural code.

Some BLR verbs (e.g. blr_if, blr_for etc) seem to be procedural ones
at the first glance, but in fact they were part of the engine since
the very beginning and they're used in usual queries. For example, a
WHERE condition of UPDATE/DELETE statements is evaluated as a FOR-
loop with a positioned (i.e. cursor-based) update/delete inside. So
your above words are a bit incorrect. But of course adding one BLR
verb (like your blr_btcblock) is much easier than supporting all your

needs via extra BLR verbs.

> I have been thinking about merging the two:
> (i) There would be a new BLR opcode, say "blr_btcblock". This
> opcode would signal the start of a block of BTC code. The BLR
> compiler would turn this into a single execution node that
> contained the entire BTC block. The looper would see the block and
> shift to executing the BTC (the bytecodes are low level operations
> that are executed directly, i.e. not expanded to an execution net,
> like BLR is).

Nice solution. We may provide an interface to call external code from

the looper via the new BLR verb and it can be used for different
needs - just develop new engine plugin (byte-code interpreter) and
use it when you need. Your case is good example.

> (ii) The BTC contains sections of BLR (just like a gpre'ed C
> program would). The BTC engine would take the BLR section, send it
> through the BLR compiler to turn it into execution nodes and
> recursively call the looper. Obviously, this needs some kind of >

caching.

This idea needs more thinking. AFAIU, it means that your BTC code can

be always translated to BLR. It's not the case currently. Did I
misunderstand you?

> I suppose I could prototype some of the above by doing a special
> UDF lib. It would have one function that took one string parameter
> and 9? descriptor parameters. The string would be used to locate
> the PL/SQL procedure (compiled to BTC), a stack frame would be set
> up with the other parameters, and the BTC would be executed. The
> old OSRI interface would be used to send back BLR sections to the
> engine. The three problems with this approach are:

One more issue. Currently UDF has no access to their execution
context. It's possible to send BLR via OSDI functions, but you have
neither transaction nor attachment handles in UDF.

> - i've heard that the engine is locked during UDF execution...
> true?

False. The engine unlocks itself before calling UDF and locks upon
return.The UDF code is executed outside the engine.

[...snip...]

Dmitry

========================================

Dmitry,

[...snip...]

Well, thinking about it a bit more, and taking into account your
remarks about issues in the "prototype" approach, how about the
following:

(i) add a new BLR opcode 'blr_external'
syntax: blr_external <string libname> <int size> <char[] data>
This opcode creates an execution node with the lib name and a
pointer to a malloc'ed block with the data of size <size>.

(ii) add code to looper
This code tries to open the library <libname> (using the UDF
algorithms for each OS variety) and calls a fixed function in
this library, eg. "external_handler":
void external_handler( ptr to datablock, ptr to paramdesc array,

int param count, ptr to transaction handle, ptr to attachment
handle);
array[0] could be used for a function return value, if any.

Perhaps we need to pass pointers to 'old' and 'new' if this is
called from within a trigger handler. I am not quite sure how
this works in the engine.

(iii) write an external handler
In my case, this would be a lib that would allocate an
interpreter stack, push the parameters on this stack and starts
interpreting the code in the <data> block. When any BLR code is
encontered, the interpreter would call back in using the OSRI
interface and the handles passed from the engine. When finished,

it loads the result (if any) in array[0] and returns.

In the case of java, the data block could contain just the name of a
jar and an object.method name. The external handler would start a JVM

thread, load the jar and call the method.

Good idea ??

> > (ii) The BTC contains sections of BLR (just like a gpre'ed C
> > program would). The BTC engine would take the BLR section, send
> > it through the BLR compiler to turn it into execution nodes and
> > recursively call the looper. Obviously, this needs some kind of
> > caching.
>
> This idea needs more thinking. AFAIU, it means that your BTC code
> can be always translated to BLR. It's not the case currently. Did I
> misunderstand you?
>

I think you did. Let me show you an example to explain it. Take the
below PL/SQL code block:

declare
type pr is record(i number, t char(30));
p pr;
dummy char(1);
cursor c1(k in number) is select * from paul where paul.a!=k;
cursor c2 is select * from dual;
begin
open c1(7);
open c2;
fetch c1 into p;
fetch c2 into dummy;
dbms_output.put_line(p.i);
dbms_output.put_line(p.t);
dbms_output.put_line(dummy);
fetch c1 into p;
dbms_output.put_line(p.i);
dbms_output.put_line(p.t);
close c1;
close c2;
end;

This is then compiled to the following BTC sequence (the 0,5 etc.
parameters are pascal style stack relative addresses: 0,5 means stack

frame 0, location 5):

1 btc_jmp 118
4 btc_enter 1, 0
8 btc_open 0, 5, "SELECT A, B FROM PAUL WHERE (A != ?)"
64 btc_bindin 0, 1, -3, 2
70 btc_exec "C1"
74 btc_leave
75 btc_return
76 btc_enter 1, 0
80 btc_open 0, 6, "SELECT DUMMY FROM DUAL"
112 btc_exec "C2"
116 btc_leave
117 btc_return
118 btc_enter 0, 0
122 btc_null
123 btc_null
124 btc_null
125 btc_null
126 btc_null
127 btc_pushi1 7
129 btc_ci2f
130 btc_call 4
133 btc_pop
134 btc_call 76
137 btc_setfetch 0, 5
141 btc_bindout 0, 0, 2, 2
147 btc_bindout 1, 0, 3, 3
153 btc_fetchrow
154 btc_setfetch 0, 6
158 btc_bindout 0, 0, 4, 3
164 btc_fetchrow
165 btc_fetch 0, 2
168 btc_printf
169 btc_fetch 0, 3
172 btc_prints
173 btc_fetch 0, 4
176 btc_prints
177 btc_setfetch 0, 5
181 btc_bindout 0, 0, 2, 2
187 btc_bindout 1, 0, 3, 3
193 btc_fetchrow
194 btc_fetch 0, 2
197 btc_printf
198 btc_fetch 0, 3
201 btc_prints
202 btc_close 0, 5
206 btc_close 0, 6
210 btc_leave
211 btc_exit

At code address 8 and 80, the interpreter calls back into the engine
to actually open a cursor. The SQL string gets translated to BLR and
executed. At address 153 and 164, message exchange takes place to
retrieve a row of the result set. The code just before those
btc_fetchrow commands selects an open cursor and binds stack
locations to message slots.

I realise that without BTC documentation this is all a bit cryptic,
but perhaps it helps explain the difference between BTC and BLR in
design, even though there isn't much procedural code in this example.

BTC is really very simple 1970's pascal p-code like stuff, designed
for arithmetic, looping & stack/heap access.

Hope this helps

Paul

==============================================

> I'm about to start on getting "external" procedures/triggers
> implemented in FB. Did your discussions bring up any items that you
> would want me to keep in mind for this project ?

First of all, I'd follow the Roman's advice and start the discussion
in fb-architect. Two questions to be started there: (a) calling
external code from the looper to process non-native byte code (our
suggestion - use extra BLR verb for this purpose) and (b) callbacks
from the external libraries (JVM, UDFs, your BTC plugin, etc) to the
engine. After the initial discussion we may continue in fb-devel.

==============================================

Well, the floor is open, FB-Architects !