Subject Re: [Firebird-Architect] Re: Architectural Cleanliness: CVT_move
Author Jim Starkey
paulruizendaal wrote:

>This message thread could be more prophetic than Helen imagined: not
>only does the UDF interface expose an internal engine structure, it
>exposes one that shouldn't be there. In my opinion, runtime
>descriptors will go the way of the Dodo, perhaps not in FB2.0, but
>certainly before FB3.0
>
>
I designed the descriptor mechanism to support general builtin functions
as DML extensions, not for external functions. I don't know if any use
was ever made of the facility (though a little research should reveal
all). Since those functions were compiled as part of the engine there
was no problem. Exposing them for external functions is more
problematic, but only slightly so. As several folks have pointed out,
external descriptors could be easily make independent of the internal
descriptor mechanism. I question, however, whether exposing any
descriptor facility to extern functions make sense -- without an
architecturally controlled runtime environment, descriptors are close to
useless. All this begs the larger question of serious, e.g. Java,
embedded language.

For the purposes of FB2.0/Vulcan, I think it's a non-issue.

>I think the engine execution runtime shows it age. It was designed as
>a clever way to do co-routines in the original architecture (classic
>+ gpre'd clients).
>
>There are three things that could be better, I think:
>1. Change the design paradigm from 'stalling looper' to 'multitasked
>VM'. This will make the code flow easier to understand & maintain. It
>is mostly a semantic change: if you think about it, impure areas in
>requests are like stack frames and the type field of JRD_NODs are
>like opcodes, etc.
>
You are talking about two things that should be considered separately,
looper and the impure area.

Looper was absolutely designed as a coroutine mechanism that enabled a
request to stall, return control the the client, then continue. Most
systems written today would use threads. But there is a consideration
that I would like you to address. Each thread requires a stack, and in
all threading systems I'm aware of, those stacks live in a single
address space. The size of a thread stack is generally controllable but
necessarily large, usually defaulting to a megabyte or so. It doesn't
take a large number of threads to consume a large part of available
address space, particularly on those processors and operating systems
that segment the address space into P0 and P1 spaces. If Firebird
applications where constrained to web or transaction processing, this
would be no problem, as a thread would be allocated to a process for a
short period then released for reuse. But Firebird is mainly used in
client server applications where long standing connections and requests
are the norm. Allocating a metabyte per connection to preserve request
state may not be feasible for a large number of connections. Sure,
careful accounting could reduce the normal case at the cost of a great
deal of tricky code. So before you throw out an aging and torturous
machanism, do carefully consider the ramifactions of the alternative.
Note: disclosure dictates that I confess that Netfrastructure is thread
based, but it's designed specifically for web applications.

The impure area is a complete different story. The impure area exists
so many requests can share a compiled request. No matter how fast you
make the compiler, not having to compile will always be faster. The
impure area is a hidden jewel waiting to be exploited by compiled
statement caching. Nickolay and I are of different minds as to whether
statement caching should be client or server side, but then he hasn't
been exposed to the power of filtersets.

>2. Clearly separate the 'relational' VM and the 'procedural' VM. The
>latter can & should be pugable. Perhaps the relational VM should be
>plugable too, but it hard to see how more than one per ODS would make
>sense. Perhaps it should not be plugable at all.
>
Multiple engines, yes. Pluggable engine, not a change. Too many
complex services that can't and shouldn't be architecturally controlled.

>3. Move type casting decisions from the runtime to the compiler. SQL
>is a strongly typed language and I believe so is GDML/blr Where type
>casts are necessary is known at compile time. Figuring this out
>repeatedly at runtime is inefficient, as is the overhead of
>maintaining type information (i.e descriptors). Hence, descriptors
>will be superfluous. The runtime will work with 'naked' values (i.e.
>value union + null flag)
>
>
Pish-tosh. Since records can be of different format, you don't know the
datatype until runtime. Some decisions about datatype, particularly
datatypes of operations, can be determined at compile time, but these
are more the exception than the rule.

>How the current design is broken shows, for instance, in UDF's that
>return their result by descriptor: although you can return any type
>you like, if it isn't the compile-time type, you'll create an error
>condition further down the processing road.
>
>
The design was known to be broken at the time I designed it. Putting
raw use code in the engine is insane; the only thing that was worse was
not putting user code in the engine.

I continue to advocate an embedded JVM for triggers, store procedures,
and UDFs. A sandbox is the only solution.

>I guess UDF's by descriptor were added as a quicky, braindead way of
>signaling NULL values in parameters or in the return value. This
>should have been done by defining a new value struct (value + null
>flag) that became part of a published, maintainable, future proof API
>(as Jim has already suggested). Would it be a good idea to move FB2.0
>to such a UDF system? Or would it cause too much breakage?
>
>
Now that I'm being accused of brain-death, I have a little more sympathy
for the advocates of list civility.

Gosh, a new value struct sounds remarkably like a descriptor. Could you
kindly explain the difference to those of us who are brain dead?

>BTW, can anybody tell me why there are 3 ways to signal NULL in the
>current VM ??
>1. return a null DSC pointer
>2. set DSC_null flag in the descriptor
>3. set req_null flag in the request
>
>
Things that evaluate values use descriptors. Things that compute
booleans use the flag to indicate null. I never like it much and used
explicit return values to indicate boolean-null.

>Slightly off-topic, I do see the need to provide UDF users with a
>tool to cast types, especially from date to string and vice versa.
>Would it be an idea to kill the current CVT_move entry point and to
>bundle a separate dynamic lib with a type conversion utility like
>CVT_move. I was thinking of doing something like that anyway.
>
>
Hard problem, no existence proof of a solution. Good luck.

--

Jim Starkey
Netfrastructure, Inc.
978 526-1376