firebird-architect - Re: [Firebird-Architect] External Engines (and Plugins)

Subject	Re: [Firebird-Architect] External Engines (and Plugins)
Author	Vlad Khorsun
Post date	2008-06-23T09:34:45Z

Adriano dos Santos Fernandes :

> Vlad wrote:
>>> Syntax:
>>>
>>> { CREATE [ OR ALTER ] | RECREATE | ALTER } PROCEDURE <name>
>>> [ ( <parameter list> ) ]
>>> [ RETURNS ( <parameter list> ) ]
>>> ENTRY_POINT '<entry point>'
>>> LANGUAGE <language>
>>>
>> What is exact meaning of LANGUAGE clause ? From the explanations below i see
>> it as plugin identifier, not related directly to any language.
>>
> External plugins implements languages.

I not agree here. Language "implemented" by compiler :) I'd said what we have
as ExternalEngine is a set of rules of how to write loadable library which this
EE plugin is able to load, obtain entrypoints from library and execute its
functions passing parameters and returning results. Also plugin may provide some
kind of run-time support for user library. But i see no relation with language,
sorry.

> Languages are made available to
> databases through the configuration file. This is almost how INTL works,
> and allows simultaneous usage of many plugins implementing many
> languages. The admin chooses in the config file what's used from what
> plugin. Also this makes clears features available to databases.
>
> AFAIK, this is also how original implementation works, but instead of
> config the languages/libraries are chosen from a system table.

Yes, this is how it works and we also have the same semantic problem with
"language" word.

> Writing external routines directly through the C++ interface is not easy
> as write an UDF. And this is more difficult for others (non C++)
> languages. So the things are break into two layers: plugins and user
> libraries.

Yes, two layers. And this is good design for generic usage i think. But nobody
prevents users to write single External Engine plugin with application code
builtin. It can be usable for complex application layer executed on database
server side.

> For example, there is the C++ (engine) plugin. It implements the CPP
> language. Users will not reimplement it, but write libraries that
> register routines in the C++ plugin.

It have nothing common with C++ language itself i guess. I can write user
library for this plugin in Delphi and it will work.

> Firebird talks with the plugin, and the plugin talks with user
> libraries. This is also how Java routines will work, the plugin will
> load user classes.

BTW, do you have Java-plugin implemented ? If yes, does it requires Jaybird (or
another Java connectivity driver) support as original plugin did ?

>> Also, you removed EXTERNAL keyword from original implementation. What is the
>> reasons for it ?
>>
> I changed EXTERNAL NAME to ENTRY_POINT. It makes things more likely UDFs
> are. Also EXTERNAL NAME seems not good inside an EXTERNAL FUNCTION
> declaration.

I think ability to distinguish external and native procedures at syntax level
is not a bad thing. But i have no strong preference here and may live with and
without EXTERNAL keyword in procedure declaration.

>>> Here are links to latest version of files that worth discuss. Please
>>> verify them:
>>>
>>> FirebirdApi.h is the generic new C++ API. These classes was designed to
>>> future replace the ISC API in mind. Only the necessary classes and
>>> methods for external engines was created -
>>> http://firebird.cvs.sourceforge.net/firebird/firebird2/src/include/FirebirdApi.h?view=markup&pathrev=B2_5_ExtEngines
>>>
>> Why FB_CALL is defined as __stdcall for WIN32 only ?
> Native calling convention from Windows is STDCALL. Also, the default
> calling convention for C++ functions in MSVC is THISCALL, that is
> STDCALL. Certainly, we may define FB_CALL as CDECL, but since COM used
> STDCALL, I think we should use it.
>
> And finally, the ISC API uses it too.

I made all EE API interface methods stdcall to have ability to write plugins in
Delphi. IIRC, i described this decision here. I.e. it was done nor because of
ISC API used it nor because of Windows\COM\etc.

I asked you not why "stdcall" but why "stdcall" is for Windows only ?

>> Why do you use MSVC
>> extended syntax ? IIRC plain "stdcall" is supported by every compiler while
>> "__stdcall". is MS extension.
>>
> This is how we define it in public ibase.h.

It seems we need to think again about it ;)

>> Why values within Values class enumerated starting from 1 but not zero ?
>>
> Do you talk about "index" parameter? This is how almost all SQL
> libraries works. I prefer indexes starting from 0, but I give up on this.

Who was pushed on you so hard so you gave up ? :)

>>> The plugin library may save the plugin object and call they methods
>>> later. The object and all pointers returned by it are valid until the
>>> plugin is unloaded (done through OS unload of the dynamic library) when
>>> Firebird is shutting down.
>>>
>> I think reference counting is much better for such usage. Also it allows to
>> reload plugin library without stop of Firebird.
>>
> If we implement plugin reload, the plugin will be need to be notified by
> some way and can/will reload its state. I don't see any need for
> reference counting here.

Reference counting allows to release library as soon as no more reference on it
stayed in metadata cache. Therefore it allows to reload library without restart
of Firebird and without of extra notifications.

>>> Inside the plugin entry point (firebirdPlugin), the plugin may register
>>> extra functionality that may be obtained by Firebird when required.
>>> Currently only External Engines may be registered through
>>> Plugin::setExternalEngineFactory.
>>>
>> How it can be extended in future for new kinds of plugins ? Adding new
>> Plugin::setXXXFactory ?
> Yes. All API set are versioned so they can be extended and used
> appropriated. But not all plugins will register through factories. Some
> may need to call simple Plugin:::setXXX methods.

I thought Firebird obtained ExternalEngine (or Trace, or INTL, etc) instance
via corresponding factory only, am i wrong ?

>> It seems for me it is better to introduce enumeration of
>> known plugin types and create only universal method to register all kinds of
>> factories, such as
>>
>> Plugin::registerFactory(PluginKind, PluginFactory).
>>
> This will cause a plugin to extend a whole set of methods to just do one
> thing.

I don't understand, explain please.

>> Also it seems better to use word "Plugin" instead of "ExternalEngine" for plugin
>> API :)
>>
> I do not like "external engine" or ESP words from original
> implementations. :-) But Plugin is generic. We may have the TracePlugin,
> for example. I think "Language Plugin" is a more appropriate term.

It seems you mixed generic PluginAPI (used to enumerate, load\unload and
configure loadable plugin libraries) with concrete API, implemented by plugin
(ExternalEngine API, Trace API, etc). PluginAPI must be isolated from all other
APIs, isn't is ?

>>> External Engines API:
>>>
>>> Entry points are opaque strings to Firebird. They are recognized by
>>> specific external engines. A external engine is the implementation of a
>>> language. Languages are declared in config files (possibly in the same
>>> file as a plugin, like in the config example present here).
>>>
>> I still see no correspondence between natural meaning of word 'language' and
>> how it is used. I can write plugin which will work with any dll written on any
>> language and how i must register 'language' of my plugin ?
>>
> Language is registered in the config file so Firebird knows what plugin
> it should ask for it. Plugins returns ExternalEngines based on language
> parameter of ExternalEngineFactory::createEngine.

See above. I not convinced term "language" reflects nature of implemented
facilities.

>>> The C++ (CPP) engine:
>>>
>>> Entry points of the C++ engine are defined as following:
>>> '<module name>!<routine name>!<misc info>'
>>>
>>> The <module name> is used to locate the library, <routine name> is used
>>> to locate the routine registered by the given module, and <misc info> is
>>> an user defined string passed to the routine and can read there. "!<misc
>>> info>" may be ommitted.
>>>
>> Why it is better than passing this <misc info> as parameter ?
>>
> Because some things is better encoded in the metadata, and for triggers
> there is no parameter. For example, see my REPLICATE trigger example.
> The database designer chooses the datasource, and the trigger will read
> properties of that datasource from a table.

So, you offer to declare as many REPLICATE triggers as destination datasources
and mark each of them by corresponding datasource name in <misc info> ? I don't
think its OK. Instead i would put all destinations names into separate table and
read this table in one REPLICATE trigger. It saves at least trigger call overhead.

I'm not against <misc info> i just want to understand for what it is needed ;)

>> PS It seems as a good idea to mention original creator of code in headers. I
>> mean at least Eugeney Poutilin.
>>
> Sorry. Will do it, certainly.

Thank you.

>> PPS Do you plan to introduce interface for user defined aggregates ?
> IMO we need first a generic way for user defined aggregates, i.e., it
> should be possible to write aggregate functions in PSQL.

Aggregates may be called million times so PSQL's call\execution overhead is
critical. Therefore i think we need external aggregates not at second order.

> A way to do it may be through normal functions (but marked as aggregate)
> that SUSPEND asks for new rows, instead of produce rows. Something like:
>
> create aggregate function mult (
> n integer
> ) returns integer
> as
> declare ret integer = 1;
> begin
> while (n is not null)
> do
> begin
> ret = ret * n;
> suspend;
> end
>
> return ret;
> end

I like it, looks very attractive. Just a few comments:

a) It must allow to process NULL values. I.e. it must not use NULL value as sign
of EOF. Instead we may introduce builtin system function AGGREGATE_STATE (like
CURRENT_xxx) which will return at least 2 logical values of "accumulating
values" and "produce result".

b) I guess we need obtain input value first and only then accumulate it in
result. We may supply first input value with aggregate functin call but it must
be carefully think on and documented.

c) I not sure that using of SUSPEND keyword is correct here. We can introduce
something more adequate, say WAITROW or GETINPUT, etc.

Taking into account comments above, your example will look as follows:

create aggregate function mult (
n integer
) returns integer
as
declare ret integer = 1;
begin
while (1=1)
do
begin
WAITROW; // or SUSPEND, i don't know what is better

if (AGGREGATE_STATE <> 1) // accumulating values
then leave;

if (n is not null)
then ret = ret * n;
end

return ret;
end

Regards,
Vlad