Subject Re: [Firebird-Architect] RFC: Cross database queries
Author Roman Rokytskyy
My comments:

> 1. Registration of an external source of the data
>
> CREATE | ALTER EXTERNAL DATA SOURCE
> <ds_name>
> PROVIDER <provider_name>
> CONNECT <ds_connection_string>
> [AS {USER <external_user> [PASSWORD <external_password>] | CURRENT_USER}]
> [OPTIONS < ds_options >]

I have asked whether we're talking about the Firebird DBs only or
external data sources in general.

If later - I'm not ok with [AS {USER ... | CURRENT_USER}] mechanism
since it is very limited. Here we have to consider all plans that Alex
has for PAM-based authentication. In general it has to pass some list of
credentials (user/password pair, certificate, etc.).

If former - we don't need PROVIDER clause then - it should be part of
Firebird. BTW, we don't have clause to register provider :)

> 2. Login mappings management

User name/password is not enough, we have to support certificates at
least. I think we should split security part into another thread.

> 3. Direct (pass-through) queries to external data source (not heterogeneous)

Not sure I like it... see below.

>
>
> 4. Heterogeneous queries to external data source
>
> Table qualifier: <ds_table_name>@<ds_name>

I'd suggest to extend the tables like this:

CREATE TABLE table [EXTERNAL {[FILE] ’filespec’ | DATA SOURCE <ds_name>}]

CREATE VIEW name [(view_col [, view_col …])]
{AS <select> [WITH CHECK OPTION] | DATA SOURCE <ds_name>}

That means that external tables has to be defined in the database
explicitly. As to the explicit queries - if they are used often - people
can define them as views (finally - as selectable procedures, add syntax
extension to CREATE PROCEDURE statement).

I do not consider the cross-database queries as something that should be
very flexible - at the end it is about integrating two databases, not
about acting as router to another DB.

> Here i see 2 possible strategies of work :
>
> ?) Distinguish each external table, to request its format, cardinality, indexes and their
> selectivity and to remember this information similarly to ours jrd_rel. It will do DSQL in pass1.
> GEN will build usual rse with blr_remote_relation instead of blr_relation. Further work with
> these remote_rel as with usual tables - build rsb for each of them, etc. CMP will build
> parametrized queries for external data sources from these rsb's
>
> Pluses: relative simplicity, possibility to not touch optimizer.

I'd go for this approach in the first place, probably I would provide an
option to CREATE TABLE clause that would refresh
statistics/cardinality/indexes from remote data source periodically.

> b) DSQL will distinguish external tables from the same data source and do decomposition of
> heterogeneous query on some "monogenic" queries. For each such external query will build rse same
> as for EXECUTE STATEMENT (with in\out parameters)

That is more advanced thing... First we have to collect experiences with
the more easy stuff, I think.

> It is necessary to think up a way to set an external table name with its full qualification by
> external engine rules. it is possible to quote full name but it is easy may not fit in ours 32
> symbols and it is not clear how to quote external quoted identifiers;)
>
> Also such names will not fit in SQLDA. The truth, it not problems for work with external FB
> data source.

If we extend the CREATE TABLE declaration as shown above, this problem
disappears - it will use local identifiers.

> 5. Transactions

No comments at the moment,.. but somehow the 2PC thing looks too simple
for me at the moment...

> 5b. Savepoints
>
> If provider does not support savepoints we may :
>
> a) prohibit everything, except for read-only queries (how to identify in foreign (unknown)
> SQL syntax?)
>
> b) don't not execute savepoints on an external data source at all (what to if local transaction
> rolled back to intermediate savepoint ?)

I'd vote for b) - at the end it is not our problem.

> If provider supports savepoints, then if under our internal savepoint external query was executed
> then before execution of this query engine will start savepoint on external data source and local
> savepoint is flagged. The further operations with local savepoint will be executed with external
> savepoint too.

Ok.


Roman