Subject | External Data Sources |
---|---|
Author | Jim Starkey |
Post date | 2004-04-29T18:21:56Z |
The existing external data mechanism is more in the nature of crude hack
that a strong extensible feature. It made sense on VMS where field
structured RMS sequential and indexed files were common but less useful
in the rest of universe where binary files are few and far between. In
any case, the existing external data mechanism assumes a very dumb data
source -- a flat file of more or less fixed length records -- that
doesn't generalize well to relational data sources.
I think the ability to reference external objects in a DML context would
be a very good thing to have, though not necessary a simple thing to do
well. ODBC is probably a good model to use when thinking about the
problem, but a solution that generalizes other other APIs with different
plugins would be even better.
To address external relational data sources, the following problems need
to be addressed:
1. Definition of the data source, including plugin name, source name,
account/password information, etc. This needs to be rather open
ended to support direct access to other database systems.
2. Definition of external tables. The existing mechanism requires a
Firebird table definition that maps the physical record. This is
inappropriate for relation sources, which probably want to be
defined in terms of a SQL "select" statement. The external
definitions, however, pretty much need to be replicated in the
Firebird system tables if any tools are going to work.
3. A mechanism to push selection criteria into the external system
where appropriate to support external optimization. This is
critical and far from trivial, particularly since the plugin must
be isolated from internal engine data structures subject to change
without notice.
4. Transaction control. A external data source mechanism that can't
bridge transactions probably isn't worth having.
5. Multiple stream support. Perhaps Sybase has gotten smarter, but
it (another other brain-dead rdbms) used to be incapable of
support more than a single active data stream, imposing major
constraints on any system layered on it.
I believe that a good external data source mechanism would be a very
good addition, but the interface design isn't going to be easy.
--
Jim Starkey
Netfrastructure, Inc.
978 526-1376
that a strong extensible feature. It made sense on VMS where field
structured RMS sequential and indexed files were common but less useful
in the rest of universe where binary files are few and far between. In
any case, the existing external data mechanism assumes a very dumb data
source -- a flat file of more or less fixed length records -- that
doesn't generalize well to relational data sources.
I think the ability to reference external objects in a DML context would
be a very good thing to have, though not necessary a simple thing to do
well. ODBC is probably a good model to use when thinking about the
problem, but a solution that generalizes other other APIs with different
plugins would be even better.
To address external relational data sources, the following problems need
to be addressed:
1. Definition of the data source, including plugin name, source name,
account/password information, etc. This needs to be rather open
ended to support direct access to other database systems.
2. Definition of external tables. The existing mechanism requires a
Firebird table definition that maps the physical record. This is
inappropriate for relation sources, which probably want to be
defined in terms of a SQL "select" statement. The external
definitions, however, pretty much need to be replicated in the
Firebird system tables if any tools are going to work.
3. A mechanism to push selection criteria into the external system
where appropriate to support external optimization. This is
critical and far from trivial, particularly since the plugin must
be isolated from internal engine data structures subject to change
without notice.
4. Transaction control. A external data source mechanism that can't
bridge transactions probably isn't worth having.
5. Multiple stream support. Perhaps Sybase has gotten smarter, but
it (another other brain-dead rdbms) used to be incapable of
support more than a single active data stream, imposing major
constraints on any system layered on it.
I believe that a good external data source mechanism would be a very
good addition, but the interface design isn't going to be easy.
--
Jim Starkey
Netfrastructure, Inc.
978 526-1376