firebird-architect - Re: [Firebird-Architect] RFC: Proposal for the implementation

Subject	Re: [Firebird-Architect] RFC: Proposal for the implementation
Author	Vlad Horsun
Post date	2004-11-29T12:12:46Z

> We have two issue to solve:
>
> 1) Data storage
>
> Temporary data by its definition doesn't require any recovery policies. If
> it disappears because of a hardware/software failure, it means no actual
> data loss. Temporary data is also expected to have shorter life-time than
> persistent data and to provide faster access. All this means that it should
> be preserved in memory as much/long as possible and flushed on disk only if
> there's not enough buffers to keep all temp data.

I think this is not correct. Temporary data shouldn't be preserved
in RAM - it's must be cached like all other data. We can cache it with
our cache manager (CCH) or delegate it to file system, but don't retain
temp data in RAM.

As example, MSSQL before version 7.0 has option 'tempdb in RAM' but
deprecate it since v7, IIRC.

> Hence I consider CCH a
> wrong module to handle temporary I/O. The required allocation/caching
> algorithm is provided by the SortMem class, which can be extended to produce
> a generic multi-purpose temp space manager. Perhaps, CCH could be also
> enhanced to provide different allocation/caching patterns and in this case
> it could be used for our purposes, although this doesn't look like an easy
> work. But, regardless of the CCH usage, I consider the whole idea of storing
> temp data inside the database via the existing PIO wrong, as it's just
> provides the required semantics without any performance and/or cleanup
> benefits. If the proper solution requires a separate page numbers space, so
> be it.

Separate page space seems most attractively from performance
POV for me.

> 2) Data visibility
>
> I see two ways to allow per-session data visibility:
>
> 1) One TempSpace instance per attachment. It means that different
> attachments work with different temporary files.

In fact we can imagine 4 tempspace scope :
a) one per temporary table instance
b) one per attachment
c) one per database
d) one per engine

Option d) seems to be not practically useful, at least with
current engine implementation

Option c) seems to be not very friendly to classic server, but
allows to avoid frequent file creation\deletion

Option a) is most granulated but add most overhead from file
system

So, i prefer option b) for CS and c) for SS, or one per engine
process.

> 2) Nickolay's ATTACHMENT_ID idea to add a hidden column to both data and
> indices and teach the optimizer to filter the rows.

I think this is not so good. It's seems to be easy to implement,
but i showed even more easiest way. But this has one big disadvantage -
performance. Clean up at startup is fast but we don't want to reload engine
just to empty temp tables ;) Regular clean up (sweep, garbage collection)
is slow for temp tables and must be avoided, imho.

And i can't see how Nickolay idea satisfy tables with ON COMMIT
DELETE option.

At last, different page space allows read-only database to work
with temp data and still remains read-only - this can be important
for some applications

Regards,
Vlad