Subject Re: [Firebird-Architect] RFC: Proposal for the implementation
Author Dmitry Yemanov
"Vlad Horsun" <hvlad@...> wrote:
>
> > 1) Data storage
> >
> > Temporary data by its definition doesn't require any recovery policies.
If
> > it disappears because of a hardware/software failure, it means no actual
> > data loss. Temporary data is also expected to have shorter life-time
than
> > persistent data and to provide faster access. All this means that it
should
> > be preserved in memory as much/long as possible and flushed on disk only
if
> > there's not enough buffers to keep all temp data.
>
> I think this is not correct. Temporary data shouldn't be preserved
> in RAM - it's must be cached like all other data.

This is somehow similar to what I was saying ;-) Both CCH and
SortMem/TempFile provide caching, just with different strategies.

> We can cache it with
> our cache manager (CCH) or delegate it to file system, but don't retain
> temp data in RAM.

I still think that CCH is too weighty to provide good performance for temp
tables, but perhaps it's not so important and/or can be improved in the
future. But the cache controlling code must be changed accordingly, e.g.
there's no need to flush temp pages to disk on commit/rollback.

> As example, MSSQL before version 7.0 has option 'tempdb in RAM' but
> deprecate it since v7, IIRC.

CCH also tends to preserve as much pages in RAM (okay, not in RAM, but in
virtual memory) as possible :-) It just flushes them too often from the temp
tables POV. And I'm not sure that temp page I/O should follow the careful
writes strategy. I have no other problems with CCH. But I tend to think that
in this case CCH should have two independent page pools (for generic pages
and temp ones), because I'd expect these two buffers to be configured
independently.

> > But, regardless of the CCH usage, I consider the whole idea of storing
> > temp data inside the database via the existing PIO wrong, as it's just
> > provides the required semantics without any performance and/or cleanup
> > benefits. If the proper solution requires a separate page numbers space,
so
> > be it.
>
> Separate page space seems most attractively from performance
> POV for me.

Agreed.

> > 2) Data visibility
> >
> > I see two ways to allow per-session data visibility:
> >
> > 1) One TempSpace instance per attachment. It means that different
> > attachments work with different temporary files.
>
> In fact we can imagine 4 tempspace scope :
> a) one per temporary table instance
> b) one per attachment
> c) one per database
> d) one per engine
>
> Option d) seems to be not practically useful, at least with
> current engine implementation
>
> Option c) seems to be not very friendly to classic server, but
> allows to avoid frequent file creation\deletion
>
> Option a) is most granulated but add most overhead from file
> system
>
> So, i prefer option b) for CS and c) for SS, or one per engine
> process.

I'd stick to (b) for both architectures, just to unify the code. Note that
(b) doesn't require anything else to support attachment-level visibility per
se. No hidden columns, no transaction SBMs...

> > 2) Nickolay's ATTACHMENT_ID idea to add a hidden column to both data and
> > indices and teach the optimizer to filter the rows.
>
> I think this is not so good. It's seems to be easy to implement,
> but i showed even more easiest way. But this has one big disadvantage -
> performance. Clean up at startup is fast but we don't want to reload
engine
> just to empty temp tables ;) Regular clean up (sweep, garbage collection)
> is slow for temp tables and must be avoided, imho.

Your suggestion requires to collect txn id's on a per attachment basis. How
big will be such a SBM in the case of long 24x7 attachments and PRESERVE
ROWS option?

> And i can't see how Nickolay idea satisfy tables with ON COMMIT
> DELETE option.
>
> At last, different page space allows read-only database to work
> with temp data and still remains read-only - this can be important
> for some applications

Yep, this ability is important.


Dmitry