Subject Re: [Firebird-Architect] RFC: Proposal for the implementation
Author Vlad Horsun
> > Separate page space seems most attractively from performance POV for
> me.
>
> For me either, but to making it useful is not that simple.

I don't name it simple

> For example on Windows creating temporary files is horrifyingly slow and
> also tends to fragment the filesystem, you can see this with on-disk
> sorting module.

For this reason i considered different tempspace scopes

> One big temporary space storage has better performance properties, but
> it needs management and is not much better than normal database in that
> regard.

I'm afraid we don't have nothing better for that. About management - it's
seems to be much simple than DPM+PAG.

> Also, collapsing per-attachment temporary space is going to introduce
> difficult issues which are going to translate to bad performance.
> Imagine you create a temporary blob, which is the normal way you go to
> assign values to blob fields.
>
> When you create it you don't know which table it is going to be attached
> to. You may pass blob to view or stored procedure and don't know the
> destination table at all. What tablespace should engine use to store
> this blob?

Can't say right now. I must to teach himself how blobs are handled

> > > 2) Nickolay's ATTACHMENT_ID idea to add a hidden column to
> > both data
> > > and indices and teach the optimizer to filter the rows.
> >
> > I think this is not so good. It's seems to be easy to
> > implement, but i showed even more easiest way.
>
> Your approach is not easier, at least indices code would have to be
> revamped and also would require special threatment of temporary tables
> in GC and sweep handlers.

In my approach only VIO_chase_record_version needs to be
modified. We simple must understand - what is a "garbage".
Canonical definition is : garbage is that record versions which not seen by
any transaction and is deleted or is not a primary version (Jim will correct me
if i'm wrong ;). We determine first part of that definition by comparing
transaction
number of record version with oldest snapshot.

In case of temporary tables we must add to this definition :
a) ON COMMIT DELETE tables - all records created by transactions
which are not live now (this records can't have secondary versions)
b) ON COMMIT PRESERVE tables - primary record versions created
by transactions which attachments are not live now

With this definition we don't need to physically DELETE unneeded
records at transaction\attachment end. GC will remove such records
whenever it will see it and at engine startup we can drop all temporary
tables data

> > But this has one big disadvantage - performance. Clean up at startup
> is
> > fast but we don't want to reload engine just to empty temp
> > tables ;) Regular clean up (sweep, garbage collection) is
> > slow for temp tables and must be avoided, imho.
>
> We do GC for normal tables anyways, temp tables are not going to change
> the picture significantly.

And this is bad. Since we know that data is temporary and know
exactly when it's became unneeded, we must use this to improve
performance

> They are not going to be slower than normal tables and we'll have
> motivation to optimize normal access paths, not one special case of
> temporary tables.

This task is needed and not related to temp tables only, right ? ;)

> BTW, note that due to multiple transactions allowed for one connection
> update conflicts are possible for temporary tables and the most of the
> versioning, conflict resolution and GC logic of CCH, DPM and VIO modules
> still applies.

This is why i even didn't examine another ways to manage records.
For example like in external tables and so on

> > And i can't see how Nickolay idea satisfy tables with ON
> > COMMIT DELETE option.
>
> Do what's asked. Delete rows from temp table in DFW handler at commit
> time.

Why ? Why we need to do unneeded work with a lot of IO ?

> We are going to implement deferred constraints

This is first time i hear that

> so this is not going to change the big performance picture either.

Sorry, i don't see relation of deferred constraints with temporary tables
and it's performance

> > At last, different page space allows read-only database
> > to work with temp data and still remains read-only - this can
> > be important for some applications
>
> For those special applications NBACKUP allows to have read-only main
> database file and redirect writes to separate read-write storage. :-)

Hmm... you joke, right ? There are many applications like catalog of spare
parts or libraries or phone book etc which distributed on the CD\DVD with big
database on it. This applications often used embedded engine. Ability to perform
complex queries with temporary data is very desirable for its developers

> The benefit of my approach is that it would allow to share 99% of
> implementation with normal tables motivating us to optimize caching
> logic, data placement logic and access paths for entire engine, not only
> for temporary tables.

If you approach allows share 99% of code than mine allows 99.5% ;)
But i still prefer more complex but more faster page spaces

Regards,
Vlad