Subject Re: [IB-Architect] trouble with sweep
Author Pavel Cisar
Ivan,

On 20 Oct 2002 at 22:00, Ivan Prenosil wrote:

> I think that traversing _long_ version chains causes problems
> only in relatively rare situation when rows are updated repeatedly,
> and there is a long running transaction that needs to go to tail
> of chain to get versions old enough.
> (But everybody avoids long running transactions, don't they ? :-)

Unfortunately, very long chains are created not only by bad design
(especially when an update happen as a part of insert), but also when GC
dead versions are not purged by GC thread quickly enough. This is not
related to long running transactions, as I mean a long chain that
consists mostly from dead (no longer needed) versions.

> More common (imho) is situation when you e.g. delete one million of records.
> When new transaction tries to read that table, it does not need
> to traverse to old versions, because it is interested only in
> the newest versions; unfotunately, these versions are just stubs
> saying "this record no longer exists", and there is a million
> of such stubs.

True. The mass delete is a worst scenario not just due to needed skip
over the deleted rows, but also because although deleted-row stubs are
small, deleted row is stored in full (no gain from diff is possible) as a
back version and many of them very likely span to other pages when you
delete many rows on single page. The longer the row is, the worst outcome
one would get.

> Or situation when you update lot of records and the updated column
> is indexed. Because indexes are not versions-aware, sparse bitmap
> created from such index will contain more records that the transaction
> can see.

Yep, another problem, but we have to live with it.

> What do you mean by "dead version" ?
> If you mean version old enough that it can be garbage collected,
> then there already is such flag - it is transaction-id.
> If you mean version already reported to GC thread,
> then you can't mark it on page, because the list held by GC
> is volatile.

Dead version = no longer needed version. The problem here is that the
back-version chain is always evaluated in full length to detect these
versions, although they were already reported to GC thread and thus
evaluation of these versions is unnecessary (and costly when they are on
several other pages, fragmented etc.). The row access code has no
intelligence here, because the old in-place GC was changed only slightly
(do not do GC directly, but report it to separate GC thread), so until
the version is actually purged by GC, any row access would go and report
what was already detected and reported, because it can't determine
beforehand if a version was already marked for GC.

Best regards
Pavel Cisar
http://www.ibphoenix.com
For all your upto date Firebird and
InterBase information