Subject Re: [IB-Architect] trouble with sweep
Author Ivan Prenosil
> From: Pavel Cisar
> Very true, and a reason for GC thread to exist. Actually, this beast has
> two heads. First is slow GC thread that falls behind his duty, and second
> is unnecessary full chain traversal in subsequent reads.

I think that traversing _long_ version chains causes problems
only in relatively rare situation when rows are updated repeatedly,
and there is a long running transaction that needs to go to tail
of chain to get versions old enough.
(But everybody avoids long running transactions, don't they ? :-)

More common (imho) is situation when you e.g. delete one million of records.
When new transaction tries to read that table, it does not need
to traverse to old versions, because it is interested only in
the newest versions; unfotunately, these versions are just stubs
saying "this record no longer exists", and there is a million
of such stubs.
Or situation when you update lot of records and the updated column
is indexed. Because indexes are not versions-aware, sparse bitmap
created from such index will contain more records that the transaction
can see.
E.g. there is a boolean indexed column (it is just example!)
with one row containing "1" and a million rows conatining "0".
Query for "1"s (using index) will be fast.
Now if the update changes "1"s to "0"s and "0"s to "1"s,
the index will contain "0" with a million(plus one) pointers to all
records, and "1" with a million(plus one) pointers to all records.

> If we can find a way
> how to stop the chain traversal so already detected dead versions are not
> read again, then most from performance hit of subsequent reads will go
> away. A flag in row header indicating a dead version working as backstop
> is the simplest and most obvious solution, but that would impose a page
> write (more likely a dirty page in cache that could be GC'ed before
> actual write happen, but anyway, it's not very nice idea).

What do you mean by "dead version" ?
If you mean version old enough that it can be garbage collected,
then there already is such flag - it is transaction-id.
If you mean version already reported to GC thread,
then you can't mark it on page, because the list held by GC
is volatile.