Subject Re: [firebird-support]
Author Ann W. Harrison
Kevin Hamilton wrote:
>
> 1) In a gstat -h, what is the difference between "Oldest Active" and
> "Oldest snapshot"?

The oldest active is the oldest transaction that was active when any
currently active transaction started. That's important because versions
created by that transaction and newer transactions are visible to some
active transaction and can not be garbage collected.


The oldest snapshot is a a problem. We' spent about two weeks tracing
it through the code on the devel list, and it appears that when the
oldest active is updated, the previous value for the oldest active
becomes the oldest snapshot. I don't know why it's kept or why it's
used instead of the oldest-active to limit garbage collection. Normally
it lags the oldest active by two or three transactions and is not a
problem. The stats you provided below, however, are really troubling...

> For instance, why might I see the following?
> Oldest transaction 2896976
> Oldest active 2896977
> Oldest snapshot 2853644
> Next transaction 2897181

I don't understand how the oldest snapshot could be more than forty
thousand older than the oldest active. Sometimes that sort of thing is
a bug in the reporting program... (she says, clutching at straws).

> 2) Oldest transaction means "oldest interesting transaction"

Yes.

> and is used for auto-sweeping, right?

No. The oldest interesting transaction is the oldest transaction that
did not commit. Every running transaction must look for data created by
uncommitted transactions - if those transactions rolled back, the
current transaction will remove their changes. If the transaction that
created the data is active, the current transaction will read a back
version.


> a) I don't need to be concerned if the Oldest Transaction is very
> old, do I? Like the following?
> Oldest transaction 38879
> Oldest active 293593
> Oldest snapshot 293585
> Next transaction 293606

That's not a serious problem - on old machines with tiny memory, keeping
an array of two bits per transaction for three hundred thousand
transactions was a problem. Sweeping from time to time will move the
oldest interesting transaction along. As long as all the changes made
by a transaction are removed, it's state can be reset to committed.
That's what sweep does.

> 3) In a gstat -r -a, what is the difference between "total versions" and
> "max versions"?

Easy. The total versions is the total number of back versions of
records in that table. The max versions is the largest number of back
versions of any single record. When you seen one record with 30 back
versions, you know you've got a hot-spot or seriously delayed garbage
collection, or probably both.

> a) Does the average version length tell me anything useful?

It tells you how much space is being used by back versions - and is an
indication of the type of changes made. Firebird stores back versions
as deltas when possible. A delta isn't a full record, but instructions
that tell the system how to transform the current record into its
previous state. For example, start at byte 23 and replace the next 12
bytes with zeros. There are three cases where deltas aren't used.

1) when the size of the delta is longer than the size of the full record
2) when the delta would be longer than 255 bytes (that's a mistake and
ought to be changed at some point - the size was picked in 1985 and
things have changed.
3) when a transaction modifies a single record twice. That's a long story.

> b) I was under the impression that having an old "Oldest Snapshot"
> would mean I would also have a high max versions, but the database cited
> in (1) doesn't seem to.

It depends on the activity level. If your database is all insert/read
and never has a rollback, you'll never have any back versions.

> Here's the only interesting lines from that
> database:
> TABLE_Q
> Average version length: 14.99, total versions: 4327, max
> versions: 51

A max version of 51 is breathtaking. Fifty-one transactions have
modified a single record, and there's still some transaction around that
needs the oldest version. Is it a gateway record?

> TABLE_X
> Average version length: 42.16, total versions: 650, max versions:
> 1

no problem.
>
> 4) What is the difference between total dup, and max dup for indexes?

Same as above, total is the aggregate for the index, max is the max for
any single key value.


> I had a high max dup on the tables cited in (3b), so I did a compound
> index with the primary key. But I still have a high "total dup"?

Gstat will report duplicates on unique indexes - that's one of the
oddities of multi-generational indexes. No one transaction can see more
than one instance of each value, but different transactions can see
different instances. I wouldn't expect a lot of duplicates, but quite
possibly a record is being deleted, then another with the same key value
is created over and over ....

> INDEX_Q
> Average data length: 0.00, total dup: 2379, max dup: 28
> INDEX_X
> Average data length: 0.00, total dup: 29593, max dup: 32

Average data length shows that prefix compression is working well, and
that there's a high percentage of duplicate values. Neither of the max
dup values are in the critical range.

Hope this helps,


Ann