Subject | Re: [firebird-support] High CPU consumption |
---|---|
Author | Ann W. Harrison |
Post date | 2005-10-03T18:47:33Z |
kapsee wrote:
the removal of unnecessary versions of records - is an on-going process
in all Firebird databases. The expensive part of garbage collection is
not removing the data itself, but removing index entries. In
particular, removing index entries when the index contains thousands of
instance of that particular key value is very expensive.
The process that you can turn on and off is called "sweeping". A sweep
is an separate thread that reads every record in the database and
garbage collects unnecessary versions. Sweep also resets the oldest
interesting transaction marker by changing the state of transactions
from rolled back to committed once all their changes have been removed
from the database. Sweeping is expensive and should not be done during
high use periods. But it is not the only time garbage collection is done.
In Classic, garbage collection is performed by the transaction that
finds the unneeded data. Any time any transaction finds an unnecessary
record version it removes it, releasing the space it used for use in
storing new records. It also removes index entries for those record
versions.
SuperServer 1.0x and 1.5x have a separate garbage collect thread. When
a transaction encounters an unnecessary record version, it posts that
record to the garbage collect thread. When the garbage collect thread
runs, it attempts to remove the version and its index entries. Where
there are large numbers of duplicate values for the index entry being
removed, the cost of garbage collection can exceed the quantum of time
allocated to the garbage collect thread, so it must abandon its work.
That leads to lots of CPU use with not much progress to show for it.
In SuperServer version 2, some garbage collection is done immediately by
the transaction that encounters the unneeded record version - saving the
extra I/O required for the garbage collect thread to read the page.
More complex operations - where removing the unnecessary version would
involve additional I/O - are deferred to the garbage collect thread.
More significantly, I think, Version 2 changes the index structure and
makes garbage collecting index duplicates much less expensive.
Let me see if I can describe the problem simply. When you store a
record, it typically goes at the end of the table. Not always, because
space released by deleted records and garbage collected old versions
will be used first, but typically new records show up at the end of the
table. In Version 1.0x and 1.5x, duplicate index entries, on the other
hand, are stored at the beginning of the list of duplicates.
OK, so you've got two lists - record data is stored with the most recent
record last, duplicate index entries are stored with the most recent
record first.
You decide to update or delete a group of records. No problem, that's
fast and doesn't affect the indexes. When another transaction starts to
read the table - whether in natural order or by index - it will read the
table in storage order - effectively oldest first. So the oldest
records are removed first. But their index entries are at the end of
the list of duplicates. Finding that entry requires reading the whole
list of duplicates. Removing the oldest entry in a list of 32,000
duplicates requires checking 32,000 index entries. Removing the second
oldest requires checking 31,999 index entries, and so on...
Version 2 indexes order duplicates by the record number of the record to
be deleted. The garbage collect thread can look up the entry to be
removed by a combination of the value - not very selective - and the
record number - very selective. For the purpose of garbage collection,
all indexes are unique. Instead of examining 32,000 entries, the
garbage collector goes directly to the one it needs to remove.
And that's a huge saving.
Regards,
An
> My application is using the C interface to execute queries againstThe problem is likely to be garbage collection. Garbage collection -
> firebird most of which are updates. I find that firebird consumes a
> lot of CPU while this is happening.
the removal of unnecessary versions of records - is an on-going process
in all Firebird databases. The expensive part of garbage collection is
not removing the data itself, but removing index entries. In
particular, removing index entries when the index contains thousands of
instance of that particular key value is very expensive.
The process that you can turn on and off is called "sweeping". A sweep
is an separate thread that reads every record in the database and
garbage collects unnecessary versions. Sweep also resets the oldest
interesting transaction marker by changing the state of transactions
from rolled back to committed once all their changes have been removed
from the database. Sweeping is expensive and should not be done during
high use periods. But it is not the only time garbage collection is done.
In Classic, garbage collection is performed by the transaction that
finds the unneeded data. Any time any transaction finds an unnecessary
record version it removes it, releasing the space it used for use in
storing new records. It also removes index entries for those record
versions.
SuperServer 1.0x and 1.5x have a separate garbage collect thread. When
a transaction encounters an unnecessary record version, it posts that
record to the garbage collect thread. When the garbage collect thread
runs, it attempts to remove the version and its index entries. Where
there are large numbers of duplicate values for the index entry being
removed, the cost of garbage collection can exceed the quantum of time
allocated to the garbage collect thread, so it must abandon its work.
That leads to lots of CPU use with not much progress to show for it.
In SuperServer version 2, some garbage collection is done immediately by
the transaction that encounters the unneeded record version - saving the
extra I/O required for the garbage collect thread to read the page.
More complex operations - where removing the unnecessary version would
involve additional I/O - are deferred to the garbage collect thread.
More significantly, I think, Version 2 changes the index structure and
makes garbage collecting index duplicates much less expensive.
Let me see if I can describe the problem simply. When you store a
record, it typically goes at the end of the table. Not always, because
space released by deleted records and garbage collected old versions
will be used first, but typically new records show up at the end of the
table. In Version 1.0x and 1.5x, duplicate index entries, on the other
hand, are stored at the beginning of the list of duplicates.
OK, so you've got two lists - record data is stored with the most recent
record last, duplicate index entries are stored with the most recent
record first.
You decide to update or delete a group of records. No problem, that's
fast and doesn't affect the indexes. When another transaction starts to
read the table - whether in natural order or by index - it will read the
table in storage order - effectively oldest first. So the oldest
records are removed first. But their index entries are at the end of
the list of duplicates. Finding that entry requires reading the whole
list of duplicates. Removing the oldest entry in a list of 32,000
duplicates requires checking 32,000 index entries. Removing the second
oldest requires checking 31,999 index entries, and so on...
Version 2 indexes order duplicates by the record number of the record to
be deleted. The garbage collect thread can look up the entry to be
removed by a combination of the value - not very selective - and the
record number - very selective. For the purpose of garbage collection,
all indexes are unique. Instead of examining 32,000 entries, the
garbage collector goes directly to the one it needs to remove.
And that's a huge saving.
Regards,
An