Subject Re: High CPU consumption
Author kapsee
Thanks for the detailed response. When will version 2 be available ?
I am on 1.5 currently.

--- In firebird-support@yahoogroups.com, "Ann W. Harrison"
<aharrison@i...> wrote:
> kapsee wrote:
> > My application is using the C interface to execute queries
against
> > firebird most of which are updates. I find that firebird
consumes a
> > lot of CPU while this is happening.
>
> The problem is likely to be garbage collection. Garbage
collection -
> the removal of unnecessary versions of records - is an on-going
process
> in all Firebird databases. The expensive part of garbage
collection is
> not removing the data itself, but removing index entries. In
> particular, removing index entries when the index contains
thousands of
> instance of that particular key value is very expensive.
>
> The process that you can turn on and off is called "sweeping". A
sweep
> is an separate thread that reads every record in the database and
> garbage collects unnecessary versions. Sweep also resets the
oldest
> interesting transaction marker by changing the state of
transactions
> from rolled back to committed once all their changes have been
removed
> from the database. Sweeping is expensive and should not be done
during
> high use periods. But it is not the only time garbage collection
is done.
>
> In Classic, garbage collection is performed by the transaction
that
> finds the unneeded data. Any time any transaction finds an
unnecessary
> record version it removes it, releasing the space it used for use
in
> storing new records. It also removes index entries for those
record
> versions.
>
> SuperServer 1.0x and 1.5x have a separate garbage collect thread.
When
> a transaction encounters an unnecessary record version, it posts
that
> record to the garbage collect thread. When the garbage collect
thread
> runs, it attempts to remove the version and its index entries.
Where
> there are large numbers of duplicate values for the index entry
being
> removed, the cost of garbage collection can exceed the quantum of
time
> allocated to the garbage collect thread, so it must abandon its
work.
> That leads to lots of CPU use with not much progress to show for
it.
>
> In SuperServer version 2, some garbage collection is done
immediately by
> the transaction that encounters the unneeded record version -
saving the
> extra I/O required for the garbage collect thread to read the
page.
> More complex operations - where removing the unnecessary version
would
> involve additional I/O - are deferred to the garbage collect
thread.
>
> More significantly, I think, Version 2 changes the index structure
and
> makes garbage collecting index duplicates much less expensive.
>
> Let me see if I can describe the problem simply. When you store a
> record, it typically goes at the end of the table. Not always,
because
> space released by deleted records and garbage collected old
versions
> will be used first, but typically new records show up at the end
of the
> table. In Version 1.0x and 1.5x, duplicate index entries, on the
other
> hand, are stored at the beginning of the list of duplicates.
>
> OK, so you've got two lists - record data is stored with the most
recent
> record last, duplicate index entries are stored with the most
recent
> record first.
>
> You decide to update or delete a group of records. No problem,
that's
> fast and doesn't affect the indexes. When another transaction
starts to
> read the table - whether in natural order or by index - it will
read the
> table in storage order - effectively oldest first. So the oldest
> records are removed first. But their index entries are at the end
of
> the list of duplicates. Finding that entry requires reading the
whole
> list of duplicates. Removing the oldest entry in a list of 32,000
> duplicates requires checking 32,000 index entries. Removing the
second
> oldest requires checking 31,999 index entries, and so on...
>
> Version 2 indexes order duplicates by the record number of the
record to
> be deleted. The garbage collect thread can look up the entry to
be
> removed by a combination of the value - not very selective - and
the
> record number - very selective. For the purpose of garbage
collection,
> all indexes are unique. Instead of examining 32,000 entries, the
> garbage collector goes directly to the one it needs to remove.
>
> And that's a huge saving.
>
> Regards,
>
>
> An