firebird-support - Re: High CPU consumption

Subject	Re: High CPU consumption
Author	kapsee
Post date	2005-10-04T16:53:45Z

Thanks for the detailed response. When will version 2 be available ?
I am on 1.5 currently.

--- In firebird-support@yahoogroups.com, "Ann W. Harrison"
<aharrison@i...> wrote:

> kapsee wrote:
> > My application is using the C interface to execute queries

against

> > firebird most of which are updates. I find that firebird

consumes a

> > lot of CPU while this is happening.
>
> The problem is likely to be garbage collection. Garbage

collection -

> the removal of unnecessary versions of records - is an on-going

process

> in all Firebird databases. The expensive part of garbage

collection is

> not removing the data itself, but removing index entries. In
> particular, removing index entries when the index contains

thousands of

> instance of that particular key value is very expensive.
>
> The process that you can turn on and off is called "sweeping". A

sweep

> is an separate thread that reads every record in the database and
> garbage collects unnecessary versions. Sweep also resets the

oldest

> interesting transaction marker by changing the state of

transactions

> from rolled back to committed once all their changes have been

removed

> from the database. Sweeping is expensive and should not be done

during

> high use periods. But it is not the only time garbage collection

is done.

>
> In Classic, garbage collection is performed by the transaction

that

> finds the unneeded data. Any time any transaction finds an

unnecessary

> record version it removes it, releasing the space it used for use

> storing new records. It also removes index entries for those

record

> versions.
>
> SuperServer 1.0x and 1.5x have a separate garbage collect thread.

When

> a transaction encounters an unnecessary record version, it posts

that

> record to the garbage collect thread. When the garbage collect

thread

> runs, it attempts to remove the version and its index entries.

Where

> there are large numbers of duplicate values for the index entry

being

> removed, the cost of garbage collection can exceed the quantum of

time

> allocated to the garbage collect thread, so it must abandon its

work.

> That leads to lots of CPU use with not much progress to show for

it.

>
> In SuperServer version 2, some garbage collection is done

immediately by

> the transaction that encounters the unneeded record version -

saving the

> extra I/O required for the garbage collect thread to read the

page.

> More complex operations - where removing the unnecessary version

would

> involve additional I/O - are deferred to the garbage collect

thread.

>
> More significantly, I think, Version 2 changes the index structure

and

> makes garbage collecting index duplicates much less expensive.
>
> Let me see if I can describe the problem simply. When you store a
> record, it typically goes at the end of the table. Not always,

because

> space released by deleted records and garbage collected old

versions

> will be used first, but typically new records show up at the end

of the

> table. In Version 1.0x and 1.5x, duplicate index entries, on the

other

> hand, are stored at the beginning of the list of duplicates.
>
> OK, so you've got two lists - record data is stored with the most

recent

> record last, duplicate index entries are stored with the most

recent

> record first.
>
> You decide to update or delete a group of records. No problem,

that's

> fast and doesn't affect the indexes. When another transaction

starts to

> read the table - whether in natural order or by index - it will

read the

> table in storage order - effectively oldest first. So the oldest
> records are removed first. But their index entries are at the end

> the list of duplicates. Finding that entry requires reading the

whole

> list of duplicates. Removing the oldest entry in a list of 32,000
> duplicates requires checking 32,000 index entries. Removing the

second

> oldest requires checking 31,999 index entries, and so on...
>
> Version 2 indexes order duplicates by the record number of the

record to

> be deleted. The garbage collect thread can look up the entry to

> removed by a combination of the value - not very selective - and

the

> record number - very selective. For the purpose of garbage

collection,

> all indexes are unique. Instead of examining 32,000 entries, the
> garbage collector goes directly to the one it needs to remove.
>
> And that's a huge saving.
>
> Regards,
>
>
> An