Subject Re: [firebird-support] Oldest Snapshot and Performance Issues
Author Helen Borrie
At 06:51 AM 25/10/2005 +0000, you wrote:
>We are using Classic server 1.52 on a Linux 8 Gb dual CPU opteron 242
>with Windows clients. There are about 110 users with 450 connections
>at peak times. Here is the output of a gstat -h taken during the day:
>Database header page information:
> Flags 0
> Checksum 12345
> Generation 572988
> Page size 4096
> ODS version 10.1
> Oldest transaction 22951
> Oldest active 319143
> Oldest snapshot 317998
> Next transaction 572981
> Bumped transaction 1
> Sequence number 0
> Next attachment ID 0
> Implementation ID 19
> Shadow count 0
> Page buffers 1000
> Next header page 0
> Database dialect 3
> Creation date Oct 23, 2005 22:24:16
> Attributes
> Variable header data:
> Sweep interval: 0
> *END*
>My question is why is the "Oldest snapshot" considerably less than the
>"Oldest active"? My understanding is that the "Oldest snapshot" should
>be close to the "Oldest Active".

Not necessarily. The Oldest Snapshot is transaction number of the Oldest
Active as it was the last time garbage collection was done. You have your
sweep interval set to 0, so the only way your oldest snapshot is going to
advance is either to run a manual sweep or free up some interesting
transactions so that cooperative GC has something to work on.

>Also, assuming we can improve the transaction movement, what else can
>we do to improve performance? We expect 20-30% growth in the number of
>users per year. Based on the recent article on the Lock Manager in
>"The Interbase and Firebird Developer Magazine" we have increased the
>lock manager settings to the maximum with considerable improvement
>(hash lengths have gone from min 700, avg 800, and max 900, to min 20,
>avg 40, or max 60, and the mutex wait is down to 2.5%). Is there
>anything else?

Possibly find some way to bring the number of users and the number of
connections closer together? Unlike Superserver (which shares page cache
dynamically) Classic allocates a static lump of page cache for each
connection. You have an unusually high cache allocation set (4K * 1000
pages, or 4 Mb per connection). Currently your peak time load has 1.8 Gb
of RAM tied up in page caches. Was this 1000 pages decision based on any
load/test metrics, or was it set on the assumption that "more is better"?