Subject Re: Oldest Snapshot and Performance Issues
Author Greg Kay
--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@t...>
wrote:
>
> At 06:51 AM 25/10/2005 +0000, you wrote:
> >Hi,
> >
> >We are using Classic server 1.52 on a Linux 8 Gb dual CPU opteron 242
> >with Windows clients. There are about 110 users with 450 connections
> >at peak times. Here is the output of a gstat -h taken during the day:
> >
> >Database header page information:
> > Flags 0
> > Checksum 12345
> > Generation 572988
> > Page size 4096
> > ODS version 10.1
> > Oldest transaction 22951
> > Oldest active 319143
> > Oldest snapshot 317998
> > Next transaction 572981
> > Bumped transaction 1
> > Sequence number 0
> > Next attachment ID 0
> > Implementation ID 19
> > Shadow count 0
> > Page buffers 1000
> > Next header page 0
> > Database dialect 3
> > Creation date Oct 23, 2005 22:24:16
> > Attributes
> >
> > Variable header data:
> > Sweep interval: 0
> > *END*
> >
> >My question is why is the "Oldest snapshot" considerably less than the
> >"Oldest active"? My understanding is that the "Oldest snapshot" should
> >be close to the "Oldest Active".
>
> Not necessarily. The Oldest Snapshot is transaction number of the
Oldest
> Active as it was the last time garbage collection was done. You
have your
> sweep interval set to 0, so the only way your oldest snapshot is
going to
> advance is either to run a manual sweep or free up some interesting
> transactions so that cooperative GC has something to work on.
>
> >Also, assuming we can improve the transaction movement, what else can
> >we do to improve performance? We expect 20-30% growth in the number of
> >users per year. Based on the recent article on the Lock Manager in
> >"The Interbase and Firebird Developer Magazine" we have increased the
> >lock manager settings to the maximum with considerable improvement
> >(hash lengths have gone from min 700, avg 800, and max 900, to min 20,
> >avg 40, or max 60, and the mutex wait is down to 2.5%). Is there
> >anything else?
>
> Possibly find some way to bring the number of users and the number of
> connections closer together? Unlike Superserver (which shares page
cache
> dynamically) Classic allocates a static lump of page cache for each
> connection. You have an unusually high cache allocation set (4K * 1000
> pages, or 4 Mb per connection). Currently your peak time load has
1.8 Gb
> of RAM tied up in page caches. Was this 1000 pages decision based
on any
> load/test metrics, or was it set on the assumption that "more is
better"?

The multiple connections are related to legacy applications which we
are slowly removing. Why will bringing down the connection count
improve performance?

At peak periods we are only using 5Gb of the 8Gb currently available
on the server. So we know we are not swapping to disk. It is
interesting that you say 1000 is a high setting for the buffers as we
had been thinking how low it was since it is only 20% of available
memory and about 90% of our queries are indexed retrievals. We have
another 4Gb of memory chips which we are thinking about putting in and
increasing the setting to 2000 or 3000 pages. If it would make more
sense to increase the sort space or one of the other settings we are
certainly up for suggestions. Assuming that we are not swapping to
disk in what case (other than the 10,000 buffer limit) would having a
larger number of pages slow us down?

We have tried to work out how to do an accurate load test but our
system is complex (300 to 400 forms and a few hundred reports) and 110
people can do a lot of different things. We thought about capturing a
days worth of queries via IBMonitor and writing a tool to process the
results but we were having some reliability problems with that much
data going through IBMonitor. Our current transaction load is about
700,000 transactions during the 12 hours of the business day.

>
> ./heLen
>

Greg