firebird-support - Re: [firebird-support] Firebird slowdowns while resources are available

Subject	Re: [firebird-support] Firebird slowdowns while resources are available
Author	Zsazsi
Post date	2014-02-11T09:51:28Z

Hi Alexey,

I've sent a mail directly to You about this particular case asking for personal assistance.

Any suggestions or hints are welcome if someone else is willing to participate.

Best Regards

Zsolt

2014-02-11 9:58 GMT+01:00 Alexey Kovyazin <ak@...>:

Hi Zoltan,

this problem is often topic here.
Solving this problem is a long boring process, where community will try to explain you basics about transactions and transaction markers meaning, ask you to send gstat statistics, then explain about record versioning and garbage and long-running transactions, about advanced monitoring and so on.
In the end you will land with big3 - long-running transactions, bad plans and wrong configuration, each is another big topic .

In short, your database size looks pretty serious to afford professional services
http://www.firebirdsql.org/en/professional-support/

Regards,
Alexey Kovyazin
IBSurgeon
PS If somebody here wants to go through steps described above, please don't hesitate to do this.

Hi,

I have a strange slowdown problem with a Firebird database. During the
daily use of the database the clients experience significant slowdowns
while the system still have lots of resources available. Some
information about the environment:

- 64 bit Firebird 2.5.2 server running in SuperServer mode
- the database is running on a 64 bit Windows 2008 R2 server OS
- the server OS is running in a VMware 4.1 VM with 4 CPU cores and 16 GB of RAM
- the database size is around 37 GB and the number of concurrent
connections to the database is around 150.

While observing the slowdowns:

- the CPU usage on the machine is between 40-60% without higher spikes
and the load is nicely distributed among all 4 cores
- the memory usage of the server is around 4-6 GB and the rest of the
memory is used as OS cache
- the disc queue lenghts almost never go over 0.3 with around 2-5 ms latency
- there is almost no network activity on the server.

Still, the slowdown seems to be linked to the general load on the
server. During the night when no users are connected to the
database/no background jobs are running a test query used for
reference executes in 4-5 seconds while during the day when all the
users are connected to the database executing the same reference query
requires 60+ seconds to finish. It should be added though that the
slowdown is general in nature, there are no specific queries that are
slower while the server is under load, everything gets slower within
the specific Firebird database. The server has other databases with a
very low number of transactions executed daily and these other
databases show no sign of slowdowns. I even created a copy of the live
database experiencing slowdowns and executed the same query against
both the originaly and the duplicate database - the original did
execute the query slow and hte duplicate fast. The only difference
between the original and the duplicate I know is the number of
connected users/concurrent transactions.

As I found no evident reasons of all these in the available OS
resources so I tried to fetch statistics from Firebird.

The observations:

- at peak times the databse has 30-40 transactions running in parallel
according to mon$statements (where mon$state == 1 which according to
the archives means the transactions are running or are waiting for a
lock)
- fb_lock_print displays the following about the database:

LOCK_HEADER BLOCK
Version: 145, Active owner: 0, Length: 2097152, Used: 1335440
Flags: 0x0001
Enqs: 9993237, Converts: 93191, Rejects: 1417230, Blocks: 2
Deadlock scans: 0, Deadlocks: 0, Scan interval: 10
Acquires: 19972846, Acquire blocks: 0, Spin count: 0
Mutex wait: 0.0%
Hash slots: 1009, Hash lengths (min/avg/max): 0/ 2/ 7
Remove node: 0, Insert queue: 0, Insert prior: 0
Owners (38): forward: 20824, backward: 872088
Free owners (126): forward: 973360, backward: 728016
Free locks (370): forward: 852200, backward: 195936
Free requests (12425): forward: 614608, backward: 1230536
Lock Ordering: Enabled

Here I noted that the "rejects" field accounts for ~14% of "enqs"
field but unfortunately I don't know the exact meaning of these
values. I guess about 14% of the lock requests are rejected for some
reason but I might be completely wrong.

So the questions:

- How should the output of fb_lock_print interpreted in this case? Are
these numbers "wrong" in some sense? Can they be improved by some
parameter tuning?
- What additional steps should be taken to pinpoint what causes the slowdowns?

Thanks,
Zoltan