Subject | Re: [firebird-support] [FB 2.1] Firebird engine seems to slow down on high load without utilizing hardware |
---|---|
Author | Thomas Steinmaurer |
Post date | 2016-04-12T05:23:53Z |
Patrick,
setting of 20011. Mutex wait looks high to me.
Some ideas:
* Increase the hash slots value to 30011
* Get a picture on how many garbage (record versions) you create. AFAIR
it is the -r switch of gstat which gives you that information. Sudden
spikes in the statement response time could be related to co-operative
garbage collection in Classic/SuperClassic, where basically the
statement synchronously removes garbage of out-dated record versions
* Consider upgrading to 2.5. 2.1.7 is end-of-life and 2.5 improved in
the area of lock contention in Classic/SuperClassic substantially.
--
With regards,
Thomas Steinmaurer
http://www.upscene.com/
Professional Tools and Services for Firebird
FB TraceManager, IB LogManager, Database Health Check, Tuning etc.
> recently we had some strange performance issues with our Firebird DB server.fb_lock_print is reporting a hash slots value of 15077 but you show a
>
> On high load, our server started to slow down. Select and update SQL
> query times did go up by more than 500% on average,
>
> but reaching unreasonable high execution times at worst case. (several
> minutes instead of < 1sec)
>
>
> OIT/OAT/Next Transaction statistics was within 1000 the hole time
>
> We were not able to messure any hardware limiting factor. Indeed, this
> system was running with only 8 cores at about 70% CPU usage on max. load.
>
> We decided that this may be our problem since we experienced a similar
> problem at about 80% CPU load in the past.
>
> So we upgraded the hardware. As expected, the CPU-load dropped to ~35%
> usage on max. load scenario.
>
> But this did not solve the problem.
>
> Same story for the harddisk system. The usage is not even near it's max
> capacity.
>
>
> We also can't see any impact on the harddisk.
>
> We'r kind of stuck with our ideas, because we have no idea what could be
> a potential bottleneck to the system.
>
> Since the hardware doesn't show a limit, there have to be anything else
> - most likely firebird engine related that's limiting our system.
>
> We would be very grateful if anyone can give us hints where we can
> search further.
>
> Or someone has similar experiences to share with us.
>
>
>
> Operating System: Windows Server 2003
>
> Firebird: 2.1.5 Classic
>
> Dedicated database server (VMWare)
>
>
> CPU: 16 cores, each 2.4 GHz
>
> RAM: 32 GB
>
> About 14GB are used from OS and firebird processes under max load.
>
> HDD: SAN Storage System
>
>
> Average system IOPS under load read: 100
>
> Average system IOPS under load write: 550
>
> Backup Restore IOPS read: 1700
>
> Backup Restore IOPS write: 250
>
> SAN IPOS Limit (max): 3000
>
>
> Firebird Config Settings, based on defaults
>
> DefaultDbCachePages = 1024
>
> LockMemSize = 134247728
>
> LockHashSlots = 20011
>
> Database
>
> size: about 45 GB
>
> 450 to 550 concurrent connections
>
> Daily average of 65 transactions / second (peak should be higher)
>
>
> FB_LOCK_PRINT (without any params) while system was slowing down (~4
> days uptime).
>
> I have to note, Firebird was not able to print the complete output
> (stats was not cropped by me)
>
>
> LOCK_HEADER BLOCK
>
> Version: 16, Active owner: 0, Length: 134247728, Used: 82169316
>
> Semmask: 0x0, Flags: 0x0001
>
> Enqs: 4211018659, Converts: 10050437, Rejects: 9115488, Blocks: 105409192
>
> Deadlock scans: 1049, Deadlocks: 0, Scan interval: 10
>
> Acquires: 4723416170, Acquire blocks: 640857597, Spin count: 0
>
> Mutex wait: 13.6%
>
> Hash slots: 15077, Hash lengths (min/avg/max): 3/ 12/ 25
>
> Remove node: 0, Insert queue: 36, Insert prior: 74815332
>
> Owners (456):forward: 131316, backward: 14899392
>
> Free owners (9):forward: 39711576, backward: 49867232
>
> Free locks (42409):forward: 65924212, backward: 23319052
setting of 20011. Mutex wait looks high to me.
Some ideas:
* Increase the hash slots value to 30011
* Get a picture on how many garbage (record versions) you create. AFAIR
it is the -r switch of gstat which gives you that information. Sudden
spikes in the statement response time could be related to co-operative
garbage collection in Classic/SuperClassic, where basically the
statement synchronously removes garbage of out-dated record versions
* Consider upgrading to 2.5. 2.1.7 is end-of-life and 2.5 improved in
the area of lock contention in Classic/SuperClassic substantially.
--
With regards,
Thomas Steinmaurer
http://www.upscene.com/
Professional Tools and Services for Firebird
FB TraceManager, IB LogManager, Database Health Check, Tuning etc.