firebird-support - Re: [firebird-support] Problem with FB database that freezes

Subject	Re: [firebird-support] Problem with FB database that freezes
Author	Mark Rotteveel
Post date	2015-07-24T06:40:08Z

On 24-7-2015 00:23, conversar@... [firebird-support] wrote:

> Thanks for your insightful response. FWIW, I would like to mention that,
> in the same server, we have another database (same size ~7 GB) no one
> connects to, it's a restore of the production database from January this
> year. This database works perfectly even when the production database is
> down. We try only a few test connections though.
>
> Below is some of the requested information, at a time when the
> production database performance is normal.
>
> Firebird.conf:
> ---------------------
>
> DefaultDbCachePages = 1024
> #FileSystemCacheThreshold = 65536 (commented out)
> #FileSystemCacheSize = 0 (commented out)
>
>
> Server environment:
> --------------------------
>
> CPU utiliza tion: 11%
> Memory utilization: 11 GB (out of 32)
>
> Note.- Even when the DB performance is down, this values are in the same
> range or even lower. No swapping.
>
> gstat output (normal performance):
> ---------------------------------------------------------
>
> Database header page information:
> Flags 0
> Checksum 12345
> Generation 19572161
> Page size 16384
> ODS version 11.2
> Oldest transaction 18709808
> Oldest active 18953295
> Oldest snapshot 18851591
> Next transaction 19520857

The large transaction gap indicates that you have long running
transactions, which can lead to performance problems due to garbage
accumulation.

> Bumped transaction 1
> Sequence number 0
> Next attachment ID 50438
> Implementation ID 26
> Shadow count 0
> Page buffers 3000

This might be a bit high for Classic. This means that each connection
can take 47 MB in cached pages. However with 32 GB available, that might
not be that relevant.

> Next header page 0
> Database dialect 1
> Creation date Jul 7, 2015 7:00:57
> Attributes no reserve

As already noted by Thomas: don't use "no reserve" (from the gstat
manual: "All pages will be filled to 100% and will be most useful on
read-only databases. No space is reserved in each page for updates
and/or deletions.")

> Variable header data:
> Database backup GUID: {BF8D26E0-970E-431A-7FAD-E2D9BDB2E4DA}
> Sweep interval: 0
> *END*
>
> Note.- We seep the database manually each night.
>
> fb_lock_print output (normal performance):
> ----------------------------------------------------------------
>
> LOCK_HEADER BLOCK
> Version: 145, Active owner: 0, Length: 28311552, Used: 27588104
> Flags: 0x0001
> Enqs: 69364533, Converts: 192066, Rejects: 36029, Blocks: 282250
> Deadlock scans: 7, Deadlocks: 0, Scan interval: 10
> Acquires: 77720068, Acquire blocks: 2159883, Spin count: 0
> Mutex wait: 2.8%
> Hash slots: 1009, Hash lengths (min/avg/max): 51/ 66/ 81
> Remove node: 0, Insert queue: 0, Insert prior: 0
> Owners (145): forward: 441288, backward: 98120
> Free owners (11): forward: 24695928, backward: 23070064
> Free locks (2963): forward: 22024, backward: 27499760
> Free requests (42905): forward: 22145288, backward: 25253392
> Lock Ordering: Enabled

You need to increase the value of LockHashSlots in firebird.conf as the
hash length is rather long.

> Firebird.log (IBMCASA is the server's host name)
> ------------------------------------------------------------------
>
> The log is literally FULL of 10053 and 10054 error entries like the
> following:
>
> IBMCASA Thu Jul 23 10:27:27 2015
> Unable to complete network request to host "IBMCASA".
> Error writing data to the connection.
>
>
> IBMCASA Thu Jul 23 10:27:29 2015
> Unable to complete network r equest to host "IBMCASA".
> Error reading data from the connection.
>
>
> IBMCASA Thu Jul 23 10:27:30 2015
> INET/inet_error: read errno = 10054
>
>
> According to the log, this errors seems to be happening every second or
> every few seconds/minutes, since March 8 2014 and until today even as
> I'm writing this. Each day, this errors stop at 11:49 PM when the last
> users stop working on the client apps, then they'll start again every
> morning at 6:00 AM when the first client apps connect to the database.

Error 10054 is connection reset by peer, it means that the connection
was terminated without properly signalling a connection close to the
server. This might indicate a problem in the application: not properly
closing connections, or applications being closed/killed before the
connection could be closed properly. Combined with Error 10053 it might
mean that you are also using events and that the server tries to notify
a client of an event, when the client is no longer there.

It would still be interesting to see the values when there is a
performance problem.

Mark
--
Mark Rotteveel