Subject | RE: [firebird-support] Periodic database slowdown - troubleshooting steps? |
---|---|
Author | Bob Murdoch |
Post date | 2012-09-18T02:17:17Z |
Thomas -
-----Original Message-----
From: firebird-support@yahoogroups.com
[mailto:firebird-support@yahoogroups.com] On Behalf Of Thomas
Steinmaurer
was a default on an earlier version of the server that just carried
over. I'll use gfix to use reserve to at least deal with those tables
that are emptied and overwritten regularly.
backup/restore cycle is taking more than 24 hours, and at best we only
have a 12 hour window at best on a Sunday. Hence the May 2009
creation date of the current DB.
the gbak -b can't use the service switch. Since we are restoring
locally on the second server I could use that switch, but instead we
are using the embedded gbak. Using embedded is definitely faster than
regular gbak -c, but I'm curious as to whether -service is faster. I
would assume that they are probably about the same.
so no goodies like SSD's.
Unfortunately, there is overlap because the sweep usually takes about
2.5 hours.
Thanks,
Bob M..
-----Original Message-----
From: firebird-support@yahoogroups.com
[mailto:firebird-support@yahoogroups.com] On Behalf Of Thomas
Steinmaurer
>> 1): The most obvious thing according to the header page is a verylarge
>> gap between the oldest active transaction and the next transaction.This
>> means, you have a long-running/stuck transaction. If you are lucky,you
>> can go into the MON$TRANSACTIONS table and check out if you findthe
>> MON$TRANSACTION_ID for 41467442. "Lucky", because I saw occasionswhere
>> the OAT according to the header page isn't available in themonitoring
>> tables. Perhaps some client (ETL tool?) doesn't behave well from aNo such luck - 42450558 is the earliest of the 29 records listed.
>> client transaction management POV.
>> 2): Although you say you aren't in an OLTP pattern here, I guessdue to
>> ETL, it isn't a read-only database, right? If so, running thedatabase
>> in "no reserve" mode isn't a good idea, because, basically you arethe
>> telling Firebird to not reserve space for back record version on
>> same data page as the primary record version. This results in morereads
>> from disk, especially in a reporting scenario where you havecan be
>> long-running read-write transactions/queries, where concurrent
>> read/write requests generate a longer back record chain until it
>> removed via co-operative GC (the only GC mode in CS).I have definitely never used the "no reserve" option. I wonder if it
was a default on an earlier version of the server that just carried
over. I'll use gfix to use reserve to at least deal with those tables
that are emptied and overwritten regularly.
>> While gfix can be used to remove the "no reserve" thing, thisdoesn't change the layout of
>> already allocated data pages. If you have a maintainence window, Iwould
>> go with a backup/restore cycle to re-build the database with"reserve"
>> (the default, btw, thus you don't have to provide anything specialfor
>> that) from scratch. Might be a challenge for a 90GB database and asmall
>> maintenance window.That has been a problem for a very long time. Right now, a full
backup/restore cycle is taking more than 24 hours, and at best we only
have a 12 hour window at best on a Sunday. Hence the May 2009
creation date of the current DB.
>> A few tricks to shorten the offline window:That's a good trick, but since we are backing up to a seperate server
>>* Run both, backup and restore through the services API. When using
>> gbak, this can be done via the -service switch. This results in not
>> going through the TCP stack, which can improve performance a lot.
the gbak -b can't use the service switch. Since we are restoring
locally on the second server I could use that switch, but instead we
are using the embedded gbak. Using embedded is definitely faster than
regular gbak -c, but I'm curious as to whether -service is faster. I
would assume that they are probably about the same.
>> * Backup the database with the -g option, because this suppressgarbage
>> collection in the source databaseThis is standard practice when planning on replacing the database.
>> * If enough RAM is available, restore the database with a MUCHhigher
>> page buffers value as 2048, because this can speed up indexcreation
>> during a restore a lot. E.g. 100000, with a page size of 8K, thismeans
>> ~800MB RAM for the page cache for this single restore connectiononly.
>> Use it with caution and don't forget to set it to the originalvalue
>> after the restore!!!Good suggestion, I'm going to try that tonight.
>> * If you have a spare SSD, even if it is only a cheap consumer SSD,make
>> use of it for both, backup and restore.Unfortunately it's a corporate datacenter with fixed configurations,
so no goodies like SSD's.
>> 3:) As you are talking about reporting, make use of read-onlymight
>> transactions. Even better would be a combination of read-only
>> transaction in read committed isolation mode, but read committed
>> be problematic in a reporting scenario, when you need a stablesnapshot
>> of the underlaying data for the period of report generation.Very good points!
>> 4:) Keep an eye on the fb_lock_print output to possibly increasethe
>> default hash slot value.to
>> 5:) Try to run gfix -sweep at a time, when there is zero or close
>> zero load.Yes, we run it at night just before the backup kicks off.
Unfortunately, there is overlap because the sweep usually takes about
2.5 hours.
Thanks,
Bob M..