Subject Re: [firebird-support] Re: One table become very slow
Author Helen Borrie
At 08:33 13/09/2008, you wrote:
>What does it mean here

>total versions = the total combined number of current and garbage versions of records in the table

>and

>max versions = the number of record versions on the "worst" record.

But they are all pretty awful - averaging around 3000 versions per record

>Whats wrong about that?

Your table is choked with garbage. Every access to the table, especially the "worst" records, is ploughing through pages and pages of garbage to find the current records.

>How could it be fixed/avoided?

Don't keep stopping gbak when it's in the process of garbage-collecting your problem table!!! It hasn't "hung", it is just very, very busy. If you terminate gbak, all that GC work will be rolled back.

Go into firebird.conf and set GCPolicy to "background". This will get you back to the GC conditions that you had under v.1.5. It won't stop your garbage buildup problem but it will stop your users getting hit with cooperative GC.

Find and fix the application code that is allowing this garbage to build up. If, as you said, this only began recently, then you can probably find the source of the problem fairly easily....or maybe you have a user that is constantly "monitoring" this table by watching it via a utility program in a read/write transaction...or maybe the problem goes back the whole 7 years and has only begun to show itself under the 'combined' gc policy.

Don't use file copy to back up the database and don't let any OS-based file-copy utilities include it in their purview. Same goes for any AV program - don't let them touch your database file.

For now, you can have a short-term solution by using Firebird's own tools to try to reduce the garbage buildup. Either get everyone off once a week and run gbak or a sweep; or run a weekly backup-and-restore cycle the -g flag to suppress garbage collection. This addresses symptoms, not cause, so you've got to find the rogue application code that is causing this.

>Header:
>
>Database header page information:
>Flags 0
>Checksum 12345
>Generation 44503
>Page size 4096
>ODS version 11.1
>Oldest transaction 44493
>Oldest active 44494
>Oldest snapshot 44494
>Next transaction 44495
>Bumped transaction 1
>Sequence number 0
>Next attachment ID 12040
>Implementation ID 16
>Shadow count 0
>Page buffers 0
>Next header page 0
>Database dialect 3
>Creation date Aug 25, 2008 21:57:39
>Attributes force write
>
>Variable header data:
>Sweep interval: 20000
>*END*

This is the same header output you showed us two days ago. It looks like one that was done immediately after a gbak or sweep or, indeed, just after some user had just copped a successful cooperative GC and it is not consistent with your detail stats for that CLIENTS table. Run gstat -h when the problem is showing itself - that will tell you what you need to know.

And make a point of running the detailed stats on this table over days, between the housekeepings, to figure out how fast this garbage is accumulating.

./heLen