Subject Re: [firebird-support] Re: One table become very slow
Author Helen Borrie
At 17:45 13/09/2008, you wrote:
>Thanks for your post.
>--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@...> wrote:
>> >total versions = the total combined number of current and garbage
>versions of records in the table
>> >max versions = the number of record versions on the "worst" record.
>>
>> But they are all pretty awful - averaging around 3000 versions per
>record
>
>I quite don`t understand what is "worst" record and what could be
>typical scenarious for those 3000 versions per record?

Firebird has a "multi-generational architecture" which is a lot of polysyllabic words describing a system where it is possible for multiple "generations" of the same record to exist simultaneously. Only one record is the "current record". However, older versions ('generations") of the same record are retained in the database as long as there are transactions alive that are "interested" in that record.

Once all transactions that have an interest in the record have been committed, these obsolete record versions are flagged for garbage collection. Next time a GC operation occurs, they will be cleared and the space that they occupied will be released for re-use.

In a database where transactions are managed carefully and committed in a timely manner, it is not uncommon for perhaps 5-20 versions to exist for a few minutes, or a little longer if the client is a web application that does not monitor its clients (usually from 2-3 hours using Windows defaults for TCP/IP client timeout). This management depends entirely on how well the application code controls the life of transactions. In the best-written applications, there will be no visible garbage.

The "typical" scenarios where your kind of problem is manifested would be apps written that do not commit transactions; or that use Commit with Retain (a.k.a. Commit Retaining); or that do (or neglect) operations in such a way that failed commits are not detected and attended to.

>It will be difficult to catch the problem because I don`t fully
>understand how garbage is built/cleaned up.

I hope you do understand now.

>Recently I added field
>that is constantly updated - timestamped. Could it be the problem?

It could indeed. In Firebird, you do not update fields, you update records. If you have a web app that is constantly writing new versions of the same records without committing the changes then you will get this problem.

>It`s kinda silly... It is fairly basic Apache/PHP application.

The database doesn't care whether it is a "fairly basic Apache/PHP application" or a complex system of layers and layers. What matters to the Firebird engine is the requests that are passed to it through the API functions. Your Firebird php driver is an interface between some php layer and the Firebird API (fbclient.dll on Windows).


>Maybe should I change something with page size or buffers?

Why? It's not page size or cache that are causing your problems and there is no magic bullet that will fix your transaction management problems.

>Recenlty I
>also did full metadata dump and recreated database "from scratch" - to FB version 2.1.1 and pumped data back.

That's a good way to start the day with a clean database. However, it won't stop the garbage from building up again. Note that you will have the same effect from backing up and restoring the database, with a lot less labour. But, love or hate the idea, you have to fix the application.

>>
>> For now, you can have a short-term solution by using Firebird's own
>tools to try to reduce the garbage buildup. Either get everyone off
>once a week and run gbak or a sweep; or run a weekly
>backup-and-restore cycle the -g flag to suppress garbage collection.
>This addresses symptoms, not cause, so you've got to find the rogue
>application code that is causing this.
>
>Yes, but I`m affraid thats not the option, beaouse as you mentioned -
>this addressess symptoms.

Sure, it addresses symptoms. This seems to be your immediate need, until you find and fix the code that is causing the symptoms. Like taking a cough syrup won't cure pneumonia but the patient is relieved of the incessant coughing for a period.

>> This is the same header output you showed us two days ago.
>
>Yes I`ts the same, but I grabbed it after all clients were shut down,
>so I suppose nothing big has changed.

The statistics change *constantly*. The various structures underlying those statistics are updated each time the engine receives a signal that a garbage collection has been done.

>It was taken two days ago, not
>after gbak or sweep. However I had run gbak and stopped after couldn`t
>wait for it to finish. Should I gstat again?

Until you fix your problems it will be helpful to run a gstat several times a day during production activity - and find a good way to retain and make sense of the numbers over the course of a day, a week, whatever.

./heLen