firebird-support - Re: [firebird-support] Connection errors

Subject	Re: [firebird-support] Connection errors
Author	Helen Borrie
Post date	2004-10-25T04:37:16Z

At 10:07 AM 25/10/2004 +0800, you wrote:

>Hi,
>
>I'm having connection problems on my firebird 1.5.1.
>Is there any way I can remedy this problem?
>These are the logs (take note that errors happen almost every second):
>
>iedb1 Mon Oct 25 09:58:09 2004
> INET/inet_error: receive in try_connect errno = 104

(lots of 'em)

>And below is the database details:
>SQL> show database;
>Database: /var/firebird/db/erp_sd.fdb
> Owner: SYSDBA
>PAGE_SIZE 8192
>Number of DB pages allocated = 579302
>Sweep interval = 0
>Forced Writes are ON
>Transaction - oldest = 119009
>Transaction - oldest active = 33478512
>Transaction - oldest snapshot = 33476670
>Transaction - Next = 33648887
>Default Character set: NONE
>SQL>

As Alan suggested, this is a very DIRTY database. It has uncollected
garbage going back to the year "dot" (see a difference of more than 33
MILLION between the oldest [interesting] and the oldest active
transactions). Somewhere in there you have an enormous gap of who knows
what rolled-back, deleted and limbo garbage. Then, between the oldest
snapshot (the transaction following the "high-water mark" when garbage
collection was last done) and the oldest active (i.e. uncommitted)
transaction there is a gap of 1900 - that's approximately how many recent
transactions are in an uncommitted state.

Then, between the oldest active and the next one which will happen is a gap
of about 170,000 transactions. That means you have that many transactions
piling up data in the transaction inventory manager in memory (the cause of
the server crashing) and a great deal of garbage accumulated on disk.

To fix the database now, I suggest either an immediate sweep. That is
going to take quite a while, possibly days if that first gap represents a
huge amount of garbage. Then do the stats on the database to see what
improvements you have. Set the sweep interval to 20,000 (use gfix -h 20000).

Alternatively, do a backup with the -g switch (to prevent garbage
collection) and restore -c[reate_database] to a new database file. It's
going to take a long time, too, but it might actually be faster than a
sweep in your case, and you will have a squeaky-clean new database afterwards.

As Alan suggested, look hard at your application code. There are a LOT of
transactions not getting committed in there. It's the worst set of
transaction statistics I've ever seen.

Until you get your head around your application problems, for heaven's sake
set the sweep interval to around 20,000, schedule a weekly (or more
frequent) backup WITHOUT the -g switch AND plan to sweep that database once
a day.

./heLen