Subject Re: Delayed updates to databases (Urgent !!!!!!!!!!!!!!!!!!!!!!!!!!!!)
Author Ann W. Harrison
At 10:06 AM 6/5/2001 -0500, jolague@... wrote:

>About hardware, I don't think th origin of the problem resides here, our NT
>server is a Compaq Proliant with 1 GB RAM (it seems to be working fine),
>two P3 700MHz processors and 100 GB HD capacity.

Are you using the affinity program? InterBase is not at all good at
using a multi-processor. That's not your current problem, but it is
something to check.

>Originally, we had InterBase running over other server, without problems,
>and we moved it to his actual location on april 4th. Last night, trying to
>isolate the problem, we moved the database to the original server (which
>has no software or hardware changes) and the problem persists.

Did you copy the database, or back it up and restore it? It may well
be that some sort of error has introduced corruption - a copy will move
the problem with the database. Backup/restore (if successful) builds
a new database.

My advice is to backup and restore all the databases.

The rest of this message is commentary on the InterBase log file -
fascinating to me, but fumets to most.

Regards,

Ann

The first part of your log show lots of INET errors, predominantly
10054, with some 10053's and 10058 as seasoning. Those errors suggest
that users are leaving applications without closing the database,
which is bad practice, though it has nothing to do with the problem at
hand.

WSAECONNABORTED
(10053)
Software caused connection abort.
An established connection was aborted by the software in your host machine,
possibly due to a data transmission timeout or protocol error.

WSAECONNRESET
(10054)
Connection reset by peer.
A existing connection was forcibly closed by the remote host. This normally
results if the peer application on the remote host is suddenly stopped, the
host is rebooted, or the remote host used a "hard close" (see setsockopt
for more information on the SO_LINGER option on the remote socket.) This
error may also result if a connection was broken due to "keep-alive"
activity detecting a failure while one or more operations are in progress.
Operations that were in progress fail with WSAENETRESET. Subsequent
operations fail with WSAECONNRESET.

WSAESHUTDOWN
(10058)
Cannot send after socket shutdown.
A request to send or receive data was disallowed because the socket had
already been shut down in that direction with a previous shutdown call. By
calling shutdown a partial close of a socket is requested, which is a
signal that sending or receiving or both has been discontinued.

>You can see the interbase.log attached to this message, as you will see,
>starting may 29th there are some consistency errors related to other
>databases (DBCONFIG.GDB and DBCONF.GDB), but these databases are not
>related with the crashed database.

Whether or not the databases are related, a server crash can leave
trash in any database that the server has open. Collateral damage,
I guess it's called.


ENVATAP1 (Server) Thu May 31 16:51:23 2001
Fatal lock manager error: release when not active, errno: 0

This is a serious error and suggests a server bug - a bad server bug.
To translate into English, some thread said to the server, "Here, I
don't care about this page any more." The server said back, "That's
good, you never had it." Either the thread that wrote to the page
failed to get a latch on it - and possibly undid some other change
or wrote on the wrong page - or the server lost track of a latch and
the thread's changes have disappeared into the ether.

This error was followed by a bunch of clients complaining that
the server had exited ungracefully, and the guardian restarting
the server.


ENVATAP1 (Server) Thu May 31 17:02:21 2001
Database: F:\SIPV3R2\BASE DE DATOS\DBSIPXX.GDB
Record 121 has bad transaction 499423 in table MAQUINAESTADO (133)
...

ENVATAP1 (Server) Thu May 31 17:02:23 2001
Database: F:\SIPV3R2\BASE DE DATOS\DBSIPXX.GDB
Record 2299649 has bad transaction 499535 in table PRODUCTO (180)

These errors (and the others here) come from a database validation.
My guess would be that the page that disappeared was the header page.
The errors indicate that the next transaction on the header page had
not been updated correctly - it's stuck somewhere around 499400 while
the right answer is something like 499550

ENVATAP1 (Server) Thu May 31 17:02:24 2001
Database: F:\SIPV3R2\BASE DE DATOS\DBSIPXX.GDB
Index 7 is corrupt (missing entries) in table PRODUCTO (180)

This is also an error from a validation. Missing index entries sometimes
happen when a database crashes. They're not important because the missing
entries point to data created by transactions that died in the crash.


The error below is on a different database, and not from a validation. It
occurred because an index check failed to verify the existence of a primary
key to matche a foreign key. I haven't a clue why.

ENVATAP1 (Server) Thu May 31 18:37:28 2001
Database: F:\SIPV3R2\BASE DE DATOS\DBSIPXX01.GDB
internal gds software consistency check (referenced index
description not found (173))

The next error (in this message - there were a bunch of boring ones
in the actual log) is in a third database. What it means is that the
primary record version (newest) points back to an older version, but
the older version isn't there. That's why gbak has a "no garbage collect"
switch - to get around this type of corruption. It's common when forced
writes are off, and used to happen occasionally due to race conditions
in VIO, but they've all been beaten into submission and this database has
forced write enabled.

ENVATAP1 (Server) Fri Jun 01 12:04:09 2001
Database: F:\SIPV3R2\BASE DE DATOS\DBSIP.GDB
internal gds software consistency check (cannot find record back
version (291))

And here we are again with the lock manager error, followed by errors
similar to the ones already explicated.

ENVATAP1 (Server) Fri Jun 01 12:56:03 2001
Fatal lock manager error: release when not active, errno: 0

A tip page (transaction inventory page page) holds the state of a range
of transactions. Apparently this database lost one - probably the last
one.

ENVATAP1 (Server) Fri Jun 01 16:13:27 2001
Database: F:SIPV3R2\BASE DE DATOS\DBSIP.GDB
internal gds software consistency check (cannot find tip page (165))

The last interesting error is this - a data page had become full,
and the pointer page that locates it needed to be marked so other
transactions wouldn't look at that page if they need to store a
record version. Unfortunately, the pointer page had vanished.
That can't happen, because the only way to find a record is through
the pointer page. Pointer pages can go away, but only if they
become empty. This one isn't empty because there's still a full
page on it. Most mysterious.

ENVATAP1 (Server) Mon Jun 04 07:17:56 2001
Database: F:\SIPV3R2\BASE DE DATOS\DBSIP.GDB
internal gds software consistency check (pointer page vanished
from mark_full (256))

Regards,

Ann
www.ibphoenix.com
We have answers.