Subject Re: [firebird-support] Re: Fixing Database
Author Helen Borrie
At 11:24 AM 13/03/2005 +0000, salisburyproject wrote:

>sorry if I wrongly presented the exact case.
>I am still worried about the situation when if such corruption occurs
>I cannot give solution to the customer, without contacting third
>party. I will need to clarify this to them - the choice of FB and
>Windows is their's.
>
>So, I will present the case from begining.
>
>Windows 2000, SP4.

What version AND model (Classic/Superserver/Embedded) of Firebird are you
using?

>Plenty of disk space, memory, etc.. Not a resource problem.

If your ORDER BY and GROUP BY queries are storing huge temp files that are
not getting deleted, then your resources might not be as plentiful as you
think.
-- Firebird 1.5 tries to maintain sort files in memory and will only resort
to writing them to disk if memory gets too low.
-- Firebird 1.0 always writes sort files to disk

Now --- if the corrupting operation is writing sort files, one could guess
that your insert operation is of the form

insert into blah (<list of columns>)
select list of columns from sourcetable
order by ...

which would cause sort files to be written to disk in Fb 1, and would cause
them to be paged to disk in Fb 1.5 if memory were inadequate. But then -
if the operation appears to finish, you close the statement, then close the
connection, and are left with sort files on disk, then what REALLY happened
is that the server crashed for some external cause that it couldn't detect
for itself. Your description of path management (further down) does
nothing to eliminate the possibility that your path configuration has
enabled this to happen. See later comments.

>The Database was created with FB's isql.
>
> >From gstat -h:
>Attributes force write
>
>It has 4 tables and has no Stored Procedures.
>Two generators are used for insertion of unique row ID in two tables.
>5 indices - 2 on varchar fields, 3 on int fields.

Describe the sizes and segments of the indices.


>The application uses FB's ODBC driver to connect (via ADO).

Version?

>The DB path is kept in registry and is relative to the app location
>(e.g. no connection with different paths can be made).

Firebird isn't a file-served database: you do not connect to databases by
way of relative paths.
-- What does your application do with the path it extracts from the
Registry, to make it a legal path for the server?
-- What connection protocol is specified in your DSN?
-- What *exactly* does your connection string look like when the connection
is made?

>DB is local (not remote, shared, mapped, etc. drive)

But it's being accessed via a relative path, right? What is the network
protocol (if you didn't answer this earlier)?


>The application migrates external data (from DB2 and text files) to
>FB. Each record needs processing and updating by a human. A temp
>table is used to store modified data,

Describe what you mean by "temp table".
-- How is the temp table created?
-- When is it created?

What is the human doing? i.e. is this an interactive application? or is
the human driving a client app that consumes a lot of resource, e.g. loops
through a dataset of 50,000 rows? repaints a row counter at each iteration?

>which after being validated by
>the application logic is inserted into FB. Simple inserts statements.

Give an exact sample of the SQL that performs the inserts. Also describe
how you are handling transactions.

>When the corruption occured, the DB was updated with ~50K rows,
>growing up to 234 Mb on disk.

What batch sizes are the commits of these ~50K rows broken up into? (By
common practice, it should be around 8K rows per transaction; less if the
rows are large; possibly more if the rows are small).

>The application was closed properly (I can tell this from internal
>logs), including proper connection closing (sure about this!).

By "internal logs", do you mean firebird.log? (interbase.log if you are
using Fb 1.0). Or an application log that merely records what the
application attempted to do?


>I then tried to connect again from a small viewer application and got
>the error. Tried with isql - same.

Using exactly what connection string? Report the exact string that you
used in the viewer and with isql. I mean - EVERYTHING - including the host
name.

>gfix runs with -v -f -i, but
>doesn't fix the Database.
>
>Running IBFirstAid, I found that a page is beyond the end of file,
>i.e. some data was not written to the disk.
>
>Looked in Windows log for some problems - nothing.
>
>Now, something I mentioned in the first post:
>There were 4 huge files (~500 Mb) in the Winnt/Temp dirctory, called
>fb_sort_xxx

Check the times on these files. As mentioned before, if the connection is
supposedly closed, then these files should be gone. If they persist, then
EITHER the server crashed (look at the firebird log) OR the connection is
still live and the commit is incomplete.


>All my classes have error handling, and if an error is returned by
>the DB, it is written to a log. But there was no error at all while
>inserting data.
>
>What I did until now was to use the data in temp table and repeat the
>process on a new DB and all went well. It is almost the same size
>(few bytes difference) from the corrupted one.
>
>I don't know what other details I can provide here...
>I am sure that in the process of inserting data there was no error
>and no other connection to the DB.

However, if the connection was live and a transaction was uncommitted when
you used your viewer, isql or gfix to connect to the database using a
compromised path, then the compromised connection would see a corrupt
database.

There are still some essential facts missing here. Please review this
posting carefully and provide the requested details.


>The Windows host has no backup software at all, there were no other
>(except the usual) processes running. I checked for viruses with
>Norton and AVG - nothing.

Set virises aside for now. Let's address the connection issues as
precisely as possible.

./hb