Subject | IBSERVER.EXE at 100% CPU usage for a long time - a bug |
---|---|
Author | Christian Gütter |
Post date | 2001-06-12T02:41:26Z |
Hi,
this is an unresolved issue which was discussed at
the Mers interbase list in the last week.
I am reposting it here because I hope we could find a
solution here - unfortunately, the discussion on the
Mers list was stuck.
In short terms: Some users (including me) have confirmed
that the ibserver process under Win32 sometimes uses 100%
of CPU time (maybe after doing some backups) and does not
stop doing this until it is restarted.
I hope some of you have the time to read this and have a
clue how to solve this.
Beneath you will find the dicussion from mers with the very
important messages marked by exclamation marks.
I hope someone can help us.
TIA,
Christian
Christian
wrote: -----------------------------------------------------------------
Hi,
we are using InterBase/x86/Windows NT Version "WI-V6.0.0.627" on a
NT Server (PII/400, 256 MB RAM) with about 5-15 Clients connected.
Since December, it ran fine, but two weeks ago the whole server appeared
to respond very slowly. Looking closer at it, I saw that the
"ibserver.exe"
was using 100% of the cpu since about 30 minutes.
I kicked out the users, shutdown the database and restarted it and
everything
was fine.
Yesterday, I had to reboot the server and today was the second time IB
was
behaving like described above.
The DB has a size of 10 MB, Forced Writes are on, there are no
consistency
errors
and no errors in the server log (except some "INET/inet_error: read
errno =
10054",
which I suppose is quite normal).
Of course, this is very annoying because a service behaving like this on
a
server
causes a lot of havoc.
So I wonder if someone could give me a hint what could be wrong?
TIA,
Christian
Thomas
wrote:
--------------------------------------------------------------------
Possibly you haven't enough disk space on the partition where the
database
file resides or on the partition where Interbase creates temp files.
Christian
wrote:
-------------------------------------------------------------------
Thomas,
thanks for your answer.
There are about 400 MB of free diskspace.
This should be enough for a 10 MB database, shouldn't it?
But anyway, I will free up about 2 GB of disk space to be
secure.
Any other objections?
Ded
wrote:
---------------------------------------------------------------------
Hi, Christian.
1. When you last time restored database from backup?
2. As far as I know, IB can be overloaded by:
2.1. Inaccurate select (missed join condition and so on).
2.2. Garbage collection after mass delete.
2.3. Sweep automatically started in inconvenient time.
I usually turn auto-sweep off and make it when no users are
working.
3. Work on slightly corrupted database.
Classic IB servers much better keeps overload, but only Superserver
architecture is available for Win.
Best regards.
Thomas
wrote:
-------------------------------------------------------------------
But on a database with only 10MB?
Thomas
Ded
wrote:
---------------------------------------------------------------------
Really. :) I forgot when I last time saw such one.
Best regards.
Tobias
wrote:
---------------------------------------------------------------------
can still have thousands of rows with which IB obviously got entangled
somehow
Tobias
wrote:
---------------------------------------------------------------------
Hi!
have you tried a backup and restore?
Christian
wrote:
-------------------------------------------------------------------
I backup the database regularly,
but I never restored it.
Maybe it's the time for a restore ...
Christian
Christian answered to
Ded: --------------------------------------------------------
Ded,
Perhaps I should do it.
But I don't think the sweep problem occurs with such small DBs (10 MB).
In addition to that, there are no tables larger than 7.000 records.
Backup works, too.
I will check if a DB restore reveals some error...
Thank you for the response.
I will try a restore.
Perhaps an Interbase guru (like Ann Harrison) is reading this and will
bring forward a new, unconsidered point (as she often does) ...
Louis
wrote:
---------------------------------------------------------------------
I frequently see 100% utilization when the query plan that IB comes up
with
isn't quite the best plan. Sometimes the IB-generated plan is wrong due
to
index statistics being incorrect. Index statistics can be corrected by
restoring a backup or by using
ALTER INDEX idxName INACTIVE;
ALTER INDEX idxName ACTIVE;
Usually this kind of thing is present from the beginning, but maybe your
data has just grown to the point where a bad plan is causing it to
thrash.
Good luck.
Louis Kleiman
SSTMS, Inc.
Ded
wrote:
---------------------------------------------------------------------
Hi Louis. Smallest database I ever had deal was 170Mb and it was
empty, so
I
can't shurely speak here, BUT ON 10Mb??? It should be in cash entirely
and
any
bad query should'nt overload server for significant time, am I wrong?
Best regards.
Thomas
wrote:
---------------------------------------------------------------------
Christian,
I've read about Interbase locations running years without doing a
backup/restore. Which data access components do you use?
BDE/FIB+/IBX/IBO
...?
Have a look on the Oldest Active (OAT), Oldest Interesting (OIT) and
Next
Transaction. You can get these information with IBConsole. Are there big
gaps between OAT/Next or OIT/Next?
Ded
replied:
-------------------------------------------------------------------
Thomas, he have auto-sweep on 20000 and I doubt one of his users don't
turn
off computer for weeks. Somewhat mysterious...
[Annotation by me: that's right. Nobody of my users keeps the computer
turned on
for more than two days ...]
Christian
replied:
-------------------------------------------------------------------
Hi Thomas,
yesterday I restored the DB, turned off autosweep.
Up till now no 100% CPU usage ...
E.g. today:
Oldest transaction 5892
Oldest active 5893
Oldest snapshot 5892
Next transaction 5939
Generally, the gap is always smaller then 1000.
Christian
JAC2
wrote:
---------------------------------------------------------------------
New databases that have never been restored with data are likely to have
very poor stats on indices, leading to poor query plans. 100% normally
comes about through large deletes (but you said this is not possible) or
queries that blow the flacky query optimiser into touch.
Q's
How many users
Does EVERYONE access the DB through applications that you control the
SQL
for (e.g. some "power-user" with a copy of Crystal reports killing the
server with home grown queries)?
What difference has the Restore made?
Extract some of the queries that are running that include multi-table
joins,
execute them in Wisql / Marathon / IBExpert & see what the plans look
like.
We have a large (6GB) DB with lots of users, I track the time it takes
to
locate records for some of the main tables, log it & display it in a
graph.
We have over 50,000,000 lines of audit data. We have found that by
backing
up and restoring the DB once in a while really improves perfomance (due
to
old stuck transactions being left in the database). Restarts do help,
but
if the DB has grown from just meta-data, it needs a backup & restore.
Auditing the SQL people execute is a must to solve these problems, I'm
sure
it will make its way in FB one day.
I have seen small db's hang for 15 minutes with >5 table joins that look
like they should work fine.
Fabrice wrote:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!
Hi,
We are currently developing an application which also provides a user
interface for backup/restore.
(Firebird 0.9.4xxx / Win2000)
The test DB is smaller (about 1,5Mb), rebuilt from scratch 10 times a
day
(test phase).
However, when launching the backup process (a .bat file doing a gbak
-b),
the IBSERVER.EXE very often (say, once every 10 tries) uses 100% CPU.
I then wait for many minutes, nothing happens, the only thing I can do
is to
stop the IB service.
I have no idea what is happenning.
For now it is not my main problem, but it will surely become in a few
weeks...
Christian
replied:
-------------------------------------------------------------------
Hi, Fabrice,
do you mean the IBSERVER.EXE is at 100% when backing up and
backing up never finishes?
Or has backup finished and IBSERVER.EXE still uses 100% CPU?
(This is at least how it seems to happen on my server ...)
Christian
Fabrice
replied:
-------------------------------------------------------------------
Christian,
Yes, backup has finished and IBSERVER.EXE still uses 100% CPU.
And then, there is nothing to do but stop IB services and restart it.
Fabrice
replied:
-------------------------------------------------------------------
Hi, Fabrice. All this sounds like SuperServer bug. I leaved SS namely
for
problems of this kind, but it was Linux and half year ago. Poor Windows
users,
you can't go to Classic... :)
Best regards.
Martin wrote:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!
Hi,
I recognize that same behaviour from when I was testing our
backup/restore
routine
some time ago. When running backup and restore several times it behaved
like
Fabricio says: "(say, once every 10 tries) uses 100% CPU". That was with
IB
5.6,
and small test databases (only a couple of megs).
Can someone confirm if this is a known bug, and if so, what can be done
to
prevent
it from happening?
I'm one of the 'Poor Windows users'... ;-(
Thanks in advance,
Martin
Christian wrote:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!
Now, with a fresh, restored 9 GB DB without errors and autosweep turned
off,
the error occured again after the server was rebooted.
Could it have to do with the order the NT services are starting.
E.g. the IBSERVER.EXE starts too early before another service it
(slightly)
depends on is started? And so the Server gets confused and CPU usage
is at 100%?
This time, it occured with only two users online and these users had
only executed
one query which returns records from a table that consists of about 25
records.
So it cannot be a bad plan or something like that ...
Any opinions?
Tobias
wrote:
---------------------------------------------------------------------
Hi!
maybe it would make a difference if IB is not run as a server, but as an
application? I guess it's worth a try ...
[to be continued... hopefully ...]
this is an unresolved issue which was discussed at
the Mers interbase list in the last week.
I am reposting it here because I hope we could find a
solution here - unfortunately, the discussion on the
Mers list was stuck.
In short terms: Some users (including me) have confirmed
that the ibserver process under Win32 sometimes uses 100%
of CPU time (maybe after doing some backups) and does not
stop doing this until it is restarted.
I hope some of you have the time to read this and have a
clue how to solve this.
Beneath you will find the dicussion from mers with the very
important messages marked by exclamation marks.
I hope someone can help us.
TIA,
Christian
Christian
wrote: -----------------------------------------------------------------
Hi,
we are using InterBase/x86/Windows NT Version "WI-V6.0.0.627" on a
NT Server (PII/400, 256 MB RAM) with about 5-15 Clients connected.
Since December, it ran fine, but two weeks ago the whole server appeared
to respond very slowly. Looking closer at it, I saw that the
"ibserver.exe"
was using 100% of the cpu since about 30 minutes.
I kicked out the users, shutdown the database and restarted it and
everything
was fine.
Yesterday, I had to reboot the server and today was the second time IB
was
behaving like described above.
The DB has a size of 10 MB, Forced Writes are on, there are no
consistency
errors
and no errors in the server log (except some "INET/inet_error: read
errno =
10054",
which I suppose is quite normal).
Of course, this is very annoying because a service behaving like this on
a
server
causes a lot of havoc.
So I wonder if someone could give me a hint what could be wrong?
TIA,
Christian
Thomas
wrote:
--------------------------------------------------------------------
Possibly you haven't enough disk space on the partition where the
database
file resides or on the partition where Interbase creates temp files.
Christian
wrote:
-------------------------------------------------------------------
Thomas,
thanks for your answer.
There are about 400 MB of free diskspace.
This should be enough for a 10 MB database, shouldn't it?
But anyway, I will free up about 2 GB of disk space to be
secure.
Any other objections?
Ded
wrote:
---------------------------------------------------------------------
Hi, Christian.
1. When you last time restored database from backup?
2. As far as I know, IB can be overloaded by:
2.1. Inaccurate select (missed join condition and so on).
2.2. Garbage collection after mass delete.
2.3. Sweep automatically started in inconvenient time.
I usually turn auto-sweep off and make it when no users are
working.
3. Work on slightly corrupted database.
Classic IB servers much better keeps overload, but only Superserver
architecture is available for Win.
Best regards.
Thomas
wrote:
-------------------------------------------------------------------
But on a database with only 10MB?
Thomas
Ded
wrote:
---------------------------------------------------------------------
Really. :) I forgot when I last time saw such one.
Best regards.
Tobias
wrote:
---------------------------------------------------------------------
can still have thousands of rows with which IB obviously got entangled
somehow
Tobias
wrote:
---------------------------------------------------------------------
Hi!
have you tried a backup and restore?
Christian
wrote:
-------------------------------------------------------------------
I backup the database regularly,
but I never restored it.
Maybe it's the time for a restore ...
Christian
Christian answered to
Ded: --------------------------------------------------------
Ded,
> 1. When you last time restored database from backup?I never did since December 2000.
Perhaps I should do it.
> 2. As far as I know, IB can be overloaded by:This did not happen, I'm sure. I have only got very simple queries.
> 2.1. Inaccurate select (missed join condition and so on).
> 2.2. Garbage collection after mass delete.There were no mass deletes.
> 2.3. Sweep automatically started in inconvenient time.The sweep interval is 20000.
> I usually turn auto-sweep off and make it when no users are
> working.
But I don't think the sweep problem occurs with such small DBs (10 MB).
In addition to that, there are no tables larger than 7.000 records.
> 3. Work on slightly corrupted database.The database check tells me everything is OK.
Backup works, too.
I will check if a DB restore reveals some error...
> Classic IB servers much better keeps overload, but only SuperserverHmmm ...
> architecture is available for Win.
Thank you for the response.
I will try a restore.
Perhaps an Interbase guru (like Ann Harrison) is reading this and will
bring forward a new, unconsidered point (as she often does) ...
Louis
wrote:
---------------------------------------------------------------------
I frequently see 100% utilization when the query plan that IB comes up
with
isn't quite the best plan. Sometimes the IB-generated plan is wrong due
to
index statistics being incorrect. Index statistics can be corrected by
restoring a backup or by using
ALTER INDEX idxName INACTIVE;
ALTER INDEX idxName ACTIVE;
Usually this kind of thing is present from the beginning, but maybe your
data has just grown to the point where a bad plan is causing it to
thrash.
Good luck.
Louis Kleiman
SSTMS, Inc.
Ded
wrote:
---------------------------------------------------------------------
Hi Louis. Smallest database I ever had deal was 170Mb and it was
empty, so
I
can't shurely speak here, BUT ON 10Mb??? It should be in cash entirely
and
any
bad query should'nt overload server for significant time, am I wrong?
Best regards.
Thomas
wrote:
---------------------------------------------------------------------
Christian,
I've read about Interbase locations running years without doing a
backup/restore. Which data access components do you use?
BDE/FIB+/IBX/IBO
...?
Have a look on the Oldest Active (OAT), Oldest Interesting (OIT) and
Next
Transaction. You can get these information with IBConsole. Are there big
gaps between OAT/Next or OIT/Next?
Ded
replied:
-------------------------------------------------------------------
Thomas, he have auto-sweep on 20000 and I doubt one of his users don't
turn
off computer for weeks. Somewhat mysterious...
[Annotation by me: that's right. Nobody of my users keeps the computer
turned on
for more than two days ...]
Christian
replied:
-------------------------------------------------------------------
Hi Thomas,
yesterday I restored the DB, turned off autosweep.
Up till now no 100% CPU usage ...
> I've read about Interbase locations running years without doing aI use IBO, and for 6 months, it rocked.
> backup/restore. Which data access components do you use?
> BDE/FIB+/IBX/IBO
> ...?
> Have a look on the Oldest Active (OAT), Oldest InterestingThere are generally no big gaps (thanks to IBO).
> (OIT) and Next
> Transaction. You can get these information with IBConsole.
> Are there big
> gaps between OAT/Next or OIT/Next?
E.g. today:
Oldest transaction 5892
Oldest active 5893
Oldest snapshot 5892
Next transaction 5939
Generally, the gap is always smaller then 1000.
Christian
JAC2
wrote:
---------------------------------------------------------------------
New databases that have never been restored with data are likely to have
very poor stats on indices, leading to poor query plans. 100% normally
comes about through large deletes (but you said this is not possible) or
queries that blow the flacky query optimiser into touch.
Q's
How many users
Does EVERYONE access the DB through applications that you control the
SQL
for (e.g. some "power-user" with a copy of Crystal reports killing the
server with home grown queries)?
What difference has the Restore made?
Extract some of the queries that are running that include multi-table
joins,
execute them in Wisql / Marathon / IBExpert & see what the plans look
like.
We have a large (6GB) DB with lots of users, I track the time it takes
to
locate records for some of the main tables, log it & display it in a
graph.
We have over 50,000,000 lines of audit data. We have found that by
backing
up and restoring the DB once in a while really improves perfomance (due
to
old stuck transactions being left in the database). Restarts do help,
but
if the DB has grown from just meta-data, it needs a backup & restore.
Auditing the SQL people execute is a must to solve these problems, I'm
sure
it will make its way in FB one day.
I have seen small db's hang for 15 minutes with >5 table joins that look
like they should work fine.
Fabrice wrote:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!
Hi,
We are currently developing an application which also provides a user
interface for backup/restore.
(Firebird 0.9.4xxx / Win2000)
The test DB is smaller (about 1,5Mb), rebuilt from scratch 10 times a
day
(test phase).
However, when launching the backup process (a .bat file doing a gbak
-b),
the IBSERVER.EXE very often (say, once every 10 tries) uses 100% CPU.
I then wait for many minutes, nothing happens, the only thing I can do
is to
stop the IB service.
I have no idea what is happenning.
For now it is not my main problem, but it will surely become in a few
weeks...
Christian
replied:
-------------------------------------------------------------------
Hi, Fabrice,
do you mean the IBSERVER.EXE is at 100% when backing up and
backing up never finishes?
Or has backup finished and IBSERVER.EXE still uses 100% CPU?
(This is at least how it seems to happen on my server ...)
Christian
Fabrice
replied:
-------------------------------------------------------------------
Christian,
Yes, backup has finished and IBSERVER.EXE still uses 100% CPU.
And then, there is nothing to do but stop IB services and restart it.
Fabrice
>Ded
replied:
-------------------------------------------------------------------
Hi, Fabrice. All this sounds like SuperServer bug. I leaved SS namely
for
problems of this kind, but it was Linux and half year ago. Poor Windows
users,
you can't go to Classic... :)
Best regards.
Martin wrote:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!
Hi,
I recognize that same behaviour from when I was testing our
backup/restore
routine
some time ago. When running backup and restore several times it behaved
like
Fabricio says: "(say, once every 10 tries) uses 100% CPU". That was with
IB
5.6,
and small test databases (only a couple of megs).
Can someone confirm if this is a known bug, and if so, what can be done
to
prevent
it from happening?
I'm one of the 'Poor Windows users'... ;-(
Thanks in advance,
Martin
Christian wrote:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!
Now, with a fresh, restored 9 GB DB without errors and autosweep turned
off,
the error occured again after the server was rebooted.
Could it have to do with the order the NT services are starting.
E.g. the IBSERVER.EXE starts too early before another service it
(slightly)
depends on is started? And so the Server gets confused and CPU usage
is at 100%?
This time, it occured with only two users online and these users had
only executed
one query which returns records from a table that consists of about 25
records.
So it cannot be a bad plan or something like that ...
Any opinions?
Tobias
wrote:
---------------------------------------------------------------------
Hi!
maybe it would make a difference if IB is not run as a server, but as an
application? I guess it's worth a try ...
[to be continued... hopefully ...]