Subject | RE: [IB-Architect] GBAK processing |
---|---|
Author | Leyne, Sean |
Post date | 2000-04-26T18:35:36Z |
Markus,
In general, I agree with you regarding the most painful part being the
restore.
In the vast majority of cases (98% or more), however, it is the backup
process which is run and, therefore, most likely to affect user access
and system performance. In some cases, the length of time for a backup
impact of the DBA ability to run the backup (in cases where multiple
backups are scheduled/run each and every day).
My initial concern is with improving the performance of the most common
operation. By definition, a restore is an exceptional situation.
I do, however, very much appreciate the fact that to restore a multi-gb
DB does present significant issues.
I don't understand your reference to a "virtual backup", please expand.
How would it work for multi-file DB's?
Sean
-----Original Message-----
From: Markus Kemper [mailto:mkemper@...]
Sent: Wednesday, April 26, 2000 2:05 PM
To: IB-Architect@egroups.com
Subject: Re: [IB-Architect] GBAK processing
Sean,
Not looking at the code, I believe that it is single
threaded. From what I've seen with customers, the
most painful part of GBAK is time to recovery from
a failure/corruption. Time to recovery being either:
Restore from a Backup
(gbak -c ...)
Backup (corrupted db) and Restore
(gbak -b ...)
(gbak -c ...)
Any way you slice it this process gets more painful
as the database grows thus, I suspect that a different
strategy will need to be considered for really large
databases.
We've bounced around a couple ideas as to how we might
speed up gbak as it is today.
1) Add functionality to gbak to create a virtual
'backup' file. You can do this with UNIX/Linux
today outside of the gbak program. You might be
able to do it on Windows too if you have something
like MKSToolkit but, I have not tried it.
This syntax is from memory not testing today, so
please forgive me if I am incorrect but, the idea
is to write the backup file to a pipe and read
from it in the background thus, performing the
two step operation in one.
mknod gpipe
gbak -c gpipe new_database.gdb &
gbak -b old_database.gdb gpipe
2) Creating indexes is a large part of the expense
in restoring a database. Currently a full table
scan is required for each index even if they are
on the same table. Meaning that
create table foo (
f1 integer,
f2 integer )
create index idx_1 on foo ( f1 )
create index idx_2 on foo ( f2 )
would cause two full table scans to build the
indexes during a restore. The idea was to modify
gbak to build multiple indexes at a time, thus
only having to do one full scan per table when
building database indexes.
Markus
In general, I agree with you regarding the most painful part being the
restore.
In the vast majority of cases (98% or more), however, it is the backup
process which is run and, therefore, most likely to affect user access
and system performance. In some cases, the length of time for a backup
impact of the DBA ability to run the backup (in cases where multiple
backups are scheduled/run each and every day).
My initial concern is with improving the performance of the most common
operation. By definition, a restore is an exceptional situation.
I do, however, very much appreciate the fact that to restore a multi-gb
DB does present significant issues.
I don't understand your reference to a "virtual backup", please expand.
How would it work for multi-file DB's?
Sean
-----Original Message-----
From: Markus Kemper [mailto:mkemper@...]
Sent: Wednesday, April 26, 2000 2:05 PM
To: IB-Architect@egroups.com
Subject: Re: [IB-Architect] GBAK processing
Sean,
Not looking at the code, I believe that it is single
threaded. From what I've seen with customers, the
most painful part of GBAK is time to recovery from
a failure/corruption. Time to recovery being either:
Restore from a Backup
(gbak -c ...)
Backup (corrupted db) and Restore
(gbak -b ...)
(gbak -c ...)
Any way you slice it this process gets more painful
as the database grows thus, I suspect that a different
strategy will need to be considered for really large
databases.
We've bounced around a couple ideas as to how we might
speed up gbak as it is today.
1) Add functionality to gbak to create a virtual
'backup' file. You can do this with UNIX/Linux
today outside of the gbak program. You might be
able to do it on Windows too if you have something
like MKSToolkit but, I have not tried it.
This syntax is from memory not testing today, so
please forgive me if I am incorrect but, the idea
is to write the backup file to a pipe and read
from it in the background thus, performing the
two step operation in one.
mknod gpipe
gbak -c gpipe new_database.gdb &
gbak -b old_database.gdb gpipe
2) Creating indexes is a large part of the expense
in restoring a database. Currently a full table
scan is required for each index even if they are
on the same table. Meaning that
create table foo (
f1 integer,
f2 integer )
create index idx_1 on foo ( f1 )
create index idx_2 on foo ( f2 )
would cause two full table scans to build the
indexes during a restore. The idea was to modify
gbak to build multiple indexes at a time, thus
only having to do one full scan per table when
building database indexes.
Markus
> Is GBAK (the backup process) a single threaded function, performing 1with
> task at a time, or is it multi-threaded?
>
> Since I think the answer is that it is single threaded, I'll ask a
> couple of other questions.
>
> Would a multi-threaded backup process perform faster then the current
> model?
>
> What issue would prevent (or made very difficult) GBAK being made
> multi-threaded?
>
> I been thinking about a different approach to GBAK where it creates
> treads (up to a certain limit) for each DB object to be backed up,
> these threads reading the DB information and building "backup
> information" to be written out by a single "writer thread". I
> understand that the system structure would need to be processed by a
> single thread and written to disk before any application data.
>
> Sean
>