Subject | Re: [firebird-support] Re: Truncating transaction logs in Firebird in v2.01 - v2.03 |
---|---|
Author | Niben M Singh |
Post date | 2008-04-14T13:15:04Z |
I appreciate all the detailed explanation from the experts. Thank you all! I was always under the impression that SWEEPING and Garbage Collecting does the same job as truncating logs in Firebird as in SQL Server.
However, my problem is reproducible and I have tried this in different workstations in windows platform. The GBAK and GFIX statements are the exact as I have used and it is the same statement I used for both backups and restore.
I will list you step by step for what I have done and may be you all can try if you get chance. I am pumping data out from SQL Server to Firebird and as I have said earlier this database has about 400 tables and 500MB size. The Firebird database is not online as I am creating the brand new one and I am the only user connected to it while I am doing this operation.
1. Created a brand new firebird database with 4KB page size.
2. Create all the necessary tables and incidices from SQL file.
3. Pump data from SQL Server to Firebird using ODBC. SQL Server has the same schema.
4. This new firebrid database is ~500MB and does not have FDB extension.
5. Sweep this database as -- gfix -user SYSDBA -password masterkey MyDB.XXX -sweep (I will take the sweep out from my process as it is not needed)
6. Backup this database as -- gbak -v -t -user SYSDBA -password masterkey MyDB.XXX MyDB.XXX.FBK
7. Restore this into new Firebird database as -- gbak -v -c -p 4096 -user SYSDBA -password masterkey MyDB.XXX.FBK MyDB_1.XXX
8. Resweep as -- gfix -user SYSDBA -password masterkey MyDB_1.XXX -sweep
9. Backup again as -- gbak -v -t -user SYSDBA -password masterkey MyDB_1.XXX MyDB_1.XXX.FBK
10. Restore again into new database -- gbak -v -c -p 4096 -user SYSDBA -password masterkey MyDB_1.XXX.FBK MyDB_2.XXX
The size of MyDB.XXX was ~500MB.
The size of MyDB_1.XXX was ~470MB.
The size of MyDB_2.XXX was ~350MB.
I verified the schema as it was the same in all the databases. I will verify if the restore has not been done properly for data, but I have been able to reproduce this on different PCs. The size remained exact the same, however, for each databases in different PCs.
Please let me know if I am missing anything.
Regards,
Niben
Helen Borrie <helebor@...> wrote: At 03:21 PM 14/04/2008, you wrote:
1. If you perform a gbak backup WITH garbage collection (the default) it does what sweep does, anyway. So sweeping as a precursor to routine backup is pointless.
2. If you perform a backup with the -g switch, the gbak process won't do any GC. Use the -g switch when you do a backup in exclusive mode with the intention of restoring it immediately without putting the database back online between the operations.
3. No form of garbage collection - either by sweeping, by backing up or by the routine GC that goes on all the time - will free up disk space and reduce the size of the database file.
4. Backup never saves old record versions. It reads the data from the latest committed version of each table as it stood at the start of the snapshot transaction in which the process runs.
5. Restoring a database will produce the smallest possible database file. This is because a restore completely recreates the database by creating a new file, writing the header, pumping the metadata and creating the objects and (almost) finally, pumping in the data. The last things to happen are the activation of the indexes and the recreation of the SQL privileges. The restore requests pages only as it needs them. No garbage is created by a restore.
6. Apart from the small variations in file size you might observe if your disk is very fragmented, you won't see any difference between the file sizes of two databases restored from the same backup. So if you have observed the file size "dramatically reduced" between two such operations, suspect that your restore in the "smaller" case did not finish; or else you ran that restore with the -i switch to disable the activation of indexes.
7. In essence, you should just stop worrying about database file growth if you are sure you are taking proper care of transactions in your application code AND you have allowed room for growth. Firebird isn't storing data in physical tabular structures at all. Database file growth reflects the number of *pages* that have been acquired since the last time you looked while the database was offline. Even though your most-used tables might be small enough to fit 20 records on a page, the engine will acquire a whole new page if it can't find enough recyclable space to accommodate a new record version.
If your page size is unnecessarily large, that could represent quite a lot of growth in a short period, especially if you have long-running read-write transactions that are inhibiting the routine GC.
./heLen
[Non-text portions of this message have been removed]
However, my problem is reproducible and I have tried this in different workstations in windows platform. The GBAK and GFIX statements are the exact as I have used and it is the same statement I used for both backups and restore.
I will list you step by step for what I have done and may be you all can try if you get chance. I am pumping data out from SQL Server to Firebird and as I have said earlier this database has about 400 tables and 500MB size. The Firebird database is not online as I am creating the brand new one and I am the only user connected to it while I am doing this operation.
1. Created a brand new firebird database with 4KB page size.
2. Create all the necessary tables and incidices from SQL file.
3. Pump data from SQL Server to Firebird using ODBC. SQL Server has the same schema.
4. This new firebrid database is ~500MB and does not have FDB extension.
5. Sweep this database as -- gfix -user SYSDBA -password masterkey MyDB.XXX -sweep (I will take the sweep out from my process as it is not needed)
6. Backup this database as -- gbak -v -t -user SYSDBA -password masterkey MyDB.XXX MyDB.XXX.FBK
7. Restore this into new Firebird database as -- gbak -v -c -p 4096 -user SYSDBA -password masterkey MyDB.XXX.FBK MyDB_1.XXX
8. Resweep as -- gfix -user SYSDBA -password masterkey MyDB_1.XXX -sweep
9. Backup again as -- gbak -v -t -user SYSDBA -password masterkey MyDB_1.XXX MyDB_1.XXX.FBK
10. Restore again into new database -- gbak -v -c -p 4096 -user SYSDBA -password masterkey MyDB_1.XXX.FBK MyDB_2.XXX
The size of MyDB.XXX was ~500MB.
The size of MyDB_1.XXX was ~470MB.
The size of MyDB_2.XXX was ~350MB.
I verified the schema as it was the same in all the databases. I will verify if the restore has not been done properly for data, but I have been able to reproduce this on different PCs. The size remained exact the same, however, for each databases in different PCs.
Please let me know if I am missing anything.
Regards,
Niben
Helen Borrie <helebor@...> wrote: At 03:21 PM 14/04/2008, you wrote:
>Thanks for the responses!Well, sweeping and GC don't have anything to do with "transaction logs". They are both mechanisms for removing unwanted record versions and freeing up space on data pages so it can be re-used.
>
>I guess in Firebird term it is called SWEEP or GARBAGE COLLECTION.
>GBAK does garbage collection by default, unless you specify "-g" option.
>So it is confusing to me when I backed up the database after manual SWEEPING and with Garbage Collection -- Restored the database from backup into new file -- SWEEPED new database again -- again backed it up with garbage collection -- and eventually re-restored again into new database file.It is illusory and probably indicates that one of your restores didn't finish.
>
>After doing that the final size of the new file is dramatically reduced. The data and schema has not changed, however....
1. If you perform a gbak backup WITH garbage collection (the default) it does what sweep does, anyway. So sweeping as a precursor to routine backup is pointless.
2. If you perform a backup with the -g switch, the gbak process won't do any GC. Use the -g switch when you do a backup in exclusive mode with the intention of restoring it immediately without putting the database back online between the operations.
3. No form of garbage collection - either by sweeping, by backing up or by the routine GC that goes on all the time - will free up disk space and reduce the size of the database file.
4. Backup never saves old record versions. It reads the data from the latest committed version of each table as it stood at the start of the snapshot transaction in which the process runs.
5. Restoring a database will produce the smallest possible database file. This is because a restore completely recreates the database by creating a new file, writing the header, pumping the metadata and creating the objects and (almost) finally, pumping in the data. The last things to happen are the activation of the indexes and the recreation of the SQL privileges. The restore requests pages only as it needs them. No garbage is created by a restore.
6. Apart from the small variations in file size you might observe if your disk is very fragmented, you won't see any difference between the file sizes of two databases restored from the same backup. So if you have observed the file size "dramatically reduced" between two such operations, suspect that your restore in the "smaller" case did not finish; or else you ran that restore with the -i switch to disable the activation of indexes.
7. In essence, you should just stop worrying about database file growth if you are sure you are taking proper care of transactions in your application code AND you have allowed room for growth. Firebird isn't storing data in physical tabular structures at all. Database file growth reflects the number of *pages* that have been acquired since the last time you looked while the database was offline. Even though your most-used tables might be small enough to fit 20 records on a page, the engine will acquire a whole new page if it can't find enough recyclable space to accommodate a new record version.
If your page size is unnecessarily large, that could represent quite a lot of growth in a short period, especially if you have long-running read-write transactions that are inhibiting the routine GC.
./heLen
[Non-text portions of this message have been removed]