Subject Re: [ib-support] Re: BLOBs and garbage collection
Author Ivan Prenosil
When working with such high volumes of data, you must be much more careful
about committing all transactions as soon as possible.


> Oldest transaction 40298
> Oldest active 40299
> Oldest snapshot 40247
> Next transaction 41004

Here you can see that difference between Oldest Snapshot and Next Transaction
is 757, which is generally good value, but is probably unacceptable for your
type of application. It means that backversions produced by last 757 transactions
can't be garbage collected.



> VIDEO (131)
> Primary pointer page: 144, Index root page: 145
> Average record length: 13.69, total records: 4029
> Average version length: 51.99, total versions: 3018, max versions: 1

This means there are 4029 records, 3018 of which has 1 backversion.
If you are not updating records in this table (only inserting and deleting),
then it means there are 1011 single-version records, and 3018 pairs
of new record version (only with information that record is deleted)
and old record version (with original data and blob).
I.e. more that 4GB of data.
It is less than 20GB, but we do not know the whole history
(and I presume you know that once the disk space is allocated to database
file it is not automatically released - only after backup/restore).


> By the way, SELECT COUNT(*) FROM VIDEO returns 1004, and not 4029.

This can mean that either 3025 records is already deleted
(probably, because it is close to 3018 from statistisc),
or that you are not committing somewhere
(either you are not committing all transactions that make the changes,
or you are not committing the transaction that counts rows, or both).


Ivan Prenosil
[ I am looking for a job: InterBase - Firebird - Delphi - C - fulltext db - and more ... ]
Ivan.Prenosil@...
http://www.volny.cz/iprenosil/interbase

----- Original Message -----
From: "Fabrice Aeschbacher" <fabrice.aeschbacher@...>
To: <ib-support@yahoogroups.com>
Sent: 22. kv�tna 2003 09:23
Subject: [ib-support] Re: BLOBs and garbage collection


> Hi,
>
> I ran the test again during last night, this time with smaller video
> slices: I insert a new video record for each 1MB read from stream.
> Once again, this completely filled up my 20GB disk during the night;
> the "real" total blob size is 1GB.
>
> Below the interesting part of gstat output (when the disk is full):
>
> Oldest transaction 40298
> Oldest active 40299
> Oldest snapshot 40247
> Next transaction 41004
>
> <...>
>
> VIDEO (131)
> Primary pointer page: 144, Index root page: 145
> Average record length: 13.69, total records: 4029
> Average version length: 51.99, total versions: 3018, max
> versions: 1
> Data pages: 1330, data page slots: 1341, average fill: 24%
> Fill distribution:
> 0 - 19% = 1111
> 20 - 39% = 83
> 40 - 59% = 1
> 60 - 79% = 0
> 80 - 99% = 135
>
> By the way, SELECT COUNT(*) FROM VIDEO returns 1004, and not 4029.
>
> I really can not understand what I'm doing wrong. I cannot imagine
> that what I'm doing is not possible with FB. The problem is that I
> cannot spend a month with this
>
> Any help will be appreciated,
> Best regards
> Fabrice Aeschbacher
>
>
>
> --- In ib-support@yahoogroups.com, "Fabrice Aeschbacher"
> <fabrice.aeschbacher@s...> wrote:
> > Hi,
> >
> > Firebird version: LI-V6.2.908 Firebird 1.0 (classic-64)
> > OS: Linux/debian kernel 2.4.20
> > Filesystem: XFS
> >
> > I am still struggling with continuous insert of large BLOBs into
> the
> > DB.
> > My DB has one main table:
> >
> > CREATE TABLE VIDEO (
> > VIDEO_ID INTEGER NOT NULL
> > , SEQUENCE_ID INTEGER
> > , TS TIMESTAMP
> > , VIDEO_DATA BLOB( 65535, 0 )
> > , CONSTRAINT PK_VIDEO PRIMARY KEY ( VIDEO_ID )
> > )
> >
> > One first process does the following continually:
> > - read data ftom a video device (e.g., /dev/video<x>) and insert
> a
> > new record into VIDEO as soon as 10MB where read from the device
> > - commit transaction after insert and
> >
> >
> > One second process executes the following every minute (setitimer):
> > - DELETE FROM VIDEO WHERE TS < :some_time
> > , where :some_time is calculated to be (say) current_timestamp -
> 60
> > minutes.
> > - Execute a SELECT COUNT(*) FROM VIDEO (Ivan explained that
> > counting rows could help touching deleted records)
> >
> > Each process use one DB connection
> > Isolation level : read commited, rec version for all transaction
> > All transaction are short (commit after each SQL command)
> >
> > Now the problem:
> > ----------------
> >
> > The size of the DB should be approximatively the size of a 60
> > minutes video (about 2GB), perhaps a little more. But it seems
> that
> > no garbage collection is done at all, since the DB size fills up
> all
> > my 20GB disk.
> > After some hours running the test, there is only 1 record left in
> > the DB (should be about 10MB), and however the disk is full (20GB)
> >
> > Has someone any idea of what could be wrong?
> > How could I be sure, that every deleted record will effectively be
> > garbaged collected and its space freed for other inserts?
> >
> > Any hint could really help,
> > Fabrice Aeschbacher
> >
> > Here an output from gstat, as the disk was full (please not that
> > there is only 1 record left in the VIDEO table):
> >
> > ========= G S T A T ============================================
> > Database "video.gdb"
> >
> > Database header page information:
> > Flags 0
> > Checksum 12345
> > Generation 7229
> > Page size 8192
> > ODS version 10.0
> > Oldest transaction 7184
> > Oldest active 7225
> > Oldest snapshot 7225
> > Next transaction 7226
> > Bumped transaction 1
> > Sequence number 0
> > Next attachment ID 0
> > Implementation ID 19
> > Shadow count 0
> > Page buffers 0
> > Next header page 0
> > Database dialect 3
> > Creation date May 20, 2003 17:49:33
> > Attributes
> >
> > Variable header data:
> > Sweep interval: 1000
> > *END*
> >
> >
> > Database file sequence:
> > File video.gdb is the only file
> >
> > Database log page information:
> > Creation date
> > Log flags: 2
> > No write ahead log
> >
> > Next log page: 0
> >
> > Variable log data:
> > Control Point 1:
> > File name:
> > Partition offset: 0 Seqno: 0 Offset: 0
> > Control Point 2:
> > File name:
> > Partition offset: 0 Seqno: 0 Offset: 0
> > Current File:
> > File name:
> > Partition offset: 0 Seqno: 0 Offset: 0
> > *END*
> >
> > Analyzing database pages ...
> >
> >
> >
> > SEQUENCE (129)
> > Primary pointer page: 137, Index root page: 138
> > Average record length: 35.00, total records: 6
> > Average version length: 0.00, total versions: 0, max versions:
> 0
> > Data pages: 1, data page slots: 1, average fill: 4%
> > Fill distribution:
> > 0 - 19% = 1
> > 20 - 39% = 0
> > 40 - 59% = 0
> > 60 - 79% = 0
> > 80 - 99% = 0
> >
> > Index RDB$PRIMARY2 (0)
> > Depth: 1, leaf buckets: 1, nodes: 6
> > Average data length: 1.00, total dup: 0, max dup: 0
> > Fill distribution:
> > 0 - 19% = 1
> > 20 - 39% = 0
> > 40 - 59% = 0
> > 60 - 79% = 0
> > 80 - 99% = 0
> >
> >
> > VIDEO (131)
> > Primary pointer page: 142, Index root page: 143
> > Average record length: 0.00, total records: 1 <======= !!!!!!!
> > Average version length: 52.00, total versions: 1, max
> versions: 1
> > Data pages: 1, data page slots: 1634, average fill: 64%
> > Fill distribution:
> > 0 - 19% = 0
> > 20 - 39% = 0
> > 40 - 59% = 0
> > 60 - 79% = 1
> > 80 - 99% = 0
> >
> > Index RDB$FOREIGN4 (1)
> > Depth: 2, leaf buckets: 1, nodes: 1
> > Average data length: 2.00, total dup: 0, max dup: 0
> > Fill distribution:
> > 0 - 19% = 1
> > 20 - 39% = 0
> > 40 - 59% = 0
> > 60 - 79% = 0
> > 80 - 99% = 0
> >
> > Index RDB$PRIMARY3 (0)
> > Depth: 2, leaf buckets: 1, nodes: 1
> > Average data length: 3.00, total dup: 0, max dup: 0
> > Fill distribution:
> > 0 - 19% = 1
> > 20 - 39% = 0
> > 40 - 59% = 0
> > 60 - 79% = 0
> > 80 - 99% = 0
> >
> > ========= G S T A T ============================================