Subject | Blobs |
---|---|
Author | Jim Starkey |
Post date | 2003-06-25T18:33:04Z |
If I may elaborate a bit on Mr. Rail's explanation of blob storage,
blobs, unlike records, are not multi-generational, are never updated
in place, and are not updates when a parent record is updated.
Blobs are created only when needed (either a blob create call or
an internal assignment to a blob field from a different record
instance. Blobs are garbaged collected the same way index entries
are: the engine makes a list of record versions going away and a
list of versions staying. It loops through the fields of the
record versions going way looking for blob ids not represented in
the record version staying. If it doesn't find one, the blob itself
is garbage collected.
Whether it makes more sense to store blobs in the same data space
as record is an interesting question that is hard to answer in the
abstract. Intermingling records and blobs reduces the cost to fetch
a blob at the expense of increasing the cost of a linear scan. In
Interbase/Firebird I think the trade off is highly application
specific. If you assume that most applications (or at the least
the ones we most care about) do virtually all record access by
index, co-mingling is a significant gain. If an application is
sloppy or has a large number of low selectivity fields, then
separate data spaces probably wins, though the cost of keeping the
two spaces approximately co-linear (or the failure to successfully
do so) may cancel the gain. But I will confess that I was more
concerned about efficiency with the very small cache sizes on computers
that we booted with buggy whips where Interbase grew up.
Netfrastructure uses separate data spaces for records and blobs, but
for different reasons. Unlike Interbase/Firebird, Netfrastructure
caches whole records in memory, but like Interbase/Firebird, fetches
blobs from the page cache. There are a lot of advantages to this scheme
(efficiency on small memory system isn't one of them, but who cares?),
but it does eliminate any benefit of co-mingling records and blobs.
blobs, unlike records, are not multi-generational, are never updated
in place, and are not updates when a parent record is updated.
Blobs are created only when needed (either a blob create call or
an internal assignment to a blob field from a different record
instance. Blobs are garbaged collected the same way index entries
are: the engine makes a list of record versions going away and a
list of versions staying. It loops through the fields of the
record versions going way looking for blob ids not represented in
the record version staying. If it doesn't find one, the blob itself
is garbage collected.
Whether it makes more sense to store blobs in the same data space
as record is an interesting question that is hard to answer in the
abstract. Intermingling records and blobs reduces the cost to fetch
a blob at the expense of increasing the cost of a linear scan. In
Interbase/Firebird I think the trade off is highly application
specific. If you assume that most applications (or at the least
the ones we most care about) do virtually all record access by
index, co-mingling is a significant gain. If an application is
sloppy or has a large number of low selectivity fields, then
separate data spaces probably wins, though the cost of keeping the
two spaces approximately co-linear (or the failure to successfully
do so) may cancel the gain. But I will confess that I was more
concerned about efficiency with the very small cache sizes on computers
that we booted with buggy whips where Interbase grew up.
Netfrastructure uses separate data spaces for records and blobs, but
for different reasons. Unlike Interbase/Firebird, Netfrastructure
caches whole records in memory, but like Interbase/Firebird, fetches
blobs from the page cache. There are a lot of advantages to this scheme
(efficiency on small memory system isn't one of them, but who cares?),
but it does eliminate any benefit of co-mingling records and blobs.