Subject | Re: [Firebird-Architect] blobs causes table fragmentation. |
---|---|
Author | Jim Starkey |
Post date | 2004-10-04T23:39:13Z |
Roman Rokytskyy wrote:
data sections, one for records, one for blobs, after a straightforward
table load, the two sections would be co-linear. Alternating record and
acess access would have the cache hit behavior of the current system,
but accessing the data section alone would take half the page reads of
commingled records and blobs.
Netfrastructure, in fact, does this -- separate data segments for
records and blobs. I don't think the current scheme has all the bad
attributes suggested, but I do think the other approach is better (or I
would have been a fool to switch).
Netfrastructure also allows string and blobs to be used interchangeably
with the caveat that you can't index a blob. Except for code that is
blob-repository aware (mostly just replication code), blobs are accessed
by getString, getBytes, setString, and setBytes.
Interbase was designed when even modest blobs (by today's standards)
exceeded physical memory size, demanding special processing. But I can
buy more memory at BestBuy for $250 than DEC had in commercial software
engineering in Merrimack, New Hampshire. Times have changed; software
should evolve.
I suggest that someone think long and hard about the best way to get
from the current implementation to something that can take advantage of
modern machines. However, this should be done inconjunction with the
design of a new API. I've been pushing a JDBC C++ binding and have
implemented that inside of Vulcan. If we can get some buy in on this or
an reasonable alternative, we can start thinking about APIs, wire
protocols, and ODS all at the same time, which is really the one way to
design a database.
>The argument that you want to use is that tables had two independent
>
>>Based on this observation, I would argue that the current approach
>>of fitting data into the row data page would be preferred, since the
>>blob would resolved without needing an additional disk read.
>>
>>
>
>Not each access to the record means access to the blob. If in order to
>process a request engine needs to evaluate condition on the row data
>(for example no index can be used for it), the bigger record is, more
>pages you have to read.
>
>If you have a lot of relatively small BLOBs, each SELECT that requires
>access to the record would need to do twice as more disk IO as
>compared to the situation where BLOB is stored in separate table with
>an integer primary key.
>
>
>
data sections, one for records, one for blobs, after a straightforward
table load, the two sections would be co-linear. Alternating record and
acess access would have the cache hit behavior of the current system,
but accessing the data section alone would take half the page reads of
commingled records and blobs.
Netfrastructure, in fact, does this -- separate data segments for
records and blobs. I don't think the current scheme has all the bad
attributes suggested, but I do think the other approach is better (or I
would have been a fool to switch).
Netfrastructure also allows string and blobs to be used interchangeably
with the caveat that you can't index a blob. Except for code that is
blob-repository aware (mostly just replication code), blobs are accessed
by getString, getBytes, setString, and setBytes.
Interbase was designed when even modest blobs (by today's standards)
exceeded physical memory size, demanding special processing. But I can
buy more memory at BestBuy for $250 than DEC had in commercial software
engineering in Merrimack, New Hampshire. Times have changed; software
should evolve.
I suggest that someone think long and hard about the best way to get
from the current implementation to something that can take advantage of
modern machines. However, this should be done inconjunction with the
design of a new API. I've been pushing a JDBC C++ binding and have
implemented that inside of Vulcan. If we can get some buy in on this or
an reasonable alternative, we can start thinking about APIs, wire
protocols, and ODS all at the same time, which is really the one way to
design a database.