Subject | RE: [Firebird-Architect] blobs causes table fragmentation. |
---|---|
Author | Leyne, Sean |
Post date | 2004-10-04T23:42:07Z |
Roman,
the data is stored -- this is why the engine supports large page sizes.
But I need to point out: The problem you describe is with the API/wire
protocol! Not with the manner in which the data is stored in the
database!!
If a new wire-protocol allowed for the return of Blobs with the row data
(perhaps to a defined size) then there would be no problem with the
current storage approach.
So let's keep our eye on the ball, and understand where the problems
truly exist.
will solve anything -- in fact depending on the page size and the BINARY
size I could see a case where disk I/O would *significantly* increase
(degrading performance).
The big negative for a new type is the inevitable user confusion -- what
is the difference between BLOB and BINARY???
Sean
> Not each access to the record means access to the blob. If in order toBut that (bigger record = more pages read) is true regardless of where
> process a request engine needs to evaluate condition on the row data
> (for example no index can be used for it), the bigger record is, more
> pages you have to read.
the data is stored -- this is why the engine supports large page sizes.
> If you have a lot of relatively small BLOBs, each SELECT that requiresI agree that the current situation is less than ideal.
> access to the record would need to do twice as more disk IO as
> compared to the situation where BLOB is stored in separate table with
> an integer primary key.
>
> Current behavior is even more flawed when we consider the API. When we
> do want to improve the performance for small BLOBs that are already
> loaded into memory, why do we require additional network call to
> access BLOB's content? Why don't we send its content right away to the
> client? Currently we fetch records that contain BLOB ids, and then for
> each ID we have to load BLOB segments (in this case that is usually
> one segment)...
>
> Also we believe that access to the BLOB content would happen faster
> because the page is in cache. In Classic with default cache size of 75
> pages that means only cache hit when SELECT fetches less then ~600
> rows described above. For bigger result sets (I assume we have cache
> with LRU policy for replacing pages in it) cache does not play any
> role, because when we finish fetching data, first page is thrown away
> when the record 601 is fetched. Then we start accessing BLOBs starting
> the first row, load page 1 by throwing away page 2 (at that point its
> the oldest one), then we continue to page 2 by throwing away page 3,
> and so on. I suspect that the situation with SuperServer is more or
> less similar under heavy load. Anyway effect of cache hit can be
> completely neglected because of network call latency of 5-10 ms per
> call, you can't fix this (and I suspect that additional disk IO gives
> comparable delays). So we loose on each SELECT that involves access to
> the record regardless whether BLOB is needed or not, but what do we
> actually earn on small BLOBs?
But I need to point out: The problem you describe is with the API/wire
protocol! Not with the manner in which the data is stored in the
database!!
If a new wire-protocol allowed for the return of Blobs with the row data
(perhaps to a defined size) then there would be no problem with the
current storage approach.
So let's keep our eye on the ball, and understand where the problems
truly exist.
> Probably I would introduce new data type BINARY(n) of maximum sizeI don't have a problem with a new datatype, but I'm not so sure this
> 32000 bytes content of which would be stored on the database page
> (that would be VARCHAR type without all character information, like
> charset, collation, etc.) whose content is sent over the wire together
> with the content of the rest of the fields. People that need objects
> without upper bound should use BLOBs that in this case would be always
> stored separately.
will solve anything -- in fact depending on the page size and the BINARY
size I could see a case where disk I/O would *significantly* increase
(degrading performance).
The big negative for a new type is the inevitable user confusion -- what
is the difference between BLOB and BINARY???
Sean