Subject Re: [firebird-support] BLOB performance
Author Ann W. Harrison
Alexey Kovjazin wrote:

> May be, I am wrong, but I think that BLOB reading and writing becomes
> expensive when BLOBs are small and placed on the same data page with other
> columns, am I right?

There's a difference between something that affects performance and
something that's "expensive". The writer was comparing the cost of
storing blobs in a record with storing them in another table or storing
them in separate files. Clearly (at least to me) storing blob data is
external files is a performance disaster.

The second alternative that the customer suggested is storing blobs in a
separate table. Reading a blob by blob id requires checking a pointer
page and reading a data page. In most cases, both those pages are
already in cache from looking up the record. Even if the data page has
to be read, lookup by blob id is faster than finding a record in a
different table by key value - even primary key - reading that record,
and tracking down the blob in that table.

Neither of the alternatives that the writer suggested are better than
using a blob in a record. You've suggested a third alternative, storing
the blob as a varchar. You want to look at two effects: 1) the cost of
getting the blob data when it exists, and 2) the effect of having the
blobs on page when retrieving record (not blob) data.

Certainly, storing the blob as a varchar eliminates one round trip from
the client when the data exists. However, storing blobs as varchars
does not significantly reduce the record density. In fact, it may
increase the record density.
>
> So, my assumption why small BLOBs are slowing down:
>
> 1) To read BLOBs engine needs to read page twice (in second time it will be
> in cache, of course) because different APIs are using to get non-BLOB and
> BLOB columns

As above, reading a page in cache is remarkably quick.

>
> 2) Small BLOBs are not small from the point of view of data page - when
> BLOBs occupy big % of data page, so fewer records can fit on page (the
> situation looks worse when we have large fragmented records and small BLOBs)

The data has to go somewhere. If you store it as a varchar, it takes
the same amount of space as if you store it as a blob, ignoring overhead
and compression. If the blob occupies a large part of a page, putting
the same data in the record will make the record occupy almost exactly
the same space. The blob space is packs more densely, so using blobs can
actually increase the number of records on page.

Overhead and compression cut in different ways.

Varchars have two bytes of overhead - the two length bytes. Blobs take
up eight bytes in the record, whether they exist or not, and another 60
bytes or so for the record header, blob header, and page index, if they
do exist.

Both blobs and varchars are compressed as part of the record so the
actual compression depends on what's on either side if them. An empty
varchar consists of two bytes of 0x0, indicating the length, plus n
bytes of spaces. An empty blob consists of eight bytes of spaces.

Assume the preceding field in the record ended with spaces - reasonably
likely - and the succeeding field is a varchar. The length portion of
the varchar compresses like to two bytes, one byte containing the value
-2, indicating that the next two bytes are identical, then a byte of
0x0. The data portion compresses to 2 bytes for every 256. So an empty
varchar(512) compresses to 6 bytes.

The eight bytes of 0x0 that represent an empty blob compress to a length
byte of -8 and a data byte of 0x0 - 2 bytes.

Unless the varchar is preceded by a byte of 0 or followed by a space,
any compressed empty varchar is larger than an empty blob. If the blob
is followed or preceded by a byte of 0, it compresses to nothing.
Varchar compression gets worse as the varchar gets longer - the data
portion of a 32767 byte varchar requires 256 bytes after compression.

So in the case of a value that's rarely present, the blob wins big.

What about medium small blobs that do exist?

Primary records are never stored as fragments, unless the compressed
record is larger than the page size. Assume you've got a record thats
about 1/3 of the page size plus a string that's about 1/4 of a page. If
the string is a varchar in a record, you'll get 1 record per page. If
the string is a blob, you'll get two records plus a blob on some pages,
and one record plus 2 blobs on other - on average 1.5 records per page.

Certainly you get more records per page if each record has less data,
but if you've got to store the data, you're better off sorting it as
blobs on page, given the current data architecture.

It might make sense to have separate blob pages and data pages for each
table, but that's a major ODS change.


>
> As a summary, I did not recommend use BLOBs for small data when they may fit
> on data page - VARCHAR in this case looks better.
>
> Please resolve my doubts - was I wrong?

I think so.


Best regards,


Ann