firebird-architect - Re: [Firebird-Architect] Blobs

Subject	Re: [Firebird-Architect] Blobs
Author	Jim Starkey
Post date	2003-06-26T21:34:10Z

Alexandre Kozlov wrote:

>Do not understand clearly but what can be faster than reading all table
>(about 20MB) in memory and scanning it there - this is exactly the case
>after my space for BLOBs simulated separation (and even more: each table
>occupies contiguous space on disk after restore - this is the main
>underlying feature I used in my simulation). Still I believe it exists
>another tuning to raise slammed performance in my case but no so
>effectively and simply for understanding.
>
>

Reading a few index pages, a pointer page, and a single database page is
a great deal faster
than reading 20 mb. Reading 20 mb will completely sweep the cache,
invalidating every page;
the next 20 mb scan will need to start over with all cache misses
(assuming 1000 buffers and
a 2k page size). Unless the application is single (patient) user, 20 mb
exhaustive scan is not
a good thing, and certainly not something to drive internal database design.

Given a typical application -- say a content store for a web server -- a
request will involve
maybe a half dozen indexed lookups with one or two blob fetches ranging
from a few words
to a few paragraphs each. Ignoring the cache (reasonable for a very
large database), storing
the blobs on the same page as the record reduces the number of page
reads by a third, assuming
the leaf page of the index is always a cache miss. That's a pretty good
gain. Trading off a
33% performance kick for a well structured application for increased
performance for a badly
structured application is not a tradeoff I would take.

>Of course, I undersatnd FB do not tries to put a lot features for tuning
>supposing that in most cases engine will work good. And I like this
>approach too. Bur sometimes you need additional parameters for tuning
>(even related more with OS features - like different disk spaces) large
>database which for small databases are nonsense.
>

The only tuning parameters other than indexes in Netfrastructure are
upper and lower limits
that control record scavenging (garbage collection). I wish I didn't
have that, but it depends
on a rough guess of the size of frequently used data, the amount of RAM,
and the effective
working set sizes of other commonly scheduled processes.

Let me bore the list about an ancient story about "optimizations" and
how they can bite you
on the leg. A couple of decades back Sun was designing their first low
cost, high volume
workstation, the Sun 350 (68K) based. To maximize performance, they had
Carver
Meade's company design them a graphics accelerator, essential a bit blip
chip. When they
finally got production parts they plugged the chips into the waiting
sockets and expected
the graphic performance shoot up. It didn't. It went slower. Much
analysis later, they
discovered that although the chip made large transfers much faster, most
transfers were
moving a character cell, a couple of dozen pixels, and the cost of
managing the chip,
locking down pages, setting up the control structures took more time
than it took to do
the transfer in software. The software guys said "no problem" and put
in a test to see
if the transfer was small and should be done in software or big and
should be done in
hardware. And guess what? The machine went slower still, since the
most common
operations, a character cell transfer, now required a complicated test
in addition to
everything else. Sun scraped the chip and shipped the machines with the
socket empty.

Making things more complex doesn't necessarily make them faster.