Subject Re: [Firebird-Architect] Re: Record Encoding
Author Geoff Worboys
Hi Jim,

Jim Starkey wrote:
> Geoff Worboys wrote:
>>If a new version of FB *requires* multi-Gb of RAM just to be
>>able to run then we will alienate many users. ("requires" is
>>different from being able to "take advantage of" - which FB
>>must be able to do.)
> Nobody has suggested anything like this. We're really talking
> about the difference between reading one blob page and
> decompressing it into two or three pages rather than reading
> two three pages of uncompressed blobs. I've been trying to
> make the rather simple point that I picked run length encoding
> because it didn't tax the relatively slow CPUs of the mid 1980s,
> that CPUs are now two or three orders of magnitudes faster, and
> maybe, just maybe, it makes sense to reexamine the issue. The
> guys who want to run 8MB 386s have morphed the argument into a
> requirement for multi-gigabyte memory. Nobody said, nobody
> suggested it, nobody even hinted at it.

No hint? What was this:

Thu, 12 May 2005 17:10:19 -0400 Jim Starkey wrote:
> OK. Let's go there. Rdb had only segmented blobs. At
> Interbase, I added stream blobs with seek. In Netfrastructure,
> I dropped both segmented blobs and seek.

> My initial thinking on blobs was centered around the idea that
> blobs were going to be much larger than available memory. That,
> like the PDP-11, has passed. It is now feasible, even
> preferable, to fetch the whole thing into memory and process it
> there.

> I'm not going to argue that there aren't applications where UDFs
> or blob filters or embedded Java or stored procedures aren't
> going to want to manipulate blobs. I am going argue that that
> capability shouldn't dictate the on disk storage format.

> We can continue to support blob seek with the simple expedient
> of fetching and decompressing the blob into memory and doing
> random access. The other 99.99% of the cases can get full
> advantage from blob compression.

How many requests for reasonable size blobs will it take to blow
out memory use significantly?

back to todays reply:
> run faster, not slower. It faster to decompress a page than to
> read another one. Sooner or later you have to understand that
> cpus are fast and disks are slow, and the way to go faster is
> to read fewer pages.

Compression = performance ... sometimes, perhaps even most of
the time. Block compression may be acceptable. Forcing the
entire blob into memory is not.

OTOH you have also made comments about FB having become CPU
bound even on new fast hardware (which I have noticed BTW).
After the trouble you are going to, to reduce that problem,
isn't an always on compression scheme going to have the
potential to lead us back that way?

It seems to be that:
- yes compression may be a good feature to have
(perhaps even as a default)
- compression should be switchable per column so that
implementations can tune per requirement

> What's the worst machine you can imagine running Firebird?

I am really not certain. AFAICT hardware is more expensive
here in Australia that it is in USA, but even so it is much
cheaper than some other places. This is the sort of thing
that needs to be researched. We really should have targets.

> And what has anyone suggested that would increase the footprint?

See above.

> What compromises do you have in mind?

Configuration - if you do it, allow it to be turned off.

Memory - dont force blobs to be fully loaded for stream
processing (or limit/buffer it in some way to limit the
memory used).

> Suppose we compressed in 64K hunks. Is that a problem?

I doubt it, but as always there are tradeoffs and the cost
is real. Significant? I dont know.

> It a disk optimization that should be transparent to the user.
> Without awareness of blob compression, it is impossible to
> apply blob operators, which is a terrible sacrifice to make.

Awareness of blob compression and forced/always-on compression
are different things. Block compression and full-blob
compression are different things.

Some of your previous comments had led me to believe you were
heading down the track of forcing FB into areas that may have
prevented it from running efficiently on low-end hardware.

Built in compression will add to the overheads. Will it be
significant on current low-end hardware? Without knowing
exactly what the targets are, we cant know. To investigate
the targets we need a new thread - and possibly an expansion
of the discussion into support or FFMembers for feedback.

As for IBs status as a renegade. That is a good feature in a
new ground-breaking system. It is not necessarily a desirable
feature in a system with a large existing set of users. They
expect certain things that we must maintain. The status of
FB as a system that will work even of relatively low hardware
is one of the things that is important to many FB users.

Keep that status - with or without compression - and you will
keep the existing users happy. Be a renegade and ignore what
the users consider as acceptable target platforms and FB will
be looking for a new market.

I am not saying that your ideas for compression are bad. We
just need to be wary of pushing the platform requirements too
far. Put compression in place in a form that will not blow
out memory and will not prevent efficient blob stream access,
and we can see how well it performs in real implementations.
If it turns out that compression is so good that any costs
are insignificant then future versions can begin to deprecate
the support for uncompressed data.

(And who knows, in the meantime solid state drives may take
off with new technology, becoming cheap and even faster than
they are now. Then suddenly the assumption that disk reads
are slow will be made obsolete and we will be looking for a
way to stop our databases being CPU bound again. ;-)

Geoff Worboys
Telesis Computing