firebird-architect - Re: [Firebird-Architect] Re: Record Encoding

Subject	Re: [Firebird-Architect] Re: Record Encoding
Author	Jim Starkey
Post date	2005-05-14T18:08:23Z

adem wrote:

>Disk and network throughput (I/O) is not a problem peculiar
>to Firebird or Vulcan --it is a yin-yang thing for the whole
>universe. Attempting to solve it in software on general-purpose
>CPUs will only mean added load and latency to the system as
>a whole.
>
>

This is an interesting point probably worth discussing at length.

The idea of hardware assist for database systems is as old as database
systems themselves. I've personally be involved with two, the
Datacomputer project on the original ARPAnet and DEC's database machine
project. Other examples was the Britten-Lee machine, where Bob Epstein
cut his commercial teeth before Sybase, a whole slew of various
"accelerators", and an uncountable number of university research projects.

The Datacomputer project was to busy with software issues to even spend
the money in the hardware budget for anything other than a) memory, b)
adaptors for high end disks, and c) more memory. The machine was Tenex
box based on the wire-wrapped KA-10, so dropping specialized stuff was
feasible. We just never found anything that made sense.

At DEC we played with a variety of things ranging from a Quick-sort
hardware board to custom microcode for a VAX 751. The sort board never
made it out the pure research stage because it was apparent from the
beginning that it couldn't be context switched, and being able to
perform only one sort at a time was a clear bottleneck. There were two
problems with microcode project. First, when we found a hot spot from
instrumentation, I always found a software solution before the microcode
was ready. The second problem was that a fried writable control store
module for a 751 really stunk up the lab.

The Britten-Lee machine was one of two high profile commercial database
machines (the other was TeraData, about which I know very little). The
Britten-Lee machine was, in fact, a more or less generic microprocessor
with a database-aware OS with an ECL accelerator board to follow. The
performance of the original machine was very ho-hum. The ECL board was
very late (all ECL boards are very late except for the ones that die in
development), and when released, did almost nothing for performance, at
which point everyone got bored and the company died.

Just about every serious analysis of database performance and
specialized hardware concludes that the most bang for the buck comes
from a) more memory and b) a faster processor. Even worse, every study
done by DEC storage engineering showed that additional memory is better
used in the CPU than the disk controller. This really hurt, because the
disk guys charged a lot more for memory.

>It is, and will be, tackled by the hardware people; it is
>their problem.
>
>

Not at all. They don't know *anything* about software or operating
system or how to attack complex problems.

>Just to sample what is available right now, here are a few
>examples:
>
>If it is the disk space and I/O you are worried you can buy
>a hardware compression card for around $900 --it will get a
>lot cheaper in time and when others enter into market.
>
>

This has the same problem that has Roman upset -- the inability to do
direct access into a compressed file. I tried to convince him
decompressing the blob was good enough, but this clearly doesn't work
for a multi-gigabyte database file.

I'll buy encryption at the disk level, but not compression.

>
>If it is the network throughput you're concerned, 10Gb/s is
>around the corner. And, with 10Gb/s you have a problem that
>has to be solved by specialized silicon: some sort of offloading
>engine, or else the CPU will be monopolized by TCP alone.
>
>

You are confusing bandwidth and throughput. They aren't the same.
Bandwidth reduces the per-bit transmission cost but doesn't nothing
about the per-message cost to dribble up and down the operating system's
protocol stacks.

Adaptive switches are good use of custom hardware. Unlike most
specialized hardware, they make it easier for many clients to shared
common resources. Ethernet boards have gotten smarter, and could be
smarter still, but protocol and connection processing needs to be done
in the OS for a variety of reasons.

>There are various TCP offload engines (ToE) NICs that
>does just that in HW.
>
>Intel, for one, will integrate these (called I/OAT) to server
>MBs next year; I am sure AMD and others will not want to be
>left behind.
>
>

A lot of people have made smart ethernet controllers. But if the
operating system needs to support flexible IP packet handling (like
Linux, for example), having the semantic in the board doesn't help.

>Now, with regards to Firebird/Vulcan and databases in general:
>
>Other than an article I seem to recall reading in the Byte
>Magazine (from about the last quarter of the previous century)
>--may it rest in piece-- mentioning a hardware database engine
>being developed at University of Cambridge (UK), I am not
>aware of a database-accelerator-on-silicon thing. Unless it
>got classified and buried into Echelon, it was vaporware.
>
>

DEC used to fund all sorts of things like this. The university guys
take a tiny piece of a problem and cast it in hardware. The problem is
that the OS overhead to manage shared access to the widget costs more
than the widget saves.

A good example of this was the Sun 3/50 -- Sun's first "low" cost
workstation. They hired Carver Meade (one very smart cookie) to build
them a custom video accelerator chip. When they final got working parts
they discovered that the video performance dropped. The accelerator was
much faster for large blts, but with the OS overhead, slower for very
small operations like rendering a single character. And that,
unfortunately, was the most common operation. So the OS guys put in a
fix so large operations were done in hardware and the little ones in
software. This reduced the performance even more, since now the once
fast small operations were slowed down by the test. Sun shipped the
machines with the socket empty.

This is an unavoidable problem. Suppose you have a compression or
encryption or full text search board that is scathingly hot. To get at
it from user mode, you need operating system support to control and
schedule access to the hardware. So to get at it, you need to change
protection mode, probe a bunch of addresses, try to acquire the
hardware. If the hardware is busy doing someone else's work, the task
must stall, be marked unrunnable, and queued for the hardware resource.
So what you often get is an idle processor waiting for the
"accelerator". The alternative is letting a super-scalar processor with
a huge cache loose on the problem. Guess who usually wins?

>I have never heard of anything similar ever since. Which means,
>to me, at least, there is no silicon solution for databases.
>
>

Many have been tried, none cut the mustard.

>Which brings back to the point where I was wondering why you
>focus on problems that are about to be solved for all, instead
>of adding new features, data types, ACL type security etc.
>that Firebird/Vulcan needs.
>
>

I try to solve problems in order of importance. The number one problem
was that Firebird central server didn't take advantage of SMP machines.
That's fixed. Then next big problem is that while Interbase 3 was disk
bound on a 68020, Firebird is compute bound on a 2.8 GHz superscalar
processor with a half gig of on chip cache.

There are only three or four ways to do something faster. One is to use
a faster processor. Another is to do it on multiple processors in
parallel. The third way is to do it smarter. And the best is to use
memory so don't do the same damn thing over and over. We're towards the
end of faster processors, so the future is with parallel processing
(SMP). But the biggest potential is to be smarter. This means looking
simultaneously at the big picture and the little details to see where
the wheels are spinning, then fix them.

ACL type security? Sir, Interbase was born with ACLs. Borland hid them
because they hadn't been blessed by the SQL Committee.

>Just my 0.02 of some currency.
>
>
>

My lecture is gratis. Keep you 0.02.