firebird-architect - Re: [Firebird-Architect] Re: Some benchmarks about 'Order by'

Subject	Re: [Firebird-Architect] Re: Some benchmarks about 'Order by' - temporary indexes?...
Author	Jim Starkey
Post date	2006-09-23T12:09:07Z

Adam wrote:

> This change will happen when:
>
> 1) The storage size is 'high enough'
> 2) The price is 'cheap enough'
> 3) People become confident that write cycle issues are resolved.
>
> In 5 years time, you won't get a hard disk from Dell, HP or Apple
> unless you choose a special box on the order form. Should we perhaps
> be considering how to make the most out of different storage
> technologies, rather than limiting ourselves to the restrictions of a
> device.
>
>

If the only overhead were the seek and rotational latency of the hard
disk, I wouldn't quibble at all. But a Firebird record fetch requires
the following:

* A fetch of a pointer page, which might, though probably not,
require a page. It does require an interlocked trip through the
page cache manager, tweaking data structure, maybe incurring aged
page writes, etc.
* A handoff from the pointer page to the data page. This is a
fetch of the data page (see above) and a release of the pointer
page (another trip through the page cache manager.
* Fetch of the record header
* A decision on whether to fetch another version, which would
require another pair of release/fetch trips through the page cache
manager.
* Decompression of the record
* Reassembly of the record if it's fragmented (more trips through
the page cache manager)
* Release of the data page (yet another trip through the page cache
manager)

Each disk read incurs at least the following:

* An OS call to initiate the read. The OS passes the operation to
the driver to initiate or queue it
* A thread stall and switch
* If no runnable thread, a process context switch
* When the read completes, maybe a process context switch back
* A thread switch back

In the sort alternative, it's a memory reference.

When I original wrote Interbase, I sweated every cycle in the page cache
manager. I sweated it all over again on Vulcan. As the system has
grown in complexity, so has the complexity of the page cache manager,
with a resulting drop in performance. The page cache manager is the
core and the throttle of Firebird/Interbase.

Falcon was designed to circumvent the problem. Rather than using memory
to hold more pages (an inefficient cache), it uses memory to hold whole
records. This makes a record reference

* A single shared read/write lock executed in user mode
* An indexed walk down a fixed width, variable depth tree
* Increment of an interlocked record use count
* Release of the read/write lock in user mode

The basic record reference cycle is probably a hundreth (or less) than
the Firebird equivalent without a disk read or write. A cache miss,
however, isn't much different than Firebird.

Memory used to be expensive and limited in size and address space. Now
it's cheap, really fast, and huge.

About a million years ago I attended a talk on memory hierarchies. The
guy argued that there was an inherent pyramid of memory references with
different tradeoffs of size of speed: Register, cache, main memory,
disk. What has happened in those million years is that the shape of the
pyramid has changed. It's still a pyramid, but it's shorter and wider.
The speed difference between cache and main memory is narrowing while
the relative sizes are changing dramatically. Main memory is radially
faster. Disks are almost unchanged (OK, solid state disks change this
-- somewhat). Intelligent design dictates that when the environment
changes, so must the design. In other words, the intelligent use of
more really fast memory is not just "more page buffers."

Getting back to your argument, the real impact of solid state disks are
the serial logs, not database pages. The best (known) way of pumping
transactions it to write all updates to a single serialized log
periodically flushed serially with non-buffered writes, letting a
background thread translate the updates to the on-disk structure. With
batch commits, there are fewer disk writes than transactions. And if
each disk write is a non-buffered write to a sequential file on a solid
state disk, well...

We're trying to get a Falcon alpha released open source in next month or
two. Some of it will look familiar, particularly to developers working
on Vulcan. The rest of it, I hope, will be a look at alternatives made
possible by quantitative changes in computer architecture.

--

Jim Starkey
Netfrastructure, Inc.
978 526-1376