Subject Re: Some benchmarks about 'Order by' - temporary indexes?...
Author m_theologos
--- In, Jim Starkey <jas@...>

> If the only overhead were the seek and rotational latency of the
> disk, I wouldn't quibble at all. But a Firebird record fetch
> the following:
> * A fetch of a pointer page, which might, though probably not,
> require a page. It does require an interlocked trip through
> page cache manager, tweaking data structure, maybe incurring
> page writes, etc.
> * A handoff from the pointer page to the data page. This is a
> fetch of the data page (see above) and a release of the
> page (another trip through the page cache manager.
> * Fetch of the record header
> * A decision on whether to fetch another version, which would
> require another pair of release/fetch trips through the page
> manager.
> * Decompression of the record
> * Reassembly of the record if it's fragmented (more trips
> the page cache manager)
> * Release of the data page (yet another trip through the page
> manager)
> Each disk read incurs at least the following:
> * An OS call to initiate the read. The OS passes the operation
> the driver to initiate or queue it
> * A thread stall and switch
> * If no runnable thread, a process context switch
> * When the read completes, maybe a process context switch back
> * A thread switch back
> In the sort alternative, it's a memory reference.
> When I original wrote Interbase, I sweated every cycle in the page
> manager. I sweated it all over again on Vulcan. As the system has
> grown in complexity, so has the complexity of the page cache
> with a resulting drop in performance. The page cache manager is
> core and the throttle of Firebird/Interbase.
> Falcon was designed to circumvent the problem. Rather than using
> to hold more pages (an inefficient cache), it uses memory to hold
> records. This makes a record reference
> * A single shared read/write lock executed in user mode
> * An indexed walk down a fixed width, variable depth tree
> * Increment of an interlocked record use count
> * Release of the read/write lock in user mode
> The basic record reference cycle is probably a hundreth (or less)
> the Firebird equivalent without a disk read or write. A cache
> however, isn't much different than Firebird.
> Memory used to be expensive and limited in size and address space.
> it's cheap, really fast, and huge.
> About a million years ago I attended a talk on memory hierarchies.
> guy argued that there was an inherent pyramid of memory references
> different tradeoffs of size of speed: Register, cache, main
> disk. What has happened in those million years is that the shape
of the
> pyramid has changed. It's still a pyramid, but it's shorter and
> The speed difference between cache and main memory is narrowing
> the relative sizes are changing dramatically. Main memory is
> faster. Disks are almost unchanged (OK, solid state disks change
> -- somewhat). Intelligent design dictates that when the
> changes, so must the design. In other words, the intelligent use
> more really fast memory is not just "more page buffers."
> Getting back to your argument, the real impact of solid state disks
> the serial logs, not database pages. The best (known) way of
> transactions it to write all updates to a single serialized log
> periodically flushed serially with non-buffered writes, letting a
> background thread translate the updates to the on-disk structure.
> batch commits, there are fewer disk writes than transactions. And
> each disk write is a non-buffered write to a sequential file on a
> state disk, well...
> We're trying to get a Falcon alpha released open source in next
month or
> two. Some of it will look familiar, particularly to developers
> on Vulcan. The rest of it, I hope, will be a look at alternatives
> possible by quantitative changes in computer architecture.
> --
> Jim Starkey
> Netfrastructure, Inc.
> 978 526-1376

I think that's much more efficient to have INFORMATIONS in cache (ie.
entire records) rather than DATA which must be converted, thus, as
Jim said, will have many side effects on speed and system complexity.

Just my 2c,

m. Th.