firebird-architect - Re: [Firebird-Architect] Relational Databases and Solid State Memory: An Opportunity Squandered?

Subject	Re: [Firebird-Architect] Relational Databases and Solid State Memory: An Opportunity Squandered?
Author	Jim Starkey
Post date	2012-01-21T20:47:24Z

About 35 years ago I attended a talk on memory hierarchies. The
speaker, name long forgotten, argued that computer memory is structured
in a pyramid of registers, cache, core (this was ancient, remember),
virtual, and disk. The top was extremely fast, expensive, and tiny, but
as you went down the hierarchy, the memory got slower, cheaper, and
bigger. He predicted that the speeds and the costs would change, but
the hierarchy would be with us forever.

Pretty good prediction.

There are two problems with SSDs. One is they fit poorly into a memory
hierarchy. They are faster than disks, but much more expensive. And,
more importantly, they are unsuited for long term storage. So they
aren't a replacement for disks, at least not yet. And as disks continue
to boggle the mind with size and price, it doesn't look likely that SSDs
will ever force disks out of the market for general applications other
than mobile computing.

So if we forget about SSDs at the bottom of the hierarchy, where do they
fit in? At first glance, SSDs might make a good, large-ish, disk
cache. Maybe. The problem is caches are, by definition, highly
volatile. Unfortunately, by the laws of physics, flash cells have a
finite limit on the number of times they can change state without
wearing out. SSD controllers do all sorts of clear things like maintain
a large amount of reserve cells, moving around volatile data, etc.
Probably fine for general computing disk loads, but highly problematic
for caches.

The second big problem is speed. Flash is vastly faster than disks, but
SSDs have to live in disk subsystems designed around disk transfer
speeds. The potential transfer rate just isn't achievable in an I/O
infrastructure built for disks.

Let's get back to fundamentals. There are two things we know in our
hearts and head. One is that disks suck. Disks and big and cheap and
really, really slow and will never get faster. The other is that flash
is really good solution desperately looking for a problems.

The solution, I believe, storage matrix made out of flash on a PCI-x
hard front ending a distributed file system like the Hadoop File System
(HDFS). HDFS works more or less like this: an HDFS consists of a large
number of data nodes each with sucky disks and a name node that keeps
track of where everything is. An HDFS client creates a file locally,
then asks the name node where to put it. The name node directs the
client to send the file to the data node for storage. The data node
then forward to file to other data nodes for redundancy, updating the
name nodes with the additional sites for the data (lots of interesting
policy come into play here since HDFS knows about racks and data
centers). When a file is needed, a trip to the name nodes gets the
address of the optimal data node and where the client gets the data. If
you want the real scoop, read here:
http://hadoop.apache.org/common/docs/current/hdfs_design.html

In this architecture, flash on a super-high bandwidth PCI-X flash card
makes file creation very fast. The file remains in the PCI-X flash as a
cache, then can be deleted after the file has propagated around a number
of data nodes. The flash is never used to persist data, so if it wears
out, it gets replaced, ho hum. What we get is a super fast file write,
redundant, multi-datacenter storage, retrieval optimized to minimize
latency, and the appearance of infinite bandwidth.

The fly in the ointment is that database systems built around block
level direct access to volatile disk files are basically screwed. HDFS
is optimized for write once, read many, so a volatile page cache is out
of the question.

So I'd say pox on SSDs -- too little, too late. The real value of flash
is get the economy and size of disks without having disks anywhere on
the critical path. Disks are great for archival storage and rotten for
anything else.

For you Firebirdians, NuoDB (formerly NimbusDB) is built around diskless
transaction nodes and quite separate archive managers that serialize
snapshots of distributed objects called atoms (target size is about
64kb). All active data is in memory and replicated among as many
transaction nodes as find the atom worth keeping in memory. All
transaction nodes serve as atom caches. If a transaction node needs an
atom that is not in memory anywhere, it gets the atom from the most
responsive archive manager.

HDFS isn't perfect, but it isn't unique, either. There are a variety of
highly distributed, replicated, high performance storage systems out
there, and most likely more will be showing up. It's the way to go.
Definitely.

On 1/21/2012 7:53 AM, mariuz wrote:
>
> Something to read for the weekend
> http://www.simple-talk.com/sql/database-administration/relational-databases-and-solid-state-memory-an-opportunity-squandered/
>
> And by the way Fusion-io Breaks One Billion IOPS Barrier
>
> http://www.fusionio.com/press-releases/fusion-io-breaks-one-billion-iops-barrier/
>
>

[Non-text portions of this message have been removed]