firebird-architect - Re: Well, here we go again

Subject	Re: Well, here we go again
Author	paulruizendaal
Post date	2008-06-17T13:15:30Z

Hi Jim,

It would be interesting to widen this thread to a discussion of
highly scalable web apps and the databases underpinning them. I agree
with some of your observations, whilst others seem out of place. As
you opened this thread, I assume you are interested in having such a
chat.

Perhaps first a few remarks for the benefit of the rest of the
community who may want to participate. There is little magic in
scaling web apps out, although a lot of engineering skill is
required. For instance, the web log "high scalability" has
descriptions of the architecture of many high volume web apps and
links to blogs and presentations by their key engineers (see
http://highscalability.com/).

In very broad strokes, being highly scalable boils down to being as
stateless/RESTful as possible and assigning individual bits of
workload across a broad group of machines. Caching CPU intensive
workload steps is also often part of the design. As said, little
magic, but lots of skill involved.

Let's turn to the user requirements next. "There is a strong trend in
cloud computing away from relational databases, incidentally.
Google's BigTable, Amazon's SimpleDB, and Microsoft's unannounced
cloud database service all abandon the relational model for different
single table, complex record models." True, but I disagree with the
reason you give for this, i.e. that relational stuff won't scale. In
my view most of the current web apps (a web shop, a facebook, a
twitter, google, etc.) simply don't need a relational database; the
needs of the app designer are often better served by the "single
table, complex record" model.

"Existing relational products are designed around single node
technology that has neither the reliability nor scalability these
guys require." For the purposes of this discussion, that statement is
too absolute. Of the sites that do use or need a relational backend,
most engineers report that the database was initially a bottleneck,
but that it could be solved via set of common techniques (caching,
sharding, master-slave replication, etc.)

Take the example of salesforce.com that really needs a true
relational backend and has solved the scalability issue using of-the-
shelf technology. Salesforce.com uses an Oracle RAC cluster built
around regular "lintel" harware. There is no magic in RAC,
essentially it is Firebird classic, using a distributed lock manager
and cache fusion (i.e. when a cached page is written to, the caches
on other machines are not invalidated but push updated via a high
speed network link).

So, the challenge isn't so much in figuring out how to make it work,
the challenge is in making scaling out so simple that even a school
leaver can do it right, thus pushing down the TCO of a scalable web
app to that of a simple LAMP system (i.e about $15/yr per user seat).
If NimbusDB scales out to hundreds of standard lintel boxes whilst
being as simple as Firebird to install and operate, you've cracked
the problem.

"There will aways be a point of diminishing returns where the delta
increase in replication cost cancels the benefit of an additional
node. But if this point happens at a hundred nodes, we will still see
a 100X gain in performance over a single node system." You probably
don't mean this literally, but if you do, I would challenge the
arithmetic.

Finally, let's turn to database design choices. What I understand
about the NimbusDB concept comes from these quotes: "No, it isn't
remotely similar to either. For example, nodes that do SQL don't do
disk I/O, nodes that do disk I/O don't do SQL, and all files are
written once and never updated." and "Pages: Just say no."

The first statement is intresting, but seems besides the point of
scalability. All relational database are internally divided along
those lines since their inception, eg. in Firebird the code
above 'looper' is virtually independent from the code below. Of all
databases I know, SQLite seems to be architected the most cleanly in
this area, with a well defined virtual machine separating the two
halves. Still, it is a cheap way of getting from X to ~2X performance.

The bit about all files are written once and never updated is more
interesting. This is a feature that is found in stuff like BigTable
as well and is a choice driven by caching. The big challenge with
caching is not the caching itself (just throw memcached at the
problem), but the invalidation of the cache(s) when underlying data
changes is a tough problem. Write only is a clean solution to the
invalidation problem.

"Write once & only" perhaps seems weird at first glance, but it is
not so hard and working GPL'ed code exists:
http://www.primebase.org/download/ See previous post for a link to
the architecture whitepaper of PBXT.

The "pages: just say no" seems to fit in with the above. A write once
& only system doesn't need pages for storage management. Also, in a
distributed version of this design it would probably be best
to "ship" update logs of individual records, not of pages, to the
target machine. The target machine would be found through some sort
of stable hash, similar to memcached.

In such a design, indeed it would be possible to scale a system by
simply adding boxes and very little scaling engineering skill being
required from the side of the DBA/app designer.

Your other design ideas for NimbusDB seem to be unrelated to the
scalability issue and a continuation of ideas floating around in
other systems:
- semantic data model: seems an extension of concepts in Postgres
- unbounded types: a slight regression from SQLite's manifest typing
- utf8/collations: a slight regression from SQLite's design
- multiple requests per round trip: wasn't that in the HSQLDB http
protocol?

All in all, you seem to have a great idea for an easy to use, highly
scalable database. As I understand it, it is more a continuation of
ideas & concepts that have been brewing for years and not a total
break with the past. However, perhaps I'm just not getting it.

Looking forward to your clarification & rebuttal :^) Other community
members: feel free to join in with views & observations.

Paul