Subject | Re: Cloud databases |
---|---|
Author | paulruizendaal |
Post date | 2008-07-25T10:13:21Z |
Thanks Roman, that was very helpful. I like the architectural
elegance of combining a lean GCS with a lean database into a scalable
system. I guess I am johnny-come-lately with that view.
All,
Below some thoughts on performance.
It was suggested to me that Spread could do total ordering of 10-20
nodes in well less than a millisecond on common PC hardware.
Say it is 0.2 ms. That means it can handle up to 5,000 write
transactions per second, if the member nodes can keep up. If it is 1
ms, it still gets to a thousand. If a single node system can do 100,
the scale out factor would be 10 to 50. If we have one GCS server per
5 nodes we would scale out to 25 to 100 nodes before the GCS maxes
out. As the Kemme2000 paper showed good scaling out to 15 nodes, I
tend to believe the 0.2 ms more than the 1 ms.
Note that the above is for write transactions, read transactions
would scale out linearly, much like a MySQL master-slave setup.
Keeping up at the disk end is harder. If we take a write-everywhere
approach on a cloud with 1,000 write transactions per second, and an
averages diff size of 1kb, we need to send 1MB per second to the
disk. Easy for a continuous write, but quite hard if the writes are
random. If a single diff takes 2 seeks to write and average seek time
is 5 ms, we get stuck at 100 write transactions per second.
For a single cloud of Salesforce (it runs about 10 instances) the
update rate is more likely to be close the 1,000 updates/second than
to 100. A solution might be to save up all changes for, say, a 10
second interval and write out all changes as continuous as possible.
Another solution is to partition the data, but that complicates the
architecture and lengthens many code paths.
Flash-based SSD's are not a solution, as random write speeds tend to
be as low as conventional disks, if not worse.
Paul
elegance of combining a lean GCS with a lean database into a scalable
system. I guess I am johnny-come-lately with that view.
All,
Below some thoughts on performance.
It was suggested to me that Spread could do total ordering of 10-20
nodes in well less than a millisecond on common PC hardware.
Say it is 0.2 ms. That means it can handle up to 5,000 write
transactions per second, if the member nodes can keep up. If it is 1
ms, it still gets to a thousand. If a single node system can do 100,
the scale out factor would be 10 to 50. If we have one GCS server per
5 nodes we would scale out to 25 to 100 nodes before the GCS maxes
out. As the Kemme2000 paper showed good scaling out to 15 nodes, I
tend to believe the 0.2 ms more than the 1 ms.
Note that the above is for write transactions, read transactions
would scale out linearly, much like a MySQL master-slave setup.
Keeping up at the disk end is harder. If we take a write-everywhere
approach on a cloud with 1,000 write transactions per second, and an
averages diff size of 1kb, we need to send 1MB per second to the
disk. Easy for a continuous write, but quite hard if the writes are
random. If a single diff takes 2 seeks to write and average seek time
is 5 ms, we get stuck at 100 write transactions per second.
For a single cloud of Salesforce (it runs about 10 instances) the
update rate is more likely to be close the 1,000 updates/second than
to 100. A solution might be to save up all changes for, say, a 10
second interval and write out all changes as continuous as possible.
Another solution is to partition the data, but that complicates the
architecture and lengthens many code paths.
Flash-based SSD's are not a solution, as random write speeds tend to
be as low as conventional disks, if not worse.
Paul