Subject RE: [Firebird-general] Re: Support for Tablespaces/Data-placement in Firebird?
Author Steve Summers
From: Firebird-general@yahoogroups.com
[mailto:Firebird-general@yahoogroups.com] On Behalf Of Alexandre Benson
Smith
Sent: Friday, September 11, 2009 03:07 PM
To: Firebird-general@yahoogroups.com
Subject: Re: [Firebird-general] Re: Support for Tablespaces/Data-placement
in Firebird?

Paul,

I never did such experiment, but I will try to explain what I think
about it. Of course the text below is a very simplistic way to expose
the idea, forgive-me FB developers to simplify your hard work so much :)
<snip>

I've spent a lot of time studying these issues to try to maximize
performance of large queries for our customers. Here's what I've found,
from reading lots of articles (although that was a couple years ago, so my
facts may have drifted away from reality just a bit. J



The issue really isn't throughput (although that matters in other ways) -
it's seek time.



Consider this scenario - two users issue queries at the exact same moment-
queries that take several seconds to pull all the necessary data.

Because both queries are running simultanously, in separate instances (I'm
assuming classic architecture), the operating system is going to make them
take turns. So instead of each query obtaining the index page pointer list
and then accessing the needed pages sequentially, from the same area of the
disk, the two queries will alternating between accessing data from what
could be widely separated parts of the disk, and spending a lot more time
moving the read head than if they were not alternating.



This is where the solid state (flash) drive is valuable.



Hard drives rate seek time as the time it takes to move the read head across
1/3 of the disk surface. Typical values are 8-12 milliseconds. In the real
world, seeks rarely move that far - disk drives have several surfaces,
defrag software tries to keep parts of the file close together, and your
puny 10GB database file is only taking 1/50 of your 500GB, $65 hard drive
anyway. So maybe the real-world time is more like 2 milliseconds. But there
can still be a lot of seeks, and they add up.



There's also the issue of rotational latency - on the average, the sector
you want is half way around the disk when you want it. A 7200 RPM drive
spins 120 times per second, so you wait an average of 1/240th of a second
(about 4ms) for the data if it's truly random access. It generally isn't -
Firebird sorts pages and seeks in page order (As I understand it, anyway),
so it's likely the next sector you want will be the one coming up next.



In contrast, SSDs rate seek time as the time it takes for the controller to
figure out where in the flash memory cells your "sector" is stored, and
typical values are 1/10 of a millisecond or less. So even assuming an
unfragmented, small disk fraction file, it's still at least 20 times faster
than the magnetic drive, and counting rotational latency, it could be 100+
times faster.



Also note that this situation isn't improved by RAID arrays. Although the
effect is small, RAID arrays have to wait for the longest seek time of the
multiple drives. Since they're not synchronized, the rotational latency can
be worse than for single drives. Also, if there's anything that makes one
take longer than the others (periodic thermal recalibration, for example),
all of them have to wait, so a RAID array of 4 drives will experience that
extra wasted time 4 times more often.



Of course, if you have enough RAM, none of this matters. Current Core I7
chip motherboards can be loaded up with 12GB of RAM for a few hundred
dollars. If your database is 10GB, the whole thing will fit in the OS's disk
cache and seek time (and transfer speeds) become irrelevant.



Anyway, the point of all this is that the primary benefit of putting
different parts of the data on different drives is to keep the disk heads
from having to move around as much, minimizing the wasted seek times. SSDs
and/or lots of RAM totally eliminate that problem (for reads, anyway. I
won't talk about writes because this is already too long.)





[Non-text portions of this message have been removed]