Subject | Re: [Firebird-Architect] Databases on NFS shares |
---|---|
Author | Paul Beach |
Post date | 2003-06-16T07:35:32Z |
FYI
<<As of this date (Tue Nov 20 1990), we do not use, or work with, NFS
so much as we *work around* its deficiencies. Here's why:
1. While NFS provides a way for machines to access remote files as if
they were local, it does not provide a way to *coordinate* that
access so that multiple writers don't clobber each other's work. There
is a facility called the Lock Daemon, but it is inadequate for InterBase's
needs.
At the very least, it does not provide the necessary different locking
modes that we need. So, we have our own lock manager that doles out
locks on database files, relations in the databases, and so forth. Since
this process must run on the node where the database resides, application
code must connect to it via TCP. Having gone that far, it seems reasonable
to just put a request server on that same node, and have it do all the
access *and* talk with the local lock manager.
As of this date, research and development has been done on a "page/lock
manager" that would respond to requests for locks and pages but do no
actual relational requests. For now, work on that has been stopped.
It's entirely possible that in a the future, a page/lock server, or a
lock server, may be made available for use in certain circumstances.
2. The internal structure or format of the database -- what we call the
On-Disk Structure, or ODS -- depends on the architecture of the machine
that is accessing the database. This is primarily because different machines
have different alignment requirements. For example, the Sun 3 is fairly
flexible in this regard, while the Sparc has quite stringent requirements.
Data that is closely packed in the database file can be loaded directly
into memory on a Sun 3, but the same data packed the same way and loaded
directly into memory on a Sparc would cause kernel errors. So, a database
created by a Sun 3 cannot be read (or written!) by a Sparc, and vice versa.
To guarantee that the correct access method always accesses a particular
database, if we notice that a database file is located in an NFS mounted
partition, we instead establish a TCP connection to gds_inet_server on the
node where the database actually resides, and that server will know how
to read and write that database.
Of course, in a truly homogeneous NFS environment, this would not be
a problem, and this reason for bypassing NFS would be irrelevant.
However, since we cannot know a priori that this is so, we uniformly
use this approach.
3. Direct access to NFS-mounted remote databases would in many, if not most,
cases be inefficient. Consider two nodes, A and B, with a database
located on node B whose file is accessible on node A by means of NFS.
Suppose that the database is large -- perhaps a 100,000 records of 100 bytes
apiece. That's at least 10,000,000 bytes. Now suppose that the request is
something like
SELECT (A,B) FROM R WHERE R.A < 500
and suppose that this represents only 1% of all the records,
and only 20 bytes of each record that matches.
To get those 20,000 bytes, the access method will have to read *the entire
10 megabytes of the database* each time this request is issued.
Those 10Mb will have to be transferred over the Ethernet (or whatever
network underlies TCP), clogging up the network at all levels.
Then node A will have to do 10Mb worth of network I/O to receive it,
and still have to crunch through all 10Mb to find the 1,000 matching records
and extract the 20 bytes the user wanted from each one. What a waste!
Instead, by going through a request-based server, a small request (<1kb)
is sent from node A to the server on node B. The server on node B does
the grunt work of fishing through the entire database, taking advantage
of efficient local disk-block caching, to find the 1K matching records
and return from each one the 20 bytes the user asked for!
So, direct db access via NFS would be enormously wasteful of network
bandwidth
and of machine resources. Using a server avoids this.
A side benefit is that the client machines, which are probably many,
need not be very powerful and hence not very expensive, because they
no one of them is doing very much of the hard work. Only the server
machine(s), of which there are probably few, need to be powerful and
hence expensive. If direct NFS access were allowed, they would *all*
have to be powerful enough to do the grunt work of accessing the database
directly.
Of course, this method *does* result in a potential bottleneck,
since *all* the grunt work is being done by the hapless machine on which
the server is running. For this reason (among others), the alternative
server architectures described above may some day be resurrected.>>
Regards
Paul
<<As of this date (Tue Nov 20 1990), we do not use, or work with, NFS
so much as we *work around* its deficiencies. Here's why:
1. While NFS provides a way for machines to access remote files as if
they were local, it does not provide a way to *coordinate* that
access so that multiple writers don't clobber each other's work. There
is a facility called the Lock Daemon, but it is inadequate for InterBase's
needs.
At the very least, it does not provide the necessary different locking
modes that we need. So, we have our own lock manager that doles out
locks on database files, relations in the databases, and so forth. Since
this process must run on the node where the database resides, application
code must connect to it via TCP. Having gone that far, it seems reasonable
to just put a request server on that same node, and have it do all the
access *and* talk with the local lock manager.
As of this date, research and development has been done on a "page/lock
manager" that would respond to requests for locks and pages but do no
actual relational requests. For now, work on that has been stopped.
It's entirely possible that in a the future, a page/lock server, or a
lock server, may be made available for use in certain circumstances.
2. The internal structure or format of the database -- what we call the
On-Disk Structure, or ODS -- depends on the architecture of the machine
that is accessing the database. This is primarily because different machines
have different alignment requirements. For example, the Sun 3 is fairly
flexible in this regard, while the Sparc has quite stringent requirements.
Data that is closely packed in the database file can be loaded directly
into memory on a Sun 3, but the same data packed the same way and loaded
directly into memory on a Sparc would cause kernel errors. So, a database
created by a Sun 3 cannot be read (or written!) by a Sparc, and vice versa.
To guarantee that the correct access method always accesses a particular
database, if we notice that a database file is located in an NFS mounted
partition, we instead establish a TCP connection to gds_inet_server on the
node where the database actually resides, and that server will know how
to read and write that database.
Of course, in a truly homogeneous NFS environment, this would not be
a problem, and this reason for bypassing NFS would be irrelevant.
However, since we cannot know a priori that this is so, we uniformly
use this approach.
3. Direct access to NFS-mounted remote databases would in many, if not most,
cases be inefficient. Consider two nodes, A and B, with a database
located on node B whose file is accessible on node A by means of NFS.
Suppose that the database is large -- perhaps a 100,000 records of 100 bytes
apiece. That's at least 10,000,000 bytes. Now suppose that the request is
something like
SELECT (A,B) FROM R WHERE R.A < 500
and suppose that this represents only 1% of all the records,
and only 20 bytes of each record that matches.
To get those 20,000 bytes, the access method will have to read *the entire
10 megabytes of the database* each time this request is issued.
Those 10Mb will have to be transferred over the Ethernet (or whatever
network underlies TCP), clogging up the network at all levels.
Then node A will have to do 10Mb worth of network I/O to receive it,
and still have to crunch through all 10Mb to find the 1,000 matching records
and extract the 20 bytes the user wanted from each one. What a waste!
Instead, by going through a request-based server, a small request (<1kb)
is sent from node A to the server on node B. The server on node B does
the grunt work of fishing through the entire database, taking advantage
of efficient local disk-block caching, to find the 1K matching records
and return from each one the 20 bytes the user asked for!
So, direct db access via NFS would be enormously wasteful of network
bandwidth
and of machine resources. Using a server avoids this.
A side benefit is that the client machines, which are probably many,
need not be very powerful and hence not very expensive, because they
no one of them is doing very much of the hard work. Only the server
machine(s), of which there are probably few, need to be powerful and
hence expensive. If direct NFS access were allowed, they would *all*
have to be powerful enough to do the grunt work of accessing the database
directly.
Of course, this method *does* result in a potential bottleneck,
since *all* the grunt work is being done by the hapless machine on which
the server is running. For this reason (among others), the alternative
server architectures described above may some day be resurrected.>>
Regards
Paul