Subject | Very large BLOB-dominant databases |
---|---|
Author | Robby Dermody |
Post date | 2004-10-18T19:05:34Z |
Hey guys, I hope this is the appropriate list for this question. :)
I was wondering if anyone on this list has heavily made use of BLOBs in
Firebird. I have a requirement of the construction of a voice audio file
logging and storage system. Individual audio files will compressed (ogg
or G.729a) and be probably in the range of 400KB - 3MB. The system must
be able to scale up to offer storage of hundreds of thousand of these
files. To do this, I was considering the following 3 possibilities:
1. Store the files straight on a database, using a journaling filesystem
that deals with small files pretty well such as ReiserFS. Files will be
stored in a hierarchy that allows quick traversal to the desired file by
the file's recorded date. Other methods of seeking (i.e. the entity ID
that recorded the file) could be kept in a separate filesystem hierarchy
and symlinked appropriately. This solution leads much to be desired, and
is suboptimal due to the potential for external misuse/abuse, potential
complexity of the directory structure, and lack of true metadata storage.
2. This solution would use Firebird (or some other RDBMS) in conjunction
with straight files on a file system. Metadata would be stored on the
database, and each DB record would reference the path and filename of
where the actual data on the filesystem is stored. This approach still
has multiple problems, one of the biggest being that the data on the
disk and the data on the database can get out of synch and/or change
independently of each other, with no strict versioning/transaction
control between them without something being bolted on (i.e. extra
work). However, with this approach over #1, we gain the ability to use
the richness of SQL to query the metadata. (A BIG plus.)
3. The 3rd solution would use Firebird all the way. Each column would
store both the metadata for a recorded entry, and the audio data itself.
Like #2, this allows richness of metadata (i.e. can handle how many
times the audio file has been requested, top requesters, etc) along with
the ability to query this metadata, good versioning support and data
integrity controls. The area that I wonder about is how much slower this
implementation would be in real use to #1 or #2, and if Firebird is
really up to the task of handling so much data. The fact sheet says that
it can potentially have databases of up to 64TB, and I would stick the
database files on something that is good for handling large files, such
as XFS. Ultimately data integrity is much more important to me than
speed. The read-write ratio will be around 1:5 (one select for every 5
inserts). If Firebird can be pretty speedy about BLOB serialization and
BLOB retrieval with databases that could reach hundreds of gigabytes in
size, and not really tax the system, this sounds like a good solution.
As it is now, I'm thinking #3 is the way to go....any input?
Thanks,
Robby
I was wondering if anyone on this list has heavily made use of BLOBs in
Firebird. I have a requirement of the construction of a voice audio file
logging and storage system. Individual audio files will compressed (ogg
or G.729a) and be probably in the range of 400KB - 3MB. The system must
be able to scale up to offer storage of hundreds of thousand of these
files. To do this, I was considering the following 3 possibilities:
1. Store the files straight on a database, using a journaling filesystem
that deals with small files pretty well such as ReiserFS. Files will be
stored in a hierarchy that allows quick traversal to the desired file by
the file's recorded date. Other methods of seeking (i.e. the entity ID
that recorded the file) could be kept in a separate filesystem hierarchy
and symlinked appropriately. This solution leads much to be desired, and
is suboptimal due to the potential for external misuse/abuse, potential
complexity of the directory structure, and lack of true metadata storage.
2. This solution would use Firebird (or some other RDBMS) in conjunction
with straight files on a file system. Metadata would be stored on the
database, and each DB record would reference the path and filename of
where the actual data on the filesystem is stored. This approach still
has multiple problems, one of the biggest being that the data on the
disk and the data on the database can get out of synch and/or change
independently of each other, with no strict versioning/transaction
control between them without something being bolted on (i.e. extra
work). However, with this approach over #1, we gain the ability to use
the richness of SQL to query the metadata. (A BIG plus.)
3. The 3rd solution would use Firebird all the way. Each column would
store both the metadata for a recorded entry, and the audio data itself.
Like #2, this allows richness of metadata (i.e. can handle how many
times the audio file has been requested, top requesters, etc) along with
the ability to query this metadata, good versioning support and data
integrity controls. The area that I wonder about is how much slower this
implementation would be in real use to #1 or #2, and if Firebird is
really up to the task of handling so much data. The fact sheet says that
it can potentially have databases of up to 64TB, and I would stick the
database files on something that is good for handling large files, such
as XFS. Ultimately data integrity is much more important to me than
speed. The read-write ratio will be around 1:5 (one select for every 5
inserts). If Firebird can be pretty speedy about BLOB serialization and
BLOB retrieval with databases that could reach hundreds of gigabytes in
size, and not really tax the system, this sounds like a good solution.
As it is now, I'm thinking #3 is the way to go....any input?
Thanks,
Robby