Subject | Re: Very large BLOB-dominant databases |
---|---|
Author | sboydlns |
Post date | 2004-10-19T15:41:38Z |
I have implement a document imaging system with requirements very
similar to yours. I went for option #3, the all firebird route.
Overall I have been please with the result. Image blobs run from 30K
to about 2MB. Respone time is certainly adequate. I have had
occasional database corruption problems but these can usually be
traced back to end user stupidity. These system are installed in
small companies with no IT staff and not even a real computer room so
the systems are prone to all kinds of abuse. I had to write my own
icremental backup routines though. The largest DB right now is in the
200MB range and it is impractical to back up that much data every day.
--- In Firebird-general@yahoogroups.com, Robby Dermody <robbyd@p...>
wrote:
similar to yours. I went for option #3, the all firebird route.
Overall I have been please with the result. Image blobs run from 30K
to about 2MB. Respone time is certainly adequate. I have had
occasional database corruption problems but these can usually be
traced back to end user stupidity. These system are installed in
small companies with no IT staff and not even a real computer room so
the systems are prone to all kinds of abuse. I had to write my own
icremental backup routines though. The largest DB right now is in the
200MB range and it is impractical to back up that much data every day.
--- In Firebird-general@yahoogroups.com, Robby Dermody <robbyd@p...>
wrote:
> Hey guys, I hope this is the appropriate list for this question. :)storage.
>
> I was wondering if anyone on this list has heavily made use of BLOBs in
> Firebird. I have a requirement of the construction of a voice audio file
> logging and storage system. Individual audio files will compressed (ogg
> or G.729a) and be probably in the range of 400KB - 3MB. The system must
> be able to scale up to offer storage of hundreds of thousand of these
> files. To do this, I was considering the following 3 possibilities:
>
> 1. Store the files straight on a database, using a journaling filesystem
> that deals with small files pretty well such as ReiserFS. Files will be
> stored in a hierarchy that allows quick traversal to the desired file by
> the file's recorded date. Other methods of seeking (i.e. the entity ID
> that recorded the file) could be kept in a separate filesystem hierarchy
> and symlinked appropriately. This solution leads much to be desired, and
> is suboptimal due to the potential for external misuse/abuse, potential
> complexity of the directory structure, and lack of true metadata
>
> 2. This solution would use Firebird (or some other RDBMS) in conjunction
> with straight files on a file system. Metadata would be stored on the
> database, and each DB record would reference the path and filename of
> where the actual data on the filesystem is stored. This approach still
> has multiple problems, one of the biggest being that the data on the
> disk and the data on the database can get out of synch and/or change
> independently of each other, with no strict versioning/transaction
> control between them without something being bolted on (i.e. extra
> work). However, with this approach over #1, we gain the ability to use
> the richness of SQL to query the metadata. (A BIG plus.)
>
> 3. The 3rd solution would use Firebird all the way. Each column would
> store both the metadata for a recorded entry, and the audio data itself.
> Like #2, this allows richness of metadata (i.e. can handle how many
> times the audio file has been requested, top requesters, etc) along with
> the ability to query this metadata, good versioning support and data
> integrity controls. The area that I wonder about is how much slower this
> implementation would be in real use to #1 or #2, and if Firebird is
> really up to the task of handling so much data. The fact sheet says that
> it can potentially have databases of up to 64TB, and I would stick the
> database files on something that is good for handling large files, such
> as XFS. Ultimately data integrity is much more important to me than
> speed. The read-write ratio will be around 1:5 (one select for every 5
> inserts). If Firebird can be pretty speedy about BLOB serialization and
> BLOB retrieval with databases that could reach hundreds of gigabytes in
> size, and not really tax the system, this sounds like a good solution.
>
> As it is now, I'm thinking #3 is the way to go....any input?
>
> Thanks,
>
> Robby