Subject | "Check out" procedure on table for multiple clients |
---|---|
Author | blueridge77772000 |
Post date | 2005-09-04T16:53:02Z |
This question probably has a horrifically simple answer but I'm having
difficulty conceptualizing the proper way to handle it.
Here's an analogy to keep things simple:
The job is to spellcheck books. A master table keeps a list of
authors (assume uniqueness). A second table contains a record for
each book title and a blob containing the text of each book
(author-to-books is simply one-to-many so a key back to the author in
the master table as well).
A client needs to "check out" an author and process all books by that
author in a batch fashion (i.e. held in client's memory). The reason
for author "granularity" is that some processing will be dependent on
other books' data processed by the same author in the batch.
After spell checking is completed on all books by the author, an
update will be performed to revise each book record's blob with the
corrected text. When all books from a given author have been
committed, a new author from the master table will be selected by the
client.
To start, five clients will be processing at the same time. The check
out process means that there needs to be a method of flagging the
author's record so that each client is guaranteed not to duplicate
work checked out by another client. The processing time will vary by
author; Dostoevsky will take longer to process than Dr. Seuss.
However, the process needs to be bulletproof so there is no chance
of two clients selecting the same author if the requests are
simultaneous.
The easy way out would be to break up the table of authors and
assigning them to each machine up front. However, I need to be able
to add clients as time goes by. Ideally, a new machine would be put
online and jump into the fray by just selecting the next unprocessed
author from the master table.
(Btw, note that in reality, each "book" in my analogy will average a
few hundred bytes of non-text data, will take 4 seconds to process
each one and there are 30 million records in the "books" table.)
Is the proper way to handle this from the client or server side?
From the client side, is it simply adding a checkedout field to the
author table combined with and update/select statement within a
transaction to provide locking? Or Firebird's stored
procedure/trigger functionality and a generator that doles out the
author upon a client's request?
Any guidance appreciated.
Steve
difficulty conceptualizing the proper way to handle it.
Here's an analogy to keep things simple:
The job is to spellcheck books. A master table keeps a list of
authors (assume uniqueness). A second table contains a record for
each book title and a blob containing the text of each book
(author-to-books is simply one-to-many so a key back to the author in
the master table as well).
A client needs to "check out" an author and process all books by that
author in a batch fashion (i.e. held in client's memory). The reason
for author "granularity" is that some processing will be dependent on
other books' data processed by the same author in the batch.
After spell checking is completed on all books by the author, an
update will be performed to revise each book record's blob with the
corrected text. When all books from a given author have been
committed, a new author from the master table will be selected by the
client.
To start, five clients will be processing at the same time. The check
out process means that there needs to be a method of flagging the
author's record so that each client is guaranteed not to duplicate
work checked out by another client. The processing time will vary by
author; Dostoevsky will take longer to process than Dr. Seuss.
However, the process needs to be bulletproof so there is no chance
of two clients selecting the same author if the requests are
simultaneous.
The easy way out would be to break up the table of authors and
assigning them to each machine up front. However, I need to be able
to add clients as time goes by. Ideally, a new machine would be put
online and jump into the fray by just selecting the next unprocessed
author from the master table.
(Btw, note that in reality, each "book" in my analogy will average a
few hundred bytes of non-text data, will take 4 seconds to process
each one and there are 30 million records in the "books" table.)
Is the proper way to handle this from the client or server side?
From the client side, is it simply adding a checkedout field to the
author table combined with and update/select statement within a
transaction to provide locking? Or Firebird's stored
procedure/trigger functionality and a generator that doles out the
author upon a client's request?
Any guidance appreciated.
Steve