Subject | Re: [ib-support] Serialization? |
---|---|
Author | Matteo Giacomazzi |
Post date | 2002-07-05T15:42:22Z |
Hi Martijn,
Friday, July 05, 2002, you wrote:
MT> There are many ways to solve such a problem. One way to make sure
MT> that only ONE thread is downloading from a single host is simply
MT> putting a constraint on the host name and continueing to the next
MT> host if you cannot insert into the table - that way you know that
MT> some other thread is already searching that host.
Oh, sorry, I guess I've been misunderstood.
When a thread gets a document, it parses it and extracts all the links
it contains. Then, it inserts those links in the database for further
retrieval.
So, the fact that a host is inserted in the table doesn't mean there
is a thread retrieving documents from it! It only means that one (or
more) URL is based on that host.
That's the source of my problem: an URL can be used for retrieval if
its HOST is "marked" as "free". But if two threads try to get a URL in
the same time, they could update the HOST row to mark it as "no more
free" whitout any problem! That's because they could both find it free
with the first query.
Maybe if I perform a SELECT FOR UPDATE I can solve the problem?
SELECT whatever_I_need
FROM HOST
WHERE FREE = 'Y'
FOR UPDATE
Could it work?
Kind regards,
--
Matteo
mailto:matteo.giacomazzi@...
ICQ# 24075529
Friday, July 05, 2002, you wrote:
MT> There are many ways to solve such a problem. One way to make sure
MT> that only ONE thread is downloading from a single host is simply
MT> putting a constraint on the host name and continueing to the next
MT> host if you cannot insert into the table - that way you know that
MT> some other thread is already searching that host.
Oh, sorry, I guess I've been misunderstood.
When a thread gets a document, it parses it and extracts all the links
it contains. Then, it inserts those links in the database for further
retrieval.
So, the fact that a host is inserted in the table doesn't mean there
is a thread retrieving documents from it! It only means that one (or
more) URL is based on that host.
That's the source of my problem: an URL can be used for retrieval if
its HOST is "marked" as "free". But if two threads try to get a URL in
the same time, they could update the HOST row to mark it as "no more
free" whitout any problem! That's because they could both find it free
with the first query.
Maybe if I perform a SELECT FOR UPDATE I can solve the problem?
SELECT whatever_I_need
FROM HOST
WHERE FREE = 'Y'
FOR UPDATE
Could it work?
Kind regards,
--
Matteo
mailto:matteo.giacomazzi@...
ICQ# 24075529