Subject | Re: Google-like scoring in databases |
---|---|
Author | Roman Rokytskyy |
Post date | 2003-07-01T16:12:54Z |
> ... The page ranking is a second score that attemptsWhich happens to be the hand-made ranking (even by a vast number of
> to weight the results with regard to what other sites thought
> useful...
hands) :)
> Massive number of duplicates is a difficult problem -- searchAnd you can change the weight depending on the place where you have
> refinement is almost always the right solution. Searching the
> IBPhoenix for "blob" or "blobs" is going get a large number of
> duplicates hard to differentiate, the "blob seek" or "blob
> subtypes" or "blob filters" will probably do the trick. But
> even in the case of "blobs", a scoring scheme based on the
> references weighted by the inverse of the word number will
> do a very good just of find general discussion articles.
found catch words (for example, headline match would get more weight
than content match).
I think separate test is needed, if we can get pretty good
selectivness/relevance of database indexing by using only lexical
information. From my experience of intergrating ht://dig with
CoreMedia content management system, you can get good results without
using references between documents.
Does anybody have suggestions regarding the test database?
Best regards,
Roman Rokytskyy