Subject | Re: Google-like scoring in databases |
---|---|
Author | Roman Rokytskyy |
Post date | 2003-07-01T14:40:09Z |
Hi,
lexicon somewhere and then load it into memory. I doubt that they
build it each time from the scratch.
information stored somewhere explicitly (except, probably, foreigh key
relationship).
So I question, if PageRank is crucial for text search in relational
databases. If it is, how are we going to calculate our RelationRank?
If we say that PageRank is crucial and we do not find a way to model
it with relations (our RelationRank), then the whole idea of having
google-like search in Firebird is questionable: server will not
"understand" content and I suspect that we will not be able to
manipulate it neither from within stored procedures and UDFs nor from
regular statements.
Best regards,
Roman Rokytskyy
> Google is designed to avoid disk seeks whenever possible, and thisFrom my point of view this is implementation detail. The do store
> has had a considerable influence on the design of the data
> structures. A lexicon in memory is implemented in two parts: a list
> of words (concatenated together but separated by nulls) and a hash
> table of pointers to doclists.
lexicon somewhere and then load it into memory. I doubt that they
build it each time from the scratch.
> The Google search engine has two important features that help itAnd that was exactly my question. In databases we do not have this
> produce high precision results. First, it makes use of the link
> structure of the Web to calculate a quality ranking for each web
> page. This ranking is called PageRank. The main goal of Google is to
> improve the quality of web search engines.
>
> The PageRank of a page A is given as follows:
> PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
>
> We assume page A has pages T1...Tn which point to it . The parameter
> d is a damping factor which can be set between 0 and 1. We usually
> set d to 0.85. Also C(A) is defined as the number of links going out
> of page A.
>
> The PageRank of a page A is given as follows:
> PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
information stored somewhere explicitly (except, probably, foreigh key
relationship).
So I question, if PageRank is crucial for text search in relational
databases. If it is, how are we going to calculate our RelationRank?
If we say that PageRank is crucial and we do not find a way to model
it with relations (our RelationRank), then the whole idea of having
google-like search in Firebird is questionable: server will not
"understand" content and I suspect that we will not be able to
manipulate it neither from within stored procedures and UDFs nor from
regular statements.
Best regards,
Roman Rokytskyy