Subject Re: Google-like scoring in databases
Author Roman Rokytskyy
Hi,

> Google is designed to avoid disk seeks whenever possible, and this
> has had a considerable influence on the design of the data
> structures. A lexicon in memory is implemented in two parts: a list
> of words (concatenated together but separated by nulls) and a hash
> table of pointers to doclists.

From my point of view this is implementation detail. The do store
lexicon somewhere and then load it into memory. I doubt that they
build it each time from the scratch.

> The Google search engine has two important features that help it
> produce high precision results. First, it makes use of the link
> structure of the Web to calculate a quality ranking for each web
> page. This ranking is called PageRank. The main goal of Google is to
> improve the quality of web search engines.
>
> The PageRank of a page A is given as follows:
> PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
>
> We assume page A has pages T1...Tn which point to it . The parameter
> d is a damping factor which can be set between 0 and 1. We usually
> set d to 0.85. Also C(A) is defined as the number of links going out
> of page A.
>
> The PageRank of a page A is given as follows:
> PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

And that was exactly my question. In databases we do not have this
information stored somewhere explicitly (except, probably, foreigh key
relationship).

So I question, if PageRank is crucial for text search in relational
databases. If it is, how are we going to calculate our RelationRank?

If we say that PageRank is crucial and we do not find a way to model
it with relations (our RelationRank), then the whole idea of having
google-like search in Firebird is questionable: server will not
"understand" content and I suspect that we will not be able to
manipulate it neither from within stored procedures and UDFs nor from
regular statements.

Best regards,
Roman Rokytskyy