Subject Re: [firebird-support] Re: Dynamic Variable Instantiation Within A Stored Procedure
Author Alexandre Benson Smith
Adam wrote:

>>:-)
>>
>>I was talking about the "Word x Record" table, it will have a lot of
>>duplicates since a lot of records will have the word "Alexandre" for
>>example.
>>
>>
>
>LOL,
>
>Yes, the problem you will face here is there will be a foreign key to
>the words table, and because there have got to be no less than 100000
>books with the title "Alexandre" in any collection, you are not going
>to be able to use the PK at the end work around I don't think.
>
>
Or don't use declarative FK and make it procedural with triggers.

And I am not talking about book titles, as I understand full text search
has it's beneffits when the text is indexed too, so I think there will
be a bunch of words that will appear a gazzilion times (like "the",
"of", "by" and so on) this words should be treated as "too common" to be
searched, but any word will appear a thousand times (even the words that
are not common if the "text" of an article or book will be indexed)

>I imagine you could create a wordexception table, where the word is
>only added to words if it is not in the wordexception table. Using
>this approach, common words like Alexandre will not influence search
>results.
>
>
Yes I created it in my pet project

>The import could be done with nothing in the word exception table.
>
>Then a query like
>
>select first 100 w.value, count(*)
>from words w
>join bookword bw on (w.id = bw.wordsid)
>group by w.value
>having count(*) > 10
>order by 2 desc
>
>
>
Did something liek that too, but used a value far high than 10.

>Would start to give you some idea about which words should be
>considered exceptions.
>
>
In my pet project I indexed 101 articles about Firebird in portuguese
language.

Too Common Words = 81
Total Words (unique) 13717
Total Word x Document = 115081

the above numbers are after removing the too common words, I don't have
the numbers with it any more, but are a bit high !

After removing it the most common words are:
database = 787 times
table = 670 times
server = 612 times
transaction = 470 times

here you can see a very simple web interface to search the articles
http://www.thorsoftware.com.br/ric

it's in portuguese, but you could try some common words in english
(transaction procedure classic super server) and so on to see the
results. after the first run it get fast response.

try putting in the search field
classic super server

the procedure that search and classify by relevance needs a lot of work yet.

Jokes apart...

Alexandre is a name *extremelly* common in Brazil, this is why I used it :-)

>Adam
>
>
>
see you !

--
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br