Subject | Re: [firebird-support] Re: Dynamic Variable Instantiation Within A Stored Procedure |
---|---|
Author | Alexandre Benson Smith |
Post date | 2005-11-09T01:07:58Z |
Adam wrote:
And I am not talking about book titles, as I understand full text search
has it's beneffits when the text is indexed too, so I think there will
be a bunch of words that will appear a gazzilion times (like "the",
"of", "by" and so on) this words should be treated as "too common" to be
searched, but any word will appear a thousand times (even the words that
are not common if the "text" of an article or book will be indexed)
language.
Too Common Words = 81
Total Words (unique) 13717
Total Word x Document = 115081
the above numbers are after removing the too common words, I don't have
the numbers with it any more, but are a bit high !
After removing it the most common words are:
database = 787 times
table = 670 times
server = 612 times
transaction = 470 times
here you can see a very simple web interface to search the articles
http://www.thorsoftware.com.br/ric
it's in portuguese, but you could try some common words in english
(transaction procedure classic super server) and so on to see the
results. after the first run it get fast response.
try putting in the search field
classic super server
the procedure that search and classify by relevance needs a lot of work yet.
Jokes apart...
Alexandre is a name *extremelly* common in Brazil, this is why I used it :-)
--
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br
>>:-)Or don't use declarative FK and make it procedural with triggers.
>>
>>I was talking about the "Word x Record" table, it will have a lot of
>>duplicates since a lot of records will have the word "Alexandre" for
>>example.
>>
>>
>
>LOL,
>
>Yes, the problem you will face here is there will be a foreign key to
>the words table, and because there have got to be no less than 100000
>books with the title "Alexandre" in any collection, you are not going
>to be able to use the PK at the end work around I don't think.
>
>
And I am not talking about book titles, as I understand full text search
has it's beneffits when the text is indexed too, so I think there will
be a bunch of words that will appear a gazzilion times (like "the",
"of", "by" and so on) this words should be treated as "too common" to be
searched, but any word will appear a thousand times (even the words that
are not common if the "text" of an article or book will be indexed)
>I imagine you could create a wordexception table, where the word isYes I created it in my pet project
>only added to words if it is not in the wordexception table. Using
>this approach, common words like Alexandre will not influence search
>results.
>
>
>The import could be done with nothing in the word exception table.Did something liek that too, but used a value far high than 10.
>
>Then a query like
>
>select first 100 w.value, count(*)
>from words w
>join bookword bw on (w.id = bw.wordsid)
>group by w.value
>having count(*) > 10
>order by 2 desc
>
>
>
>Would start to give you some idea about which words should beIn my pet project I indexed 101 articles about Firebird in portuguese
>considered exceptions.
>
>
language.
Too Common Words = 81
Total Words (unique) 13717
Total Word x Document = 115081
the above numbers are after removing the too common words, I don't have
the numbers with it any more, but are a bit high !
After removing it the most common words are:
database = 787 times
table = 670 times
server = 612 times
transaction = 470 times
here you can see a very simple web interface to search the articles
http://www.thorsoftware.com.br/ric
it's in portuguese, but you could try some common words in english
(transaction procedure classic super server) and so on to see the
results. after the first run it get fast response.
try putting in the search field
classic super server
the procedure that search and classify by relevance needs a lot of work yet.
Jokes apart...
Alexandre is a name *extremelly* common in Brazil, this is why I used it :-)
>Adamsee you !
>
>
>
--
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br