Subject | Re: [firebird-support] Re: Full Text Searching For Firebird |
---|---|
Author | Alexandre Benson Smith |
Post date | 2005-11-04T00:49:58Z |
martinthrelly wrote:
I had made a pilot project to do full text search.
The project goal is very simple, just on table.
I wrote a program to populate the database that consists in:
Insert the Document record
Decompose the document text and insert in a Document x Word Table
It works well, but are not generic... it works for that table in
special, and the documents should be inserted using that interface (the
program I wrote)
I will outline what I think that could be a better approach in a very
macro view:
1.) Create your tables normally
2.) Create a table that hold's what field for what table should be indexed
3.) Create a Table of Words, a table of "Words x Records" (that will
hold the Table Name, the Record ID and the word it contains and some
other relevant data to make the full text search, like position of this
word in the field to measure distance between words, relevance, etc.)
4.) Create an APP that will let you mark wich field in each table should
be indexed, that App will generate automatically triggers that posts
events for another application to listen
5.) Write an App that will listen for events and when receive an event
search for the record, look in the fields that should be indexed and
decompose the text into words populating the "Words x Records" Table
6.) Write an Stored Procedure that will receive some parameters like
word to search, bollean criteria for the search, in wich table to search
(or a list of tables, or all tables and so on) this Stored Procedure
will Search the "Words x Records" table and return the TableName, the
RecordID and the relevance of the data found
With this result of this Procedure you could in your "Search
Application" present a briefing of the data found and provide ways to go
to the real records
Other things you will need to consider:
Sinonyms
plural x singular
soundex or metaphone search
too common words (there is no sense to search for the word "the")
I am sure I forget to think about a bunch of other things that I
overlooked or even don't know about. And I am sure that this is far more
complex than what I think it is...
How well or if it will even work I don't know, the mechanism I did works
well (in my experiments) with small quantity of data and a fixed table,
don't know if with a LOT of data it will crash.
see you !
--
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br
>hello alexandre thanks for the replyJust a comment,
>
>our application is held on a server. ive researched a bit about
>dotlucene. however this article put me off a bit.
>
>http://discuss.joelonsoftware.com/default.asp?joel.3.230182.22
>
>basically they are saying that dotlucene is fine if your data is
>static. but if its being hit and updated all the time you risk
>concurrency issues between the database data and the dotlucene data.
>this for me is a big worry.
>
>what do you make of this? do you have any other suggestions? do you
>know if firebird intend to include a full text search in a future
>release? thanks
>
>
I had made a pilot project to do full text search.
The project goal is very simple, just on table.
I wrote a program to populate the database that consists in:
Insert the Document record
Decompose the document text and insert in a Document x Word Table
It works well, but are not generic... it works for that table in
special, and the documents should be inserted using that interface (the
program I wrote)
I will outline what I think that could be a better approach in a very
macro view:
1.) Create your tables normally
2.) Create a table that hold's what field for what table should be indexed
3.) Create a Table of Words, a table of "Words x Records" (that will
hold the Table Name, the Record ID and the word it contains and some
other relevant data to make the full text search, like position of this
word in the field to measure distance between words, relevance, etc.)
4.) Create an APP that will let you mark wich field in each table should
be indexed, that App will generate automatically triggers that posts
events for another application to listen
5.) Write an App that will listen for events and when receive an event
search for the record, look in the fields that should be indexed and
decompose the text into words populating the "Words x Records" Table
6.) Write an Stored Procedure that will receive some parameters like
word to search, bollean criteria for the search, in wich table to search
(or a list of tables, or all tables and so on) this Stored Procedure
will Search the "Words x Records" table and return the TableName, the
RecordID and the relevance of the data found
With this result of this Procedure you could in your "Search
Application" present a briefing of the data found and provide ways to go
to the real records
Other things you will need to consider:
Sinonyms
plural x singular
soundex or metaphone search
too common words (there is no sense to search for the word "the")
I am sure I forget to think about a bunch of other things that I
overlooked or even don't know about. And I am sure that this is far more
complex than what I think it is...
How well or if it will even work I don't know, the mechanism I did works
well (in my experiments) with small quantity of data and a fixed table,
don't know if with a LOT of data it will crash.
see you !
--
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br