Subject | Re: [Firebird-Architect] Full Text Search |
---|---|
Author | Lester Caine |
Post date | 2005-02-05T20:46:19Z |
Jim Starkey wrote:
And I want a mechanism that can be expanded to cover content of external
files.
for comparison - and how not to do it ;)
table,field,record_no and position in record?
order and contains exact phrase has the highest score?
grouping function?
important. Find the recent occurrences first.
for this with the results list, and then allow tighter searches?
added/updated in the database - with the extension that allows scanning
an external file when it's name or modified date is changed.
additional functions to help an extension to the core engine rather than
something that is a totally integral function. So that application
specific changes can be included?
--
Lester Caine
-----------------------------
L.S.Caine Electronic Services
> Lester Caine wrote:I think we agree there
>>Hopefully this starts fleshing out the question - "What is Full Text
>>Search?"
>
> Most database people approach search as a SQL boolean. This is too
> restrictive to be useful. Think Google, not SQL.
And I want a mechanism that can be expanded to cover content of external
files.
> Unless Monty has had a major change in attitude, don't use MySQL as a model.I think I'd be looking MSSQL rather than MySQL I just included the notes
for comparison - and how not to do it ;)
> Base on my experience with database based search:Limit the fields that are 'indexed' - location of a word has
>
> 1. The general case is multi-table, multi-column search, which will
> need an API extension.
> 2. Search should be restrictable to an explicit set of tables
table,field,record_no and position in record?
> 3. Multi-term and word proximity search must be supported (thisContains all words has a lower score than contains all words in the same
> includes phrases)
order and contains exact phrase has the highest score?
> 4. Word stem matching should be supported (ie support*)In addition to soundex
> 5. Alta Vista rules for case matching are nice, but probably notCharacter sets add to the fun :)
> required
> 6. Support for application specific "stop" words (words too common toPerhaps a flag on the 'dictionary? Or a separate table with some sort of
> fetch from index -- a, the, and, etc.) is important
grouping function?
> 7. Result ordering by "score" is very important.Another area of 'fun' - last modified date of a record could be as
important. Find the recent occurrences first.
> 8. A filter boolean operator that can be mixed with SQL booleanIs that my - search existing results? I think we need a temporary table
> operators is important.
for this with the results list, and then allow tighter searches?
> 9. Search indexing should be html-awareAnd XML - filter to remove tags as an extension to the stop words?
> I use a model where any field can be declared "searchable" (other fieldsWhat does the search index consist of?
> are not considered in search). I keep a single search index per
> database. I'm considering changing this to search index per schema, but
> haven't yet made up my mind.
>>A simple starting point from my viewpoint is a convention onI would expect to add entries to the search index as records are
>>numbering/identifying words in a document, be that a BLOB or an external
>>file, so that we have a standard pointer method. Doc27/Word32 - so that
>>if two words are being looked for we can match adjacent occurrences or
>>any occurrence of the words.
>
> Exhaustive search is too expensive to even think about. Given a word
> index with word position, it's cheaper to evaluate a given record
> against the index than to apply an operator against a blob. Computing a
> record number bitmap from the word index is definitely the way to go.
added/updated in the database - with the extension that allows scanning
an external file when it's name or modified date is changed.
>>How does this fit in with other peoples thinking?I think we agree there as well, but aren't we just talking about
>>
> This is an area where the database world has dropped the ball big time.
> The entire world navigates the web starting at Google, but no mainstream
> database system supports the most basic multi-table search. Did the
> database world miss the web altogether?
additional functions to help an extension to the core engine rather than
something that is a totally integral function. So that application
specific changes can be included?
--
Lester Caine
-----------------------------
L.S.Caine Electronic Services