Subject | Re: Full Text Search |
---|---|
Author | Roman Rokytskyy |
Post date | 2005-02-06T11:17:59Z |
Jim,
We have discussed this issue approx. 1,5 years ago. The next round :)
other approaches do no work, people want to search the database, not
tables. My argument against your solution was that the result of the
search is not a relation, or more correctly, not a relation in first
normal form.
I have probably similar, but, from my point of view, more correct
idea. The full-text index should cover entities (remember ER-model?).
We index entities, we formulate queries against the entities, we
retrive entities. The question is: how to translate that into SQL?
What do you think about this approach?
to achieve. If that content is stored in the BLOB, we have already a
concept of BLOB filter. Just define a "searchable" BLOB type and
corresponding "HTML", "PDF", "RTF" BLOB types. The conversion between
that datatypes is done by filter.
Roman
We have discussed this issue approx. 1,5 years ago. The next round :)
> 1. The general case is multi-table, multi-column search, whichCan you explain this requirement? The last time you have argued that
> will need an API extension.
other approaches do no work, people want to search the database, not
tables. My argument against your solution was that the result of the
search is not a relation, or more correctly, not a relation in first
normal form.
I have probably similar, but, from my point of view, more correct
idea. The full-text index should cover entities (remember ER-model?).
We index entities, we formulate queries against the entities, we
retrive entities. The question is: how to translate that into SQL?
What do you think about this approach?
> 2. Search should be restrictable to an explicit set of tablesEasy.
> 3. Multi-term and word proximity search must be supported (thisEasy.
> includes phrases)
> 4. Word stem matching should be supported (ie support*)Easy.
> 5. Alta Vista rules for case matching are nice, but probably notDo not know yet.
> required
> 6. Support for application specific "stop" words (words tooEasy.
> common to fetch from index -- a, the, and, etc.) is important
> 7. Result ordering by "score" is very important.Relatively easy.
> 8. A filter boolean operator that can be mixed with SQL booleanDo not know yet.
> operators is important.
> 9. Search indexing should be html-awareNot only HTML-aware, but also XML, RTF, MS Word, etc. But that is easy
to achieve. If that content is stored in the BLOB, we have already a
concept of BLOB filter. Just define a "searchable" BLOB type and
corresponding "HTML", "PDF", "RTF" BLOB types. The conversion between
that datatypes is done by filter.
> I use a model where any field can be declared "searchable" (otherThat's what I have been thinking about too.
> fields are not considered in search).
Roman