Subject Re: Full Text Search
Author Roman Rokytskyy
Jim,

We have discussed this issue approx. 1,5 years ago. The next round :)

> 1. The general case is multi-table, multi-column search, which
> will need an API extension.

Can you explain this requirement? The last time you have argued that
other approaches do no work, people want to search the database, not
tables. My argument against your solution was that the result of the
search is not a relation, or more correctly, not a relation in first
normal form.

I have probably similar, but, from my point of view, more correct
idea. The full-text index should cover entities (remember ER-model?).
We index entities, we formulate queries against the entities, we
retrive entities. The question is: how to translate that into SQL?

What do you think about this approach?

> 2. Search should be restrictable to an explicit set of tables

Easy.

> 3. Multi-term and word proximity search must be supported (this
> includes phrases)

Easy.

> 4. Word stem matching should be supported (ie support*)

Easy.

> 5. Alta Vista rules for case matching are nice, but probably not
> required

Do not know yet.

> 6. Support for application specific "stop" words (words too
> common to fetch from index -- a, the, and, etc.) is important

Easy.

> 7. Result ordering by "score" is very important.

Relatively easy.

> 8. A filter boolean operator that can be mixed with SQL boolean
> operators is important.

Do not know yet.

> 9. Search indexing should be html-aware

Not only HTML-aware, but also XML, RTF, MS Word, etc. But that is easy
to achieve. If that content is stored in the BLOB, we have already a
concept of BLOB filter. Just define a "searchable" BLOB type and
corresponding "HTML", "PDF", "RTF" BLOB types. The conversion between
that datatypes is done by filter.

> I use a model where any field can be declared "searchable" (other
> fields are not considered in search).

That's what I have been thinking about too.

Roman