Subject Re: [Firebird-Architect] Full Text Search
Author Lester Caine
unordained wrote:

> When I helped a normally mysql-only guy set up full-text indexing of public documents (using
> Firebird), I found the whole endeavour frustrating. Firebird worked great; users however just want
> full-text indexing because they're too lazy to enter good meta-data. Oddly, it's hard to find
> documents by date when you're searching only the text of the document, and have no idea how the
> date might be formatted ("February 1st, 2005"), particularly if your index excludes short words
> ("1st") and symbols (the slashes if they're even entering the date 'normally'.)

We have been having this discussion for other reasons on the
GenealogyXML List. It is obviously important to establish dates for that
sort of data. The current thought is that date information gets
normalised to a DATE element, which can be indexed. I see no reason why
one could not have in effect a calendar of identified dates in much the
same way as you have an index of identified words. But it is probably
additional to the basic 'word' search.

> We sorted the results based on how often words occurred, weighting each word based on its rarity in
> the overall document set. (We only had to deal with AND and NOT searches.) How would firebird
> return something useful to people who want the results sorted according to other rules? Rarity of
> words, number of occurrences, position in the document, relative position to each other in the
> document, ... and that's just the tip of the proverbial ice cube.

THAT is probably why I was suggesting on a previous thread that a simple
'one size fits all' solution is probably of little use. This is an area
where processing the results of the search is as important as the search
itself, so hooks to provide further processing of the 'result set' are
important. As Jim has already said - a found/not-found result is only
half of the problem.

--
Lester Caine
-----------------------------
L.S.Caine Electronic Services