Subject Index tales - part 2 - Keyword FTS
Author m_theologos
Hi,

Studying a little the FB indexes and some issuses about when the
indexes are 'good' or 'bad' I have some small ideeas to put in your
attention. (I broke this thema in more independent messages).

1. Keywords

Defining a structure like:

CREATE KEYWORD INDEX IDX1 ON T1(TextField1) SEPARATORS '
.,!@#$%^&*()' MIN_LETTERS 2 BANNED_WORDS 'AND', 'OR', 'THE' ...

(semantics subject to change)

we'll have a quite good keyword full-text-seacrch system (ie. will
find 'Yemanov' but not 'emanov' from the Dmitry's name...).

When the engine will index a record with a field indexed in such a
way then the engine will trigger a small function, let's call it
'get_word', which will parse the 'TextField1' from the table T1 and
get out all the words (according with given sepparators, the word
must be at least 3 letters in our example, must not be 'AND', 'OR' or
'THE') and to insert these words in a normal index structure.
Deleting/updating index nodes will be in the same manner, calling the
'get_word' and deleting/updating the keys which are bounded to the
rec which must be deleted/updated.

Searching:

Define a family of predicate(s) like:

KEYWORD = <text_val>
KEYWORD > <text_val>
KEYWORD < <text_val>
KEYWORD >= <text_val>
KEYWORD <= <text_val>
KEYWORD BETWEEN... <text_val1> AND <text_val2>
KEYWORD STARTING WITH... <text_val>
KEYWORD CONTAINING... (hummm.... hummmmmm.... hummm...)

Which if

1. We have the index then will use the index to find the <text_val>
2. We DON'T have the index then will do a natural scan on the table
using the 'get_word' function (which implies that the SPARATORS and
the other clauses are rater column properties and _not_ bounded to
the index).

Comments?

hth,

m.th.