Subject | Index tales - part 2 - Keyword FTS |
---|---|
Author | m_theologos |
Post date | 2006-10-09T09:21:41Z |
Hi,
Studying a little the FB indexes and some issuses about when the
indexes are 'good' or 'bad' I have some small ideeas to put in your
attention. (I broke this thema in more independent messages).
1. Keywords
Defining a structure like:
CREATE KEYWORD INDEX IDX1 ON T1(TextField1) SEPARATORS '
.,!@#$%^&*()' MIN_LETTERS 2 BANNED_WORDS 'AND', 'OR', 'THE' ...
(semantics subject to change)
we'll have a quite good keyword full-text-seacrch system (ie. will
find 'Yemanov' but not 'emanov' from the Dmitry's name...).
When the engine will index a record with a field indexed in such a
way then the engine will trigger a small function, let's call it
'get_word', which will parse the 'TextField1' from the table T1 and
get out all the words (according with given sepparators, the word
must be at least 3 letters in our example, must not be 'AND', 'OR' or
'THE') and to insert these words in a normal index structure.
Deleting/updating index nodes will be in the same manner, calling the
'get_word' and deleting/updating the keys which are bounded to the
rec which must be deleted/updated.
Searching:
Define a family of predicate(s) like:
KEYWORD = <text_val>
KEYWORD > <text_val>
KEYWORD < <text_val>
KEYWORD >= <text_val>
KEYWORD <= <text_val>
KEYWORD BETWEEN... <text_val1> AND <text_val2>
KEYWORD STARTING WITH... <text_val>
KEYWORD CONTAINING... (hummm.... hummmmmm.... hummm...)
Which if
1. We have the index then will use the index to find the <text_val>
2. We DON'T have the index then will do a natural scan on the table
using the 'get_word' function (which implies that the SPARATORS and
the other clauses are rater column properties and _not_ bounded to
the index).
Comments?
hth,
m.th.
Studying a little the FB indexes and some issuses about when the
indexes are 'good' or 'bad' I have some small ideeas to put in your
attention. (I broke this thema in more independent messages).
1. Keywords
Defining a structure like:
CREATE KEYWORD INDEX IDX1 ON T1(TextField1) SEPARATORS '
.,!@#$%^&*()' MIN_LETTERS 2 BANNED_WORDS 'AND', 'OR', 'THE' ...
(semantics subject to change)
we'll have a quite good keyword full-text-seacrch system (ie. will
find 'Yemanov' but not 'emanov' from the Dmitry's name...).
When the engine will index a record with a field indexed in such a
way then the engine will trigger a small function, let's call it
'get_word', which will parse the 'TextField1' from the table T1 and
get out all the words (according with given sepparators, the word
must be at least 3 letters in our example, must not be 'AND', 'OR' or
'THE') and to insert these words in a normal index structure.
Deleting/updating index nodes will be in the same manner, calling the
'get_word' and deleting/updating the keys which are bounded to the
rec which must be deleted/updated.
Searching:
Define a family of predicate(s) like:
KEYWORD = <text_val>
KEYWORD > <text_val>
KEYWORD < <text_val>
KEYWORD >= <text_val>
KEYWORD <= <text_val>
KEYWORD BETWEEN... <text_val1> AND <text_val2>
KEYWORD STARTING WITH... <text_val>
KEYWORD CONTAINING... (hummm.... hummmmmm.... hummm...)
Which if
1. We have the index then will use the index to find the <text_val>
2. We DON'T have the index then will do a natural scan on the table
using the 'get_word' function (which implies that the SPARATORS and
the other clauses are rater column properties and _not_ bounded to
the index).
Comments?
hth,
m.th.