Subject Reverse Index Performance
Author Salim Naufal
I have a 40 GB books database that contains about 5,000,000 items. I have
parsed every book and generated a keywords table containing the word and the
pkey of the book record. This resulted in a 300,000,000 row table. So far,
Firebird performs gracefully.
To build the search query, the following is done:

SELECT K1.PKEY
FROM KEYWORDS K1
JOIN KEYWORDS K2 ON (K1.PKEY = K2.PKEY)
JOIN KEYWORDS K3 ON (K1.PKEY = K3.PKEY)
WHERE (K1.KWORD = 'first keyword')
AND (K2.KWORD = 'second keyword')
AND (K3.KWORD = 'third keyword')

For this, I have created three indexes:
Index1: KEYWORDS(KWORD)
Index2: KEYWORDS(PKEY)
Index3: KEYWORDS(KWORD, PKEY) <- UNIQUE

Given the large number of rows in the KEYWORDS table, how should the
keywords be ordered?
1) set the 'first keyword' that results in the smallest result set?
2) set the 'third keyword' that results in the smallest result set?

Second Question:
I have created a secondary table that contains the keyword words and number
of occurrences. Ultimately such a table can be populated with a simple
query:

INSERT INTO KEYWORDS_STATS
SELECT COUNT(*), KWORD
FROM KEYWORDS

I launched this on Saturday on a Dual Xeon 2.4 with 4GB RAM and 15KRPM SCSI
drives, SUSE Linux 9.1, Classic version of Firebird 1.5.1). Three days have
passed and it is still not finished. Is there a better way to populate this
table? Please note that there are no indexes have yet been created on the
new table "KEYWORDS_STATS".

Thanks

Salim