Subject index key question and garbage collecting
Author Dmitry Kuzmenko
Hello, All!

About a week ago I've found (for myself) why garbage collecting index keys
takes so much time. Looking into ODS.H:

typedef struct btn {
UCHAR btn_prefix; /* size of compressed prefix */
UCHAR btn_length; /* length of data in node */
UCHAR btn_number [4]; /* page or record number */
UCHAR btn_data [1];

typedef struct btr {
struct pag btr_header;
SLONG btr_sibling; /* right sibling page */
SLONG btr_left_sibling; /* left sibling page */
SLONG btr_prefix_total; /* sum of all prefixes on page */
USHORT btr_relation; /* relation id for consistency */
USHORT btr_length; /* length of data in bucket */
UCHAR btr_id; /* index id for consistency */
UCHAR btr_level; /* index level (0 = leaf) */
struct btn btr_nodes [1];
} *BTR;

As you can see, there is no transaction id like in record's structure:
typedef struct rhd {
SLONG rhd_transaction;/* transaction id */
SLONG rhd_b_page; /* back pointer */
...

In this case, if somebody deleted many records from table with non-unique index,
and there are many duplicate keys in this index, IB needs to
(at least when select count(*) is executed)

1. find record version that must be garbage collected
2. pick up ALL keys from index, that
3. delete key version that corresponds to the garbage collected record
4. go to the point 1 if there is a next record needs to be garbage collected

I.e. here is multiplication of record count to be garbage collected and
number of duplicate keys for each garbage collected record.
(please forgive me if I'm wrong, I am Pascal programmer and can't
read freely C sources where these things are performed).

For example, if we have table with 10K records and 10 different values for
an index, there will be ~1000 equal keys of each value in index.
If I will delete 1000 records, than there will be
~ (1000 * 1000) = 1,000,000 cycles. Isn't it?

Why I'm talking about it: other servers (Oracle and MS) uses index scan to
calculate record count. It is easy way even for
select count(*) from sometable, because there are less pages in index than
in table, and scanning index pages will be much faster than scanning data pages.
But, you see, index key does not have transaction id, and IB can't understand
what key versions can be seen by some transaction.
Slow garbage collection is another point of the same issue, I think.

The long story, the short question: what about including transaction id into
BTN structure? Is it possible? Right now I see only the following cases to
speed up with transaction id in a key:
1. select count
2. index garbage collection
3. group by (when index is used).

What do you think?

--
Dmitry Kuzmenko, Epsylon Technologies.