Subject RE: [IB-Architect] Fw: Slow deletion on large tables
Author Ann Harrison
At 03:25 AM 6/1/00 -0400, Claudio Valderrama C. wrote:

> > It's a consequence of the architecture which was designed before the
> > implementation of referential integrity. The initial architecture
> > certainly
> > didn't foresee the magnitude of the length of duplicate index
> > chains today, and
> > even if it did, it probably assumed user error or bad application design.
>
> Umh, for what I've read in this list more than a month ago, I had
> assumed
>these problems weren't real. AFAICT, I understood that Jim said there's no
>evidence of performance problems in connection with indexes holding a great
>amount of duplicate entries. May have misread the explanation, tho.

Jim said, I hope, that we don't know of a performance problem with
large numbers of duplicates for insert or select. We all know about
the problem with garbage collection after an index value in a long
duplicate chain is deleted or changed.

> > Assuming that all users can't be educated or for some reason
> > avoid those large
> > numbers of duplicates, there has been some research and thinking
> > on the issue.
>
> Well, if you want to start, please educate me. :-) Let's assume I
> have 40
>thousand people and I keep the region as a foreign key. (Region is my
>country is like a state in the US.) There are only 13 regions here, so the
>selectivity is rather poor, but the index will be created as part of the FK
>relationship. What's the solution? Don't define the FK but perform the
>lookup in the BEFORE INSERT trigger with a select/exists construction?

No, referential integrity is more reliable than triggers because it
is not bound to transaction context. One solution is to make the
indexes better at handling long duplicate chains. In the leaf level,
the index key value is paired with the "record number" aka db_key.
In upper levels of the index, the key value is paired with the page
number (I think) of the next level down in the index. The problem
with duplicates is that you must read across the leaf level to find
the row you're looking for. Once you hit the first value in the
chain, the upper levels are of no use.

What Charlie is proposing is considering the db_key as part of the
index key value, and propagating it upward in the index. Thus if
you wanted to eliminate the row 65432 whose index value is ABC,
you'd descend through the upper levels of the index to A, then AB,
then ABC, then to ABC65000, for example. If the next value at that
level was ABC78901, you'd go to the next level down, and thus to
the leaf level, just as if the index were unique.


> Does this mean that the index tree is more balanced with this
> addition or
>does it depth change?

This may make the index depth greater. It may also slow the insertion
of duplicates slightly, or require a slightly large cache to hold more
of the upper level. Not significant, would be my guess.

> I won't continue piggybacking the kinobi.performance NG here
> unless ISC
>people want to continue here in parallel.

No, I'll go back to the performance list.

Thanks,

Ann