Subject RE: [IB-Architect] Data clustering
Author David Schnepper
There is lots of value in localizing data --

For instance, a few months back I made a change to a 63M row table
(um, Oracle)

was: create table ratings (userid number, movieid number, rating
number, primary key (userid, movieid) )
became: create table ratings (userid number, movieid number, rating
number, primary key (userid, movieid ) )
organization index compress;

each user id had typically 150 to 5000 rows. All data for a user is
loaded at the same time.

Changing the table to an index organized table had two significant
improvements
a) PK index & table data were folded together, reducing accesses and
dataspace requirements.
b) each user's data was localized to a smaller number of pages.

Together, these resulted in a 50% average reduction in time to load a
user.

Before converting to the indexed table, I experimented with a manual
packing of data -- this by itself reduced access time by 25%.

As for Ann's comments -- I vote for the add-options-to-gbak method.

Note: I have this vague memory that GBAK makes it's own "system table"
GBAK$something ? Scratch that, I'm thinking QLI$something. But why
not have GBAK store options in a "gbak system" table ? This would
save from the "lost script" problem -- and be compatible with existing
metadata.

Dave

> -----Original Message-----
> From: Ann W. Harrison [mailto:aharrison@...]
> Sent: Thursday, June 28, 2001 9:59 AM
> To: IB-Architect@yahoogroups.com
> Subject: RE: [IB-Architect] Data clustering
>
>
> At 12:06 PM 6/28/2001 -0400, Leyne, Sean wrote:
>
> >Could you please explain what advantage there would be to
> having the
> >table initially stored in a sorted order. Once any
> updates took place
> >the 'table order' would be lost.
>
> Updates wouldn't be a problem - unless, of course, you altered the
> "clustering" fields. New data would not be clustered until the next
> backup/restore cycle. Worst case would be no worse than the current
> case. If the data were reasonably stable, it might help in
> cases like
> this...
>
> You have 10,000 customers. On the average each customer
> has placed 15
> orders with you over 2 years. 90% of the time, you want to
> see orders
> associated with a customer rather than orders associated
> with a date.
> However, the orders were entered by date. The probability
> is that the
> 15 orders from any one customer will be on 15 different pages. This
> pseudo-clustering would put each customer's orders together
> on a page
> or two.
>
> Jim says that giving gbak an input script is the way to go.
> Contrary
> beast.
>
>
> Regards,
>
> Ann
> www.ibphoenix.com
> We have answers.
>
>
> To unsubscribe from this group, send an email to:
> IB-Architect-unsubscribe@onelist.com
>
>
>
> Your use of Yahoo! Groups is subject to
> http://docs.yahoo.com/info/terms/
>
>
>
>