Subject | Re: [IB-Architect] Foreign Key indexes |
---|---|
Author | Joseph Alba |
Post date | 2000-04-17T14:11:21Z |
I was using Interbase 4 for Linux, on RedHat Linux 5.2
A simplified data structure is:
table AREA (about 16 records)
AREACODE CHAR(2) PRIMARY KEY
AREANAME CHAR(15)
table CLASS (about 18 records)
CLASSCODE CHAR(2) PRIMARY KEY
CLASSNAME CHAR(15)
table MEMBER (about 40,000 to 50,000 records)
ACCOUNTNO CHAR(10) PRIMARY KEY
AREACODE CHAR(2) references AREA (AREACODE)
CLASSCODE CHAR(2) references CLASS (CLASSCODE)
(other non key fields)
table MEMBILL (about 2,000,000 records)
BILLNO PRIMARY KEY
ACCOUNTNO CHAR(10) references MEMBER (ACCOUNTNO)
PERIOD DATE
AREACODE CHAR(2) REFERENCES AREA( AREACODE )
CLASSCODE CHAR(2) REFERENCES CLASS (CLASSCODE)
KWHRUSED
POWERBILL
SE_AMT
ADJUSTMENTS
OTHERS
table COLLECT (about 1,800,000 records)
DOCNO PRIMARY KEY
DOCDATE
BILLNO REFERENCES MEMBILL (BILLNO)
ACCOUNTNO references MEMBER (ACCOUNTNO)
AMOUNT
I think, if Bill K remembers, I had also sent the API level code that I did,
and I also think that he found nothing wrong with the code, except the
probable addition of Application.Processmessages code. (please excuse me if
I misspell some commands, I'm kinda sleepy). Note that I did not use any
TTable or even any TQuery.
Instead, I used the API level to send the constructed INSERT string to the
server. I had a label counter running so I knew the number of records that
were being processed and had an idea of how fast the transfer was going on.
I was not also SELECTing anything. Everything was just an INSERT INTO
statement. There wasn't even any parameters because I constructed an ALL
STRING insert statement.
With Member, the code goes through because it appears that it has not hit
the lower boundery for the worst case scenario. It just zooms away at
transferring.
But with Membill, the processing zooms for the first hundred thousand
records, then it somewhat starts hitting a wall, and slowly drops down to an
untolerable pace. We really tried to just test how far the transfer could
go, but after four days of continuous running, but finally, the whole thing
ground to a screeching one record per hour.
I also tried revising my program so that it would just process 100,000
records per batch, then quit. Then, the next time it restarts, it will
continue inserting from where it left off. I also tried turning off the
server before restarting the client insert process, back-up/restore, - the
works so as to eliminate any potential garbage lying around at the server
level, but the speed on the insert program made no difference (I mean if it
was crawling at 1 insert record per minute at 400,000 records, before, it
was the same thing with the batched insert program.)
After I got a hint about selectivity on indexes, I realized that Areacode
foreign key on MEMBILL and also Classcode on Membill had severe selectivity
problems with the index.
So, what I did was, I reconstructed the database so that there were no
indexes, and no foreign key references. just all:plain attributes.
And, the inserts got through within a few hours. instead of days. Also, the
speed as constant and linear. There were no, how do you call it,
asymptotic(?) zooming up of time to insert one record, after the hundred
thousand record mark was reached.
Also, I took a hint from Sybase's Powerbuilder 60 day demo, and implemented
the foreign keys using triggers (so that I only had the index on the AREA
side, not on the MEMBILL side). I noticed that there was no degradation in
performance. If I really needed to index on a referencing column, I now
append a primary key suffix on it. So, I guess the culprit was the index
automatically created on the referencing column which would have
automatically selectivity problems.
I'll try to dig up my code. Also, if I have time next month, I'll try to
re-enact the problem set.
Thanks for your patience. I do enjoy corresponding with people who have the
Socratic instinct for wisdom.
(For those who are not too interested with philosophy, Legend has it that
the Oracle at Delphi declared Socrates as the wisest man in the world.
Instead of being proud, Socrates was puzzled, so he went around asking other
people - artists, politicians, poets, mathematicians -- people whom he
believed was definitely wiser than he. But in the end, after questioning all
of them, he realized that he was indeed the wisest man. Why? Because while
the others think that they know, Socrates realized that, "I KNOW THAT I DO
NOT KNOW."
For indeed, the knowledge of ignorance is the start of knowledge. And it is
more dangerous and pitiful to be ignorant of ignorance. So, that's Socrates
for me and thanks again for your patience.
instinct.
Joseph Alba
jalba@...
-----Original Message-----
From: Ann Harrison <harrison@...>
To: IB-Architect@egroups.com <IB-Architect@egroups.com>;
IB-Architect@egroups.com <IB-Architect@egroups.com>
Date: Monday, April 17, 2000 2:32 AM
Subject: Re: [IB-Architect] Foreign Key indexes
A simplified data structure is:
table AREA (about 16 records)
AREACODE CHAR(2) PRIMARY KEY
AREANAME CHAR(15)
table CLASS (about 18 records)
CLASSCODE CHAR(2) PRIMARY KEY
CLASSNAME CHAR(15)
table MEMBER (about 40,000 to 50,000 records)
ACCOUNTNO CHAR(10) PRIMARY KEY
AREACODE CHAR(2) references AREA (AREACODE)
CLASSCODE CHAR(2) references CLASS (CLASSCODE)
(other non key fields)
table MEMBILL (about 2,000,000 records)
BILLNO PRIMARY KEY
ACCOUNTNO CHAR(10) references MEMBER (ACCOUNTNO)
PERIOD DATE
AREACODE CHAR(2) REFERENCES AREA( AREACODE )
CLASSCODE CHAR(2) REFERENCES CLASS (CLASSCODE)
KWHRUSED
POWERBILL
SE_AMT
ADJUSTMENTS
OTHERS
table COLLECT (about 1,800,000 records)
DOCNO PRIMARY KEY
DOCDATE
BILLNO REFERENCES MEMBILL (BILLNO)
ACCOUNTNO references MEMBER (ACCOUNTNO)
AMOUNT
I think, if Bill K remembers, I had also sent the API level code that I did,
and I also think that he found nothing wrong with the code, except the
probable addition of Application.Processmessages code. (please excuse me if
I misspell some commands, I'm kinda sleepy). Note that I did not use any
TTable or even any TQuery.
Instead, I used the API level to send the constructed INSERT string to the
server. I had a label counter running so I knew the number of records that
were being processed and had an idea of how fast the transfer was going on.
I was not also SELECTing anything. Everything was just an INSERT INTO
statement. There wasn't even any parameters because I constructed an ALL
STRING insert statement.
With Member, the code goes through because it appears that it has not hit
the lower boundery for the worst case scenario. It just zooms away at
transferring.
But with Membill, the processing zooms for the first hundred thousand
records, then it somewhat starts hitting a wall, and slowly drops down to an
untolerable pace. We really tried to just test how far the transfer could
go, but after four days of continuous running, but finally, the whole thing
ground to a screeching one record per hour.
I also tried revising my program so that it would just process 100,000
records per batch, then quit. Then, the next time it restarts, it will
continue inserting from where it left off. I also tried turning off the
server before restarting the client insert process, back-up/restore, - the
works so as to eliminate any potential garbage lying around at the server
level, but the speed on the insert program made no difference (I mean if it
was crawling at 1 insert record per minute at 400,000 records, before, it
was the same thing with the batched insert program.)
After I got a hint about selectivity on indexes, I realized that Areacode
foreign key on MEMBILL and also Classcode on Membill had severe selectivity
problems with the index.
So, what I did was, I reconstructed the database so that there were no
indexes, and no foreign key references. just all:plain attributes.
And, the inserts got through within a few hours. instead of days. Also, the
speed as constant and linear. There were no, how do you call it,
asymptotic(?) zooming up of time to insert one record, after the hundred
thousand record mark was reached.
Also, I took a hint from Sybase's Powerbuilder 60 day demo, and implemented
the foreign keys using triggers (so that I only had the index on the AREA
side, not on the MEMBILL side). I noticed that there was no degradation in
performance. If I really needed to index on a referencing column, I now
append a primary key suffix on it. So, I guess the culprit was the index
automatically created on the referencing column which would have
automatically selectivity problems.
I'll try to dig up my code. Also, if I have time next month, I'll try to
re-enact the problem set.
Thanks for your patience. I do enjoy corresponding with people who have the
Socratic instinct for wisdom.
(For those who are not too interested with philosophy, Legend has it that
the Oracle at Delphi declared Socrates as the wisest man in the world.
Instead of being proud, Socrates was puzzled, so he went around asking other
people - artists, politicians, poets, mathematicians -- people whom he
believed was definitely wiser than he. But in the end, after questioning all
of them, he realized that he was indeed the wisest man. Why? Because while
the others think that they know, Socrates realized that, "I KNOW THAT I DO
NOT KNOW."
For indeed, the knowledge of ignorance is the start of knowledge. And it is
more dangerous and pitiful to be ignorant of ignorance. So, that's Socrates
for me and thanks again for your patience.
>I don't know, but I think we'd better find out.It is really nice to know that Interbase has a president with Socratic
instinct.
Joseph Alba
jalba@...
-----Original Message-----
From: Ann Harrison <harrison@...>
To: IB-Architect@egroups.com <IB-Architect@egroups.com>;
IB-Architect@egroups.com <IB-Architect@egroups.com>
Date: Monday, April 17, 2000 2:32 AM
Subject: Re: [IB-Architect] Foreign Key indexes
>At 11:08 PM 4/7/00 +0800, Joseph Alba wrote:
>>...
>>Several months back, I migrated an XBase setup to Interbase. (the database
>>contained millions of billing/collection records ...The first few minutes
>>processed thousands of records per minute. But as the record count reached
>>the hundred thousand records it ground to a very slow 1 record per minute
>>insertion, after four days, the computer ground to a halt of one record
>>insertion every hour..
>>
>>I tried going down to the API level, but still the same. Finally, ...I
>>realized that I had a foreign key referencial integrity constraint on
>>these millions of billing records, because the AREACODE field of these
>>records was referencing an AREA table which had only 15 tuples.
>>
>>I cancelled the referential integrity constraint, and disabled all indexes
>>and the transfer got through in a few hours.
>>
>>So, is there a problem with just one type of indexes or what?
>
>I don't know, but I think we'd better find out. According to my
>understanding of the index structure, adding rows to a duplicate
>chain should be fast. If you would send me the details ... were
>you loading only one table? What version of Interbase? What indexes
>were defined? Could you send the ddl for the target table(s)? and
>indexes?
>
>Thanks,
>
>Ann
>
>
>
>------------------------------------------------------------------------
>Join Garden.com's affiliate program and enjoy numerous benefits.
>To learn more click here:
>http://click.egroups.com/1/2753/3/_/830676/_/955909853/
>------------------------------------------------------------------------
>
>To unsubscribe from this group, send an email to:
>IB-Architect-unsubscribe@onelist.com
>
>
>