Subject | Re: [firebird-support] First insert in large table takes extreme time |
---|---|
Author | Ann W. Harrison |
Post date | 2010-12-17T16:55:53Z |
Kjell,
for each test, so every tested query pays for database start up, right?
The easiest way to make a query look bad is to run it first so it's
time includes opening the database file, reading the system tables,
building in-memory metadata structures, and loading the cache with
index and data pages.
And here's a really stupid question: you haven't by any chance deleted
the indexes on the system tables, have you?
select count(*) from rdb$indices where rdb$relation_name starting 'RDB';
That returns 46 or something like that? Unlikely to be the problem
since other statements are fast but...
Indexes with lots of duplicates shouldn't be a problem unless you're
running Firebird 1.0 or something like that. As Svein said, in the
bad old days people built compound indexes to make non-selective
field indexes less of a performance problem. However, the bad
performance was on index garbage collection, not insert.
Skip this paragraph if you've read about the index duplicate problem
and its solution. The problem was that Firebird put new duplicate
entries at the beginning of the duplicate chain at the leaf level.
Unfortunately most applications delete records oldest first. So,
taking a duplicate out of the index normally required reading all
the newer duplicates before finding the instance that matched the
record number that was going away. For two or three - or even 200
or 300 instances of a value that wasn't so bad. And for a while we
thought there were never more than about 32K duplicates in any database.
Then we changed the counter to a 32bit integer and, well, there were
databases with lots of duplicates. The solution was to order
duplicates by record number in the lowest level, and promote both
the key value and the record number to the next level up. What that
means is that all indexes behave like unique indexes for garbage
collection. And the change was a long time ago.
And of course, lots of duplicates and unbalanced distribution make
it very difficult to optimize selects since Firebird doesn't do
histograms.
what firebird does on an insert.
First it finds a page in the table that has room for the record. The
pointer pages for the table have a bitvector at the bottom indicating
whether a page does or does not have room for another record. Once the
table is up and loaded, Firebird keeps track of the last place it found
space and doesn't look at pointer pages that don't have space. To get
an idea of how many pointer page it has to look at before it gets to
one with space,
select count(*) from rdb$pages
where rdb$relation_id = 185 and rdb$page_type = 6
As it happens, this table has a number of pages with space, and it
should find one through the pointer pages. If it doesn't for some
reason, it then has to allocate a new data page. That requires
reading page inventory pages which occur about every 64K pages.
Since this is a new database, the first free page is likely to be
on the last one, so it'll read a number of PIPs.
And of course, once the data is on disk, each index has to be
located, traversed, and updated. I guess that whole thing could take
39 seconds. Look elsewhere for the remaining 19 minutes and 21 seconds.
Cheers,
Ann
>Let me just confirm a couple of things. You start with no connections
> Insert in a different table was fast. Selects, including from "Uppgift"
> are fast, as was noted using select from mon$statements when my
> application was running. Insert into "Uppgift" takes... 39 seconds.
for each test, so every tested query pays for database start up, right?
The easiest way to make a query look bad is to run it first so it's
time includes opening the database file, reading the system tables,
building in-memory metadata structures, and loading the cache with
index and data pages.
And here's a really stupid question: you haven't by any chance deleted
the indexes on the system tables, have you?
select count(*) from rdb$indices where rdb$relation_name starting 'RDB';
That returns 46 or something like that? Unlikely to be the problem
since other statements are fast but...
Indexes with lots of duplicates shouldn't be a problem unless you're
running Firebird 1.0 or something like that. As Svein said, in the
bad old days people built compound indexes to make non-selective
field indexes less of a performance problem. However, the bad
performance was on index garbage collection, not insert.
Skip this paragraph if you've read about the index duplicate problem
and its solution. The problem was that Firebird put new duplicate
entries at the beginning of the duplicate chain at the leaf level.
Unfortunately most applications delete records oldest first. So,
taking a duplicate out of the index normally required reading all
the newer duplicates before finding the instance that matched the
record number that was going away. For two or three - or even 200
or 300 instances of a value that wasn't so bad. And for a while we
thought there were never more than about 32K duplicates in any database.
Then we changed the counter to a 32bit integer and, well, there were
databases with lots of duplicates. The solution was to order
duplicates by record number in the lowest level, and promote both
the key value and the record number to the next level up. What that
means is that all indexes behave like unique indexes for garbage
collection. And the change was a long time ago.
And of course, lots of duplicates and unbalanced distribution make
it very difficult to optimize selects since Firebird doesn't do
histograms.
> Deleting the record again was instantaneous. So why does it take foreverI don't even speculate about problems above the Firebird API. Here's
> (20-60 minutes) when doing it from my application?
what firebird does on an insert.
First it finds a page in the table that has room for the record. The
pointer pages for the table have a bitvector at the bottom indicating
whether a page does or does not have room for another record. Once the
table is up and loaded, Firebird keeps track of the last place it found
space and doesn't look at pointer pages that don't have space. To get
an idea of how many pointer page it has to look at before it gets to
one with space,
select count(*) from rdb$pages
where rdb$relation_id = 185 and rdb$page_type = 6
As it happens, this table has a number of pages with space, and it
should find one through the pointer pages. If it doesn't for some
reason, it then has to allocate a new data page. That requires
reading page inventory pages which occur about every 64K pages.
Since this is a new database, the first free page is likely to be
on the last one, so it'll read a number of PIPs.
And of course, once the data is on disk, each index has to be
located, traversed, and updated. I guess that whole thing could take
39 seconds. Look elsewhere for the remaining 19 minutes and 21 seconds.
Cheers,
Ann