Subject Re: [firebird-support] Re: Performance on big tables : JOINS
Author Adomas Urbanavicius
Hello,
As far as I noticed FB tries to join everything together, and only then
apply any rules,
so after simple left (or any) join on 2 your tables,
you will get this operations done in this sequence :
a)
1. Join 8.m + 8mil /*
TOTAL OVERHEAD */
2. Your Suming : scanning through 8mil * 2 records.


b) Now, if you had only one table :
1. Your Suming : scanning through 16mil records.
c)
Your sql should be rewriten, something like :
select
sum(a1) + select(sum(a1) from tb2 )
from tb1 group by bla.bla.bla (OR using SP)
Also results are scanned 16mil records.

As you can see in 1 sample is total overhead of joining two tables, (and
now imagine there are 12 such tables join, what is significant overhead.)
And after making b) or c) actions you would receive no overhead.
However scans like sum, count avg are done using ALL records, so there
is no use of index in this case.
BUT if you are using coalesce function, imagine that it is being applied
to all of the records as well, this means you have ove 8mil*12
operations, where most of them are useless, much better is to ask only
for records which are not null (and if that column is indexed, you will
have big success in perfomance)
something like :
select
sum(a1)
from tb1 where tb1.a1 is not null
group by tb1.a1



Adomas











>>Hi everybody.
>>
>>
>
>Hi Rodrigo!
>
>I've no experience with tables this big (our tables are generally
>about 1-2m records), but still: Why use 12 different tables when they
>are identical and you want to look at them as one entity? Have you
>discovered some problems with having only one big table? Have you
>tried using a view to combine them?
>
>Anyway, to solve your problem, I'd first take a look at your plan
>(that's the part of Firebird that I partially understand, so I always
>answer that...). Doing changes to the configuration may of course
>help, but I'd be surprised if they would be enough for you to be
>satisfied.
>
>If there are many duplicates, adding the pk to the end of the index
>definition could improve performance for updates and deletes. I do not
>know whether 2500 out of 8 million records fall into the category
>'many' or not.
>
>Firebird always try do do what is asked, so in the end you would have
>gotten a reply - even though for complex queries that could take days.
>.. The plan will reveal whether Firebird try to solve your request in
>a matter best described as sensible or non-sense.
>
>If optimising the query doesn't work, then another option would be to
>write a stored procedure.
>
>HTH,
>Set
>
>
>
>>I'm migrating from MySql to Firebird 1.5.1 (SS) on win2k (P4, 2Ghz,
>>646Mb RAM). I've been reading all Firebird documentation and now I'm
>>very excited about all the features shown there. Being a newbie on
>>this, so far I've managed to dump all my data in a comfortable way
>>using perl and external tables, and then re-inserting it into the
>>final tables. Right now I have 12 "almost" identical monthly tables
>>of 8000000 rows containing (mostly) numerical data in 99 columns
>>each, all indexed in a common field "id". I'm trying to do some
>>statistics reports with them. What I need is to retrieve the annual
>>sum of a certain column "value" using a multiple full outer join and
>>a group by. I'm using coalesce and iNvl to deal with nulls.
>>
>>My query looks like this:
>>
>>select
>> coalesce(a1.id,a2.id,...,a12.id) as id,
>> sum(iNvl(a1.value,0)+iNvl(a2.value,0)+...+iNvl(a12.value))
>> as value
>>from table1 a1
>> full outer join table2 a2 on a1.id = a2.id
>> full outer join table3 a3 on a1.id = a3.id
>> ...
>> full outer join table12 a12 on a1.id = a12.id
>>group by coalesce(a1.id,a2.id,...,a12.id)
>>
>>I tried this on a small set of tables and it works like a charm, but
>>on the bigger ones it takes forever. The query eats all CPU power
>>(like 99%) so I have to shutdown the server to kill the query (I've
>>waited more than 3 hours!!).
>>
>>I have not touched firebird.conf, so I guess it's using it's
>>defaults. I do not make any inserts, deletes or updates (yet), just
>>selects. I've created the table just a week ago and I'm working
>>alone, locally. Excepting "id" which is INTEGER NOT NULL, almost all
>>columns are defined as NUMERIC(18,0) NOT NULL.
>>
>>I've tried disabling sweep interval, and changing buffers with gfix
>>to improve performance, and nothing. By the way, memory usage seems
>>to stay low all the time during the query.
>>
>>What should I check to improve performance?.
>>
>>Thanks in advance,
>>
>>Rodrigo Abt B.
>>Statistical advisor, SII,
>>Chile.
>>
>>P.S.: The data I'm using contains a not so small amount (about 2500)
>>of duplicates on the index field in each table, which I should not
>>remove.
>>
>>
>
>
>
>
>
>
>
>
>Yahoo! Groups Links
>
>
>
>
>
>
>
>
>
>