Subject Re: In memory sorting consumes alot of space
Author haris_p_sw
--- In firebird-support@yahoogroups.com, Svein Erling Tysv�r <svein.erling.tysvaer@...> wrote:
>
> >Hi,
> >
> >I 'd like to share my recent testing experiences with you hoping to help in Firebird development.
> >
> >I 've been using Firebird (classic server > 70 connections for web serving) for 13 years now.
> >
> >Recently, I decided to test certain suspicious select queries more carefully. Here are my conclusions
> >and I hope you prove me wrong:
> >
> >1. If the query plan uses indexing, everything works OK and very fast.
> >
> >2. If the query plan goes the "NATURAL" way then sorting happens which is totally expected. However,
> >if the tuning variable TempCacheLimit has a value that is not big enough for the dataset the sorting
> >process happens on hard disk and is extremely slow. it depends on the machine and the dataset but it
> >has to be a very powerful machine to manage it under 10 secs. On my new 64-bit laptop, Firebird 2.5.2
> >for a dataset around 30-40 MB I got 35-40 seconds.
> >The NATURAL plan was unavoidable since there was a need of ordering by two columns from two different
> >tables.
> >I started tuning TempCacheLimit, LockMemSize, DefaultDbCachePages and FileSystemCacheThreshold. I
> >discovered that caching in memory does not speed up things so I gave FileSystemCacheThreshold a big
> >value. I managed to get an execution time around 2 seconds(4 for the first run) when TempCacheLimit
> >was: 536870912(classic super server. With plain classic server you need more memory).
> >
> >I know that the above value is only a limit. The problem is that during query execution I noticed that
> >200 up to 480 MB were actually occupied(depending on the run).
> >
> >I don't know if things are better with Super Server. It is not an option for me because I am using
> >FreeBSD.
> >
> >As a conclusion, I was disappointed when I tested postgres 9.2 with the same query. It finishes under
> >a second without using extra memory. It is obvious that postgres also does extra work for my query but
> >in a much more efficient way.
> >
> >Why Firebird sorts datasets loading whole rows into memory when the plan is NATURAL? Couldn't it sort
> >only primary keys for example? There might be something I 'm missing...
>
> Hi Haris!
>
> I've no clue about your actual question, but am curious as to what your query looks like and what size of tables and result sets we're talking about. Going NATURAL on the first table in a plan is quite normal and usually not too time consuming unless we talk about large tables (though I've no experience with Classic...), but NATURAL on other tables in the plan can be quite ugly performancewise. And Firebird can (or at least could) be very slow if you issue the wrong kind of query against it (e.g. ... WHERE <somefield> IN (SELECT DISTINCT <AnotherField> ... wasn't recommended).
>
> Set
>

Hi Set,
Thanks for your reply.

Here is my query which is a join between three tables. prdInCatTab holds the keys to product and prdCategory and it's a one-to-many relationship:

SELECT FIRST 10 SKIP 500 *
FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId
JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
WHERE prdCategory.catShowFlag = '1'
AND product.prdHideFlag != '1'
AND prdInCatTab.PINCPrimary = '1'
ORDER BY prdInCatTab.PINCRank DESC, product.nameUp ASC;

And here is the plan:

PLAN SORT (JOIN (PRDINCATTAB NATURAL, PRDCATEGORY INDEX (PK_PRDCATEGORY), PRODUCT INDEX (PK_PRODUCT)))

Well, my actual question could be: Why does Firebird consume such a large amount of sorting space? It is almost impossible for a web server which has over 100 connections to allocate such amounts of memory space. As I wrote in my first post here are my numbers:

1. Table product's size is between 30 and 40 MB. The other two have small size.

2. TempCacheLimit = 536870912 with Classic Super Server if you want to have an acceptable run(4-5 secs first run on my laptop, around 2 secs all next runs).

3. Firebird server does actually need from 200 to 450 MB for each run.

Although my production servers are much faster than my laptop, I replaced the above query with another that uses a subquery as a virtual table. It turned out very fast and not memory consuming:

SELECT * FROM product
JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId
JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
JOIN
(
SELECT FIRST 10 SKIP 500 product.prdId as bprdid
FROM product JOIN prdInCatTab ON prdInCatTab.PINCPrdId = product.prdId
JOIN prdCategory ON prdCategory.catId = prdInCatTab.PINCCatId
WHERE catShowFlag = '1'
AND product.prdHideFlag != '1'
AND prdInCatTab.PINCPrimary = '1'
ORDER BY PINCRank DESC, product.nameUp ASC
) AS b ON product.prdid = b.bprdid
AND prdInCatTab.PINCPrimary = '1'

and the plan:
PLAN JOIN (JOIN (SORT (JOIN (B PRDINCATTAB NATURAL, B PRDCATEGORY INDEX (PK_PRDCATEGORY), B PRODUCT INDEX (PK_PRODUCT))), JOIN (PRDINCATTAB INDEX (FK_PINC_PRODUCT), PRODUCT INDEX (PK_PRODUCT))), PRDCATEGORY INDEX (PK_PRDCATEGORY))

But you can't always substitute your queries easily, can you?
And it still bothers me as a Firebird user how postgres does it so well.

Regards,
Haris