Subject Re: Performance of Firebird vs. other DBMS
Author Adam
--- In firebird-support@yahoogroups.com, "laurenz_brein"
<laurenz_brein@y...> wrote:
> Adam wrote:
> > Firstly, David was not joking when he implied that select count(*)
> > based queries is a possible weakness in any MGA database.
>
> Thank you and David for enlightening me.
>
> It is now clear to me why all SELECT COUNT(*) will be expensive,
> even the ones that I thought would do an index scan.
>
> I am afraid that I didn't make it clear enough that with a
> 'simple select' I meant a statement like
> SELECT NAME FROM PARENT WHERE ID = 42
> These queries also perform badly, when running them together
> with the COUNT(*) queries 4 times as badly as on other DBMS.
>
> However, when I run only these 'simple selects', the performance
> is a mere 50% worse than that of the worst competitor.

That is interesting, but I personally haven't encountered ill
performance on simple queries. I do know that it is a bit inefficient
in its protocol, but this will improve (the main complexity is
maintaining backwards compatibility).

> I guess that the extra load the COUNT(*)-statements imply will
> explain the factor 4, and the 50% may be accounted for by the
> record versioning overhead, right?
>

I wouldn't think so. Certainly there is some overhead but due to the
design this is very minimal. In a system where transactions are well
managed, you actually don't get lots of record versions sitting there.


> > So you don't get the impression it is a weird Firebird thing, it
> > is a problem for Postgres as well.
> >
> > (http://archives.postgresql.org/pgsql-hackers/2005-01/msg00247.php)
>
> This is interesting.
>
> Firstly, in what I call the 'index scan' queries, I DO have a where
> clause involving an indexed column, and at least sometimes the
> optimizer should figure that using the index could save time.
>

It does use the index if it can. Do you have a tool to check the
queries plan? I personally use IBPlanalyzer because it is so lite, but
others are equally useful. The index is used to "knock out" records
that don't meet the criteria, but no what are you left with? A bunch
of records that may or may not be visible to your transaction. The
data pages are then checked to see whether to count them or not.

> Secondly, one of the other databases I tested IS Postgres, and
> a) all selects, even the COUNT(*) ones, perform reasonably well, and
> b) the results of the 'index scan' queries was much better than for
> the 'full table scan' queries.
>
> This should not be if Postgres worked like Forebird, right?

Again, I am not claiming that Firebird works identically to Postgres
here, nor do I claim it is superior at this point. But discussion
lists for Postgres show they need to deal with the same problem, or at
least a problem that is significantly similar. Some of the ideas about
using a "dirty bit" in the index to denote whether the data value
needs to be considered are an interesting sideline.


> > > Next, there are some good techniques in a MGA system such as
> > Firebird for improving the speed considerably. I wont repeat it,
> > but it is right here
> >
> > (http://groups.yahoo.com/group/firebird-support/message/56457)
>
> Sorry to disagree, and I guess it is also a matter of taste,
> but I am very opposed to introducing redundancy into a database.
> It can lead to inconsistencies. This 'good technique' is what I
> would prefer to call a kludge.

Call it what you want, it is no more "redundant" than an index. I
would certainly consider it an "overhead", but there is nothing kludgy
about it. There is no-one here that thinks the select count(*) thing
is a good side effect of MGA, but it is an architecture decision we
live with. We recognise that the performance gains we get outweigh the
overheads.

Of course you could implement it with a single count record if you
didn't want to worry about record locks effectively becoming a table
lock. But used correctly, the technique allows for an accurate count
for every transaction with very little overhead.

>
> > Thirdly, I would certainly hope that for you 50% of the requirement
> > of your database is count(*) related, because you are testing and
> > basing your judgements (possibly unintentionally) on that parameter.
> >
> > If you run queries that actually have to fetch and return data,
> > the results will be significantly closer, and if you run the
> > inserts concurrently to the selects, you may find the readers and
> > writers not blocking each other actually gives MGA a real edge.
>
> Yes, you are absolutely right.
> I am well aware how 'meaningful' the result of a test like this is.
> That is why I struck up this discussion to gain insight.
> BTW, only less than 1% of all the statements in my test where
> COUNT(*) queries, which I thought was reasonable.

Thats fine, you only posted a small section of code that seemed to be
primarily testing select count performance.

>
> > Fouthly (and finally), there is one difference in Firebird's index
> > structure that you should be aware of. It is directional, not
> > bi-directional. This allows a greater density, and can still help
> > with most operations, but certainly not all. An index is ascending
> > unless you specifically declare it to be descending. That means
> > that a normal index on a field qty could not help you run the
> > following
> >
> > select max(qty) from tablea
>
> Sorry to be so dense, but I need an explanation for this.
>
> Isn't an index a B*-Tree, so that all you have to do when finding
> the maximum is to descend from each parent node to the leaf node
> that contains the largest values?

Sort of, IBPhoenix has a few excellent articles on the nitty gritty
details of Firebird that can help you.

(http://www.ibphoenix.com/main.nfs?a=ibphoenix&page=ibp_expert1)

Storing an index in this manner makes it more dense. This is a good
thing, because you have fewer I/O reads in using the index. It is also
bad, because it makes it impossible to reversly traverse (is that a
word), go up the index. Trade offs are always a part of the fun.

I am willing to accept that some of the performance issues are to do
with the inefficiencies in the interface, as well as not having the
appropriate directions for each index (a gotcha if you like). Also,
something I didn't even think of earlier, it is possible that the
other systems handle caching better. Unfortunately you can't have your
cake and eat it. Out of interest have you taken a look at Firebird 2
(it is alpha, not production ready). There are certainly good reports
on the performance front.

Cheers

Adam