Subject | Yet another "which architecture" question |
---|---|
Author | Tim Ward |
Post date | 2015-02-17T16:51:14Z |
Sorry, you must all be really bored with these, but I just can't get my
head round it.
We're looking to upgrade our (Linux) operating system, and with that is
likely to come an upgrade from 32 bit Firebird 2.1 Superserver to 64 bit
Firebird 2.5, but what architecture?
We need more performance than we're getting at the moment, a factor of
10 would be a nice start.
We've got two types of workload
(a) Web accesses - short-lived connections that run large queries, which
in our experience really do benefit from having thousands of pages in
the Firebird cache. There are typically multiple queries in the same
connection that use the same data. Concurrent queries from multiple
users, or subsequent queries from the same users, are also very likely
to use some thousands of the same pages that were used by immediately
preceding queries. There are a small number of simultaneous users, say
from two to a dozen.
(b) Persistent links to other systems. I believe that these need
considerably less in the cache, but share some of their cache
requirements with the web accesses. These typically perform large
numbers of small queries, as opposed to the small number of large
queries that the web accesses involve. There are single figures of these
connections.
We observe that when we've got performance problems one core is
saturated, for seconds or even minutes at a time, and that seems to be
the bottleneck. (Yes we keep nibbling away, and sometimes taking large
bites, at schema design and query optimisation etc, and we will continue
to do so, but we don't think that can be the complete solution.)
So for 64 bit 2.5 the choices seem to be
- Superserver, which keeps us our large shared cache, but will (still)
only use one core (of the many available on the hardware)?
- one of the other two architectures, which will use more cores, but may
simply occupy them re-reading the same thousands of pages time and time
again from the database into the cache?
Have I got that right so far?
I've read stuff about keeping persistent connections from PHP to the
database to maintain the state of the private cache in a non-Superserver
setup, and how that causes all sorts of problems if one web access falls
over, resulting in a failure to rollback the transaction automatically
on script termination. So, not recommended. I've also read stuff about
how not having a large shared cache doesn't actually matter, because
you're not reading from the disk into cache anyway, you're only reading
from the filing system's cache into the database server's cache. This
latter point does make sense cf some testing and measuring we've done,
but surely it still takes lots of CPU cycles to copy thousands of pages
from one cache into the other over and over again?
Yes I know we could "just try the various architectures and measure the
results", but it's not as simple as that. We want to do some
restructuring of our system anyway, and options include putting bits of
it onto different boxes, and splitting some tables into separate
databases, and possibly putting different database servers onto
different boxes, and stuff like that, which gives rise to an impractical
number of combinations of systems to design, code and test - having to
only design and code one system would go down best with management.
So ... ... ... ???
Oh, and if we do use one of the private-non-persistent-cache
architectures, can we set the cache size differently to suit different
connections with radically different types of workload?
Or have I completely misunderstood the whole business?
--
Tim Ward
head round it.
We're looking to upgrade our (Linux) operating system, and with that is
likely to come an upgrade from 32 bit Firebird 2.1 Superserver to 64 bit
Firebird 2.5, but what architecture?
We need more performance than we're getting at the moment, a factor of
10 would be a nice start.
We've got two types of workload
(a) Web accesses - short-lived connections that run large queries, which
in our experience really do benefit from having thousands of pages in
the Firebird cache. There are typically multiple queries in the same
connection that use the same data. Concurrent queries from multiple
users, or subsequent queries from the same users, are also very likely
to use some thousands of the same pages that were used by immediately
preceding queries. There are a small number of simultaneous users, say
from two to a dozen.
(b) Persistent links to other systems. I believe that these need
considerably less in the cache, but share some of their cache
requirements with the web accesses. These typically perform large
numbers of small queries, as opposed to the small number of large
queries that the web accesses involve. There are single figures of these
connections.
We observe that when we've got performance problems one core is
saturated, for seconds or even minutes at a time, and that seems to be
the bottleneck. (Yes we keep nibbling away, and sometimes taking large
bites, at schema design and query optimisation etc, and we will continue
to do so, but we don't think that can be the complete solution.)
So for 64 bit 2.5 the choices seem to be
- Superserver, which keeps us our large shared cache, but will (still)
only use one core (of the many available on the hardware)?
- one of the other two architectures, which will use more cores, but may
simply occupy them re-reading the same thousands of pages time and time
again from the database into the cache?
Have I got that right so far?
I've read stuff about keeping persistent connections from PHP to the
database to maintain the state of the private cache in a non-Superserver
setup, and how that causes all sorts of problems if one web access falls
over, resulting in a failure to rollback the transaction automatically
on script termination. So, not recommended. I've also read stuff about
how not having a large shared cache doesn't actually matter, because
you're not reading from the disk into cache anyway, you're only reading
from the filing system's cache into the database server's cache. This
latter point does make sense cf some testing and measuring we've done,
but surely it still takes lots of CPU cycles to copy thousands of pages
from one cache into the other over and over again?
Yes I know we could "just try the various architectures and measure the
results", but it's not as simple as that. We want to do some
restructuring of our system anyway, and options include putting bits of
it onto different boxes, and splitting some tables into separate
databases, and possibly putting different database servers onto
different boxes, and stuff like that, which gives rise to an impractical
number of combinations of systems to design, code and test - having to
only design and code one system would go down best with management.
So ... ... ... ???
Oh, and if we do use one of the private-non-persistent-cache
architectures, can we set the cache size differently to suit different
connections with radically different types of workload?
Or have I completely misunderstood the whole business?
--
Tim Ward