Subject Re: [firebird-support] Scalability principles: Can someone confirm or correct ...
Author David Johnson
On Tue, 2005-02-08 at 20:57, Dalton Calford wrote:
>
>
> Hi David,
>
snip ...

> So, hardware, I have to deal with it day in and day out. But, I have
> always thrown our developers at the oldest, slowest and most pitiful
> equipment I can find. I just had a party, we retired our test box #6, a
> 486dx66 - running linux and interbase 4.0g.

I agree with initial low-level application performance testing on slow
boxes. For most applications, if it performs adequately on older slower
hardware, it should be good for a few years on modern of future
hardware. It is also much easier to see performance improvements on
slow hardware.

However, high level tuning is a different issue. A well designed
application is tunable to maximize its use of the hardware facilities
when performance is really critical. A better designed application is
self tuning - but that is much harder to implement. Firebird seems to
implement a little of both approaches - configuration for coarse tuning,
then internal instrumentation for fine tuning.

I have one process right now that, when tuned to optimally use my
desktop PC, it is using less than 25% of the capacity of a server class
machine. When tuned to optimally use the server class machine, it bogs
down on my desktop. The tuning is all in a configuration file, so the
solution is obvious, but the lesson here is the same as you pointed out
about my original question - you can't generalize.

>
> Our list of design specifications is called the bible, and the number of
> fights that have been started over a developer trying to cut time by
> skipping a few steps, just to loose their bonus, and possibly thier jobs.

We have one of those too, but it is a typical committee document :o(
It is concerned with producing paper that no one reads, not product.

>
> The approach we take is very mechanical, we take a result set that the
> process would return, and we insert it into a simple table and we
> perform a simple select from the table.
> This gives us a flat line benchmark based upon drive speed, i/o speed,
> network speed and client speed.
> Every process that will manipulate the data is measured against this
> benchmark.

If you don't mind, I will submit this simple gem to our performance and
tuning team.

> We aim to have enough hardware resources that once the system enters
> production, that our CPU and drive utilization do not exceed 25% of
> total load.

In contrast, our CPU utilization target is no less than 90% and our DASD
utilization is not published.

> But, none of this has anything to do with scalability, as it is specific
> to the current immediate needs of the design.
> The true test is when you have deployed applications to 250 desktops,
> and a new requirement for a small department comes up.
> You should be able to make the changes, deploy the new application, and
> not need to replace any of the existing applications.

Agreed - in one sense. Scalability is an overused term that means many
things, depending on the context. You are using it in the context of
the data/application model being responsive to the business needs. I am
using it in the sense that a well defined model that already meets this
definition can support the raw processing needs of 1000 to 8000
concurrent users of real-time data, plus automated real time feeds from
all of our equipment, plus trade partners, plus ... well, you get the
idea.

> Now, you need to ensure that the users who depend upon laptops are able
> to work offline, again, the design should simply scale to the new
> requirments.

Agreed ... except that our business needs dictate that all transactions
must be online. You can't get real-time information about equipment X
in Chicago if you are off-line in Anaheim. Of course, in Anaheim you
probably would be doing everything possible to avoid getting on-line
(oops, I dropped my laptop in the Pirates of the Carribean).

> That is the key, scaling to the changing environments of the
> environments, whether it is pure horsepower needed due to increasing
> load, or, scaling to support totally new uses.
> Hardware, Operating Systems, Specifications, all are irrelavent.

Not always true. With some problems, the ability to harness raw
horsepower efficiently becomes the central issue. My last two (and next
upcoming) projects are prime examples of this.

In my next project, I have to move somewhere between 240,000,000 and
9,600,000,000 rows of real-time data from our current system to a very
different third party system, changing the database schema along the
way, in a two hour window. (Hows that for a "precise" scope
definition?) The translation scope will not be fully known until a few
hours before (or after) we actually go live. Every minute of downtime,
after the go-live time, costs us tens of thousands of dollars. A
relatively small number of incremental optimizations makes a huge
impact.

Moving this application from desktop to single Intel Hyperthreaded
server, with the same clock speed, more than doubles the throughput of
this application and leaves cycles to spare. A dual Hyperthreaded CPU
server doubles the potential performance yet again (if I had sufficient
test cases ready to really work the box).

The move from wintel to AIX/PPC4 architectures is empirically tested to
grant an order of magnitude improvement for this application - so my
process that, fully optimized, would take 20 hours to run on Wintel
hardware, will run in 2 hours on the PPC4 platform. If we open up the
LPAR and give my process all of the CPU's, we can dramatically improve
the throughput yet more.

In short, I can make incremental performance improvements through
software changes. I can improve performance on small enough datasets (a
few gigabytes) by distributing the work across a couple of dozen server
class machines (horizontally scaling). But, for single datasets that are
too large to handle on a server class machine in two hours, I can
improve throughput by an order of magnitude by simply moving to a more
capable box that costs the equivalent of 4 minutes of downtime
(vertically scaling).

>
> As for marketing, it is important, but, less so for open source tools
> than for commercial tools.

I will agree to disagree here. :o)

> The basic truth is, thier are very few programmers out there. The
> schools churn out people who have no natural ability and who are best
> promoted into management as they are useless as developers. (in case you
> are wondering, that is what has happened to me)

Sad, but true.

>
> So, the best marketing we can do right now, is make tools and manuals
> that teach good design practises.

That is a good step, but if we are to get the tools accepted in the
corporations we need to be able to sell them to our non-technical
managers. They expect "glossy 1" and a set of cheat ... er ... industry
benchmarks.

> If you want to update the Some Solutions document, I would be happy to
> help, unfortunately, life has a bad habit of overiding my best
> intentions (my wife and two daughters have a priority on my time)

Hear, hear!

>
> best regards
>
> Dalton
>
>
> David Johnson wrote:
>
> >Thanks ... I missed your message - I apologize. One of the drawbacks of
> >a good mailing list is that solid messages like yours sometimes get
> >missed in the sheer volume of mail. I will certainly read the article
> >you have so kindly referred me to.
> >
> >Your response to my original question is accurate and to the point.
> >Without knowing my particular application's parameters, accurate
> >information for scaling and limits are impossible to generate.
> >
> >I also work in a large scaled environment, so I have seen both good and
> >bad examples of scalability in design. I concur, wholeheartedly, that
> >bad application and data model design is a greater limit to scalability
> >than anything else.
> >
> >My recent experience of local RAID versus SAN is exactly counter to
> >yours - by an order of magnitude. We must be seeing of very different
> >hardware configurations. Rather than providing a point of contention,
> >this highlights the limits that "generalities" place on the guideline
> >that I would like to see.
> >
> >Once the data model and application software are optimized as far as
> >they can go, the hardware capabilities, or more specifically the DBMS'
> >ability to maximize resource usage, plays a very significant role in the
> >scalability of a database application.
> >
> >Each connection uses _X_ amount of memory. A three level B+ tree search
> >with compressed index headers requires _Y_ CPU cycles on a reference
> >architecture. In order to achieve a response time of 1 second for
> >performing a reference query joining three reference tables on primary
> >keys only for _N_ concurrent requests, you need (some set) of resources.
> >
> >Your telephony application is the sort of large scale application that
> >catches my attention. Once I have read your article, may I post
> >directly to you if I have questions?
> >
> >Re "... overall generalities do nothing but satify the needs of a
> >marketing dept.":
> >
> >(1) the marketing department needs to sell the product if we are to get
> >paid, and (2) someone needs to be able to sell the DBMS to the
> >management types if we are to use it in our corporate environment.
> >
> >A company or product that is built solely on technical merit is more
> >likely to fail than one that has little technical merit but good
> >marketing. OS/2 versus Windows illustrates that issue very
> >dramatically. Firebird's history and sojurn at Borland is another more
> >immediate and equally dramatic example of this principle in action.
> >
> >On Tue, 2005-02-08 at 19:38, Dalton Calford wrote:
> >
> >
> >>David,
> >>
> >>Have you read the "Some Solutions to Old Problems" document. Helen
> >>Borrie was kind enough to edit a bunch of my pondering on the subject of
> >>scalability.
> >>It covers some hardware and software design criteria depending upon the
> >>needs of the user. That document is getting dated, but, it does cover
> >>some of your requirements.
> >>
> >>I did respond to your earlier email. I stated what sort of environment
> >>I work and develop in. I have real world working experience with
> >>managing terabytes of data, in intensive work environments and multiple
> >>concurrent users - all with interbase 5.6, firebird 1.0 and firebird 1.5x.
> >>
> >>I know the capabilities of the system, and its limitations and I can say
> >>that your estimates are out to lunch. The cpu and memory estimates are
> >>totally wrong while the underlying subsystem and its design is more
> >>important than the cpu or ram. San storage is too slow for performance
> >>systems while raid has both benefits and drawbacks.
> >>
> >>You are asking for information about hardware scalability when I have
> >>seen the simple elimination of auto generated domains make a significant
> >>performance change.
> >>
> >>Scalability has very little to do with hardware. It has to do with
> >>system design. A proper design will allow a system to scale into uses
> >>not envisioned by the original design team. A simple design rule that
> >>specifies that all primary keys are to be surrogate keys and that no
> >>links are to be based upon user viewed or modifiable data will give
> >>amazing growth capabilities. The rule that no end user application may
> >>touch a table directly, and that all data is to be modified via
> >>procedures, allows new functionality to be added to the database without
> >>any fear of having an impact on legacy client applications.
> >>
> >>Please, specify exactly what you need to know. If you want to know what
> >>hardware is needed, give me the current needs and I can specify the
> >>hardware. When the needs change, you can easily buy different hardware
> >>and migrate a good database design onto the new platform without
> >>problem. If you have a bad design, no amount of hardware is going to
> >>make it sing.
> >>
> >>Now, as to multi-threading and symmetrical multi-processing, well, that
> >>is as flexible as the hardware is. You decide upon the environment that
> >>meets the current needs.
> >>If the needs change, a good design will migrate without any changes - it
> >>will simply work.
> >>
> >>Now, do you begin to understand why your questions about scalability are
> >>meaningless? Any answer given will only apply to a small very select
> >>set of environments and that will be misleading to anyone who is not in
> >>that small group.
> >>
> >>Whereas, a question as to how to design scalable systems, that can move
> >>from different architectures without a problem, that soft of information
> >>is useful.
> >>
> >>best regards
> >>
> >>Dalton
> >>
> >>PS, our largest system, has 4 processors and 6 GB of ram, and that
> >>system processes more records in an hour than most systems process in a
> >>year........
> >>
> >>
> >>David Johnson wrote:
> >>
> >>
> >>
> >>>Although no one has responded directly to my original question about
> >>>scalability, there have been a number of answers posted to other
> >>>question that pertain to my question, and articles on the web that
> >>>indirectly touch on it. Can someone confirm or correct what I think I
> >>>am learning?
> >>>
> >>>The "classic" model defines the heavy weight vertically scalable
> >>>configuration for firebird. It is most appropriate for scaling for
> >>>environments that are architecturally similar to the original VAX
> >>>cluster, such as some recent developments in the Opteron world. It is
> >>>also more appropriate where: higher performance is required, the number
> >>>of concurrent connections is on the same order as the number of CPU's,
> >>>and there is lots of available memory to run the heavyweight connection
> >>>processes.
> >>>
> >>>The "superserver" model defines a light weight scaling configuration
> >>>that is most appropriate for environments that are architecturally more
> >>>similar to the Intel hyperthreading model, or at least dis-similar to
> >>>the VAX cluster architecture. Superserver will allow a well built
> >>>application to function against a lighter weight server instance, with
> >>>only a limited performance penalty. Superserver is less robust, and
> >>>demands that application code pay more attention to thread safety since
> >>>connections and connection resources tend to be pooled.
> >>>
> >>>Based on these principles, recent notes in the mailing list that a low
> >>>end server class machine can safely handle 50 users with 2 CPU's and 2
> >>>GB of memory, and empirical knowledge of some relative platform scales,
> >>>I have some preliminary "guesstimates" for scaling the classic model:
> >>>
> >>>user count CPU count/arch RAM size OS Storage
> >>>50 users 2/Intel 2GB RAM linux, Windows RAID
> >>>100 users 2/IBM PPC4 4 GB RAM linux, AIX RAID
> >>>500 users 6/IBM PPC4 6 GB RAM linux, AIX SAN
> >>>1000 users 16/IBM PPC4 8 GB RAM linux, AIX SAN
> >>>10000 users ?/ES9000*? 128 GB RAM linux, AIX SAN
> >>>
> >>>* Note that scaling to the ES/9000 may not be at all straightforward
> >>>
> >>>
> >>>
> >>>
> >>>Yahoo! Groups Links
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>Yahoo! Groups Links
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >Yahoo! Groups Links
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>