Subject Re: [firebird-support] Scalability principles: Can someone confirm or correct ...
Author Dalton Calford
Hi David,

I have 4 racks of san devices, we use them strictly for historic
databases. Our hardware requirements are very precise and tested for
months before they are allowed in the test lab (five 9's reliability
forces us to be very unforgiving to any hardware).
We found that a raid array, used as it is meant to, performs excellently
for data retrieval and failure recovery - although, unlike most
companies, our raids are mirrored, raided raid (a raid 1 of five raid 5
of 5 raid 5 or more clearly, a rack full of drives with gigabytes of
cache and dedicated canisters). The off the shelf san solutions (we
have gone through a few) are not accessed correctly with classic and
thier network speed is not sufficient when dealing with a query that may
process 30 GB of data.

It is funny, our pop in Toronto, has a airconditioning system that is
bigger than my office, yet, our landlord is just barely meeting our
temperature requirements.

So, hardware, I have to deal with it day in and day out. But, I have
always thrown our developers at the oldest, slowest and most pitiful
equipment I can find. I just had a party, we retired our test box #6, a
486dx66 - running linux and interbase 4.0g.

Our list of design specifications is called the bible, and the number of
fights that have been started over a developer trying to cut time by
skipping a few steps, just to loose their bonus, and possibly thier jobs.

The approach we take is very mechanical, we take a result set that the
process would return, and we insert it into a simple table and we
perform a simple select from the table.
This gives us a flat line benchmark based upon drive speed, i/o speed,
network speed and client speed.
Every process that will manipulate the data is measured against this
benchmark.
We aim to have enough hardware resources that once the system enters
production, that our CPU and drive utilization do not exceed 25% of
total load.
But, none of this has anything to do with scalability, as it is specific
to the current immediate needs of the design.
The true test is when you have deployed applications to 250 desktops,
and a new requirement for a small department comes up.
You should be able to make the changes, deploy the new application, and
not need to replace any of the existing applications.
Now, you need to ensure that the users who depend upon laptops are able
to work offline, again, the design should simply scale to the new
requirments.
That is the key, scaling to the changing environments of the
environments, whether it is pure horsepower needed due to increasing
load, or, scaling to support totally new uses.
Hardware, Operating Systems, Specifications, all are irrelavent.

As for marketing, it is important, but, less so for open source tools
than for commercial tools.
The basic truth is, thier are very few programmers out there. The
schools churn out people who have no natural ability and who are best
promoted into management as they are useless as developers. (in case you
are wondering, that is what has happened to me)

So, the best marketing we can do right now, is make tools and manuals
that teach good design practises.
If you want to update the Some Solutions document, I would be happy to
help, unfortunately, life has a bad habit of overiding my best
intentions (my wife and two daughters have a priority on my time)

best regards

Dalton


David Johnson wrote:

>Thanks ... I missed your message - I apologize. One of the drawbacks of
>a good mailing list is that solid messages like yours sometimes get
>missed in the sheer volume of mail. I will certainly read the article
>you have so kindly referred me to.
>
>Your response to my original question is accurate and to the point.
>Without knowing my particular application's parameters, accurate
>information for scaling and limits are impossible to generate.
>
>I also work in a large scaled environment, so I have seen both good and
>bad examples of scalability in design. I concur, wholeheartedly, that
>bad application and data model design is a greater limit to scalability
>than anything else.
>
>My recent experience of local RAID versus SAN is exactly counter to
>yours - by an order of magnitude. We must be seeing of very different
>hardware configurations. Rather than providing a point of contention,
>this highlights the limits that "generalities" place on the guideline
>that I would like to see.
>
>Once the data model and application software are optimized as far as
>they can go, the hardware capabilities, or more specifically the DBMS'
>ability to maximize resource usage, plays a very significant role in the
>scalability of a database application.
>
>Each connection uses _X_ amount of memory. A three level B+ tree search
>with compressed index headers requires _Y_ CPU cycles on a reference
>architecture. In order to achieve a response time of 1 second for
>performing a reference query joining three reference tables on primary
>keys only for _N_ concurrent requests, you need (some set) of resources.
>
>Your telephony application is the sort of large scale application that
>catches my attention. Once I have read your article, may I post
>directly to you if I have questions?
>
>Re "... overall generalities do nothing but satify the needs of a
>marketing dept.":
>
>(1) the marketing department needs to sell the product if we are to get
>paid, and (2) someone needs to be able to sell the DBMS to the
>management types if we are to use it in our corporate environment.
>
>A company or product that is built solely on technical merit is more
>likely to fail than one that has little technical merit but good
>marketing. OS/2 versus Windows illustrates that issue very
>dramatically. Firebird's history and sojurn at Borland is another more
>immediate and equally dramatic example of this principle in action.
>
>On Tue, 2005-02-08 at 19:38, Dalton Calford wrote:
>
>
>>David,
>>
>>Have you read the "Some Solutions to Old Problems" document. Helen
>>Borrie was kind enough to edit a bunch of my pondering on the subject of
>>scalability.
>>It covers some hardware and software design criteria depending upon the
>>needs of the user. That document is getting dated, but, it does cover
>>some of your requirements.
>>
>>I did respond to your earlier email. I stated what sort of environment
>>I work and develop in. I have real world working experience with
>>managing terabytes of data, in intensive work environments and multiple
>>concurrent users - all with interbase 5.6, firebird 1.0 and firebird 1.5x.
>>
>>I know the capabilities of the system, and its limitations and I can say
>>that your estimates are out to lunch. The cpu and memory estimates are
>>totally wrong while the underlying subsystem and its design is more
>>important than the cpu or ram. San storage is too slow for performance
>>systems while raid has both benefits and drawbacks.
>>
>>You are asking for information about hardware scalability when I have
>>seen the simple elimination of auto generated domains make a significant
>>performance change.
>>
>>Scalability has very little to do with hardware. It has to do with
>>system design. A proper design will allow a system to scale into uses
>>not envisioned by the original design team. A simple design rule that
>>specifies that all primary keys are to be surrogate keys and that no
>>links are to be based upon user viewed or modifiable data will give
>>amazing growth capabilities. The rule that no end user application may
>>touch a table directly, and that all data is to be modified via
>>procedures, allows new functionality to be added to the database without
>>any fear of having an impact on legacy client applications.
>>
>>Please, specify exactly what you need to know. If you want to know what
>>hardware is needed, give me the current needs and I can specify the
>>hardware. When the needs change, you can easily buy different hardware
>>and migrate a good database design onto the new platform without
>>problem. If you have a bad design, no amount of hardware is going to
>>make it sing.
>>
>>Now, as to multi-threading and symmetrical multi-processing, well, that
>>is as flexible as the hardware is. You decide upon the environment that
>>meets the current needs.
>>If the needs change, a good design will migrate without any changes - it
>>will simply work.
>>
>>Now, do you begin to understand why your questions about scalability are
>>meaningless? Any answer given will only apply to a small very select
>>set of environments and that will be misleading to anyone who is not in
>>that small group.
>>
>>Whereas, a question as to how to design scalable systems, that can move
>>from different architectures without a problem, that soft of information
>>is useful.
>>
>>best regards
>>
>>Dalton
>>
>>PS, our largest system, has 4 processors and 6 GB of ram, and that
>>system processes more records in an hour than most systems process in a
>>year........
>>
>>
>>David Johnson wrote:
>>
>>
>>
>>>Although no one has responded directly to my original question about
>>>scalability, there have been a number of answers posted to other
>>>question that pertain to my question, and articles on the web that
>>>indirectly touch on it. Can someone confirm or correct what I think I
>>>am learning?
>>>
>>>The "classic" model defines the heavy weight vertically scalable
>>>configuration for firebird. It is most appropriate for scaling for
>>>environments that are architecturally similar to the original VAX
>>>cluster, such as some recent developments in the Opteron world. It is
>>>also more appropriate where: higher performance is required, the number
>>>of concurrent connections is on the same order as the number of CPU's,
>>>and there is lots of available memory to run the heavyweight connection
>>>processes.
>>>
>>>The "superserver" model defines a light weight scaling configuration
>>>that is most appropriate for environments that are architecturally more
>>>similar to the Intel hyperthreading model, or at least dis-similar to
>>>the VAX cluster architecture. Superserver will allow a well built
>>>application to function against a lighter weight server instance, with
>>>only a limited performance penalty. Superserver is less robust, and
>>>demands that application code pay more attention to thread safety since
>>>connections and connection resources tend to be pooled.
>>>
>>>Based on these principles, recent notes in the mailing list that a low
>>>end server class machine can safely handle 50 users with 2 CPU's and 2
>>>GB of memory, and empirical knowledge of some relative platform scales,
>>>I have some preliminary "guesstimates" for scaling the classic model:
>>>
>>>user count CPU count/arch RAM size OS Storage
>>>50 users 2/Intel 2GB RAM linux, Windows RAID
>>>100 users 2/IBM PPC4 4 GB RAM linux, AIX RAID
>>>500 users 6/IBM PPC4 6 GB RAM linux, AIX SAN
>>>1000 users 16/IBM PPC4 8 GB RAM linux, AIX SAN
>>>10000 users ?/ES9000*? 128 GB RAM linux, AIX SAN
>>>
>>>* Note that scaling to the ES/9000 may not be at all straightforward
>>>
>>>
>>>
>>>
>>>Yahoo! Groups Links
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>Yahoo! Groups Links
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
>
>Yahoo! Groups Links
>
>
>
>
>
>
>
>
>
>