firebird-support - Re: [firebird-support] MS SQL Vs. Firebird Again

Subject	Re: [firebird-support] MS SQL Vs. Firebird Again
Author	David Johnson
Post date	2004-02-01T11:15Z

----- Original Message -----
From: Helen Borrie
To: firebird-support@yahoogroups.com
Sent: Saturday, January 31, 2004 6:44 PM
Subject: Re: [firebird-support] MS SQL Vs. Firebird Again

At 07:01 PM 30/09/2004 -0700, you wrote:
>I am starting to be of the opinion that the best option is to just use the
>GDS32.dll directly. The components are cute, and work well for
>prototyping, but they appear to be lacking adequate stability for serious
>enterprise scale uses, IMHO.

Sounds a fairly uninformed statement...

dj
Spoken by someone who has used and sworn by (and occasionally at) Borland Pascal Products for almost 20 years, and has been up and down both the published and portions of the unpublished source code in the VCL library. Delphi is still my favorite hammer, but it is not my only one. When you spend more time coding around the bugs than using the tool, it's time to change tools.

What I am trying to accomplish right now seems to be exposing all of the bugs and design weaknesses in Borland's database connectivity approach. Much of my own current frustration comes from trying to use capabilities that I know Interbase/Firebird and other RDBMS have, but that the VCL design principles preclude.

I am still investigating options, but i am becoming disenchanted with the VCL architecture for some purposes including Interbase/Firebird connectivity.
/dj

>Also bear in mind that the timestamp conversion function in gds32.dll
>drops the milliseconds, so you may want to write your own replacement for
>that.

That is not necessary. An external function GetExactTimestamp() is
available in the fbudf library which is deployed with Firebird.
dj
I will check this out - but I think I may have been unclear in my statement. Check below.
/dj

>It's not a big deal, but it'll save you a lot of hunting when you know in
>advance that the fractional seconds are truncated at the GDS32 layer from
>all times that are stored.

Actually, they are truncated on the server - a kind of "lowest common
denominator" to genericise the return value to support OS's that can't
return subseconds. You can store subseconds down to ten-thousandths of a
second and, if you do, the server has no problem processing them and the
client has no problem returning them to the application. It seems to me
you are deriving a lot of your "wisdom" from using bad client tools...
dj
I would hesitate to use wisdom to describe my opinions, but I would agree that my findings to date are that many client side tools are not very good at all. I am quite new to Firebird use, but I am not a novice with RDBMS in general.

The database data structure clearly supports fractional seconds. But you must not use the isc_encode_xxx and isc_decode_xxx functions in the gds for times (the method used by the Borland connectivity options) if you want to preserve that precision in your own code. Notice that the fractional seconds are explicitly not translated. I was not certain whether it was by design or oversight. You are suggesting it was by design.

Here are the code snippets from gds.cpp (Firebird 1.5 RC8 source):

void API_ROUTINE isc_decode_timestamp(GDS_TIMESTAMP * date, void *times_arg)
{
/**************************************
*
* i s c _ d e c o d e _ t i m e s t a m p
*
**************************************
*
* Functional description
* Convert from internal timestamp format to UNIX time structure.
*
* Note: the date arguement is really an ISC_TIMESTAMP -- however the
* definition of ISC_TIMESTAMP is not available from all the source
* modules that need to use isc_encode_timestamp
*
**************************************/
SLONG minutes;
struct tm *times;

times = (struct tm *) times_arg;
memset(times, 0, sizeof(*times));

ndate(date->timestamp_date, times);
times->tm_yday = yday(times);
if ((times->tm_wday = (date->timestamp_date + 3) % 7) < 0)
times->tm_wday += 7;

minutes = date->timestamp_time / (ISC_TIME_SECONDS_PRECISION * 60);
times->tm_hour = minutes / 60;
times->tm_min = minutes % 60;
times->tm_sec = (date->timestamp_time / ISC_TIME_SECONDS_PRECISION) % 60;
}

void API_ROUTINE isc_encode_timestamp(void *times_arg, GDS_TIMESTAMP * date)
{
/**************************************
*
* i s c _ e n c o d e _ t i m e s t a m p
*
**************************************
*
* Functional description
* Convert from UNIX time structure to internal timestamp format.
*
* Note: the date arguement is really an ISC_TIMESTAMP -- however the
* definition of ISC_TIMESTAMP is not available from all the source
* modules that need to use isc_encode_timestamp
*
**************************************/
struct tm *times;

times = (struct tm *) times_arg;

date->timestamp_date = nday(times);
date->timestamp_time =
((times->tm_hour * 60 + times->tm_min) * 60 +
times->tm_sec) * ISC_TIME_SECONDS_PRECISION;
}

/dj

> worst case). Firebird appears to keep its indexes
> well balanced because there was no observable
> performance difference between that experienced
> following an insert of 4,000,000 rows and that
> experienced after dropping and rebuilding the indexes.

It's not always the case. It depends on the index.

dj
I'll have to do more tests to identify where it does not balance indeces.
/dj

> Caution: Beware of the DBXpress and IBXpress
> components used by some Delphi and BCB apps. My
> testing has exposed memory leaks and other serious
> issues with the middle layers,

Agreed, both of these layers have been very poorly architected.

>and the Delphi VCL
> architecture imposes minimum 3% maximum 30% overheads
> on top of the database performance.

That's misleading. The VCL architecture + the Borland implementations for
generic (vendor-independent) data access cause such problems. The
Firebird/IB world is richly provided with vendor-specific connectivity
layers between Delphi/Kylix/BCPP that -- to varying degrees -- avoid the
overhead imposed by the generic VCL data access components. Anyone
considering using the Borland client development environments owes it to
him/herself to evaluate the best of breed from this comprehensive collection.

dj
The source for the 30% was the time spent in moving date first into variants in TParam objects, and then from TParam objects into the connectivity tool's internal parameters analog, before moving the internal params analog into the GDS buffers during an insert operation. The time moving data from the internal params analog into the gds buffers was considered part of the normal cost of any system, equivalent to moving the original data into the buffers, rather than VCL overheads and is not part of that measure.

The overheads are essentially linear with the number of parameters in the query.

Anything using the VCL approach will be limited by the design constraints of the VCL. It provides a sound basis for desktop apps, but the database connectivity architecture does not support reentrancy like a solution architected for server must, and it is not designed for speed.

Since the parameters and the result set are both a part of the query object, only one query instance may be used at a time. A server environment requires that the query code be reentrant so multiple users can address the same (prepared) query concurrently. This is the "stateful versus stateless" arguement of OO programming rehashed, and is really independent of the question of Firebird as a paltfom of choice.

Interbase, on the other hand, is designed for both speed and reentrancy.
/dj

> The MAX, MIN, and Count operators are not as well
> optimized as they could be either. MAX and MIN can at
> least be optimized somewhat be ascending and
> descending indeces where they are required. Count on
> an entire table will always do a table space scan.

That is true; and it behoves the developer to learn the language and
discard a lot of the itchy old tricks that s/he used to have to resort to
in MSSQL. RDBMS's are not meant to be dependent on storage row order so
systems that don't support important relational language features like
existential predicates have to store row-counts and compromise multi-userness.

dj
Like DB2?? :o)

Sorry ... the multi-userness of DB2 on 390 architecture is pretty well documented. My daily work has me dealing with a real-time system that services 8,000 concurrent users on multiple platforms, including Delphi client server, CICS COBOL, J2EE, AS/400, AIX, and others. As a design trade off I can agree that the Interbase and DB2 teams made different decisions, and neither decision is clearly superior to the other in all ways, but "multiuserness" is an excuse rather than a reason. The row counts in DB2, rather than compromising multiuserness, are a key part of their multi-user optimizer strategy. I think that the "compromise multi-userness" part of your statement may need to be struck.

I do agree that the exists predicate against a table or index is an oversight that IBM engineers should rectify.

I also agree, avoid platform specific tricks whenever possible.

Count, at worst, _should_ do an index space scan on the primary key rather than a table space scan. Min and max should be able to operate equally well against ascending and descending indeces. There typically is only one or two more I/O (10 to 20 milliseconds) involved in locating the last entry in a balanced index versus the first entry. These are obvious optimizations, and when I have time I will dig into the optimizer and identify the places to correct this, I will post the suggested change back here. Note that "improve the optimizer" is on the todo list on the Firebird sourceforge page.
/dj

It's a source of comedy to look at so-called "benchmark tests" involving
MSSQL and other DBMS's that poorly implement standards, again Firebird and
others that implement standards better. There is always a "SELECT
COUNT(*)" test in there that the lemmings take to be an indication of
comparative performance. It says nothing about performance, of course, but
it is a self-documenting commentary on the competence of the test designers.

dj
What test designers are looking at in the select count(*) is an indication of the speed of internal lookups of something expected to be in memory already. Essentially, all they expect to see, and what they are trying to isolate, is overheads in the communication subsystem. The actual select is presumed to be so trivial that it would not be measurable.

The reviewer of those benchmarks needs to be aware of what is actually being measured and ask questions to become more informed of what the results actually mean. In the case of Firebird/Interbase, that figure is not kept in memory so it must be computed on demand. Instead of the expected turnaround on the order of nanoseconds, there is a significant delay(10ms per page) while the DBMS performs a table space scan to count records. In these platforms, the test is measuring the speed of a forced table space scan.

Instead of measuring only what it was intended to, it is also serving to identify where the implementers of the different DBMS's made very different design decisions. Identifying those differences in technology is often more important than the raw performance figures.

In this case, may also be identifying a minor weakness in the optimizer. Without changing the basic design premise, Count(*) should run against the index with the smallest footprint that is guaranteed to include pointers to all rows rather than the table space itself.

Another misunderstood benchmark is the random read. Given a sufficiently large table, and in conjunction with other tests, it isolates the indexing subsystem. It is not an indicator of "real life" application, because in real life your selections tend to be clustered together physically. But it is one fairly reliable way of testing the quality of the indexing on a table after various operations since it renders most of the lookahead technology useless. The test has other purposes too, when combined with other tests.

Benchmark tests rarely make sense in broad terms because they are designed to isolate subsystems within a larger system. In real applications you are rarely concerned with a single sub-system. However, when identifying or anticipating bottlenecks and problems, they are useful because they allow you to pinpoint issues based on comparative performance, and to make informed decisions based on your current needs. The key is to become familiar with what the tests are _actually_ measuring rather than to naively presume upon what they appear to be measuring on the surface.
/dj

[Non-text portions of this message have been removed]