Subject RE: [firebird-support] Re: At my wits end
Author Epstein, Ed
>>Are you sure that the bottleneck is the DBMS? Run time trials with all of
the DBMS calls commented out and see what the difference is. I have found,
for example, that the Borland TDataset components impose a 30% overhead on
my inserts, as compared to a direct DL call. ADO components would have a
similar overhead. The overhead is almost all in moving data into and out of
the variant type "Value" members of the TParam objects.

Are you using parameterized queries or string concatenation? The prepare
can take longer than the query, so parameterizing is a must where
performance is important. Also, how often do you "prepare"? You should
opnly need to prepare the query once. Make sure that you measure this to
ensure that the prepare is not being called repeatedly.>>

I had tried tests with some code written to directly access the API to
perform all of the work. That would get rid of all the ADO overhead. That
however turned out to be no different. All the ADO code was prepared only
once and did use parameterized queries as well as string concatenation. Both
methods turned out to be slow. The API tests however were only string
concatenation.

>>Is it critical that each file be handled at 1500 records per second, or
would it be satisfactory for three files handled concurrently at 500 records
per second each? Given your figures it sounds like it takes roughly 1 hour
per file to submit data to the DBMS.>>

If I could split up the job into 3 or 4 different processes that would be
fine. I only need the overall file contents to be uploaded greater than
1500 records per second.

>>This is a place where a stored procedure may make sense. Write the parser
into a DLL and then call it from a stored procedure. This minimizes the
chitchat between the server and your application, and brings everything itno
a place where it can be used optimally. Ann or Helen could tell you better
how robust the Firebird/Interbase stored procedure system is ... but I
suspect that it will support multiple concurrent requests nicely in the
superserver architecture.>>

The parser dll idea sounds good. However, I beleive that it would have to
be written in C if I am not mistaken. Not exactly where my skills lie, and
I don't know if I can invest the time to do it.

>>Would you benefit from multi-threading your parser? If you are parsing
text files to the DBMS, then I imagine that there is actually a lot of idle
time where the process is waiting on I/O. Monitor the directory that the
data is feeding data into. When a file is available, kick off an instance
of the parser to process the file. There will come a point of diminishing
returns, but parallelism can work wonders.>>

Parsing the file actually occurs very fast. I know its VB but I use some
pretty fast file methods. I don't beleive this is where the bottleneck will
lie. The last time I tested the parsing engine out it performed greater
than 10,000 records per second. The entire process occurs in binary format
with no conversions from strings back to numbers. It would be even faster
if VB did not store strings in memory in Unicode format.

>>On high end IBM 390 architecture hardware, we peak at 12,000,000
transactions per hour running flat out, which is twice your requirement.
Yes, it's also doing other stuff and the database is several terabytes. But
... your application may ultimately demand something with more oomph than a
wintel box. A RISC box with DASD may be more in line with the process
requirements than a wintel box with RAID.>>

A lot of people have said similar statements. My only puzzlement with it is
that the other dept., which is entirely c++ based, gets these performance
levels with less hardware than I have. Which indicates to me that its not
the hardware, no matter how much sense it makes. On another note, its not
my decision to get better hardware around here. So if the hardware is
ultimately proven to be lacking for this application then its not my fault
and I have indicated that as time goes on this will only require significant
investment in hardware.







Edwin Epstein
> Continuity Partners Inc.
> 2753 S. Highland Dr.
> Suite 1010
> Las Vegas, Nevada 89109
> (702) 851-7850 Ext. 247
> eepstein@...
>
CONFIDENTIALITY NOTICE: The information contained in this message may be
privileged and confidential and protected from disclosure.
If the reader of this message is not the intended recipient, or an employee
or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is
strictly prohibited. If you have received this communication in error,
please notify us immediately by replying to the message and
deleting it from your computer.




-----Original Message-----
From: David Johnson [mailto:d_johnson@...]
Sent: Saturday, March 13, 2004 5:00 AM
To: firebird-support@yahoogroups.com
Subject: Re: [firebird-support] Re: At my wits end


Are you sure that the bottleneck is the DBMS? Run time trials with all of
the DBMS calls commented out and see what the difference is. I have found,
for example, that the Borland TDataset components impose a 30% overhead on
my inserts, as compared to a direct DL call. ADO components would have a
similar overhead. The overhead is almost all in moving data into and out of
the variant type "Value" members of the TParam objects.

Are you using parameterized queries or string concatenation? The prepare
can take longer than the query, so parameterizing is a must where
performance is important. Also, how often do you "prepare"? You should
opnly need to prepare the query once. Make sure that you measure this to
ensure that the prepare is not being called repeatedly.

Is it critical that each file be handled at 1500 records per second, or
would it be satisfactory for three files handled concurrently at 500 records
per second each? Given your figures it sounds like it takes roughly 1 hour
per file to submit data to the DBMS.

This is a place where a stored procedure may make sense. Write the parser
into a DLL and then call it from a stored procedure. This minimizes the
chitchat between the server and your application, and brings everything itno
a place where it can be used optimally. Ann or Helen could tell you better
how robust the Firebird/Interbase stored procedure system is ... but I
suspect that it will support multiple concurrent requests nicely in the
superserver architecture.

Would you benefit from multi-threading your parser? If you are parsing text
files to the DBMS, then I imagine that there is actually a lot of idle time
where the process is waiting on I/O. Monitor the directory that the data is
feeding data into. When a file is available, kick off an instance of the
parser to process the file. There will come a point of diminishing returns,
but parallelism can work wonders.

I have seen a shift to linux double performance on back-end processing. If
this holds true in your case, then it may be worth considering.

On high end IBM 390 architecture hardware, we peak at 12,000,000
transactions per hour running flat out, which is twice your requirement.
Yes, it's also doing other stuff and the database is several terabytes. But
... your application may ultimately demand something with more oomph than a
wintel box. A RISC box with DASD may be more in line with the process
requirements than a wintel box with RAID.

To Ann and Helen ... does firebird support a batch insert like DB2 does?
Essentially you prepare your insert query with a certain option that I
forget now, and then when you open the connection for the insert at the API
layer, you just keep populating and feeding the buffer over and over with
the new data until you run out of data, then you close the buffer and
transaction normally. This eliminates a lot of communication overheads.

After all of this rambling, I think the best bet is to:

1. Move the parser into a DLL that can be called from a stored procedure and
is multi-thread safe. The parser will simply read a file and create binary
record buffers that can be used by IB/FB. It has (and needs) no
"connectivity" components.
2. write the insertion process as a stored procedure that calls the parser
DLL to read and extract data from a file then uses IB/FB internals to write
the record buffer contents to the database.

The record buffer does not need to be an interbase internal buffer, only
something that interbase can populate the parameters of the insert statement
within the stored procedure from.

If you need to migrate to linux in the future, refactoring the CSV parser is
the only programming work that should be required. That is straightforward
in any language.

[Non-text portions of this message have been removed]




Yahoo! Groups Links