Subject | Re: [firebird-support] Re: At my wits end |
---|---|
Author | David Johnson |
Post date | 2004-03-13T13:00:20Z |
Are you sure that the bottleneck is the DBMS? Run time trials with all of the DBMS calls commented out and see what the difference is. I have found, for example, that the Borland TDataset components impose a 30% overhead on my inserts, as compared to a direct DL call. ADO components would have a similar overhead. The overhead is almost all in moving data into and out of the variant type "Value" members of the TParam objects.
Are you using parameterized queries or string concatenation? The prepare can take longer than the query, so parameterizing is a must where performance is important. Also, how often do you "prepare"? You should opnly need to prepare the query once. Make sure that you measure this to ensure that the prepare is not being called repeatedly.
Is it critical that each file be handled at 1500 records per second, or would it be satisfactory for three files handled concurrently at 500 records per second each? Given your figures it sounds like it takes roughly 1 hour per file to submit data to the DBMS.
This is a place where a stored procedure may make sense. Write the parser into a DLL and then call it from a stored procedure. This minimizes the chitchat between the server and your application, and brings everything itno a place where it can be used optimally. Ann or Helen could tell you better how robust the Firebird/Interbase stored procedure system is ... but I suspect that it will support multiple concurrent requests nicely in the superserver architecture.
Would you benefit from multi-threading your parser? If you are parsing text files to the DBMS, then I imagine that there is actually a lot of idle time where the process is waiting on I/O. Monitor the directory that the data is feeding data into. When a file is available, kick off an instance of the parser to process the file. There will come a point of diminishing returns, but parallelism can work wonders.
I have seen a shift to linux double performance on back-end processing. If this holds true in your case, then it may be worth considering.
On high end IBM 390 architecture hardware, we peak at 12,000,000 transactions per hour running flat out, which is twice your requirement. Yes, it's also doing other stuff and the database is several terabytes. But ... your application may ultimately demand something with more oomph than a wintel box. A RISC box with DASD may be more in line with the process requirements than a wintel box with RAID.
To Ann and Helen ... does firebird support a batch insert like DB2 does? Essentially you prepare your insert query with a certain option that I forget now, and then when you open the connection for the insert at the API layer, you just keep populating and feeding the buffer over and over with the new data until you run out of data, then you close the buffer and transaction normally. This eliminates a lot of communication overheads.
After all of this rambling, I think the best bet is to:
1. Move the parser into a DLL that can be called from a stored procedure and is multi-thread safe. The parser will simply read a file and create binary record buffers that can be used by IB/FB. It has (and needs) no "connectivity" components.
2. write the insertion process as a stored procedure that calls the parser DLL to read and extract data from a file then uses IB/FB internals to write the record buffer contents to the database.
The record buffer does not need to be an interbase internal buffer, only something that interbase can populate the parameters of the insert statement within the stored procedure from.
If you need to migrate to linux in the future, refactoring the CSV parser is the only programming work that should be required. That is straightforward in any language.
[Non-text portions of this message have been removed]
Are you using parameterized queries or string concatenation? The prepare can take longer than the query, so parameterizing is a must where performance is important. Also, how often do you "prepare"? You should opnly need to prepare the query once. Make sure that you measure this to ensure that the prepare is not being called repeatedly.
Is it critical that each file be handled at 1500 records per second, or would it be satisfactory for three files handled concurrently at 500 records per second each? Given your figures it sounds like it takes roughly 1 hour per file to submit data to the DBMS.
This is a place where a stored procedure may make sense. Write the parser into a DLL and then call it from a stored procedure. This minimizes the chitchat between the server and your application, and brings everything itno a place where it can be used optimally. Ann or Helen could tell you better how robust the Firebird/Interbase stored procedure system is ... but I suspect that it will support multiple concurrent requests nicely in the superserver architecture.
Would you benefit from multi-threading your parser? If you are parsing text files to the DBMS, then I imagine that there is actually a lot of idle time where the process is waiting on I/O. Monitor the directory that the data is feeding data into. When a file is available, kick off an instance of the parser to process the file. There will come a point of diminishing returns, but parallelism can work wonders.
I have seen a shift to linux double performance on back-end processing. If this holds true in your case, then it may be worth considering.
On high end IBM 390 architecture hardware, we peak at 12,000,000 transactions per hour running flat out, which is twice your requirement. Yes, it's also doing other stuff and the database is several terabytes. But ... your application may ultimately demand something with more oomph than a wintel box. A RISC box with DASD may be more in line with the process requirements than a wintel box with RAID.
To Ann and Helen ... does firebird support a batch insert like DB2 does? Essentially you prepare your insert query with a certain option that I forget now, and then when you open the connection for the insert at the API layer, you just keep populating and feeding the buffer over and over with the new data until you run out of data, then you close the buffer and transaction normally. This eliminates a lot of communication overheads.
After all of this rambling, I think the best bet is to:
1. Move the parser into a DLL that can be called from a stored procedure and is multi-thread safe. The parser will simply read a file and create binary record buffers that can be used by IB/FB. It has (and needs) no "connectivity" components.
2. write the insertion process as a stored procedure that calls the parser DLL to read and extract data from a file then uses IB/FB internals to write the record buffer contents to the database.
The record buffer does not need to be an interbase internal buffer, only something that interbase can populate the parameters of the insert statement within the stored procedure from.
If you need to migrate to linux in the future, refactoring the CSV parser is the only programming work that should be required. That is straightforward in any language.
[Non-text portions of this message have been removed]