Subject | Re: Blob Parsing Stored Procedure |
---|---|
Author | lucas_jessee |
Post date | 2007-07-05T13:25:38Z |
Sorry the example I quoted at 70 seconds was actually around 250000
inserts. The more common case of around 22000 inserts I just tested
at 5 seconds. But I would still like any input on optimizations for
the large cases to avoid unnecessary waits.
In either case, when I serialize all of the 22000 or 250000 records
into a single BLOB and do 1 insert, it is much faster. Therefore
telling me that the roundtrips to the database through SQLExecute each
time are a large part of the performance overhead. Maybe the
additional overhead is in the ODBC driver's implementation of
SQLExecute. However, I have no choice but to use ODBC, unless there
are other cross platform, multi-database, C++ interfaces that I'm not
aware of.
Thanks,
Luke
--- In firebird-support@yahoogroups.com, "lucas_jessee"
<lucas_jessee@...> wrote:
inserts. The more common case of around 22000 inserts I just tested
at 5 seconds. But I would still like any input on optimizations for
the large cases to avoid unnecessary waits.
In either case, when I serialize all of the 22000 or 250000 records
into a single BLOB and do 1 insert, it is much faster. Therefore
telling me that the roundtrips to the database through SQLExecute each
time are a large part of the performance overhead. Maybe the
additional overhead is in the ODBC driver's implementation of
SQLExecute. However, I have no choice but to use ODBC, unless there
are other cross platform, multi-database, C++ interfaces that I'm not
aware of.
Thanks,
Luke
--- In firebird-support@yahoogroups.com, "lucas_jessee"
<lucas_jessee@...> wrote:
>FK to
> I'll let you all under the hood a bit more. Here is some terribly
> edited pseudo-C++ code that I created based on my situation. Please
> ignore any silly syntax errors as I threw this together quickly. Also
> assume that some of the variables (particularly the ones starting with
> m_) are class variables and are thus not defined in the scope shown.
> The data is in a large vector (size varies greatly....for the example
> I described it is about 25000, but I'd say it can be anywhere from
> 0-200000 easily).
>
> When the connection is initially made (only once), I do this so that
> it does not autocommit.
> ret = SQLSetConnectAttr(m_dbc, SQL_ATTR_AUTOCOMMIT,
> SQL_AUTOCOMMIT_OFF, 0);
>
> Then this function is called and assume it has the vector.
> PARENT_ITEM_ID is a FK into the other, larger table. ITEM_TYPE_ID is
> an FK into a small type table (ie 1 = "Type One", 2 = "Type Two", etc).
>
> ErrorType dbClass::storeItems()
> {
> SQLRETURN ret;
> SQLCHAR *stmt;
> SQLHANDLE hstmt;
> UINT64 ParentItemId = m_ID;
> UINT32 ItemBool;
> UINT32 ItemType = m_Type;
> UINT64 ItemIntOne;
> UINT64 ItemIntTwo;
> SQLINTEGER ParentItemIdSize = sizeof(ParentItemId);
> SQLINTEGER ItemBoolSize = sizeof(ItemBool);
> SQLINTEGER ItemTypeSize = sizeof(ItemType);
> SQLINTEGER ItemIntOneSize = sizeof(ItemIntOne);
> SQLINTEGER ItemIntTwoSize = sizeof(ItemIntTwo);
>
> SQLRETURN ret = SQLAllocHandle(SQL_HANDLE_STMT,
> m_dbConnect->getDBHandle(), &hstmt);
>
> if(m_itemList.size() < 1)
> return SUCCESS;
>
> stmt = (SQLCHAR *)"INSERT INTO MY_TABLE(PARENT_ITEM_ID,
> ITEM_TYPE_ID, ITEM_INT_ONE, ITEM_INT_TWO, ITEM_BOOL) VALUES(?, ?, ?,
> ?, ?);";
>
> ret = SQLPrepare(hstmt, stmt, SQL_NTS);
> if(!SQL_SUCCEEDED(ret))
> { cout << m_dbConnect->getStatementError(hstmt) << endl;
> return ERROR;
> }
>
> ret = SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_UBIGINT,
> SQL_BIGINT, 0, 0, &ParentItemId, 0, &ParentItemIdSize);
> if(!SQL_SUCCEEDED(ret))
> { cout << m_dbConnect->getStatementError(hstmt) << endl;
> return ERROR;
> }
>
> ret = SQLBindParameter(hstmt, 2, SQL_PARAM_INPUT, SQL_C_LONG,
> SQL_SMALLINT, 0, 0, &ItemType, 0, &ItemTypeSize);
> if(!SQL_SUCCEEDED(ret))
> { cout << m_dbConnect->getStatementError(hstmt) << endl;
> return ERROR;
> }
>
> ret = SQLBindParameter(hstmt, 3, SQL_PARAM_INPUT, SQL_C_UBIGINT,
> SQL_BIGINT, 0, 0, &ItemIntOne, 0, &ItemIntOneSize);
> if(!SQL_SUCCEEDED(ret))
> { cout << m_dbConnect->getStatementError(hstmt) << endl;
> return ERROR;
> }
>
> ret = SQLBindParameter(hstmt, 4, SQL_PARAM_INPUT, SQL_C_UBIGINT,
> SQL_BIGINT, 0, 0, &ItemIntTwo, 0, &ItemIntTwoSize);
> if(!SQL_SUCCEEDED(ret))
> { cout << m_dbConnect->getStatementError(hstmt) << endl;
> return ERROR;
> }
>
> ret = SQLBindParameter(hstmt, 5, SQL_PARAM_INPUT, SQL_C_LONG,
> SQL_SMALLINT, 0, 0, &ItemBool, 0, &ItemBoolSize);
> if(!SQL_SUCCEEDED(ret))
> { cout << m_dbConnect->getStatementError(hstmt) << endl;
> return ERROR;
> }
>
> for(int i=0; i<m_itemList.size();i++)
> {
> ItemBool = m_itemList[i].itemBool ? 1:0;
> ItemIntOne = m_itemList[i].itemIntOne;
> ItemIntTwo = m_itemList[i].itemIntTwo;
> ret = SQLExecute(hstmt);
> if(!SQL_SUCCEEDED(ret))
> { cout << m_dbConnect->getStatementError(hstmt) << endl;
> return ERROR;
> }
> }
>
> SQLFreeHandle(SQL_HANDLE_STMT,hstmt);
>
> return SUCCESS;
> }
>
> Then some time later, when the application terminates, this is called.
>
> SQLEndTran(SQL_HANDLE_DBC, myDB->getDBHandle(), SQL_COMMIT);
>
> I worked and researched this quite a bit, I really feel that the above
> code is optimal, but feel free to pick apart my theory.
>
> Helen, I don't doubt that my performance issues may be caused by the
> ODBC driver itself. The driver seems to have a problem with Embedded
> Databases. Unfortunately that's what I'm using in this case. If I
> have time (which I don't right now) I might try this same test on a
> local super-server database and see if the performance is the same. I
> will continue to try to get those guys to answer my questions and
> follow up.
>
> Thanks in advance,
> Luke
>
> --- In firebird-support@yahoogroups.com, Helen Borrie <helebor@> wrote:
> >
> > At 11:13 PM 3/07/2007, you wrote:
> > >Hello all,
> > >I've got a lot of small inserts to do and its just taking much longer
> > >than I'm willing to wait for (even with an embedded DB). Basically
> > >these inserts are small records (roughly 4 BIGINT's) that have a
> > >another larger table.manner.
> > >
> > >If I'm reading everything correctly, Firebird does NOT support bulk
> > >inserts and anything close to it is kind of a hack.
> >
> > No, that is quite untrue. However, it is not always advisable to
> > push huge numbers of inserts through in a single transaction if
> > you're fairly limited on resources. It can be a good practice to
> > batch your inserts at around 10,000 per transaction.
> >
> > You don't seem to have provided any comprehensible information so
> > far, such as where the data for the inserts is coming from or how
> > many records you are processing.
> >
> > >I was able to
> > >create a stored procedure that took 10x the parameters and did 10
> > >inserts at once. I saw a decent performance gain from this.
> >
> > Sorry, I don't understand the statement above.
> >
> >
> > >What I would really like to do is to send a single BLOB to a stored
> > >procedure. This blob would be a Binary data structure that would
> > >contain N 32 byte blocks for this example. The first 32 byte block
> > >would be the first 4 BIGINT's to insert into the table, the second
> > >would be the next 4 BIGINT's, etc.
> > >
> > >Can a stored procedure be written to parse the blob in this manner?
> >
> > Yes, it could be; but I would have the strongest doubts that it
> > could be faster than direct DML from the client using a prepared
> > parameterised statement and applying values to your parameters in a
> > read-from-source/write-to-target loop. To be readable by the engine,
> > your blob would have to be a text blob, so that it could be converted
> > to a string....then you would have to apply some parsing routine to
> > the string and you would *still* need to use a parameterised insert
> > statement (or an equivalent cursor operation) for each block of four
> > values you extracted.
> >
> > On the other hand, if the source of the data were a text file that
> > you had formatted as fixed-length records, there could be a major
> > performance advantage to making this file an external table and
> > processing the entire thing in a stored procedure. We just don't
> > have any information from you about where your data is coming from.
> >
> > >Basically I want to pass in an arbitrarily sized blob (I can pass in
> > >size as a parameter also if necessary...) and break it up into the 8
> > >byte blocks (32 bytes per row = 8bytes * 4 values) that actually
> > >represent the data to be stored in the database.
> >
> > Expensive, compared to a parameterised DSQL statement.
> >
> > >OR - Can I pass in the values as text and parse it in the same
> > > I could convert the 64bit numbers to 20 bytes of text and grab the
> > >numbers off in 20 byte chunks if necessary.
> >
> > Of course you could do something like that if it were needed. But to
> > me it looks like a whole heap of unnecessary processing and
> > conversion happening on both sides of the interface for no
> particular purpose.
> >
> > Putting aside the obvious (processing a file), let's suppose that it
> > is really true that you have no option but to pre-process the data in
> > your client application. Let's assume you have some loop on the
> > client where each iteration generates the set of values for the
> > record you want to insert.
> >
> > Outside the loop, you begin with the parameterised insert statement,
> e.g.,
> >
> > insert into atable (col1, col2, col3, col4)
> > values (?, ?, ?, ?)
> >
> > First, you start a transaction (which might be automatic with ODBC)
> >
> > Next, you prepare this statement (for which the ODBC driver will have
> > a method that defines the parameter set and prepares the structures
> > that its Prepare call returns from the server).
> >
> > Now, once the statement is prepared, you begin your loop:
> >
> > for each (input instance) ....
> > begin
> > munge the input into variables of the appropriate data type
> > apply each variable to the appropriate parameter (the order is
> > crucial, if your
> > driver doesn't create its own parameter variables; the
> > driver probably has
> > a method that can do the above in one step)
> > execute the statement
> > end
> >
> > Finally, when all of the input has been processed, you commit the
> transaction.
> >
> > Your workflow is not apparent from any of your emails, but get past
> > the belief that you have to invent some kind of hack to perform bulk
> > inserts. It is conventional SQL you are looking at here.
> >
> > From your posting to the ODBC list, I see that you are getting
> > memory exceptions from your attempts so far to do [[ whatever you are
> > doing ]]. You do need to persevere with this...but help yourself AND
> > the ODBC gurus by providing enough information to understand the
> > problem. The doctor can't help if the only information the patient
> > is willing to impart is "I don't feel well today."
> >
> > I suspect that you are getting AVs from referring to unassigned
> > objects. Likely sources of these problems would be unreferenced
> > parameters (if you are using parameters) or an invalidated statement
> > (which happens if you perform a total replacement of the statement
> > inside the loop without re-preparing -- a highly undesirable approach
> > even if you were properly taking care of the state !!).
> >
> > So - provide more information.
> >
> > ./heLen
> >
>