Subject | Re: Blob Parsing Stored Procedure |
---|---|
Author | lucas_jessee |
Post date | 2007-07-05T12:33:34Z |
I'll let you all under the hood a bit more. Here is some terribly
edited pseudo-C++ code that I created based on my situation. Please
ignore any silly syntax errors as I threw this together quickly. Also
assume that some of the variables (particularly the ones starting with
m_) are class variables and are thus not defined in the scope shown.
The data is in a large vector (size varies greatly....for the example
I described it is about 25000, but I'd say it can be anywhere from
0-200000 easily).
When the connection is initially made (only once), I do this so that
it does not autocommit.
ret = SQLSetConnectAttr(m_dbc, SQL_ATTR_AUTOCOMMIT,
SQL_AUTOCOMMIT_OFF, 0);
Then this function is called and assume it has the vector.
PARENT_ITEM_ID is a FK into the other, larger table. ITEM_TYPE_ID is
an FK into a small type table (ie 1 = "Type One", 2 = "Type Two", etc).
ErrorType dbClass::storeItems()
{
SQLRETURN ret;
SQLCHAR *stmt;
SQLHANDLE hstmt;
UINT64 ParentItemId = m_ID;
UINT32 ItemBool;
UINT32 ItemType = m_Type;
UINT64 ItemIntOne;
UINT64 ItemIntTwo;
SQLINTEGER ParentItemIdSize = sizeof(ParentItemId);
SQLINTEGER ItemBoolSize = sizeof(ItemBool);
SQLINTEGER ItemTypeSize = sizeof(ItemType);
SQLINTEGER ItemIntOneSize = sizeof(ItemIntOne);
SQLINTEGER ItemIntTwoSize = sizeof(ItemIntTwo);
SQLRETURN ret = SQLAllocHandle(SQL_HANDLE_STMT,
m_dbConnect->getDBHandle(), &hstmt);
if(m_itemList.size() < 1)
return SUCCESS;
stmt = (SQLCHAR *)"INSERT INTO MY_TABLE(PARENT_ITEM_ID,
ITEM_TYPE_ID, ITEM_INT_ONE, ITEM_INT_TWO, ITEM_BOOL) VALUES(?, ?, ?,
?, ?);";
ret = SQLPrepare(hstmt, stmt, SQL_NTS);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
ret = SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_UBIGINT,
SQL_BIGINT, 0, 0, &ParentItemId, 0, &ParentItemIdSize);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
ret = SQLBindParameter(hstmt, 2, SQL_PARAM_INPUT, SQL_C_LONG,
SQL_SMALLINT, 0, 0, &ItemType, 0, &ItemTypeSize);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
ret = SQLBindParameter(hstmt, 3, SQL_PARAM_INPUT, SQL_C_UBIGINT,
SQL_BIGINT, 0, 0, &ItemIntOne, 0, &ItemIntOneSize);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
ret = SQLBindParameter(hstmt, 4, SQL_PARAM_INPUT, SQL_C_UBIGINT,
SQL_BIGINT, 0, 0, &ItemIntTwo, 0, &ItemIntTwoSize);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
ret = SQLBindParameter(hstmt, 5, SQL_PARAM_INPUT, SQL_C_LONG,
SQL_SMALLINT, 0, 0, &ItemBool, 0, &ItemBoolSize);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
for(int i=0; i<m_itemList.size();i++)
{
ItemBool = m_itemList[i].itemBool ? 1:0;
ItemIntOne = m_itemList[i].itemIntOne;
ItemIntTwo = m_itemList[i].itemIntTwo;
ret = SQLExecute(hstmt);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
}
SQLFreeHandle(SQL_HANDLE_STMT,hstmt);
return SUCCESS;
}
Then some time later, when the application terminates, this is called.
SQLEndTran(SQL_HANDLE_DBC, myDB->getDBHandle(), SQL_COMMIT);
I worked and researched this quite a bit, I really feel that the above
code is optimal, but feel free to pick apart my theory.
Helen, I don't doubt that my performance issues may be caused by the
ODBC driver itself. The driver seems to have a problem with Embedded
Databases. Unfortunately that's what I'm using in this case. If I
have time (which I don't right now) I might try this same test on a
local super-server database and see if the performance is the same. I
will continue to try to get those guys to answer my questions and
follow up.
Thanks in advance,
Luke
edited pseudo-C++ code that I created based on my situation. Please
ignore any silly syntax errors as I threw this together quickly. Also
assume that some of the variables (particularly the ones starting with
m_) are class variables and are thus not defined in the scope shown.
The data is in a large vector (size varies greatly....for the example
I described it is about 25000, but I'd say it can be anywhere from
0-200000 easily).
When the connection is initially made (only once), I do this so that
it does not autocommit.
ret = SQLSetConnectAttr(m_dbc, SQL_ATTR_AUTOCOMMIT,
SQL_AUTOCOMMIT_OFF, 0);
Then this function is called and assume it has the vector.
PARENT_ITEM_ID is a FK into the other, larger table. ITEM_TYPE_ID is
an FK into a small type table (ie 1 = "Type One", 2 = "Type Two", etc).
ErrorType dbClass::storeItems()
{
SQLRETURN ret;
SQLCHAR *stmt;
SQLHANDLE hstmt;
UINT64 ParentItemId = m_ID;
UINT32 ItemBool;
UINT32 ItemType = m_Type;
UINT64 ItemIntOne;
UINT64 ItemIntTwo;
SQLINTEGER ParentItemIdSize = sizeof(ParentItemId);
SQLINTEGER ItemBoolSize = sizeof(ItemBool);
SQLINTEGER ItemTypeSize = sizeof(ItemType);
SQLINTEGER ItemIntOneSize = sizeof(ItemIntOne);
SQLINTEGER ItemIntTwoSize = sizeof(ItemIntTwo);
SQLRETURN ret = SQLAllocHandle(SQL_HANDLE_STMT,
m_dbConnect->getDBHandle(), &hstmt);
if(m_itemList.size() < 1)
return SUCCESS;
stmt = (SQLCHAR *)"INSERT INTO MY_TABLE(PARENT_ITEM_ID,
ITEM_TYPE_ID, ITEM_INT_ONE, ITEM_INT_TWO, ITEM_BOOL) VALUES(?, ?, ?,
?, ?);";
ret = SQLPrepare(hstmt, stmt, SQL_NTS);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
ret = SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_UBIGINT,
SQL_BIGINT, 0, 0, &ParentItemId, 0, &ParentItemIdSize);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
ret = SQLBindParameter(hstmt, 2, SQL_PARAM_INPUT, SQL_C_LONG,
SQL_SMALLINT, 0, 0, &ItemType, 0, &ItemTypeSize);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
ret = SQLBindParameter(hstmt, 3, SQL_PARAM_INPUT, SQL_C_UBIGINT,
SQL_BIGINT, 0, 0, &ItemIntOne, 0, &ItemIntOneSize);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
ret = SQLBindParameter(hstmt, 4, SQL_PARAM_INPUT, SQL_C_UBIGINT,
SQL_BIGINT, 0, 0, &ItemIntTwo, 0, &ItemIntTwoSize);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
ret = SQLBindParameter(hstmt, 5, SQL_PARAM_INPUT, SQL_C_LONG,
SQL_SMALLINT, 0, 0, &ItemBool, 0, &ItemBoolSize);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
for(int i=0; i<m_itemList.size();i++)
{
ItemBool = m_itemList[i].itemBool ? 1:0;
ItemIntOne = m_itemList[i].itemIntOne;
ItemIntTwo = m_itemList[i].itemIntTwo;
ret = SQLExecute(hstmt);
if(!SQL_SUCCEEDED(ret))
{ cout << m_dbConnect->getStatementError(hstmt) << endl;
return ERROR;
}
}
SQLFreeHandle(SQL_HANDLE_STMT,hstmt);
return SUCCESS;
}
Then some time later, when the application terminates, this is called.
SQLEndTran(SQL_HANDLE_DBC, myDB->getDBHandle(), SQL_COMMIT);
I worked and researched this quite a bit, I really feel that the above
code is optimal, but feel free to pick apart my theory.
Helen, I don't doubt that my performance issues may be caused by the
ODBC driver itself. The driver seems to have a problem with Embedded
Databases. Unfortunately that's what I'm using in this case. If I
have time (which I don't right now) I might try this same test on a
local super-server database and see if the performance is the same. I
will continue to try to get those guys to answer my questions and
follow up.
Thanks in advance,
Luke
--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@...> wrote:
>
> At 11:13 PM 3/07/2007, you wrote:
> >Hello all,
> >I've got a lot of small inserts to do and its just taking much longer
> >than I'm willing to wait for (even with an embedded DB). Basically
> >these inserts are small records (roughly 4 BIGINT's) that have a FK to
> >another larger table.
> >
> >If I'm reading everything correctly, Firebird does NOT support bulk
> >inserts and anything close to it is kind of a hack.
>
> No, that is quite untrue. However, it is not always advisable to
> push huge numbers of inserts through in a single transaction if
> you're fairly limited on resources. It can be a good practice to
> batch your inserts at around 10,000 per transaction.
>
> You don't seem to have provided any comprehensible information so
> far, such as where the data for the inserts is coming from or how
> many records you are processing.
>
> >I was able to
> >create a stored procedure that took 10x the parameters and did 10
> >inserts at once. I saw a decent performance gain from this.
>
> Sorry, I don't understand the statement above.
>
>
> >What I would really like to do is to send a single BLOB to a stored
> >procedure. This blob would be a Binary data structure that would
> >contain N 32 byte blocks for this example. The first 32 byte block
> >would be the first 4 BIGINT's to insert into the table, the second
> >would be the next 4 BIGINT's, etc.
> >
> >Can a stored procedure be written to parse the blob in this manner?
>
> Yes, it could be; but I would have the strongest doubts that it
> could be faster than direct DML from the client using a prepared
> parameterised statement and applying values to your parameters in a
> read-from-source/write-to-target loop. To be readable by the engine,
> your blob would have to be a text blob, so that it could be converted
> to a string....then you would have to apply some parsing routine to
> the string and you would *still* need to use a parameterised insert
> statement (or an equivalent cursor operation) for each block of four
> values you extracted.
>
> On the other hand, if the source of the data were a text file that
> you had formatted as fixed-length records, there could be a major
> performance advantage to making this file an external table and
> processing the entire thing in a stored procedure. We just don't
> have any information from you about where your data is coming from.
>
> >Basically I want to pass in an arbitrarily sized blob (I can pass in
> >size as a parameter also if necessary...) and break it up into the 8
> >byte blocks (32 bytes per row = 8bytes * 4 values) that actually
> >represent the data to be stored in the database.
>
> Expensive, compared to a parameterised DSQL statement.
>
> >OR - Can I pass in the values as text and parse it in the same manner.
> > I could convert the 64bit numbers to 20 bytes of text and grab the
> >numbers off in 20 byte chunks if necessary.
>
> Of course you could do something like that if it were needed. But to
> me it looks like a whole heap of unnecessary processing and
> conversion happening on both sides of the interface for no
particular purpose.
>
> Putting aside the obvious (processing a file), let's suppose that it
> is really true that you have no option but to pre-process the data in
> your client application. Let's assume you have some loop on the
> client where each iteration generates the set of values for the
> record you want to insert.
>
> Outside the loop, you begin with the parameterised insert statement,
e.g.,
>
> insert into atable (col1, col2, col3, col4)
> values (?, ?, ?, ?)
>
> First, you start a transaction (which might be automatic with ODBC)
>
> Next, you prepare this statement (for which the ODBC driver will have
> a method that defines the parameter set and prepares the structures
> that its Prepare call returns from the server).
>
> Now, once the statement is prepared, you begin your loop:
>
> for each (input instance) ....
> begin
> munge the input into variables of the appropriate data type
> apply each variable to the appropriate parameter (the order is
> crucial, if your
> driver doesn't create its own parameter variables; the
> driver probably has
> a method that can do the above in one step)
> execute the statement
> end
>
> Finally, when all of the input has been processed, you commit the
transaction.
>
> Your workflow is not apparent from any of your emails, but get past
> the belief that you have to invent some kind of hack to perform bulk
> inserts. It is conventional SQL you are looking at here.
>
> From your posting to the ODBC list, I see that you are getting
> memory exceptions from your attempts so far to do [[ whatever you are
> doing ]]. You do need to persevere with this...but help yourself AND
> the ODBC gurus by providing enough information to understand the
> problem. The doctor can't help if the only information the patient
> is willing to impart is "I don't feel well today."
>
> I suspect that you are getting AVs from referring to unassigned
> objects. Likely sources of these problems would be unreferenced
> parameters (if you are using parameters) or an invalidated statement
> (which happens if you perform a total replacement of the statement
> inside the loop without re-preparing -- a highly undesirable approach
> even if you were properly taking care of the state !!).
>
> So - provide more information.
>
> ./heLen
>