firebird-support - Re: [firebird-support] Aborting isc_dsql_execute calls from another thread (Win32)

Subject	Re: [firebird-support] Aborting isc_dsql_execute calls from another thread (Win32)
Author	Richard Wesley
Post date	2006-03-23T17:34:04Z

Hi Helen -

Thanks for the reply. Before we go on, I think I should describe the
usage scenario in some detail...

Tableau (the product) is a data visualisation tool for Windows that
connects to a number of SQL and data cube engines. OLAP with SQL is
often slow (which is why data cubes were invented) but many people
still need to do OLAP with SQL engines. For various use cases (off-
line work, hostile user IT departments etc.) we are adding a local
SQL engine (i.e. Firebird 2) to the product. The queries that we
generate are sometimes slow, even for well-maintained 'big iron" back
ends like Oracle, so one of the features of the product is allowing
the user to cancel a query that is slow/incorrect/etc. for any engine
(including FB).

I am currently trying to implement this feature for arbitrary user
queries generated against arbitrary datasets imported into FB. I
have some heuristics for defining indexes that are used when the data
set is created, but an OLAP query on a 6Mrow table is going to take a
while. I would like to know what the most efficient method of
aborting such a query would be.

According to Dimitry, I have to orphan the query because there is no
way to interrupt it. Some empirical testing shows that in a Prepare/
Execute/Fetch* sequence, the lion's share of the time is spent in the
Execute and the first Fetch. So my current strategy is to spawn a
thread to perform the Execute/Fetch sequence while the UI spin-waits
with a cancel dialog in front and to implement abort by leaving the
thread to complete without any further interaction with the
application (orphaning). After orphaning, no subsequent fetches are
made.

My question is really, how well will this approach work for the use
case described and what improvements could be made?

On Mar 22, 2006, at 17:47, Helen Borrie wrote:

>> I guess the question is "Where is all the time spent executing a slow
>> query"? I had assumed that all Fetch did was read a data stream that
>> had already been created. Is that not the case?
>
> No, it's not the case. Fetch gets one row per call. Your program
> has to provide a loop mechanism to fetch multiple rows.

Of course. But it appears from observation that the first Fetch can
be quite expensive.

> If you want
> a user to be able to stop the loop then you will need to build in
> some kind of callback mechanism between the user interface and your
> fetch loop to stop the fetching and roll back the transaction.

Yes, the upper layers can stop reading the stream if they like. But
they have no control over stopping FB from blocking the UI thread for
the query start-up processing.

> You are getting confused between "executing" and "preparing". The
> first time a SELECT query is run, you have to prepare it. How long
> the prepare takes depends on various factors, like the complexity of
> the query and the complexity of the metadata.

That may be. Does this mean that Prepare is also potentially time
consuming?

> Then, once the query is run, the time it takes to send the first
> packet of output depends on factors like the efficiency of the
> statement, the state of the metadata and whether optimal indexes
> are used.
>
> So prepare + an inefficient query could take a lot of time to deliver
> the first row to your FETCH loop. Then it is a question of managing
> your buffers so that you have somewhere to put the output and
> manage it.
>
> Subsequent "runs" of the same query will be faster, as long as you
> keep the query prepared.

Sadly, that is not practical right now. Fortunately, we do cache the
results of queries for all data sources, including FB, so I expect
this is not a problem.

> When a query that ought to be fast seems slow, you should suspect
> your indexes. You can run your query in isql with SET PLAN to get the
> query plan back and see what indexes are being used. You can use
> gstat to look at the stats for those indexes.

We will try to tune index generation during the beta cycle, but the
nature of the product is such that we cannot hand-tune each table/
query/index.

> You can't "abort" something in one thread from another thread. To be
> thread-safe, each thread must have its own distinct "session" with
> the database, viz. connection, transactions and queries. You can't
> "abort" a transaction at all: you can start it, and you can end it
> by committing it or rolling it back. And you can neither commit nor
> rollback the transaction while fetching is still going on. Period.

That is what I was afraid of. It appears that FB2 catches attempts
to do this ;-)

> Once you start fetching from a statement, if you want a means to
> abort the fetching (and thereby make it possible to rollback the
> transaction) then you have to write that code.

Do you mean "inside the FB server code base"?

> You mentioned somewhere that you were "using embedded". You'll need
> to be more explicit about what you mean. "Embedded" in Linux and
> Windows are two entirely different animals.

Windows.

The code sequence is:

// Foreground
m_tr->Prepare( m_stmt, inCmd, m_bindings );
m_stmt->DescribeOutputBindings( m_bindings );
m_bindings->PrepareBuffer();

// Background
m_tr->Execute( m_stmt );
m_hasNext = m_stmt->Fetch( m_bindings );

// Foreground
m_hasNext = m_stmt->Fetch( m_bindings );

As I mentioned, we can abort the Fetch loop easily in the UI, but
aborting the slow preparation of the query is my task. It sounds
from your comments like it would make sense to move the first three
calls into the background thread as well?

TIA,
________________________________________________________
Richard Wesley Software Engineer
Tableau Software
http://www.tableausoftware.com/ hawkfish
tableausoftware com