Subject Vulcan statement cancel operation
Author Tom Cole
In our use of Vulcan as an embedded database, we need to be able to offer our users the ability to cancel long-running operations. For us, the logical unit of work to cancel is a statement, since all the heavy-lifting is done by DSQL. As such, we have implemented the following in our instance of Vulcan, and I'm posting this to ask if it is generally useful enough to push to the open source code base.

In particular, since we use Vulcan exclusively in an embedded environment, there may be issues here that I don't know about that would effect conventional remote usage scenarios. We embed Vulcan within our own federated data server, so some functions are handled in our architecture differently than a "pure" Vulcan server.


Here's the entry definition as implemented in why/entrypoints.cpp:


ISC_STATUS isc_dsql_cancel_statement( ISC_STATUS* userStatus,
DsqlHandle * dsqlHandle,
USHORT * flags );


Where

userStatus Standard status vector mechanism
dsqlHandle A handle to a previously allocated statement
flags New flags setting. Previous flags returned.

This implementation assumes that a thread in the server process makes the determination that a cancel is required (presumably responding to a client request, or perhaps an administrative function, etc.) and invokes the cancel function to request that the statement be aborted. This requires that the statement handle be known to the canceling thread, even though it is possible that the statement was created and is being used by another thread. In fact, that's the only really interesting case for the embedded server, I think. In our case, a "watcher" thread handles administrative requests for the server, including handling cancel requests.

The flags parameter is used to control the state of the abort operation. When the call is made, the flags parameter is returned as the *previous* value of the flags setting.

The flags have two bit-field values. The values are arbitrary.

req_stmt_abort = 1
req_stmt_aborting = 8

Passing in the req_stmt_abort flag initiates the abort request. The returned value of flags will tell the caller if an abort has already been requested, and if the abort has been acknowledged by the engine for the statement and an "unwind" is in progress.

Note that the abort operation isn't instantaneous; it requires carefully chosen polling operations within the engine to examine requests to see if the associated statements have an abort request. As such, the cancel operation returns more-or-less immediately, and any active call on the same statement handle will eventually return with an isc_aborted error.

Because of the asynchronous nature of abort operations, the statement stays in an "aborted" state until explicitly reset with a new call that clears the isc_stmt_abort flag. As long as that flag is set, subsequent calls on the statement for anything except release will return the isc_aborted error code. The general usage pattern is that when a thread gets the isc_aborted return code, it is responsible for any cleanup. If (depending on what it was doing) it thinks it can reasonably re-use the statement (by doing a fresh prepare, for example) then it can just clear the flag. If the statement is not usefully restartable (mid-stream in executing an ad-hoc query) then it may choose to just release the statement and be done with it.

A key element here is that the server must know how to use this functionality correctly to avoid problems from re-use of statement handle values. In our case with an administrative watcher thread, the statement handle is "registered" in such a way that the administrative thread knows what statement handle is being cancelled for what active connection, etc. The client-server logic in Vulcan is likely sufficiently different from our server design that I don't presume to offer an idea of how this would be used in the Vulcan scheme for connection management, etc.

As an aside, we've also implemented code that lets us set a configuration option that will "simulate" aborts. That is, you indicate how many polling attempts you want to occur before an abort is triggered artificially in the code. We've used this to test the correctness of the unwind logic and how fast an abort is responded to given various interesting cases (stored procedures were interesting, for example).

We've also used it when a user complained of incorrect behavior when an asynchronous abort occurred, to recreate (via rather brute-force empirical testing) the approximate moment in processing when the abort was generated, so we can debug with a more deterministic abort event.

A future enhancements that we've talked about but not yet implemented allows the return code from the aborted statement to tell the owner if it's safe to restart the operation or not. This would be most relevant while returning large result sets; if you abort the operation with an incomplete result set, is it possible to "restart where you left off"? This would largely depend on the state of the request(s) when the statement was aborted. Needless to say, this one is harder and we haven't solved it yet.

Because the running request has to poll a bit flag to determine if it should abort, we were concerned about performance. However, so far the cost of doing the polling has resulted in no statistically significant performance cost. The polling is frequent enough that most queries respond to the abort request within a fraction of a second of the request bit being set on a typical test configuration. Sorts within stored procedures returning results to a SELECT are slightly slower to notice and unwind, but we're working on that now.

I can go into more detail about where we've put the polling points and what changes were required to link requests to statements and such. But before going any further, I wanted to find out if (a) this is appropriate for Vulcan, and (b) if you'd like to know something specific about it before I start pushing the code.


--
Tom Cole, Platform R&D, SAS Institute Inc.
Tom's Cartesian Corollary: "I am nothing, if not considerate."
*POOF*