Subject | Re: [firebird-support] Re: effect of CREATE VIEW not visible to transaction? |
---|---|
Author | unordained |
Post date | 2005-06-07T10:16:29Z |
It's not like I'm under oath not to reveal what I'm doing, so I might as well share. However, the
question has been answered -- no, views are not visible until you start a new transaction. So I'll
just live with it.
Note: it's 3 in the morning, and I'm a bad communicator on a good day.
--Background--
I have a user who's complained about not being able to run her own queries, right? Okay, so first
off, I'm making a query builder for her. Second, we have a lot of queries coming up for various
healthcare-related grants that involve things like "people who match all of the following
conditions" or "people who match two of the following three conditions", etc. Now, if we just had a
few, here and there, I wouldn't mind doing it by hand. But when you've got dozens, and the specs
change constantly, you want an easy way to do things like "none of the following", "all of the
following", etc. on re-usable queries -- it's a time-saver even for programmers.
So I've got it built. My first attempt involved pulling all the data from the database, for each
bit of the query, and doing in-memory joins of id's, pushing stuff back out to the database to do
joins (as the "IN" clause can't take more than 1500 elements, and OR isn't terribly efficient, and
it's nasty regardless, doing an indexed join seemed smart) ... it worked, technically. I think the
dead rabbit I saw on the road is probably faster. Enormous memory usage, lots of network traffic,
generally ... just ... bad. (And BCB5's memory management problems weren't helping. I swear I know
how to use the STL efficiently.)
--How it now works--
So I redesigned it (over the weekend) so each query element would return the name of a view that
expressed its result. It could be a view directly to the data (no cache) or it could be to a
temporary table that contains cached data in the form of id values. (query_id + id). Regardless,
everything uses view names to express stuff. Those names then get piped into components ("all
of", "none of", "some of", "any of", as well as changing-table elements such as "people who have
diseases" -- which returns people id's, but receives diagnosis record id's) and ... so forth.
And it has a really ugly background color.
The point is, this has the potential to be really quite efficient, considering how general-purpose
it is. I can let Firebird optimize statements, so long as I keep everything as the result of views.
If I force things into temp tables, I tie its hands ... but that can't always be safely avoided.
Things like "any of" can be expressed as the union of all the views, "none of" takes a "whole" view
and left-joins to all the other views, and checks to make sure none match (this is a bit slow, my
test case takes about 10 seconds -- if anyone has a better way to do a "NONE" query, I'd be very
interested), "all of" does inner joins between the first view (doesn't matter which) and the rest,
and "some of" either does one of the above, or winds up using a temp table and doing a group by,
counting how many of the views a particular record matched, returning only those id's that match
the correct number ...
(Example of "some of" query:)
People that
- Match 1 to 2 of the following queries:
- Is 65 or older
- Has diabetes
- Lives in our area
- Has had at least two procedures that
- All of:
- Is one of the following codes [.......]
- Any of:
- Was performed this year
- Was performed more than three years ago
(don't ask me why I'd run that -- and yes, they can be very deeply nested ... which is how I found
out that Windows or the VCL has some sort of max-depth on re-parenting controls inside each other,
somewhere around 48.)
I can post a screen-shot if anybody cares, but I doubt it's relevant.
--Views rather than pure select--
Since I'm mostly doing joins, and firebird doesn't yet let me put selects directly in the FROM
clause, and since I'd like to avoid (as much as possible -- I've tried it, and it -usually- slows
me down) selecting each query into the temp table (indexed) and then always joining against the
temp table ... views were just a natural solution? Yes, you could probably rig something where only
one sql statement was built, with every query object touching it, but ... I'm just not smart enough
for that.
(The base class of the query components is capable of selecting the view into a temp table and then
turning around and returning a view to the cached data transparently, though I currently have no
good way to decide whether or not this should be done.)
I have it committing right after creating the views, and it works. Each component takes care of
cleaning up any data it put in temp tables, and all the view names (and names of views they use)
are inserted into a graph so their dependencies can be figured out later. (The query system avoids
recreating views it already has, so there may be multiple dependencies. I figure that all out at
the end.) I just would have prefered not to have to do that -- currently, as I'm prototyping, my
bad code sometimes crashes. And when it does, I wind up with 2, 3, or a dozen view names created
and not destroyed. And I -would- be so smart as to use pointers as unique numbers ... so doing a
bunch of
drop view temp_view_query_33579248;
statements ... gets a little old. (And magically, from one try to the next, my query objects always
seem to get assigned the same memory addresses -- which means I almost always get "view already
exists" and can't procrastinate deleting them.)
But it still would have been handy to auto-rollback those views, particularly when this goes into
production and connections could fail for any reason, leaving the database with random view names
that might collide -- and the client application might not be able to reconnect to delete them.
I'll switch to using a generator for the unique names, to avoid the collisions, but I can't avoid
polluting the namespace over time. And it's a pain to clean up.
--Integrity--
And the view definitions cannont possibly affect database integrity, right? No other transaction
could use them until I committed, and views can't let me do anything to the database that I
shouldn't ... which means ... temporary views should be safe? Normally I would entirely agree about
metadata changes: I'm surprised Firebird lets me make any metadata changes at all without kicking
absolutely everyone offline. It's like changing the rules of the game while you're playing. Except
here, I'm talking about temporary views, which no other user can see, and which can't allow me to
circumvent integrity checks. Which means I'm not changing the rules, I'm just giving myself handy
shortcuts.
Does that answer the question?
(But then, I also ask for things like concurrent sub-transactions, and nobody believes they're a
good idea either ...)
-Philip
---------- Original Message -----------
From: Helen Borrie <helebor@...>
question has been answered -- no, views are not visible until you start a new transaction. So I'll
just live with it.
Note: it's 3 in the morning, and I'm a bad communicator on a good day.
--Background--
I have a user who's complained about not being able to run her own queries, right? Okay, so first
off, I'm making a query builder for her. Second, we have a lot of queries coming up for various
healthcare-related grants that involve things like "people who match all of the following
conditions" or "people who match two of the following three conditions", etc. Now, if we just had a
few, here and there, I wouldn't mind doing it by hand. But when you've got dozens, and the specs
change constantly, you want an easy way to do things like "none of the following", "all of the
following", etc. on re-usable queries -- it's a time-saver even for programmers.
So I've got it built. My first attempt involved pulling all the data from the database, for each
bit of the query, and doing in-memory joins of id's, pushing stuff back out to the database to do
joins (as the "IN" clause can't take more than 1500 elements, and OR isn't terribly efficient, and
it's nasty regardless, doing an indexed join seemed smart) ... it worked, technically. I think the
dead rabbit I saw on the road is probably faster. Enormous memory usage, lots of network traffic,
generally ... just ... bad. (And BCB5's memory management problems weren't helping. I swear I know
how to use the STL efficiently.)
--How it now works--
So I redesigned it (over the weekend) so each query element would return the name of a view that
expressed its result. It could be a view directly to the data (no cache) or it could be to a
temporary table that contains cached data in the form of id values. (query_id + id). Regardless,
everything uses view names to express stuff. Those names then get piped into components ("all
of", "none of", "some of", "any of", as well as changing-table elements such as "people who have
diseases" -- which returns people id's, but receives diagnosis record id's) and ... so forth.
And it has a really ugly background color.
The point is, this has the potential to be really quite efficient, considering how general-purpose
it is. I can let Firebird optimize statements, so long as I keep everything as the result of views.
If I force things into temp tables, I tie its hands ... but that can't always be safely avoided.
Things like "any of" can be expressed as the union of all the views, "none of" takes a "whole" view
and left-joins to all the other views, and checks to make sure none match (this is a bit slow, my
test case takes about 10 seconds -- if anyone has a better way to do a "NONE" query, I'd be very
interested), "all of" does inner joins between the first view (doesn't matter which) and the rest,
and "some of" either does one of the above, or winds up using a temp table and doing a group by,
counting how many of the views a particular record matched, returning only those id's that match
the correct number ...
(Example of "some of" query:)
People that
- Match 1 to 2 of the following queries:
- Is 65 or older
- Has diabetes
- Lives in our area
- Has had at least two procedures that
- All of:
- Is one of the following codes [.......]
- Any of:
- Was performed this year
- Was performed more than three years ago
(don't ask me why I'd run that -- and yes, they can be very deeply nested ... which is how I found
out that Windows or the VCL has some sort of max-depth on re-parenting controls inside each other,
somewhere around 48.)
I can post a screen-shot if anybody cares, but I doubt it's relevant.
--Views rather than pure select--
Since I'm mostly doing joins, and firebird doesn't yet let me put selects directly in the FROM
clause, and since I'd like to avoid (as much as possible -- I've tried it, and it -usually- slows
me down) selecting each query into the temp table (indexed) and then always joining against the
temp table ... views were just a natural solution? Yes, you could probably rig something where only
one sql statement was built, with every query object touching it, but ... I'm just not smart enough
for that.
(The base class of the query components is capable of selecting the view into a temp table and then
turning around and returning a view to the cached data transparently, though I currently have no
good way to decide whether or not this should be done.)
I have it committing right after creating the views, and it works. Each component takes care of
cleaning up any data it put in temp tables, and all the view names (and names of views they use)
are inserted into a graph so their dependencies can be figured out later. (The query system avoids
recreating views it already has, so there may be multiple dependencies. I figure that all out at
the end.) I just would have prefered not to have to do that -- currently, as I'm prototyping, my
bad code sometimes crashes. And when it does, I wind up with 2, 3, or a dozen view names created
and not destroyed. And I -would- be so smart as to use pointers as unique numbers ... so doing a
bunch of
drop view temp_view_query_33579248;
statements ... gets a little old. (And magically, from one try to the next, my query objects always
seem to get assigned the same memory addresses -- which means I almost always get "view already
exists" and can't procrastinate deleting them.)
But it still would have been handy to auto-rollback those views, particularly when this goes into
production and connections could fail for any reason, leaving the database with random view names
that might collide -- and the client application might not be able to reconnect to delete them.
I'll switch to using a generator for the unique names, to avoid the collisions, but I can't avoid
polluting the namespace over time. And it's a pain to clean up.
--Integrity--
And the view definitions cannont possibly affect database integrity, right? No other transaction
could use them until I committed, and views can't let me do anything to the database that I
shouldn't ... which means ... temporary views should be safe? Normally I would entirely agree about
metadata changes: I'm surprised Firebird lets me make any metadata changes at all without kicking
absolutely everyone offline. It's like changing the rules of the game while you're playing. Except
here, I'm talking about temporary views, which no other user can see, and which can't allow me to
circumvent integrity checks. Which means I'm not changing the rules, I'm just giving myself handy
shortcuts.
Does that answer the question?
(But then, I also ask for things like concurrent sub-transactions, and nobody believes they're a
good idea either ...)
-Philip
---------- Original Message -----------
From: Helen Borrie <helebor@...>
>------- End of Original Message -------
> So, like others, I'm really curious to understand why you think it's
> important to be able to create views on the fly...
>
> ./heLen