firebird-support - Re: Ideas anyone?

Subject	Re: Ideas anyone?
Author	Adam
Post date	2005-09-18T08:52:59Z

Tom,

--- In firebird-support@yahoogroups.com, "tomconlon7777777"
<Tomc@2...> wrote:

> Hi Adam,
>
> Thanks for the comprehensive feedback, especially as we've come up
> against similar challenges.
>
> We just could not see any way to perform certain queries soley (and
> adequately) using SQL with or without sp's etc, etc. - hence the need
> for techniques and solutions we are discussing.
>
> Your method is interesting and could possibly be the way for us to go,
> especially as:
> 1. New requirements mean different sorts of sources for the query.
> 2. Ability to issue parts of the query in threads.
> 3. Start returning results to user quicker.
> 4. Allow (what appears to the user) to be a query 'cancellation'.
>
> Q's:
> - Why did you go the memory dataset route rather than internal list
> (TList, TStringlist, etc)?

I would like to say because the TList implementation of sort and
locate would be quite expensive for making the list unique, and
kbmmemtable has a very efficient findkey and recordcount operation.
That might be true, but the truth is that when the original function
was built, the input data we had was in memory tables, so it was the
logical way to do it.

There is nothing stopping you from implementing the exact same thing
inside a TList derivative, but you will want to do some testing to
make sure it delivers adequate performance.

>
> > The other options I have seen and attempted is to insert the records
> > into a holding table with the current transactions ID, then to join
> to
> > that table. The downside to that approach is the number of inserts
> per
> > second you can practically achieve.
>
> This is the method we are currently using. Generally it performed
> well, but nine months ago two new 'master table' columns starting also
> generating IN() criteria that hit the system. Also, one of the sites
> is remote, using a 2k link with increasing numbers of users (30+) at
> that site and this is also stressing the system. Lastly, it puts the
> processing on the server (where it ought to belong!) and may be the
> easiest method for a generic solution (that is also reusable).
>

I believe the insert speed will be somewhere around 1000 records per
second give or take hardware dependant. Add a slow network connection
and this will not be pretty. If your condition was based on say 1000
records, then the query is going to take at least one second longer
before it can run.

> > 50 constants is nothing :)
>
> Ok. Seeing the index used 50 times is just ever-so-little-bit
> disconcerting (!) plus in actual fact, a user-defined area/territory
> feature is sometimes generating maybe in the hundreds.
>
> > To save you some time, here are some important things you should
> know
> > about the IN predicate.
> >
> > One index hit per member:
>
> Q's:
> - Is this behaviour changed in any of the future versions of FB?
> (2.0, 3.0?)
> - Is this behaviour usual?
> (i.e do other RDBMS's have the same challenge to overcome)

I don't know but I imagine it is the same. Unless you are not planning
to release until after this time, that is how it works now.

>
> >
> > 64K query size
> >
> > From memory this will kick in somewhere between 12000 and 13000
> > members in the IN. (and remember that is a lot of unnecessary hits
> on
> > the index).
>
> > This is how we solved this problem...
> > We have a function along the lines
> > DatasetToConditionClause(ds : TDataset; ...
>
> Q's
> - Are you very happy with your solution?

Yes, it is sort of a best of both worlds, the simplicity of the "IN"
approach but the performance of a better con

> - level of performance achieved?

We did not do it primarily for performance reasons but rather as a
work around for the 1499 limit. One of the slowest links in the chain
will be network comms, and let me assure you that the solution
generates a smaller SQL statement for any real input.

If you had 1000 members in the IN predicate, then you are looking at a
statement that is at least a few KB larger than the equivalent set of
between statements.

I will do some measurements when I am next in front of my development
system.

> - aprox. amount of data being queried?
>

It varies from one or two records to over ten thousand records. Again,
we would never bothered such an elaborate solution if it wasn't used
over 50 times within our application in various capacities.

Adam