Subject | Re: [firebird-support] Generating the WHERE clause |
---|---|
Author | Andy Canfield |
Post date | 2007-02-22T05:03:32Z |
I think I have it working....
These are live examples; ADAMSFARMER stores permanent information about
our farmers such as their addresses, ANNUALFARMER stores annual data
about each farmer such as this year's quota, and a CHANGWAT is a Thai
province ("state").
This is a summary of the information that the user has asked for, output
by the program itself:
For ANNUALFARMER SEASONIDENT = 2006
count distinct ANNUALFARMER FARMERIDENT
sum of ANNUALFARMER QUOTAKILO
sort by CHANGWAT ENGLISHNAME
Based on that, and a fairly detailed description of the database, the
program generates this SQL statement:
select CHANGWAT.ENGLISHNAME,
count(distinct ANNUALFARMER.FARMERIDENT),
sum(ANNUALFARMER.QUOTAKILO)
from CHANGWAT, ANNUALFARMER, ADAMSFARMER
where ANNUALFARMER.ADAMSFARMERIDENT = ADAMSFARMER.ADAMSFARMERIDENT
and ADAMSFARMER.CHANGWATIDENT = CHANGWAT.CHANGWATIDENT
and ANNUALFARMER.SEASONIDENT=2006
group by CHANGWAT.ENGLISHNAME
order by CHANGWAT.ENGLISHNAME
The selected fields, group by, and order clauses by were of course
trivial. The tricky part was getting the software to search the database
description to discover that ANNUALFARMER and CHANGWAT can be linked
together through the intermediate table ADAMSFARMER.
It is done by building a network of tables and links between tables,
expanding that network until the network is fully connected, then
discarding any tables that are not necessary to maintain that
connection.
No, sorry, the tricky part was doing it in PHP 4. I understand that PHP
5 has much better support for objects.
I did not use JOIN in this initial construction because FROM and WHERE
are independent of the order of the tables, but JOIN requires the tables
to be in sequence by the connections. Now that I have the easier task I
can look into the more difficult JOIN version.
I don't use parameterized queries because I very much <REFWORD2>
statements which are not complete in themselves but have critical
<REFWORD1> elsewhere. IMHO a parameterized query is only useful if it
will be used at least <REFWORD3> times.
<REFWORD1=chunks>
<REFWORD2=dislike>
<REFWORD3=two>
The whole purpose of this exercise is to separate the structure of the
database from the information that the manager must provide. For years I
have gotten management requests for special one-off reports; this query
capability is being created so that management can do many of those
themselves. The structure of the database does not change; the structure
of the query changes every time the manager sits down at the keyboard.
He doesn't want to fit new parameters into an old question, he wants to
ask a new question. The structure of a query is not static if the
question changes in every session.
Indeed, if the user interface is done well then it's easier for him to
ask a question as if it were new than it would be for him to find the
old question in a list. The current code does not even have the
capability of storing a query and re-using it later. If there is demand
I can add that feature. But every feature adds complexity, which
irritates managers trying to use it.
The user interface appears to be unstructured because in my humble
experience managers' minds are unstructured.
Yes, there are large classes of SQL queries which this code cannot
generate. I know that. I can expand it in some ways. I started with
single-table inquries, expanded it to directly connected multi-table
inquries, and have now gotten to connectable multi-table inquiries. But
without natural language parsing it is difficult to get a complex
question into a form that the program can understand, let alone map that
data structure into an SQL statement. And we don't want to accept
queries that are so complex that the manager himsself doesn't understand
what he's asking for. (;>)=
Thank you for your input.
These are live examples; ADAMSFARMER stores permanent information about
our farmers such as their addresses, ANNUALFARMER stores annual data
about each farmer such as this year's quota, and a CHANGWAT is a Thai
province ("state").
This is a summary of the information that the user has asked for, output
by the program itself:
For ANNUALFARMER SEASONIDENT = 2006
count distinct ANNUALFARMER FARMERIDENT
sum of ANNUALFARMER QUOTAKILO
sort by CHANGWAT ENGLISHNAME
Based on that, and a fairly detailed description of the database, the
program generates this SQL statement:
select CHANGWAT.ENGLISHNAME,
count(distinct ANNUALFARMER.FARMERIDENT),
sum(ANNUALFARMER.QUOTAKILO)
from CHANGWAT, ANNUALFARMER, ADAMSFARMER
where ANNUALFARMER.ADAMSFARMERIDENT = ADAMSFARMER.ADAMSFARMERIDENT
and ADAMSFARMER.CHANGWATIDENT = CHANGWAT.CHANGWATIDENT
and ANNUALFARMER.SEASONIDENT=2006
group by CHANGWAT.ENGLISHNAME
order by CHANGWAT.ENGLISHNAME
The selected fields, group by, and order clauses by were of course
trivial. The tricky part was getting the software to search the database
description to discover that ANNUALFARMER and CHANGWAT can be linked
together through the intermediate table ADAMSFARMER.
It is done by building a network of tables and links between tables,
expanding that network until the network is fully connected, then
discarding any tables that are not necessary to maintain that
connection.
No, sorry, the tricky part was doing it in PHP 4. I understand that PHP
5 has much better support for objects.
I did not use JOIN in this initial construction because FROM and WHERE
are independent of the order of the tables, but JOIN requires the tables
to be in sequence by the connections. Now that I have the easier task I
can look into the more difficult JOIN version.
I don't use parameterized queries because I very much <REFWORD2>
statements which are not complete in themselves but have critical
<REFWORD1> elsewhere. IMHO a parameterized query is only useful if it
will be used at least <REFWORD3> times.
<REFWORD1=chunks>
<REFWORD2=dislike>
<REFWORD3=two>
The whole purpose of this exercise is to separate the structure of the
database from the information that the manager must provide. For years I
have gotten management requests for special one-off reports; this query
capability is being created so that management can do many of those
themselves. The structure of the database does not change; the structure
of the query changes every time the manager sits down at the keyboard.
He doesn't want to fit new parameters into an old question, he wants to
ask a new question. The structure of a query is not static if the
question changes in every session.
Indeed, if the user interface is done well then it's easier for him to
ask a question as if it were new than it would be for him to find the
old question in a list. The current code does not even have the
capability of storing a query and re-using it later. If there is demand
I can add that feature. But every feature adds complexity, which
irritates managers trying to use it.
The user interface appears to be unstructured because in my humble
experience managers' minds are unstructured.
Yes, there are large classes of SQL queries which this code cannot
generate. I know that. I can expand it in some ways. I started with
single-table inquries, expanded it to directly connected multi-table
inquries, and have now gotten to connectable multi-table inquiries. But
without natural language parsing it is difficult to get a complex
question into a form that the program can understand, let alone map that
data structure into an SQL statement. And we don't want to accept
queries that are so complex that the manager himsself doesn't understand
what he's asking for. (;>)=
Thank you for your input.
On Tue, 2007-01-30 at 15:34 +1100, Helen Borrie wrote:
>
> Well, for a start, don't use WHERE clauses to specify JOIN
> criteria. That is very ancient SQL that has been deprecated since
> about 1992! Use explicit JOIN syntax and use your WHERE clause only
> for your search predicates, which you should parameterise.
>
> Secondly, try to understand parameterised queries.
>
> Thirdly, review your concept of ad hoc queries to separate the
> *structure* of the set from the values of the search criteria. By
> this means, your user can decide separately the data he wants to
> retrieve and how to order and aggregate it (structure) and the
> criteria for selecting it (search predicates).
>
> Fourthly, and relevant to your example here, is the need to
> understand the rules for aggregation.
>
> With these in your armory, your technique for building the queries
> becomes mechanical and (compared to your totally unstructured
> approach) simple.
>
> The structural side of a query specification is static; the
> selection criteria are defined structurally but parameterising them
> makes them dynamically. You can repeatedly run the same query with
> different values in the search parameters without the need to rebuild
> the query or re-prepare it.
>
> Here's what your example might look like, once you are ready to
> prepare it, and assuming that you're just looking for a total by
> STATE:
>
> SELECT
> STATE.NAME,
> SUM( EMPLOYEE.INCOME )
> FROM state
> JOIN PERSON
> on person.statecode = state.statecode
> JOIN EMPOYEE
> on employee.name = person.name
> WHERE <search clause to restrict the rows considered>
> GROUP BY STATE.NAME
> ORDER BY STATE.NAME
>
> For a given DDL the WHERE clause might have to be
> WHERE EMPLOYEE.NAME = PERSON.NAME
> AND PERSON.STATECODE = STATE.STATECODE
>
> (although one would certainly hope the user had better normalisation
> to work with in a real database!!)
>
> >Can you give me a URL of something that will help me?
>
> A t.y. book on standard SQL and retrieval algorithms? such as "SQL
> Puzzles" (I think it's by Joe Celko). Or just Google for "SQL
> algorithms" and you might well come across some useful tutorials on-
> line.
>
> ./heLen