Subject | Re: [Firebird-Architect] ORs in Where statement |
---|---|
Author | Jim Starkey |
Post date | 2003-06-13T16:01:08Z |
We have been discussing two questions here. The first concerns the order and
completeness of sub-expression evaluation in the boolean operators AND and or.
The second is whether or not it is permissible for a database product to
support
two or more mutually incompatible sets of semantics. I will address each.
Boolean expressions are used at least three places within the Firebird data
manipulation language. The first use is record filtering -- restricting sets
of records. The second use is in an IF statement to control execution
flow control. The final use is in a validation statement to determine the
validity of a prospective record update.
I hope we can agree that the semantics of the three usages should be
consistent. If you disagree, please raise your hand now so we can settle
this quickly.
The argument has been made a boolean sub-expression may have a beneficial
effect, say logging, and incomplete evaluation may lead to inconsistent
results. Unfortunately, his argument precludes optimization. Firebird, like
every SQL database, performs index retrievals to pre-select records before
applying a boolean filter. If it were necessary to guarantee consistent
evaluation of a boolean expression, it would not be possible to perform
index operations, it would be necessary to generate the full cross product
during join evaluation, and performance of the relational model becomes
untenable. Specification of boolean sub-expression semantics is
incompatible with record retrieval optimization.
If it is not possible to specify sub-expression evaluation order for
record filters, it is unreasonable and inconsistent to specify them
for execution flow control and record validation.
This brings us to the second question, should a database system
support a number of mutually incompatible semantics? There are
a number of ways to get into this sort of trouble. The most
straightforward is that users sometime build dependencies on non-
architectural artifacts of implementation that change from time
to time. Another is the that SQL specification itself has changed.
The most difficult to handle is an incompatible change necessary
to bring a product into compliance with the published standard.
In each case it's tempting to build in a "compatibility mode".
And in each case, the temptation should be resisted, or if
absolutely necessary, bounded. Branching semantics within a
product in unmaintainable. If a compatibility mode is unavoidable,
the following rules should apply:
1. The default must be the architecturally defined behavior. It
is not permissible to have settable build, installation, or
configuration options. A program operating over an API must
see a consistent environment to operation.
2. New features should never, ever, but added to compatibility
modes. Compatibility modes exist for one reason only, which
is backwards capability. Compatibility mode should be seen
as a bridge and not as an option.
3. A compatibility mode should have a clear sunset, at most a major
version. If someone wants to upgrade, they must accept the
responsibility to bring their application in compliance.
4. A compatibility modes don't branch. The purpose of a compatibility
mode is to mimic the behavior of a system at a specific point in
time, not to chose among a menu of options.
5. Compatibility modes should be limited to changes of explicit
specification. If a user depended on an non-architectural
artifact, he shouldn't have done that, and it's his problem.
It's hard, but it's happened a hundred thousand times in the
short history of computing, and that's the way it has to be.
The current Firebird implementation of complete evaluation of boolean
sub-expressions is the result of an error of reasoning; it does not
provide the benefit intended. It should be eliminated from product
and not retained as a build, installation, or configuration option.
Jim Starkey
completeness of sub-expression evaluation in the boolean operators AND and or.
The second is whether or not it is permissible for a database product to
support
two or more mutually incompatible sets of semantics. I will address each.
Boolean expressions are used at least three places within the Firebird data
manipulation language. The first use is record filtering -- restricting sets
of records. The second use is in an IF statement to control execution
flow control. The final use is in a validation statement to determine the
validity of a prospective record update.
I hope we can agree that the semantics of the three usages should be
consistent. If you disagree, please raise your hand now so we can settle
this quickly.
The argument has been made a boolean sub-expression may have a beneficial
effect, say logging, and incomplete evaluation may lead to inconsistent
results. Unfortunately, his argument precludes optimization. Firebird, like
every SQL database, performs index retrievals to pre-select records before
applying a boolean filter. If it were necessary to guarantee consistent
evaluation of a boolean expression, it would not be possible to perform
index operations, it would be necessary to generate the full cross product
during join evaluation, and performance of the relational model becomes
untenable. Specification of boolean sub-expression semantics is
incompatible with record retrieval optimization.
If it is not possible to specify sub-expression evaluation order for
record filters, it is unreasonable and inconsistent to specify them
for execution flow control and record validation.
This brings us to the second question, should a database system
support a number of mutually incompatible semantics? There are
a number of ways to get into this sort of trouble. The most
straightforward is that users sometime build dependencies on non-
architectural artifacts of implementation that change from time
to time. Another is the that SQL specification itself has changed.
The most difficult to handle is an incompatible change necessary
to bring a product into compliance with the published standard.
In each case it's tempting to build in a "compatibility mode".
And in each case, the temptation should be resisted, or if
absolutely necessary, bounded. Branching semantics within a
product in unmaintainable. If a compatibility mode is unavoidable,
the following rules should apply:
1. The default must be the architecturally defined behavior. It
is not permissible to have settable build, installation, or
configuration options. A program operating over an API must
see a consistent environment to operation.
2. New features should never, ever, but added to compatibility
modes. Compatibility modes exist for one reason only, which
is backwards capability. Compatibility mode should be seen
as a bridge and not as an option.
3. A compatibility mode should have a clear sunset, at most a major
version. If someone wants to upgrade, they must accept the
responsibility to bring their application in compliance.
4. A compatibility modes don't branch. The purpose of a compatibility
mode is to mimic the behavior of a system at a specific point in
time, not to chose among a menu of options.
5. Compatibility modes should be limited to changes of explicit
specification. If a user depended on an non-architectural
artifact, he shouldn't have done that, and it's his problem.
It's hard, but it's happened a hundred thousand times in the
short history of computing, and that's the way it has to be.
The current Firebird implementation of complete evaluation of boolean
sub-expressions is the result of an error of reasoning; it does not
provide the benefit intended. It should be eliminated from product
and not retained as a build, installation, or configuration option.
Jim Starkey