Subject | Re: [firebird-support] (Repost) Problem with SUM() on a computed column |
---|---|
Author | Helen Borrie |
Post date | 2003-07-18T11:39:36Z |
At 03:26 AM 18/07/2003 +0000, you wrote:
as good evidence that the advice NOT to incorporate dependencies on other
tables into the definitions of computed columns is SOUND and should be heeded.
Here is where the immediate problem lies:
ALTER TABLE X
ADD AMOUNT COMPUTED BY
((SELECT (SELECT COALESCE(SUM(AMOUNT), 0)
FROM Y
WHERE DATETIME = X.DATETIME AND
ISRETURNED = 0) -
(SELECT COALESCE(SUM(AMOUNT), 0)
FROM Y
WHERE DATETIME = X.DATETIME AND
ISRETURNED <> 0)
FROM RDB$DATABASE));
The search clause WHERE DATETIME = X.DATETIME is, I think,
problemmatical. This is a "sort of" correlated subquery that "sort of"
works in the column definition because it happens to be in the context of
the current row. In this sense I think it is a bug in the implementation
of COMPUTED BY, insofar as the parser fails to behave consistently with the
ambiguity restrictions introduced in Firebird 1. The parser should have
rejected that expression, but it failed to do so. (I know, because I ran
your script here on 1.4 RC).
However, when you move away from the row-level context of the computed
field and into the domain of the aggregation of all rows in the table, it
stops "sort of" working. You are getting a result which looks like a
cross-join, i.e unpredictable results. From a bug report submitted to
fb-devel today, regarding the failure of CASE constructs to aggregate
correctly, it's probable that these new conditional functions will need
some tidying.
Going back to the principle at issue here, I urge you to reconsider your
design with regard to maintaining this computed column. It's not the first
time I have seen unpredictable results from computed columns having
dependencies on other tables. Firebird SQL has much more robust alternatives.
If you are *lucky*, the worst that happens is that part of your data
actually get structurally corrupted. Then, you will have to repair your
database (possibly at considerable expense) and you will be forced to fix
up the design.
Let's suppose that structural corruption doesn't occur and you decide to
ignore it, because it "sort of" works. Among the Really Bad Things will be
that the engine happily stores hard data from your applications, based on
this unpredictable calculation, and the internal integrity of your data is
wrecked without physically damaging any structures. How will you fix that?
No. 1 Rule for me is "Don't store data that you can calculate for an output
set". I can relax the rule if I a computed column can be used to reprocess
some non-nullable hard data stored elsewhere on the same row to provide a
convenient shortcut. Sometimes, it is convenient to output a string or
some calculated value, that you don't need to index, for use in a report or
display somewhere. Something like
ALTER TABLE X
ADD BIRTH_YEAR
COMPUTED BY (EXTRACT (YEAR FROM BIRTHDATE)),
ADD BIRTHDAY
COMPUTED BY(
CAST(EXTRACT(MONTH FROM BIRTHDATE) AS CHAR2) ||'/]||
CAST(EXTRACT(DAY FROM BIRTHDATE) AS CHAR2));
I won't store summary values as computed columns. Even in regular columns
I won't do it unless it is absolutely unavoidable (and, yes, there are
still some disgustingly awful reporting packages out there that can't
handle aggregations...for these I would still prefer to make a view or a SP
to generate the output sets, in preference to storing summary data...).
Earlier this week I was watching a thread where someone was struggling to
use computed columns with other-table dependencies as a way to denormalise
his entire database - trying to make a Firebird database into a bunch of
spreadsheets. He was strongly advised by several people to reconsider this
design strategy, but it seems he's really serious about it! He actually
put in a feature request to have gbak "fixed" to enable him to make such a
monstrosity restorable.
Keep your stored data abstract. Don't fear joins and correlated
subqueries. They are the arteries of a well-designed RDBMS.
My 0.02c
heLen
>Hi all,Yes, I think so, and I don't think there is a way to make it work. Take it
>
>I have a master (X) and a detail (Y) table.
>Table X contains a timestamp and the sum of
>the amounts of its details. Table Y on the
>other hand, contains detail information such
>as description, quantity, unit price, amount
>and the ISRETURNED column, which indicates
>whether this item is returned (1) or not (0).
>All amounts are positive values.
>
>Now, the AMOUNT fiels of table X is computed
>by subtracting all detail records where
>ISRETURNED = 1 from detail records having
>ISRETURNED = 0.
>
>Using the script at the bottom of this message,
>I executed the ff. query:
>
>SELECT * FROM X;
>
>and yielded...
>
>1/1/2003 3323.75
>2/1/2003 1441.75
>3/1/2003 379
>
>...which is correct. But when I queried the
>SUM() of the AMOUNT column:
>
>SELECT SUM(AMOUNT) FROM X;
>
>I got 9971.25 instead of 5144.5.
>
>Computed fields with simple SUM() selects are
>OK, only selects like the AMOUNT column in
>table X.
>
>Anything wrong with my script?
as good evidence that the advice NOT to incorporate dependencies on other
tables into the definitions of computed columns is SOUND and should be heeded.
Here is where the immediate problem lies:
ALTER TABLE X
ADD AMOUNT COMPUTED BY
((SELECT (SELECT COALESCE(SUM(AMOUNT), 0)
FROM Y
WHERE DATETIME = X.DATETIME AND
ISRETURNED = 0) -
(SELECT COALESCE(SUM(AMOUNT), 0)
FROM Y
WHERE DATETIME = X.DATETIME AND
ISRETURNED <> 0)
FROM RDB$DATABASE));
The search clause WHERE DATETIME = X.DATETIME is, I think,
problemmatical. This is a "sort of" correlated subquery that "sort of"
works in the column definition because it happens to be in the context of
the current row. In this sense I think it is a bug in the implementation
of COMPUTED BY, insofar as the parser fails to behave consistently with the
ambiguity restrictions introduced in Firebird 1. The parser should have
rejected that expression, but it failed to do so. (I know, because I ran
your script here on 1.4 RC).
However, when you move away from the row-level context of the computed
field and into the domain of the aggregation of all rows in the table, it
stops "sort of" working. You are getting a result which looks like a
cross-join, i.e unpredictable results. From a bug report submitted to
fb-devel today, regarding the failure of CASE constructs to aggregate
correctly, it's probable that these new conditional functions will need
some tidying.
Going back to the principle at issue here, I urge you to reconsider your
design with regard to maintaining this computed column. It's not the first
time I have seen unpredictable results from computed columns having
dependencies on other tables. Firebird SQL has much more robust alternatives.
If you are *lucky*, the worst that happens is that part of your data
actually get structurally corrupted. Then, you will have to repair your
database (possibly at considerable expense) and you will be forced to fix
up the design.
Let's suppose that structural corruption doesn't occur and you decide to
ignore it, because it "sort of" works. Among the Really Bad Things will be
that the engine happily stores hard data from your applications, based on
this unpredictable calculation, and the internal integrity of your data is
wrecked without physically damaging any structures. How will you fix that?
No. 1 Rule for me is "Don't store data that you can calculate for an output
set". I can relax the rule if I a computed column can be used to reprocess
some non-nullable hard data stored elsewhere on the same row to provide a
convenient shortcut. Sometimes, it is convenient to output a string or
some calculated value, that you don't need to index, for use in a report or
display somewhere. Something like
ALTER TABLE X
ADD BIRTH_YEAR
COMPUTED BY (EXTRACT (YEAR FROM BIRTHDATE)),
ADD BIRTHDAY
COMPUTED BY(
CAST(EXTRACT(MONTH FROM BIRTHDATE) AS CHAR2) ||'/]||
CAST(EXTRACT(DAY FROM BIRTHDATE) AS CHAR2));
I won't store summary values as computed columns. Even in regular columns
I won't do it unless it is absolutely unavoidable (and, yes, there are
still some disgustingly awful reporting packages out there that can't
handle aggregations...for these I would still prefer to make a view or a SP
to generate the output sets, in preference to storing summary data...).
Earlier this week I was watching a thread where someone was struggling to
use computed columns with other-table dependencies as a way to denormalise
his entire database - trying to make a Firebird database into a bunch of
spreadsheets. He was strongly advised by several people to reconsider this
design strategy, but it seems he's really serious about it! He actually
put in a feature request to have gbak "fixed" to enable him to make such a
monstrosity restorable.
Keep your stored data abstract. Don't fear joins and correlated
subqueries. They are the arteries of a well-designed RDBMS.
My 0.02c
heLen