Subject | Re: How can I skip an Insert ? |
---|---|
Author | Adam |
Post date | 2005-08-14T06:25:05Z |
Hi Stefan,
Something about this post was troubling me, and it has just clicked.
If I understand you, your logic is something like the following. You
do the following before inserting for example ID 5:
select 1
from tableA
where ID = 5
Then if 1 is not returned, you assume it is safe to insert.
insert into TableA (id, field1, field2) values (5, 6, 7);
This does not work because the query is run from within the context of
a transaction. That means the following could happen:
Start Tx1
Start Tx2
Tx2: insert ID 5
Tx1: check if ID 5 exists --> no it doesn't because Tx2 hasn't committed
Tx1: insert ID 5
Until Tx2 commits, no other transaction even knows that value exists,
so how can it accurately decide the value is allowed to be inserted?
In this case, assuming there is a PK on the ID field, there would be a
PK violation when TX1 attempted to insert, even though there is no
record that TX1 can see for that ID. If you only ever had one
transaction, you may get away with such logic, but then there is the
second performance penalty.
When Tx1 runs
select 1
from tableA
where ID = 5
The optimiser notices that there is an index on that field with
perfect selectivity, so it looks at that. It then identifies the
record with ID 5, but because your transaction may not be able to see
that record, it needs to then locate the record on the data page. On
visiting the data page, it realises that you are not to know of this
record because it belongs to an uncommitted transaction, so it returns
no records. So Firebird has just run a useless query, wasting its
resources and your time because it made the "wrong" conclusion.
Because of this, Firebird can not take your word that it is a unique
value. If the field is declared unique, Firebird needs to be itself
convinced it is unique.
This is where a unique constraint differs. Internally, it is handled
as an index. Imagine the following scenario
Start Tx1
Start Tx2
Tx2: insert ID 5
Tx1: insert ID 5
This time Tx1 does not even try to run a query to check the existance
of the record. Because there is a unique constraint on ID (implicit
because it is the primary key), Firebird will look at the index on
that field and from what I can gather should be able to immediately
judge whether there is a matching record. Actually it may need to
check the data page the check it hasn't been deleted by a committed
transaction, but in any case, you are not doing the same work twice.
If there is, an exception is raised on the insert statement. From your
client program, you simply check what the exception is (PK Violation
in this case), and then decide on the behaviour.
Exceptions allow you to clearly write out what your program should be
doing, and to deal separately with the unusual (exceptional) cases.
Thats not to say you should deliberately run a query you know will
give you a PK violation etc. A good DBMS should be able to maintain
"consistency", so IMO ideally it should not be up to the programmer to
remember what valid data looks like. If you let them store it in your
table, it must be valid. If it is not valid, do not let them store it
in your table. It makes for far fewer bugs when UIs do not respond so
well to "strange" data.
Adam
Something about this post was troubling me, and it has just clicked.
If I understand you, your logic is something like the following. You
do the following before inserting for example ID 5:
select 1
from tableA
where ID = 5
Then if 1 is not returned, you assume it is safe to insert.
insert into TableA (id, field1, field2) values (5, 6, 7);
This does not work because the query is run from within the context of
a transaction. That means the following could happen:
Start Tx1
Start Tx2
Tx2: insert ID 5
Tx1: check if ID 5 exists --> no it doesn't because Tx2 hasn't committed
Tx1: insert ID 5
Until Tx2 commits, no other transaction even knows that value exists,
so how can it accurately decide the value is allowed to be inserted?
In this case, assuming there is a PK on the ID field, there would be a
PK violation when TX1 attempted to insert, even though there is no
record that TX1 can see for that ID. If you only ever had one
transaction, you may get away with such logic, but then there is the
second performance penalty.
When Tx1 runs
select 1
from tableA
where ID = 5
The optimiser notices that there is an index on that field with
perfect selectivity, so it looks at that. It then identifies the
record with ID 5, but because your transaction may not be able to see
that record, it needs to then locate the record on the data page. On
visiting the data page, it realises that you are not to know of this
record because it belongs to an uncommitted transaction, so it returns
no records. So Firebird has just run a useless query, wasting its
resources and your time because it made the "wrong" conclusion.
Because of this, Firebird can not take your word that it is a unique
value. If the field is declared unique, Firebird needs to be itself
convinced it is unique.
This is where a unique constraint differs. Internally, it is handled
as an index. Imagine the following scenario
Start Tx1
Start Tx2
Tx2: insert ID 5
Tx1: insert ID 5
This time Tx1 does not even try to run a query to check the existance
of the record. Because there is a unique constraint on ID (implicit
because it is the primary key), Firebird will look at the index on
that field and from what I can gather should be able to immediately
judge whether there is a matching record. Actually it may need to
check the data page the check it hasn't been deleted by a committed
transaction, but in any case, you are not doing the same work twice.
If there is, an exception is raised on the insert statement. From your
client program, you simply check what the exception is (PK Violation
in this case), and then decide on the behaviour.
> I too was wondering whether I should make this exist test at all. MyTo the contrary, unhandled exceptions are bad programming habits.
> train of thought: The database is doing exactly the same test before
> it inserts the row anyway so I'm doing redundant work and thus waste
> time. However, I disliked the idea to always try to add a new entry
> and catch the error if it already exists. Usually raised errors are
> quite time consuming and in general I believed so far this is a bad
> programming habit.
Exceptions allow you to clearly write out what your program should be
doing, and to deal separately with the unusual (exceptional) cases.
Thats not to say you should deliberately run a query you know will
give you a PK violation etc. A good DBMS should be able to maintain
"consistency", so IMO ideally it should not be up to the programmer to
remember what valid data looks like. If you let them store it in your
table, it must be valid. If it is not valid, do not let them store it
in your table. It makes for far fewer bugs when UIs do not respond so
well to "strange" data.
Adam