Subject Modelling XML data in a Firebird database
Author Matthias Hanft
Hello,

I'm getting a daily electronic account statement from my bank (format
'MT940' for those of you who deal with that kind of data, too). Amongst
others, the account statement includes all money transfers from my
customers which I write into a Firebird database - each money transfer
information consists of just a few well-defined data fields which can
easily mapped to an SQL data row (including fields like name/account
number/bank code of sender, amount, reason for payment etc.).

While the new SEPA bank standard is currently introduced, a bank
customer can switch his electronic statement from 'MT940' to
'camt.053' format. 'camt.053' is a hierarchical XML file, where
many, many, many fields are defined (from which about 10% are
currently used, though). Depending on which bank you are
using and from which other bank the money transfer from your
customer comes, you can't foresee at all which possible fields
are used and which are not.

There *is* some way to convert some camt.053 fields into the
old MT940 fields (and, consequently, keep the old FB database
"as is"), but you'd loose some important information then -
so you don't really want to strip the new format down to the
old one.

Thinking about a new database scheme for camt.053, the following
comes to my mind:

a) provide database fields for *all* defined XML fields (which
would lead to 90% NULL fields and make the whole database
somewhat unmanageable)
b) provide just database fields for already used XML fields and
subsequently add new fields when banks start to use new XML
fields (which would require slight database and client changes
each time)
c) create a unique ID for each money transfer and store an open-
end "key"-"value" pair for that event (with the event ID as
foreign key), separating hierarchical field names e.g. by a
slash ("Document/Stmt/Ntry/BookgDt" for the "booking date of
the entry of the statement of the document"). Would be pretty
universal (and future-proof), but could just be implemented
with universal string data types as values, even if the value
is an amount of money or a date (so it's hard to search events
'between' some dates or get all money transfers above a certain
amount or something like that)
d) store the whole XML account statement "as-is" in a text blob;
retains *all* information without any loss (and would be future-
proof as well), but can't be searched at all (and might be slow,
and complicated to handle)
e) any other idea?

What do you suggest, thinking about flexibility, speed, and space?

For the moment, I tend to model c), maybe extending a key-value row
to "key-integervalue-stringvalue-datevalue-currencyvalue" and just
filling the approprate value field, leaving the others NULL. And/or
using an additional "type" field to access the specific value field
directly. (On the other hand, from the key name, it should be possible
to deduce the type automatically - this would have to be done while
filling the database anyway.)

Thank you for your advice - and for reading until here :-)

-Matt