Subject Re: [firebird-support] Storing Object Data
Author unordained
---------- Original Message -----------
From: "Steve Miller" <Steve_Miller@...>
> I am working on an application that has object data mapped to tables.
> The mapping was done many years ago, before the likes of Hibernate came
> along. It basically maps a class to a table. Object properties with
> multiple values get their own tables. There are other interesting
> mappings, but that's the overall concept.
>
> We have a guy that has been doing research on an alternate approach.
> He's storing the object data in an XML string. The XML string is stored
> in the only field of a table. The table is the only one in the system.
> He's reporting huge performance increases with this approach.
>
> Have any of you heard of anything like this? What do you see are the
> pros and cons of such an approach?
>
> Steve Miller
> Language Software Development
> SIL International
------- End of Original Message -------

If you never need to use the database to manipulate the data, that's fine. Treat it as an "opaque"
field -- it's text, but you don't need to know what's inside. You get a transactional storage
system for "stuff"; it's kind of a waste of a good database engine, but it's not a violation of any
rules, and it's not dangerous. Relational databases don't care if you don't use their features.

If you need to run reports or queries off of this data, find another solution. Not only will you
have trouble with speed (in terms of indexing), you'll have problems with even querying the data
properly; I suppose you could load up some UDFs that let you run XQuery against a blob field,
but ... is that really a good idea? XQuery is slow, compared to searches on a proper relational
database. It's fine enough for small documents, one at a time, but that just doesn't scale when
you're searching thousands of objects looking for a few that match a specific pattern.

My wife worked with some guys who stored their data as XML the way you indicate; they wound up
writing code to always insert both raw XML, and a few choice fields too, to make reports easier,
then complained about the effort involved in keeping fields and XML synchronized, adding new fields
to the XML (and therefore needing to back-fill relational fields for old rows), etc. They hadn't
ever really seen a database "fly" before, so they assumed that they were getting good performance
when they ran "queries" that pulled every single BLOB from the database, parsed it into a DOM, ran
a query against it, and either accepted or rejected the "row". They had nothing better to compare
it to, so they never knew what they were missing.

If you're finding that putting objects back together from the database is slow, maybe consider
whether or not you could load several objects at once, with one query, and get some benefits from
doing more at once? If the current process is something like "get an id, then fetch fields from
each table, for that id", and it's often called in a loop, look for ways to do your joins and pull
data back in datasets all sorted by the id; that way you can do control-break (master/detail)
processing to rip through the datasets in parallel and re-assemble all the objects at once,
assuming they're of similar type.

If you're looking for raw speed of saving and loading "objects" (of unknown contents), don't use
XML; it may be faster than doing a bunch of joins to put one object at a time back together from a
few dozen tables, but even faster would be a binary representation of some sort; look at java's
object serialization, for example. If not java, maybe look into CORBA, see if some of its
serialization technology could help you. XML is wasteful.

XML is intended as a data-interchange format, particularly for B2B, not a data storage format. The
whole point was that it was self-documenting, human-readable data, in a general-purpose format that
libraries would assist with: that way your programmers on each end of the connection can look at it
and have some clue how to consume and produce the stuff when talking to each other, and they don't
have to code string parsing routines by hand to do it. But that doesn't mean it's the perfect
universal data storage format for objects.

Also, if you're often saving and loading individual objects like this ... are you perhaps working
in some sort of message-queue environment, where a different product might be better suited to your
needs? Something like ActiveMQ, for example?

-Philip