Subject | Re: IBM moves the database goalposts - xml related |
---|---|
Author | Roman Rokytskyy |
Post date | 2004-12-09T22:07:27Z |
> Like what? XML does nothing that a relational DBMS can't do.OO does nothing that relational DBMS can't. Do you claim that OO
concept should be deprecated?
> XPath/XQuery is nothing but patches to something thatWrong. XML introduces a [new?] data model and provides appropriate
> should be used for data exchange. Not storage.
methods for its manipulation.
>> Don't think about relational data. XML is used to store structured andNot really. In relational model you can have only data that conform
>> semi-structured data of a different nature.
> The same with relational databases , you can throw anything at it (a
> whole filesistem for example, movies, ..etc)
one scheme. In XML you can merge two schemes together without
additional efforts and that would be still valid XML document.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<document>
<title>SuperBook</title>
<chapter>
<header>Chapter 1.</header>
<para>
....
</para>
</chapter>
....
</document>
Now your database database is filled with "instances" of that XML files.
Now assume that a manager comes to you and tells that in your database
ofdocuments he wants to add some additional semi-structured
information to some instances, though he cannot define in which
instance what information will be added. That additional information
must be query-able and accessible with standard query language.
How are you going to solve that in relational model? Define a generic
schema? Extend your schema for each case? Design additional database?
In XML you define additional namespace. Now you can have following XML
file:
<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="urn::document">
<auth:author xmlns:auth="urn::author">
<auth:firstName>John</auth:firstName>
<auth:lastName>Doe</auth:lastName>
</auth:author>
<title>SuperBook</title>
<chapter>
<header>Chapter 1.</header>
<para>
....
</para>
</chapter>
....
</document>
And you can add information about the author to one document and
reviews to another one, or you can add both. Application that adds
this information does not need to understand the original data scheme,
but only its own. All applications that worked before with the previos
document structure will continue to work with it - they simply do not
see it. New applications can handle that additional information
together with the main one. Or they just process the information they
understand. And all additional information is available with the "old"
query language, no changes/extensions are needed.
Where do you need this?
For example your email client. If each email is an XML file, its
header and body are represented by standard structure. However
currently attachments in emails are represented by a "universal"
structure that contains only a name of the file, its MIME type and
base64 encoded binary content. Now assume that each attachment is
represented by appropriate XML structure from an appropriate
namespace. Word files would have document title, keywords, etc.
Images would have their size in pixels, resolution, etc. Executables
would have signature. You can add other emails as attachments. And all
this information is available with query language:
//email[attachment/format='application/msword' and
contains(attachment/word:keywords, 'Firebird')]
will select all emails that have attachments in word format which
contains "Firebird" in its keywords. And no MS Word document parser is
needed. (This query will not work, since there is no function
"contains", I invented it for this example, but most XPath/XQuery
implementations allows extending the set of available functions).
How are you going to implement this in relational model without
designing a schema that would handle all available cases? What will
you do with your data model and application if emails with some new
attachment type have to be processed?
Some things that are natural in XML are completely unnatural in
relational model. For example relation between document and author you
have to implement by introducing syntetic keys. If you have two
authors, in relational model you have to add a position field. In XML
it is just there.
You can argue that this is bad, not performant, etc. But nobody argues
that XML should replace relational databases. Just compare apples to
apples, oranges to oranges. XML is completely different data model,
when it is used the same way relational model is used; in this case it
is extremly inconvenient, slow and resource expensive. But when it is
used appropriately it is much more convenient than SQL.
Also Marius and Martijn, considering your replies that XML is only an
information exchange format suggests that you simply do not understand
the applicability of the XML. That is similar to the situation where I
would claim that triggers, SPs, referential integrity are not needed,
since J2EE container managed persistence specification (also JDO
specification, Hibernate etc) does not support it.
Roman