XML is a markup language for documents containing structured information.
Structured information contains both content (words, pictures, etc.) and some indication of what role that content plays (for example, content in a section heading has a different meaning from content in a footnote, which means something different than content in a figure caption or content in a database table, etc.). Almost all documents have some structure.
A markup language is a mechanism to identify structures in a document. The XML specification defines a standard way to add markup to documents.
Thus, XML is still a complex data type but not a data model. XML has been designed for maximum expressive power, maximum teach ability, and maximum ease of implementation. The language is not backward-compatible with existing HTML documents.
XML derives from a philosophy that data belongs to its creators and that content providers are best served by a data format that does not bind them to particular script languages, authoring tools, and delivery engines but provides a standardized, vendor-independent, level playing field upon which different authoring and delivery tools may freely compete.
This is its main development direction. The first step in this long journey is SQL/XML hybridization. Starting with SQL Server 2000, Microsoft began to provide support for XML data. This support has been significantly extended for SQL Server, allowing for XML data columns, XML variables, and XML indexes.
This new software species did emerge and did popularize the notion of persistence -- that is, the capability of storing and retrieving programming-language objects without arduous translation to and from relational tables.
According to Microsoft, the forthcoming Yukon edition of SQL Server will be capable of persisting .Net objects. The first step in the long journey of SQL/XML hybridization was to publish relational data as XML.
Imagine a purchase order flowing through a business process some time in the future. It's an XML document, created with a tool such as InfoPath, carrying a mixture of core data and contextual metadata. The core data, including the item number and department code, will wind up in the columns of a relational table. The contextual metadata, which might include a threaded discussion made from comments injected by the requester, the reviewer, and the approver, will remain in document form.
"This human context is never stored in the RDBMS today," says Kingsley Idehen, CEO of Burlington, Mass.-based OpenLink. Yet it's the key to understanding how the data got there and what it means.
Once written, the purchase order is injected into a workflow orchestrated on top of a Web services network. A security service may enforce authorization policy by updating a SOAP header; a choreography service may search for sets of documents that have SOAP headers that contain the same correlation ID. These active intermediaries will need some kind of database technology to manage the XML that lives transiently in their queues, but it probably won't be a job for Oracle or DB2.
Here a specialized XML database, such as Software AG's Tamino or Sleepycat Software's Berkeley DB XML may be better suited to the task. They're fast and, as Mike Champion, senior R&D advisor at Software AG in Darmstadt, Germany notes they're built to work well with dynamic XML documents even when those documents lack the schemas the RDBMS SQL/XML mappers rely on.
During the workflow and after it has been completed, the document will be accessible to interested parties via a certain URL. That URL might resolve to a projection of the document -- from a hybrid SQL/XML RDBMS, to an intranet Web server or a WebDAV repository such as Oracle's. Alternatively, the URL might resolve to the underlying instance of the document stored natively in the RDBMS. Either way, the state of the business process -- both core data and contextual metadata -- will be visible at all times to anyone who's interested in looking at it and is authorized to do so. What's more, both flavors of data carried in the document will be accessible to queries that reach across the enterprise, joining SQL and XML sources to create consolidated views.
A major shift in the style of enterprise data management is under way, and there are huge architectural issues yet to be resolved. Oracle, not surprisingly, wants you to store everything in a centralized hybrid DBMS. IBM says it would rather enable you to federate data across a range of sources. Each strategy has merit, and most enterprises will wind up pursuing both -- in different ways, for various reasons.
While storing XML data in the database is a terrific feature, the ability to format the XML data as relational data is essential for a large majority of data processing needs. This is where OPENXML enters the equation. OPENXML is a SQL Server function, which accepts a stream of XML data and provides an in0memry row set view of the XML data.
XML specifies neither semantics nor a tag set. In fact XML is really a meta-language for describing markup languages. In other words, XML provides a facility to define tags and the structural relationships between them. Since there's no predefined tag set, there can't be any preconceived semantics. All of the semantics of an XML document will either be defined by the applications that process them or by style sheets.
Since XML is a key part of the Office 2007 story, the MS Office team will have a big presence at the Microsoft booth demoting the just-released product and discussing the OpenXML format. People from the Data Programmability / XML team will also be in the booth to demonstrate some upcoming technology, including the XML Schema designer that Stan's team is developing, the LINQ to XSD technology that Ralf's team just previewed, and we can show you the underlying LINQ to XML API that will be released in the next version of Visual Studio in action. Please stop by, let us show you our stuff, and by all means let us know if you share our enthusiasm and want to help take XML technology to the next level.
When working with XML-based applications, developers often finds themselves facing the requirement to generate XML-encoded data structures on the fly, There are functions in PHP’s DOM API, showing how to programmatically generate a complete well-formed XML document from scratch and save it. This is a very spectacular development direction.