Electronic Dissertations LibraryXML: the future of web markup?, by Elliot PritchardBENEFITS AND DRAWBACKSIs XML good news for the Web? I have identified some of the key features and implications of XML in order to determine whether each of these should be counted as a benefit or a drawback. After I have finished, a simple count-up of the verdicts should decide the answer to my question.
'Metadata' is data about data. Let us take the example of looking for a specific book in a library. The first thing that we would try to find is the library-record for that book, which would tell us where we might find it. That record is a good example of metadata. It contains details about the book such as title, author, publisher, and ISBN number. XML is another example of metadata. Each set of tags describes the information that it contains. And such metadata can prove very useful. As Rath says, "metadata is added value to the information content itself." A big difference between our example of the library-record and XML is summed up by Freter (b) when he says that "XML markup provides metadata for all components of a document, not merely the object that contains the document itself." This means that XML has the potential to tell us a lot more information about a document than that library-record will be able to tell us about a book. XML provides a way to incorporate metadata into the body of a file, and therefore it can provide us with something very important: context. Why is context important? Hogan gives us an excellent example: "1000 might be a good price, in dollars for a state-of-the-art laptop. However, it would be a very bad number of days required for delivery. As a result, the tag that puts the 1000 in context is critical." In other words, if you have context, then you can have proper comparison: "XML allows consumers to compare apples to apples, rather than apples to oranges" (John Rosenfeld quoted in Oberndorf (1 April 1999)). But surely this is ridiculous: When I visit a web-site it does not just throw figures at me. It tells me what the price is, and it tells me what the delivery-time is. But what about when I am searching? Any web-user knows that searching can be frustrating. Matsumura (February 1998) offers this analogy: "Siphoning information from the Web is, for some of us, an experience akin to drinking from a spewing firehouse." If one were to type 'laptop price 1000' into a search engine, it will retrieve all pages with those keywords in. A page with the details of a laptop that cost 5000 dollars and had a 1000-day delivery time would be retrieved with this method. This is because the search terms used lack context. Reichard tells us that "search engines work by brute force, by indexing key words found in Web documents. There's no context to these searches, and users are forced to either enter a precise search term (and let's face it, most users are not well-versed in the intricacies of Boolean searches) or wade through hundreds of thousands of Web pages." Metadata can help this situation. Rath uses the following example: "The search in Alta Vista for an SGML book written by Mr. Pepper brings more than 200 hits and none of the first twenty hits are useful...[a] search at amazon.com lists the book immediately, but here the search is done in a database with metadata and not in HTML pages." The example of 'amazon.com', an on-line bookseller, is a particularly pertinent one, as it is in the area of e-commerce that XML metadata is most likely to be used. If one wanted to buy a particular book, it would be useful to be able to find the web-site that would sell it to you for the cheapest (be it amazon.com or another site). If there was an industry-specific DTD (Document Type Defintion, which tells you what tags may be used in an XML document) which these sites used, then a search engine familiar with that DTD could search for a specific title by a specific author and tell you where you could find the cheapest price. This is something Reichard indicates when he says that "by storing product information with XML data, it can be more easily categorized by a search engine on your site or on the Internet. Users will find it easier to track down your product listings, with (you hope) the end result a large increase in both sales and satisfied customers." That kind of searching is far more sophisticated, and far more successful, than what we have today on the Web. Glave (15 April 1998) quotes Tim Berners-Lee (creator of the Web) as suggesting that a search engine may one day be able to satisfactorily answer the query "Is there a green car for sale for $15,000 in Queensland?" All of that presupposes, of course, that an industry can agree to a specific DTD, or indeed that they would want to. Above I quoted the suggestion that this would lead to an increase in sales, but surely that would only be for whoever was selling the product cheaper than anywhere else. If, as Trommer tells us, "purchasers can quickly search across suppliers sites for the lowest price, rather than visiting each site separately," then many companies will lose out by coding their sites in XML and adhering to an industry-wide DTD. Mougayar explains that "in a Web-centric model, sellers are in control. Their goal is to generate traffic at their own Web site. They don't want you to compare products available elsewhere; you must purposefully go to another site." So, the potential is great, but whether it will be embraced has yet to be seen. Verdict: Benefit
As long as the tag-names (or 'element-names' in XML-speak) chosen are clear then one might suggest that XML is much clearer, and therefore easier to implement, than HTML. But this is not the case. One of the great benefits of HTML is that it is quick and easy to write. XML, however, although not too difficult to learn, is just one piece of a larger puzzle. Someone wishing to write web pages in XML may have to learn the associated specifications XSL, XLink, and XPointer as well. And altogether, this package is neither quick nor easy to master. Bosak (October 1998) tells us as much when he says that "the combination of XML and XSL is potentially vastly more complex and difficult to work with than today's HTML, so its use will at first be the domain of a few experts working on a large, specialized publishing applications by hand. These will be the applications that demand the highest level of automation and media independence - newspapers, business directories, encyclopaedias, commercial catalogs, television schedules, and so on." With the extra sophistication that XML brings to web markup it cannot help but also bring added complexity, but this may well hinder its widespread use. Verdict: Drawback
Whereas HTML tags describe the appearance of information, XML tags describe the meaning of information. But this does not mean that appearance is no longer important. Appearance is as crucial as ever, and this is where the stylesheet language XSL (eXtensible Stylesheet Language) comes in. XSL tells a browser how to present the information found in an XML file. But because it is not part of the XML file, dealing with style and dealing with content have been separated. Why is this important? It is important when the style needs to stay the same and the content change, or indeed the other way round. Johnson gives us a good example of the former: "If the data from which the document was produced changes, the entire HTML translation needs to be redone. Web sites that show the current weather around the globe, around the clock, usually handle this automatic reformatting very well. The content and the presentation style of the document are separated, because the system designers understand that their content (the temperatures, forecasts, and so on) changes constantly." Another example might be a stylesheet which sets out a corporate document-template. Every time a new document was written and added to the company web-site, it could use the same stylesheet and therefore conform to the same format (the same-size heading etc.). The benefits, here, then, are clear. But the benefits are even greater when we think of examples of the latter. Here are three:
It should be noted that one can use stylesheets in conjunction with HTML: the Cascading Style Sheets (CSS) language can be used for this purpose. However it is not done often, and it is likely that XML (which requires the use of a stylesheet) will be key in encouraging the separation of style and content on the Web. Verdict: Benefit
Currently, web-browsers will read even broken HTML code. The same will not be true of XML. Judge and Ogg (28 November 1997) let us know why this might be the case: "The XML committee apparently had a joint memo, strongly worded, from both Microsoft and Netscape, demanding that they make XML handle errors strictly - that is XML processors should reject bad XML." It is understandable that they did so. The extra programming involved in handling errors in HTML code has contributed significantly to the unwieldy file-size of web-browsers. Error-handling is not something that the browser-vendors want to encourage. However, as it is, "XML is one of the world's easiest formats to parse, and its error-handling rules mean that you don't have to write miles of bozo-correction code" (Bray (18 December 1998)). Is this a good thing? Well, it depends on your point of view. It certainly is for the vendors: it saves them a great deal of time and effort. But what about for the web-page creators? Matsumura (February 1998) tells us that "variance is not allowed, and even one error will prevent the entire document from being processed." XML, then, will not tolerate slip-ups. That should not be a problem for the professional web-designers of this world, after all they are paid to get these things right. But a great strength of HTML was that anyone could give it a go, and, more often than not build themselves a homepage. And if they accidentally forgot to close one of their tags that would not prevent the page from being displayed. The strictness of XML error-handling, however, is definitely not amateur-friendly. Verdict: Drawback
XML documents are broken up into unique 'elements'. Because each element can be individually identified it means that the document has a degree of granularity that an HTML document lacks. An HTML document can only be dealt with as one unit, but because XML documents are broken up into elements you can be quite flexible when handling them. For example stylesheets can transform element-structure so as to allow you to present an XML document into a different order than it is actually written. There are also advantages in terms of updating. Microsoft (3 April 1998) are discussing this when they say that "data may be granularly updated with XML, eliminating the need to resend an entire structured data set each time a portion of the data changes. Only the changed element must be sent from the server to the client, and the changed data can be displayed without refreshing the entire user interface." For example, if a web-site is displaying train times for you but you decide that you want to catch an earlier train, currently if you click on a link for a earlier time the whole page will have to be reloaded just to display this. If the Web-site is programmed in XML, only the element that contains the train time will have to be reloaded. The process will be much quicker, and you may just catch that earlier train after all. Verdict: Benefit
There are three 'Benefit' verdicts as opposed to only two 'Drawback' ones. Furthermore, the benefits listed are significant ones. We can therefore conclude that XML is indeed good news for the Web.
XML: the future of web markup?,
MSc in Information Management, 1998/1999 Electronic Dissertations Library © University of Sheffield - Department of Information Sudies (All Rights Reserved) |