Electronic Dissertations Library

XML: the future of web markup?, by Elliot Pritchard

BENEFITS AND DRAWBACKS

Is XML good news for the Web? I have identified some of the key features and implications of XML in order to determine whether each of these should be counted as a benefit or a drawback. After I have finished, a simple count-up of the verdicts should decide the answer to my question.


CONTEXT

'Metadata' is data about data. Let us take the example of looking for a specific book in a library. The first thing that we would try to find is the library-record for that book, which would tell us where we might find it. That record is a good example of metadata. It contains details about the book such as title, author, publisher, and ISBN number. XML is another example of metadata. Each set of tags describes the information that it contains. And such metadata can prove very useful. As Rath says, "metadata is added value to the information content itself." A big difference between our example of the library-record and XML is summed up by Freter (b) when he says that "XML markup provides metadata for all components of a document, not merely the object that contains the document itself." This means that XML has the potential to tell us a lot more information about a document than that library-record will be able to tell us about a book.

XML provides a way to incorporate metadata into the body of a file, and therefore it can provide us with something very important: context. Why is context important? Hogan gives us an excellent example: "1000 might be a good price, in dollars for a state-of-the-art laptop. However, it would be a very bad number of days required for delivery. As a result, the tag that puts the 1000 in context is critical." In other words, if you have context, then you can have proper comparison: "XML allows consumers to compare apples to apples, rather than apples to oranges" (John Rosenfeld quoted in Oberndorf (1 April 1999)). But surely this is ridiculous: When I visit a web-site it does not just throw figures at me. It tells me what the price is, and it tells me what the delivery-time is. But what about when I am searching?

Any web-user knows that searching can be frustrating. Matsumura (February 1998) offers this analogy: "Siphoning information from the Web is, for some of us, an experience akin to drinking from a spewing firehouse." If one were to type 'laptop price 1000' into a search engine, it will retrieve all pages with those keywords in. A page with the details of a laptop that cost 5000 dollars and had a 1000-day delivery time would be retrieved with this method. This is because the search terms used lack context. Reichard tells us that "search engines work by brute force, by indexing key words found in Web documents. There's no context to these searches, and users are forced to either enter a precise search term (and let's face it, most users are not well-versed in the intricacies of Boolean searches) or wade through hundreds of thousands of Web pages." Metadata can help this situation. Rath uses the following example: "The search in Alta Vista for an SGML book written by Mr. Pepper brings more than 200 hits and none of the first twenty hits are useful...[a] search at amazon.com lists the book immediately, but here the search is done in a database with metadata and not in HTML pages."

The example of 'amazon.com', an on-line bookseller, is a particularly pertinent one, as it is in the area of e-commerce that XML metadata is most likely to be used. If one wanted to buy a particular book, it would be useful to be able to find the web-site that would sell it to you for the cheapest (be it amazon.com or another site). If there was an industry-specific DTD (Document Type Defintion, which tells you what tags may be used in an XML document) which these sites used, then a search engine familiar with that DTD could search for a specific title by a specific author and tell you where you could find the cheapest price. This is something Reichard indicates when he says that "by storing product information with XML data, it can be more easily categorized by a search engine on your site or on the Internet. Users will find it easier to track down your product listings, with (you hope) the end result a large increase in both sales and satisfied customers." That kind of searching is far more sophisticated, and far more successful, than what we have today on the Web. Glave (15 April 1998) quotes Tim Berners-Lee (creator of the Web) as suggesting that a search engine may one day be able to satisfactorily answer the query "Is there a green car for sale for $15,000 in Queensland?"

All of that presupposes, of course, that an industry can agree to a specific DTD, or indeed that they would want to. Above I quoted the suggestion that this would lead to an increase in sales, but surely that would only be for whoever was selling the product cheaper than anywhere else. If, as Trommer tells us, "purchasers can quickly search across suppliers sites for the lowest price, rather than visiting each site separately," then many companies will lose out by coding their sites in XML and adhering to an industry-wide DTD. Mougayar explains that "in a Web-centric model, sellers are in control. Their goal is to generate traffic at their own Web site. They don't want you to compare products available elsewhere; you must purposefully go to another site." So, the potential is great, but whether it will be embraced has yet to be seen.

Verdict: Benefit


COMPLEXITY


Anyone who knows how to program in HTML will not be daunted when faced with a block of XML-code. The same familiar use of left angle brackets (<) and right-angle brackets (>) to contain tags. One could suggest that to someone unfamiliar with markup, XML will at first glance be much more understandable, because unlike HTML, its tags will tend to be natural-language tags. To give an example, when a markup virgin is confronted with the HTML tag <hr>, it is unlikely to be intuitively apparent to them what this signifies. They would probably need to be told, or to see the effect that it has when viewed through a browser. However, if they were to see an XML tag such as <TelephoneNumber>, it would probably be fairly obvious to them that this signifies that what follows the tag is a telephone number (a concept that it is safe to say most people are familiar with). It is this that Bosak and Bray (May 1999) are pointing out when they say: "Unlike most computer data formats, XML markup also makes sense to humans, because it consists of nothing more than ordinary text."

As long as the tag-names (or 'element-names' in XML-speak) chosen are clear then one might suggest that XML is much clearer, and therefore easier to implement, than HTML. But this is not the case. One of the great benefits of HTML is that it is quick and easy to write. XML, however, although not too difficult to learn, is just one piece of a larger puzzle. Someone wishing to write web pages in XML may have to learn the associated specifications XSL, XLink, and XPointer as well. And altogether, this package is neither quick nor easy to master. Bosak (October 1998) tells us as much when he says that "the combination of XML and XSL is potentially vastly more complex and difficult to work with than today's HTML, so its use will at first be the domain of a few experts working on a large, specialized publishing applications by hand. These will be the applications that demand the highest level of automation and media independence - newspapers, business directories, encyclopaedias, commercial catalogs, television schedules, and so on." With the extra sophistication that XML brings to web markup it cannot help but also bring added complexity, but this may well hinder its widespread use.

Verdict: Drawback


SEPARATING STYLE FROM CONTENT

Whereas HTML tags describe the appearance of information, XML tags describe the meaning of information. But this does not mean that appearance is no longer important. Appearance is as crucial as ever, and this is where the stylesheet language XSL (eXtensible Stylesheet Language) comes in. XSL tells a browser how to present the information found in an XML file. But because it is not part of the XML file, dealing with style and dealing with content have been separated.

Why is this important? It is important when the style needs to stay the same and the content change, or indeed the other way round. Johnson gives us a good example of the former: "If the data from which the document was produced changes, the entire HTML translation needs to be redone. Web sites that show the current weather around the globe, around the clock, usually handle this automatic reformatting very well. The content and the presentation style of the document are separated, because the system designers understand that their content (the temperatures, forecasts, and so on) changes constantly." Another example might be a stylesheet which sets out a corporate document-template. Every time a new document was written and added to the company web-site, it could use the same stylesheet and therefore conform to the same format (the same-size heading etc.). The benefits, here, then, are clear. But the benefits are even greater when we think of examples of the latter. Here are three:

  • Personalised web-sites. Sharpe tells us that with XML "the site generation can be done off-line or on demand, the latter creating personalized web pages in response to user requests."
  • Web-lite. "On the hardware front, one of the applications being developed in XML is a 'lite' version of the Web designed for use on a small screen on a cellular telephone. This document type allows for simple navigation (for example, 'Press one for FAQ, Press two for Company Info') using the standard telephone interface, as well as a way to effectively translate the text into an audible speech format, if necessary" (Matsumura (February 1998)). A small version of the Web such as this would not be so feasible if it meant creating two different versions of the same web-site. With XML it would only be a case of applying a different stylesheet to the same content.
  • Accesibility. In the scramble to make web-sites as visually-appealing as possible, many have forgotten that their web-sites may not be accessible to all. The technology required by the visually-impaired when they browse the web, for example, is unable to deal with the use of frames in a web-site. Separating the style from the content means that they will still have a chance to get at the latter. Bosak and Bray (May 1999) support this idea when they say that "people with visual disabilities gain a free benefit from this approach to publishing. Stylesheets will let them render XML into Braille or audible speech."

It should be noted that one can use stylesheets in conjunction with HTML: the Cascading Style Sheets (CSS) language can be used for this purpose. However it is not done often, and it is likely that XML (which requires the use of a stylesheet) will be key in encouraging the separation of style and content on the Web.

Verdict: Benefit


ERROR-HANDLING

Currently, web-browsers will read even broken HTML code. The same will not be true of XML. Judge and Ogg (28 November 1997) let us know why this might be the case: "The XML committee apparently had a joint memo, strongly worded, from both Microsoft and Netscape, demanding that they make XML handle errors strictly - that is XML processors should reject bad XML." It is understandable that they did so. The extra programming involved in handling errors in HTML code has contributed significantly to the unwieldy file-size of web-browsers. Error-handling is not something that the browser-vendors want to encourage. However, as it is, "XML is one of the world's easiest formats to parse, and its error-handling rules mean that you don't have to write miles of bozo-correction code" (Bray (18 December 1998)).

Is this a good thing? Well, it depends on your point of view. It certainly is for the vendors: it saves them a great deal of time and effort. But what about for the web-page creators? Matsumura (February 1998) tells us that "variance is not allowed, and even one error will prevent the entire document from being processed." XML, then, will not tolerate slip-ups. That should not be a problem for the professional web-designers of this world, after all they are paid to get these things right. But a great strength of HTML was that anyone could give it a go, and, more often than not build themselves a homepage. And if they accidentally forgot to close one of their tags that would not prevent the page from being displayed. The strictness of XML error-handling, however, is definitely not amateur-friendly.

Verdict: Drawback


GRANULARITY

XML documents are broken up into unique 'elements'. Because each element can be individually identified it means that the document has a degree of granularity that an HTML document lacks. An HTML document can only be dealt with as one unit, but because XML documents are broken up into elements you can be quite flexible when handling them. For example stylesheets can transform element-structure so as to allow you to present an XML document into a different order than it is actually written. There are also advantages in terms of updating. Microsoft (3 April 1998) are discussing this when they say that "data may be granularly updated with XML, eliminating the need to resend an entire structured data set each time a portion of the data changes. Only the changed element must be sent from the server to the client, and the changed data can be displayed without refreshing the entire user interface." For example, if a web-site is displaying train times for you but you decide that you want to catch an earlier train, currently if you click on a link for a earlier time the whole page will have to be reloaded just to display this. If the Web-site is programmed in XML, only the element that contains the train time will have to be reloaded. The process will be much quicker, and you may just catch that earlier train after all.

Verdict: Benefit


CONCLUSION

There are three 'Benefit' verdicts as opposed to only two 'Drawback' ones. Furthermore, the benefits listed are significant ones. We can therefore conclude that XML is indeed good news for the Web.


Title Page    Next section


XML: the future of web markup?,
MSc in Information Management, 1998/1999
Electronic Dissertations Library
© University of Sheffield - Department of Information Sudies (All Rights Reserved)