Electronic Dissertations Library

XML: the future of web markup?, by Elliot Pritchard

SGML > XML

The evolution of Standard Generlized Markup Language (SGML) can be traced back to work done at IBM in the late 60s. IBM developed a predecessor called Generalized Markup Language (GML) for its internal publishing. The language formulated 'tags' to describe to typists how a document should appear. For example tags could specify which part of a document should be underlined. This formed the basis of markup languages in use to the present day. In 1978 the American National Standards Institute (ANSI) created its first version of SGML as an attempt to standardize the way of defining and using markup in documents. Work continued on this language until 1986, when it became an ISO (International Organization for Standardization in Geneva, Switzerland) standard. It is best described as a metalanguage: a way to define and enact specific 'document types' that describe the structure and content of electronic documents.

When Tim Berners-Lee wanted to create a language to describe the presentation of information on a new network known as the World Wide Web, he looked to SGML. HTML, the language he created, is one specific document type, one specific application, of SGML that fashions information for presentation on the Web. When people started to become disillusioned with HTML, they looked again at SGML. 'Why use one specific document type,' they asked, 'when SGML holds the potential for so much more?' Straightforward use of SGML on the Web was unlikely to happen, however, due to its great complexity. When the World Wide Web Consortium (W3C) created XML, they created it as a subset of SGML, missing out complicated or unnecessary parts specifically in order to make it usable over the Web. For example, ETHOS tell us that "XML allows no use of the markup minimization features that allow an SGML document to omit those parts of the markup that can be implied from the document type definition." It is cuts like this that make the formal definition of XML take up 33 printed pages, as opposed to the approximately 500 pages used to define SGML. A staggering difference.

It was not just the problems people were experiencing with the use of HTML that made them look again at SGML, it was the vested interest that many had. SGML had been in use for over ten years, and Walsh (18 February 1998) says that "most of the people involved in the XML effort come from organizations that have a large, in some cases staggering, amount of material in SGML. XML was designed pragmatically, to be compatible with existing standards such as SGML." Because of the heritage of XML, companies with SGML now find that their information is much more 'web-compatible' than in the past. Bremser (5 October 1998) suggests that "for many companies already experienced with SGML, adopting XML DTDs be may primarily a matter of converting legacy SGML DTDs to XML."

Not only has XML meant a shrewd simplification of SGML for the Web, it has meant a shrewd re-naming. Denison (2 October 1997) quotes Tim Bray (one of the editors of the XML 1.0 Specification) as saying: "There were several acronyms that we considered. I believe there was MGML, for Minimal Generalized Markup Language, and something called SIMPL for Simple Internet Markup Protocol, or something like that. Eventually we voted, and XML - for Extensible Markup Language - won out. It was short and sweet, and people liked it." Some may think the name inconsequential, but in fact it plays a very important part in marketing this new standard. If it was of no importance, the standard could equally have been called 'SGML-on-the-Web'. But it is not, and Judge and Ogg (28 November 1997) have a good idea why: "Where SGML is a 'square' thing for document handlers, XML is a 'hip' thing for Web monkeys. The XML people, most of whom are old SGML hands, are blinking to discover that third year computer science students are writing XML products - SGML never had that kind of street cred." Kimber agrees that XML avoids the stigma of the SGML name, and thereby has become "hip, happening, now." As a marketing strategy, it has worked incredibly well. Although XML consists of principles that precede the creation of HTML, it has the reputation of 'new kid on the block' and has brought about a fresh wave of excitement about web markup. As Zelnick (30 March 1998) cynically puts it: "Appropriating XML is a way to get the attention of reporters deluged with press releases...Why do SGML vendors call themselves XML companies today?"


Title Page    Next section


XML: the future of web markup?,
MSc in Information Management, 1998/1999
Electronic Dissertations Library
© University of Sheffield - Department of Information Sudies (All Rights Reserved)