XML TUTORIAL - XML DOCUMENTS


You will find that constructing an XML document is not too difficult once you familiarise yourself with the components which you can use. We will detail those components, and then give an example of a basic XML document.


THE XML DECLARATION

This will go at the beginning of any XML document you construct. It is mainly a way of letting the processing software know that this file is coded in XML version 1.0. At the moment there is only one version, but if in the future new versions are brought out this declaration will make it clear which version you are using. Here is an example of an XML declaration:

<?xml version="1.0"?>

Pretty simple, then. Make sure that you include the question marks when you use this declaration.


ELEMENTS

These are the main components of an XML document. Here is an example of an XML element:

<CompanyDirector>Andy King</CompanyDirector>

Elements are made up of an opening tag which contains the element name, the content and a closing tag which again contains the element name. Every XML document has to have what is called a 'root element'. This is a pair of tags which enclose and describe the whole document. It is usually straightforward to pick the name of a root element. For example if I was constructing this whole tutorial as an XML document, my root element would probably take the name 'Tutorial'. Now, in our above example of an element the content used is called 'character data'. This term describes information that contains no markup. As well as just character data, an element can contain other element(s), a mixture of other element(s) and character data, or nothing at all. An example of an element containing another element is as follows:

<Staff><CompanyDirector>Andy King</CompanyDirector></Staff>

An example of an element containing another element and character data is as follows:

<CompanyDirector>Andy <Nickname>'The Fish King'</Nickname> King</CompanyDirector>

What is important to note from this example is that when elements contain other elements, your information will start to have a hierarchical structure. XML documents have one 'root' element in which other elements may appear, in which other elements may appear and so on. This is why you will find people saying that XML documents have a 'logical tree structure'. Our last example is of an element without content. You will remember that we gave an example of this in the last section. Here it is again:

<Company name="Andy King Fish Ltd."/>

This example is slightly different from the others in that it also includes an example of an attribute, which we will now go on to describe.


ATTRIBUTES

Attributes are properties of elements. In the example above, 'Company' is the element and the attribute 'name' is a property of that element with the value 'Andy King Fish Ltd.'. Elements with attributes, then, take the form:

<ElementName attribute-name="attribute-value">Content</ElementName> (note that the previous example has a trailing slash only because it is an empty element)

An element can have more than one attribute, for example 'telephone-number' could have been another attribute of the element 'Company'. You are also allowed to give a different element an attribute of the same name, for example an element 'Product' could also have the attribute 'name', but to save yourself any confusion when it come to defining your attributes in a DTD it is best to try and avoid this in the same document. For example, you could give the element 'Product' the attribute 'title', which would be just as fitting. Your main concern when it comes to attributes is when you should be using them. If we go back to our company name example once more, we will be able to see that we could express the same thing using just elements. It would take this form:

<Company><Name>Andy King Fish Ltd.</Name></Company>

So the question is: when should we use attributes and when should we use elements? The answer: that is up to us, but we should bear in mind that whereas elements can contain other elements, attributes cannot contain any other attributes. One final thing to note when you are defining attributes is that the attribute values must be surrounded by quotation marks but that these can be either single or double. If you use double quotes, you can then use single quotes within the value and it not be a problem. Similarly, if you use single, then you can use double within. An example of the former is as follows:

<CompanyDirector name="Andy 'The Fish King' King"/>


ENTITIES

Entities allow us to insert information into a particular place, or various particular places, of an XML document. If we put an 'entity-reference' in a certain place, then when the XML file is being processed, it will replace that reference with the 'entity-content'. The entity-content could be a word or phrase, or even an entire XML document. Entity-references take the form:

&EntityName;

It is in a DTD, though, where we must define what will replace an entity reference, so I shall leave the main discussion of entities until our coverage of DTDs in the next section. It is worth noting here, however, that certain entity-names are reserved by the creators of XML to automatically be replaced with specific content. These have been created in order to have a way of displaying characters (such as '<') which a browser would otherwise interpret as markup. These names, and their entity-content, are as follows:


COMMENTS

Start with '<!--' and end with '-->'. These are if you want to annotate your source code for any reason. These comments will not be processed. The comments can contain any characters except "--". You can use these to make notes to yourself, or to others who might be looking at the source code. I have included an example in the XML document at the bottom of this page.


CDATA

You can add CDATA sections to XML documents if you want, although it is only in special circumstances that you might need to do so. The sections begin with '<!CDATA [' and end with ']]>'. CDATA stands for 'character data' and is something that a parser (processing-software) will not attempt to translate as markup. You can include in these sections characters that otherwise might be thought to be part of the document markup, such as the left-angle bracket '<'. The obvious exception to this is the string that ends the CDATA section ']]>'. This is an alternative to using the aforementioned pre-specified entity-references. Here is an example:

<!CDATA [ (2 + 2) < 5 ]]>


EXAMPLE XML DOCUMENT

The following is an example of a well-formed XML document. It is a product catalogue for a made-up company called 'Andy King Fish Ltd.'.

<?xml version="1.0">

<!-- 'ProductCatalogue' is the root element of this document -->

<ProductCatalogue>

	<Product>

		<ProductName>Andy's Frozen Fishcakes</ProductName>

		<Slogan>'The Family Favourite'</Slogan>

		<Fisher'sPrice currency="sterling">1.99</Fisher'sPrice>

	</Product>

	<Product>

		<ProductName>Andy King's King Prawns</ProductName>

		<Slogan>'A Right Royal Treat'</Slogan>

		<Fisher'sPrice currency="sterling">3.99</Fisher'sPrice>

	</Product>

	<Product>

		<ProductName>King Trout</ProductName>

		<Slogan>'A Right Royal Trout'</Slogan>

		<Fisher'sPrice currency="sterling">2.99</Fisher'sPrice>

	</Product>

	<Product>

		<ProductName>Andy's Plaice</ProductName>

		<Slogan>'Hungry? Go To Andy's Plaice'</Slogan>

		<Fisher'sPrice currency="sterling">2.99</Fisher'sPrice>

	</Product>

	<Product>

		<ProductName>The King's Fingers</ProductName>

		<Slogan>'They Come In 'Andy'</Slogan>

		<Fisher'sPrice currency="sterling">0.99</Fisher'sPrice>

	</Product>

</ProductCatalogue>

You should now be able to construct your own well-formed XML document. Just try to think of the best structure for your information, and the best terms to describe it. Do not forget that the names you pick must conform to the restrictions mentioned in the last section about XML names. And do not forget to make sure that your tags nest properly. Let us move on to the next section (or if you would prefer, you can go back to the last section).