XML Basics

The purpose of this assignment was to:

The description and my solution of the XML Basics - assignment can be viewed at the bottom of this page.

What is XML?

Unlike HTML - which was designed for displaying data and focusing on how data is presented - XML was designed for describing data and focusing on what data is.

While HTML is a markup language for displaying information, XML is a markup language for describing information .

It's important to understand that XML is a complement to - not a replacement for - HTML(or XHTML). It's likely that XML eventually will be used solely for describing data, while HTML(or XHTML) will be used to format and display that same data.

XML Example Document

<?xml version="1.0" encoding="ISO-8859-1"?> <message> <receiver >Buck</receiver> <sender>Lenny</sender> <subject>Welcome</subject> <content>Welcome Buck!</content> </message>

Since XML - unlike HTML - is for describing data, this document isn't very useful by itself. It's simply pure information contained in XML elements. To make it useful, software to send or receive it should be created.
Of course software to display it could be created as well, but the main benefit of using XML as markup language will still be it's ability to send and receive data in a cross-platform manner.

Note that although most browsers can display XML documents, this ability is mostly used for making sure a document has the correct syntax(is well-formed). You can try the XML Example Document here:

XML tags aren't Predefined

Unlike HTML tags, XML tags aren't predefined. Tags in XML must be invented. Hence, the opening and closing tags of an element could be named xml_doc, bla-bla, myfirstdoc, or whatever. A self-describing name, however, is preferable.
The tags in the message example above are "invented" by me, and I tried to invent names that were as self-descriptive as possible.

Benefits of XML - data exchange

With XML, data can be exchanged between incompatible systems.

Computer systems and databases often contain data in incompatible formats. One of the more common problems for developers has always been to exchange data between data systems that contain data in different formats. Storing data in XML can reduce this complexity and create data that can be read by many different types of software.

Because XML is independent of hardware and software, data can be made available to other applications than just browsers. Applications can for example access your XML files as data sources, just like they access databases. Because of XML's cross-platform abilities data can be made available to all kinds of "reading applications".

XML elements have closing tags

Unlike HTML, but just like XHTML, elements in XML must have closing tags.

Note that the XML declaration is not an XML element. It does not have a closing tag.

XML Tags are Case Sensitive

Unlike HTML, but just like XHTML, opening and closing tags in XML must be written with the same case. XHTML is defined to use all lowercase. In XML there is no such restriction. The only important thing to remember is to be consistent and use the same case for both the opening tag and the closing tag.

Although uppercase letters could be used, if used in both the opening and closing tags, it's more common to use lowercase letters for all letters in all XML elements - just like in XHTML.

Note: The pure text content stored in elements can, of course, be in both upper- and lowercase.

XML Elements Must be logically Nested

Correct nesting of tags in XML is important.

In HTML some elements may be ilogically nested within each other, like this:

<strong> <em>This text is strong and emphased</strong> </em>

In XML all elements must be logically nested within each other like this:

<strong> <em>This text is strong and emphased </em></strong>

XML Documents have a Root Element

All XML documents must have a root element(an opening and a closing tag) surrounding all other elements.

All elements can have child elements. Child elements must be correctly nested within their parent element:

<root_element> <child_element> <child_of_child_element> ..... </child_of_child_element> </child_element> </root_element>

What does Extensible mean?

Look at the following XML message example:

<message> <receiver>Balou</receiver> <sender>Boobo</sender> <content>asdf bla hoho lala</content> </message>

Let's say you've created an application that extracted the <receiver>, <sender>, and <content> elements from the XML document to produce this output:


From: Boobo

asdf bla hoho lala

Now, let's say the author of the XML document added some extra information, like this:

<message> <date>2002-08-01</date> <receiver>Balou</receiver> <sender>Boobo</sender> <subject>Reminder</subject> <body>Hoho blabla lala</body> </message>

Although the XML document is different from before, the application should still be able to find the <receiver>, <sender>, and <content> elements in the XML document and hence produce the same output.

This ability to be able to be extended without making "reading applications" break is what is meant by Extendable in the Xtendable Markup Language .

XML Document Example - a book

The XML document below describes a book:

<book> <title>Learn XML</title> <product id="777" media="paper"> </product> <chapter>XML Basics - introduction <paragraph> What is HTML/XHTML? </paragraph> <paragraph> What is XML? </paragraph> </chapter> <chapter>XML Syntax <paragraph> Elements must have a closing tag </paragraph> <paragraph> Elements must be logically nested </paragraph> </chapter> </book>

In this example "book" is the root element . Title, product, and chapter are child elements of the book element. Book is the parent element of title, product, and chapter. Title, product, and chapter are sister elements (also called siblings ) because they have the same parent.

The chapter element is the parent element of the paragraph element. It is therefore both a child element (child of book), and a parent element (parent of paragraph).
The paragraph element is a child element that has no siblings.

XML Elements Content Types

XML Elements can have different content types.

An XML element can have element content , mixed content , simple content, or empty content .
It can also contain attributes within it's opening tag.

In the book example, book has element content , because it contains other elements. chapter has mixed content because it contains both text and other elements. paragraph has simple content (or text content) because it contains only text. product has empty content , because it carries no information within it's opening and closing tag.

The product element has attributes. The attribute named "id" has the value "777", and the attribute named "media" has the value "paper". 

XML Element Naming

XML elements must follow a few naming rules:

All other names can be used, but the idea is to make names as self-descriptive as possible.

Examples: <job_title>, <favorite_language>.

It's good practice to avoid "-" and "." in names. If, for example, you name something "job-title," it could be problematic if the software in use tries to subtract title from job. Or if you name something "job.title," the software might think that "title" is a property of the object "job". This shouldn't be a problem if the software is created by a reasonably good programmer, but to be on the safe side I recommend not to use punctuations and hyphens.

Best practice is to use names that are short and simple, like this: <job_title>.

Not like this: <title_of_the_job>. 

Non-English letters are legal in XML element names, but make sure the software that should read(parse) the XML documents supports them.

The ":" should not be used in element names because it's reserved for XML namespaces.

XML Attributes

XML elements can have attributes within the opening tag.

From XHTML, and stricter versions of HTML, you hopefully remember markup like this: <img src="hacker.jpg" />. The src attribute provides information about the img element. In fact, the img element in XHTML has no content at all (empty content), it's only purpose is to give information on where to find content by providing the URL for an image.

Attributes are usually used for information not part of the data. In the example below, the file format doesn't matter much for the end user, but is important to the application receiving and parsing the element:

<file format="jpg">hacker.jpg</file>

Elements or Attributes?

Data can be stored in either attributes or elements.

Below are examples of use of attributes and elements:

<person sex="male"> <given_name>Martin</given_name> <surname>Carlsson</surname> </person>

<person> <sex>male</sex> <given_name>Martin</given_name> <surname>Carlsson</surname> </person>

In the first example sex is an attribute with the value "male". In the second, sex is a child element with text content.
Both examples provide the same information.

Although there are no rules about when to use attributes and when to use child elements, it's in my opinion best to use child elements if the information feels like the actual data, and use attributes if the information feels like data about the actual data(meta data...sort of).

Quoted Attribute Values

In XML, attrubute values must have quotation marks.

XML elements can have attributes in name/value pairs. In XML - just like in XHTML - the attribute value must be quoted. In these two XML documents the first one is incorrect, and the second correct:

<?xml version="1.0" encoding="ISO-8859-1"?> <message date=09/10/2006> <receiver>Balou</receiver> <sender>Boobo</sender> </message>

<?xml version="1.0" encoding="ISO-8859-1"?> <message date="09/10/2006"> <receiver>Balou</receiver> <from>Boobo</from> </message>

The error in the first document is that the date attribute in the message element isn't quoted, which it must be in order to follow the correct XML syntax, i.e. to be well-formed.

This is correct syntax: date="24/03/2006".

This is incorrect syntax: date=24/03/2006.

Assignment Description

Create a CV marked up with XML. No DTD is needed.

My solution, Assignment Files

I haven't put in the actual content of my own CV. Instead I have created the elements and attributes that could be used for creating a CV in XML. Kind of a simple CV-template.

If you don't get an error message when displaying the file below, it means the document is well-formed, i.e. has correct syntax.

«  Previous Next  »