XML Basics
The purpose of this assignment was to:
- Learn the basics of XML.
- Learn how to create a simple CV in XML.
The description and my solution of the XML Basics - assignment can be viewed at the bottom of this page.
What is XML?
Unlike HTML - which was designed for displaying data and focusing on how data is presented - XML was designed for describing data and focusing on what data is.
While HTML is a markup language for displaying information, XML is a markup language for describing information .
It's important to understand that XML is a complement to - not a replacement for - HTML(or XHTML). It's likely that XML eventually will be used solely for describing data, while HTML(or XHTML) will be used to format and display that same data.
- XML stands for Extensible(Xtensible) Markup Language.
- Just like HTML, XML is a markup language.
- XML was designed for describing data.
- Unlike HTML(or XHTML), XML tags aren't predefined. One must define one's own tags.
- XML uses a XML Schema or a DTD(Document Type Definition) to describe data.
- XML documents with either a XML Schema or a DTD is designed for describing itself.
XML Example Document
<?xml version="1.0" encoding="ISO-8859-1"?> <message> <receiver >Buck</receiver> <sender>Lenny</sender> <subject>Welcome</subject> <content>Welcome Buck!</content> </message>
-
The first line in this example is called
the XML declaration
and defines the XML version and character encoding
used. In this case the document conforms
to the 1.0 specification of XML and uses the ISO-8859-1
character set.
The XML declaration is not an XML element. -
The second line describes the opening tag of the
document's root element .
The root element - which sometimes is called
the "document element" - describes what the the document is.
In this case it's a message, but note that I
could have named it myxmldoc, hatexml, wrapper...etc, since
tags aren't predefined in XML.
Because the next 4 lines are contained within the root element they are child elements of the root (i.e. receiver, sender, subject, and content elements). - The third line describes the receiver element of the message and the fourth the sender element. On the fifth line the subject element of the message is described, and on the sixth line the content element. These are, as mentioned, child elements of the root element because they are within the root element, i.e. within the message opening tag and the message closing tag.
Since XML - unlike HTML - is for describing data,
this document isn't very useful by itself.
It's simply pure information contained in XML elements. To make it useful,
software to send or receive it should be created.
Of course software to display it
could be created as well, but the main benefit of
using XML as markup language will
still be it's ability to send and receive
data in a cross-platform manner.
Note that
although most browsers can display XML documents,
this ability is mostly used for
making sure a document has the
correct syntax(is well-formed).
You can try the XML Example Document here:
message-example.xml
XML tags aren't Predefined
Unlike HTML tags, XML tags aren't predefined. Tags in XML must be invented.
Hence, the opening and closing tags of an element
could be named xml_doc, bla-bla, myfirstdoc, or whatever.
A self-describing name, however, is preferable.
The tags in the message example above are "invented" by me, and I tried
to invent names that were as self-descriptive as possible.
Benefits of XML - data exchange
With XML, data can be exchanged between incompatible systems.
Computer systems and databases often contain data in incompatible formats. One of the more common problems for developers has always been to exchange data between data systems that contain data in different formats. Storing data in XML can reduce this complexity and create data that can be read by many different types of software.
Because XML is independent of hardware and software, data can be made available to other applications than just browsers. Applications can for example access your XML files as data sources, just like they access databases. Because of XML's cross-platform abilities data can be made available to all kinds of "reading applications".
XML elements have closing tags
Unlike HTML, but just like XHTML, elements in XML must have closing tags.
Note that the XML declaration is not an XML element. It does not have a closing tag.
XML Tags are Case Sensitive
Unlike HTML, but just like XHTML, opening and closing tags in XML must be written with the same case. XHTML is defined to use all lowercase. In XML there is no such restriction. The only important thing to remember is to be consistent and use the same case for both the opening tag and the closing tag.
Although uppercase letters could be used, if used in both the opening and closing tags, it's more common to use lowercase letters for all letters in all XML elements - just like in XHTML.
Note: The pure text content stored in elements can, of course, be in both upper- and lowercase.
XML Elements Must be logically Nested
Correct nesting of tags in XML is important.
In HTML some elements may be ilogically nested within each other, like this:
<strong> <em>This text is strong and emphased</strong> </em>
In XML all elements must be logically nested within each other like this:
<strong> <em>This text is strong and emphased </em></strong>
XML Documents have a Root Element
All XML documents must have a root element(an opening and a closing tag) surrounding all other elements.
All elements can have child elements. Child elements must be correctly nested within their parent element:
<root_element> <child_element> <child_of_child_element> ..... </child_of_child_element> </child_element> </root_element>
What does Extensible mean?
Look at the following XML message example:
<message> <receiver>Balou</receiver> <sender>Boobo</sender> <content>asdf bla hoho lala</content> </message>
Let's say you've created an application that extracted the <receiver>, <sender>, and <content> elements from the XML document to produce this output:
|
MESSAGE
Receiver:Balou asdf bla hoho lala |
Now, let's say the author of the XML document added some extra information, like this:
<message> <date>2002-08-01</date> <receiver>Balou</receiver> <sender>Boobo</sender> <subject>Reminder</subject> <body>Hoho blabla lala</body> </message>
Although the XML document is different from before, the application should still be able to find the <receiver>, <sender>, and <content> elements in the XML document and hence produce the same output.
This ability to be able to be extended without making "reading applications" break is what is meant by Extendable in the Xtendable Markup Language .
XML Document Example - a book
The XML document below describes a book:
<book> <title>Learn XML</title> <product id="777" media="paper"> </product> <chapter>XML Basics - introduction <paragraph> What is HTML/XHTML? </paragraph> <paragraph> What is XML? </paragraph> </chapter> <chapter>XML Syntax <paragraph> Elements must have a closing tag </paragraph> <paragraph> Elements must be logically nested </paragraph> </chapter> </book>
In this example "book" is the root element . Title, product, and chapter are child elements of the book element. Book is the parent element of title, product, and chapter. Title, product, and chapter are sister elements (also called siblings ) because they have the same parent.
The chapter element is the parent element of the paragraph element.
It is therefore both a child element (child of book),
and a parent element (parent of paragraph).
The paragraph element is a child element that has no siblings.
XML Elements Content Types
XML Elements can have different content types.
An XML element can have element content ,
mixed content , simple content,
or empty content .
It can also contain attributes within it's opening tag.
In the book example, book has element content , because it contains other elements. chapter has mixed content because it contains both text and other elements. paragraph has simple content (or text content) because it contains only text. product has empty content , because it carries no information within it's opening and closing tag.
The product element has attributes. The attribute named "id" has the value "777", and the attribute named "media" has the value "paper".
XML Element Naming
XML elements must follow a few naming rules:
- Names can contain letters, numbers, and other characters
- Names cannot start with a number or punctuation character
- Names cannot start with the letters xml, no matter if it's lower or upper case. Hence, this is not allowed: xML, XML, or Xml, xmLdocs...etc
- Names cannot contain spaces
All other names can be used, but the idea is to make names as self-descriptive as possible.
Examples: <job_title>, <favorite_language>.
It's good practice to avoid "-" and "." in names. If, for example, you name something "job-title," it could be problematic if the software in use tries to subtract title from job. Or if you name something "job.title," the software might think that "title" is a property of the object "job". This shouldn't be a problem if the software is created by a reasonably good programmer, but to be on the safe side I recommend not to use punctuations and hyphens.
Best practice is to use names that are short and simple, like this: <job_title>.
Not like this: <title_of_the_job>.
Non-English letters are legal in XML element names, but make sure the software that should read(parse) the XML documents supports them.
The ":" should not be used in element names because it's reserved for XML namespaces.
XML Attributes
XML elements can have attributes within the opening tag.
From XHTML, and stricter versions of HTML, you hopefully remember markup like this: <img src="hacker.jpg" />. The src attribute provides information about the img element. In fact, the img element in XHTML has no content at all (empty content), it's only purpose is to give information on where to find content by providing the URL for an image.
Attributes are usually used for information not part of the data. In the example below, the file format doesn't matter much for the end user, but is important to the application receiving and parsing the element:
<file format="jpg">hacker.jpg</file> |
Elements or Attributes?
Data can be stored in either attributes or elements.
Below are examples of use of attributes and elements:
|
<person sex="male"> <given_name>Martin</given_name> <surname>Carlsson</surname> </person> |
|
<person> <sex>male</sex> <given_name>Martin</given_name> <surname>Carlsson</surname> </person> |
In the first example sex is an attribute with the
value "male". In the second, sex is a child
element with text content.
Both examples provide the same information.
Although there are no rules about when to use attributes and when to use child elements, it's in my opinion best to use child elements if the information feels like the actual data, and use attributes if the information feels like data about the actual data(meta data...sort of).
Quoted Attribute Values
In XML, attrubute values must have quotation marks.
XML elements can have attributes in name/value pairs. In XML - just like in XHTML - the attribute value must be quoted. In these two XML documents the first one is incorrect, and the second correct:
<?xml version="1.0" encoding="ISO-8859-1"?> <message date=09/10/2006> <receiver>Balou</receiver> <sender>Boobo</sender> </message>
<?xml version="1.0" encoding="ISO-8859-1"?> <message date="09/10/2006"> <receiver>Balou</receiver> <from>Boobo</from> </message>
The error in the first document is that the date attribute in the message element isn't quoted, which it must be in order to follow the correct XML syntax, i.e. to be well-formed.
This is correct syntax: date="24/03/2006".
This is incorrect syntax: date=24/03/2006.
Assignment Description
Create a CV marked up with XML. No DTD is needed.
My solution, Assignment Files
I haven't put in the actual content of my own CV. Instead I have created the elements and attributes that could be used for creating a CV in XML. Kind of a simple CV-template.
If you don't get an error message when displaying the file below, it means the document is well-formed, i.e. has correct syntax.
