DTD and XML, Part 1 - Introduction
The purpose of this assignment was to:
- Learn what DTD is.
- Learn how and why to put constrains on an XML document by using a DTD.
What is a DTD?
A DTD(Document Type Definition) defines the the structure of a document with a list of allowed elements and attributes.
Why should/could a DTD be used?
There are several advantages to using DTDs that become very obvious as the size and complexity of the XML code increases. Because almost all non-trivial software that use XML benefit from a DTD, it's essential for document authors to understand how to write them.
There are two main reasons for XML authors to use DTDs for their XML documents:
-
Documentation.
A developer can look at the DTD of a XML document and immediately understand it's structure. This makes it easy for independent groups to agree opon a common DTD for interchanging data. -
Validation.
The process of document validation involves passing an XML document through a XML parser that parses/reads the DTD and compares with the XML markup to ensure that elements appear in correct order, that mandatory elements and attributes are in place, and that no undefined elements or attributes have been inserted where they shouldn't have been.Working with validated data makes life much easier for a developer. If data is known to be valid, it's completely predictable. There's no longer any need to clutter the code with error checks or assertions; if the document validates it can be taken for granted that the data will be there in the format it should be.
DTD Declaration
A DTD can be declared as an internal reference (i.e. inline in your XML document), or as an external reference (points to a separate file).
Internal DOCTYPE declaration
If a DTD is included directly in the XML document, a DOCTYPE definition with the following syntax should be used:
<!DOCTYPE root-element [element-declarations]>
Example of a XML document with an internal DTD declaration:
<?xml version="1.0"?> <!DOCTYPE message [ <!ELEMENT message (receiver,sender,subject,content)> <!ELEMENT receiver (#PCDATA)> <!ELEMENT sender (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT content (#PCDATA)> ]> <message> <receiver >Buck</receiver> <sender>Lenny</sender> <subject>Welcome</subject> <content>Welcome Buck!</content> </message>
The DTD is interpreted by a XML parser like this:
!DOCTYPE message (second row) defines that this is message document .
!ELEMENT message (third row) defines the message element to have these four elements:receiver, sender, subject, content
!ELEMENT receiver (fourth row) defines the receiver element to be of the type "#PCDATA".
!ELEMENT sender (fifth row) defines the sender element to be of the type "#PCDATA".
!ELEMENT subject (sixth row) defines the subject element to be of the type "#PCDATA".
!ELEMENT content (seventh row) defines the content element to be of the type "#PCDATA"
External DOCTYPE declaration
If the DTD is included from a separate .dtd file(external), a DOCTYPE definition with the following syntax should be used:
<!DOCTYPE root-element SYSTEM "URI/URL or System path to .dtd file">
or
<!DOCTYPE root-element PUBLIC "Path Description" "URI/URL or System path to .dtd file">
Same XML document as above, but now with an external DTD:
<?xml version="1.0"?> <!DOCTYPE message SYSTEM "message.dtd"> <message> <receiver >Buck</receiver> <sender>Lenny</sender> <subject>Welcome</subject> <content>Welcome Buck!</content> </message>
And this is a copy of the external .dtd file "message.dtd", containing the DTD:
<!ELEMENT message (receiver,sender,subject,content)> <!ELEMENT receiver (#PCDATA)> <!ELEMENT sender (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT content (#PCDATA)>
This was part 1 of the DTD and XML assignment. In part 2 you will learn about the components of XML documents seen from a DTD perspective, and how to use them for the markup declarations in the DTD.
