14-Jun-06 (Created: 14-Jun-06) | More in 'CS-xml'

An XSD primer and a few references

Let us start with data


(article)
	(title/)
	(author/)
	(keywords/)
	(published-date/)
(/article)

For the benefit of writing about XML in XML I have used regular brackets instead of angular brackets. If you cut and paste the code accordingly replace them back.

Basic XSD definition with one element


(?xml version="1.0" encoding="UTF-8"?)
(!--
This is an xml document. The first line has to be the above line. 
It can not be a comment
--)

(!--The root element of an xml schema --)
(xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
	elementFormDefault="qualified" 
	attributeFormDefault="unqualified")
	(xs:element name="article")
		(xs:annotation)
		    (xs:documentation)
                           Comment describing your root element
                    (/xs:documentation)
		(/xs:annotation)
	(/xs:element)	
(/xs:schema)

The xml schema has a name space identified by the url at w3.org. Setting the "elementFormDefault" to "qualified" forces the elements in the xml target document to be namespace qualified and the attributes are set so that they don't have to be qualified with their namespace. There is one element called "article" defined as part of the schema so far. This element has a comment in the shape of an xs:annotation.

Next level XSD with children for the article element added


(?xml version="1.0" encoding="UTF-8"?)
(xs:schema 
   xmlns:xs="http://www.w3.org/2001/XMLSchema"   
   elementFormDefault="qualified" 
   attributeFormDefault="unqualified")
	(!--The root element of your xml document --)
	(xs:element name="article")
	  (xs:complexType)
	     (xs:all)
		(xs:element name="author" type="xs:string"/)
		(xs:element name="title" type="xs:string"/)
		(xs:element name="publish-date" type="xs:date"/)
		(xs:element name="keywords" minoccurs="0" type="xs:string"/)
	   (/xs:all)
	 (/xs:complexType)
	(/xs:element)
(/xs:schema)

This may be paraphrased to be equivalent to the following in a programming language


public class article
{
	complextype all of (author, title, publish-date, keywords)
}

The element article is defined as a collection of the named elements which can occur in any order. The "xs:all" has specially defined semantics and seemed to be a short cut for a common usage pattern. The children elements can appear in any order but any element can not occur more than once. The element is allowed to be optional by setting the "minoccurs" attribute on that element to be 0. For example the keywords element is an optional elment in this group.

To force the order on those elements we can use the xs:sequence


(?xml version="1.0" encoding="UTF-8"?)
(xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema")
	(!--The root element of your xml document --)
	(xs:element name="article")
	  (xs:complexType)
	    (xs:sequence)
		(xs:element name="author" type="xs:string"/)
		(xs:element name="title" type="xs:string"/)
		(xs:element name="publish-date" type="xs:date"/)
		(xs:element name="keywords" minoccurs="0" type="xs:string"/)
	   (/xs:sequence)
	  (/xs:complexType)
	(/xs:element)
(/xs:schema)

The nature of sequence is that the elements have to appear in the same order. One flexibility point in a sequence is that an element can occur more than once but in that order.

The story of minoccurs and maxoccurs

According to the spec, If both are ommitted then the element must occur once and only once. This means minoccurs and maxoccurs are 1 by default. so if you say that an element's minoccurs is 0 then it is optional and can occur once. The maximum value for a maxoccurs is called "unbounded".

An alternative to inline or annonymous typing of xml elements


(?xml version="1.0" encoding="UTF-8"?)
(xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema")

	(!--Define your type first --) 
	(xs:complexType name="ArticleType")
		(xs:sequence)
			(xs:element name="author" type="xs:string"/)
			(xs:element name="title" type="xs:string"/)
			(xs:element name="publish-date" type="xs:date"/)
			(xs:element name="keywords" minoccurs="0" type="xs:string"/)
		(/xs:sequence)
	(/xs:element)
	
	(!-- Then define your element --)
	(xs:element name="article" type="ArticleType")
(/xs:schema)

This code first defines a complex type called "ArticleType" and then defines an element with that type. Now the defined type with a name is reusable. More than reuse this separation of named types allows for cleaner reading of code as you don't have to nest them.

Name spaces and xsd

There is an annoying amount of detail related to the organization of namespaces in xsd and the target XML document. The included Aaron's first document gives as best an explanation of this detail as any one could. Being primer I have decided not to attack that area at this time.

References

Part1 of A quick guide to XML Schema by Aaron Skonnard

Part2 of A quick gide to XML Schema by Aaron Skonnard

XML Schema Part 0

XML Schema structures Part 1

XML Schema Data types Part 2

XML Knowledge Folder where I collect XML related information