CSCI 5733
XML Application Development
Spring 2006
Homework #2

Due date: Feb 16, 2006 (Thursday)

(1) Data types of XML Schema (20%) This question is concerned with the mapping of XML Schema Structures to Java's built-in objects. There are applications that may benefit from the mapping of XML Schema structures to Java’s built-in constructs and objects. For example, the built-in XML Schema type xs:string can simply be mapped to Java’s String object.

If an XML Schema’s feature is not directly supported by Java, it will be necessary to check for compliance during runtime, decreasing reliability and efficiency. Alternatively, user-defined objects can be used, but that will also add complexity and overhead.

For example, consider the XML Schema’s <choice> element in the following example:

<xsd:complexType name="PurchaseOrder">
  <xsd:sequence>
    <xsd:choice>
      <xsd:element name="BritishAddress" type="string"/>
      <xsd:element name="USAddress" type="string"/>
    </xsd:choice>
    <xsd:element name="items" type="string" minOccurs="1"/>
  </xsd:sequence>
</xsd:complexType>

A language supporting variant records, such as Pascal, provides direct support of <choice>. However, there is no Java's support on variant records.

List at least two more examples of XML Schema’s constructs that have no direct Java’s support.

(2) (20%) Consider the Google’s sitemap protocol for assisting its search engine for crawling Web pages: https://www.google.com/webmasters/sitemaps/docs/en/protocol.html.  Sitemap files are XML compliant using XML Schema: http://www.google.com/schemas/sitemap/0.84/sitemap.xsd

Study the protocol.

(a) Give two examples where Google sitemap constraints (as described by the sitemap protocol, including the FAQ) are not specified by the sitemap XML Schema. For example, the constraint of the 10MB size limit of the sitemap is not and cannot be specified by the XML Schema.

(b) If relative URLs were to be allowed, describe how you would modify the Google sitemap XML Schema to accommodate them.

(c) Describe a major trade-off consideration for allowing relative URLs.

(3) (60%) Write a standalone Java program, XPathCounter.java, using SAX (other parser API not acceptable) to accept an XML file and returns a list of unique XPaths for elements and attributes together with the number of occurrences. The basic XPath syntax is relatively simple and the output format can be understood by the following examples.

For:

java courses.xml.Spring2006.XPathCounter XPathCounterTest_1.xml > XPathCounterTest_Output_1.txt

The input and output files are XPathCounterTest_1.xml and XPathCounterTest_Output_1.txt respectively.

For:

java courses.xml.Spring2006.XPathCounter XPathCounterTest_2.xml > XPathCounterTest_Output_2.txt

The input and output files are XPathCounterTest_2.xml and XPathCounterTest_Output_2.txt respectively.

Turn in your program listing and put your program XPathCounter.java in the hw directory.