CSCI 5733.01
XML Application Development
Summer 2004
Mid-Term Examination

Name:  _________________________________

Time allowed: one hour and 30 minutes.  Total score: 30 points.

Open: lecture notes, every file I wrote and posted in my Web page and your project assignments, but no book.

Answer all questions.  Turn in both question and answer sheets.  Plan your time well.

Academic honesty policy will be followed strictly.  Cheating will result in a failing grade of D or below and a permanent academic record! 

(1) [8 points] Write a Java program, ConvertToCsv.java, using JAXP/SAX (other techniques not acceptable) to input an XML file and output its contents to the standard output in comma separated values.

Your program needs to handle only input XML documents that are validated by the following DTD, authors.dtd:

<!ELEMENT authors (author*)>
<!ELEMENT author (name, book+)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT book (#PCDATA)>

Your program should output a comma-separated values (CSV) document, stored one author per line. The format should be:

author,book,...,book.

Each value is enclosed by a pair of ". Inside the quoted values, a " is represented by two ".

Thus, running

java ConvertToCsv authors.xml

where the content of authors.xml is:

<?xml version="1.0"?>
<!DOCTYPE authors SYSTEM "authors.dtd">
<authors>
  <author>
    <name>Bun Yue</name>
    <book>&quot;His Life&quot;</book>
    <book>&lt;xml&gt; for &quot;dummy&quot;</book>
  </author>
  <author>
    <name>Sadegh Davari</name>
    <book>Real-time Systems</book>
    <book>The guide to Computer Science Degrees</book>
    <book>Internet Development</book>
  </author>
</authors>

the output of your program (two lines) should be:

"Bun Yue","""His Life""","<xml> for ""dummy"""
"Sadegh Davari","Real-time Systems","The guide to Computer Science Degrees","Internet Development"

A skeleton is provided for you will only need to define the handler methods to update the static private data member String csv, which stores the comma-separated values in progress of construction.

import java.io.*;

import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;

import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;

public class ConvertToCsv extends DefaultHandler
{
    public static void main(String argv[])
    {
        if (argv.length != 1) {
            System.err.println("Usage: java ConvertToCsv filename");
            System.exit(1);
        }
  
        // Use an instance of ourselves as the SAX event handler
        DefaultHandler handler = new ConvertToCsv();
        // Use the default (non-validating) parser
        SAXParserFactory factory = SAXParserFactory.newInstance();

        try {
            SAXParser saxParser = factory.newSAXParser();
            saxParser.parse(new File(argv [0]), handler);
         //   print output
         System.out.println(csv);
        } catch (Throwable t) {
            t.printStackTrace();
        }
   }
  
   //   Data members
   static private StringBuffer text = new StringBuffer();
   static private String csv = "";

  // Define your handler methods here
  // ...



   //   Helping methods.
   public void characters(char[] ch,
      int start,
      int length)
      throws SAXException
   {   text.append(new String(ch, start, length));
   }   //   characters

   public void ignorableWhitespace(char[] ch,
      int start,
      int length)
      throws SAXException
   {   text.append(new String(ch, start, length));
   }   //   characters

   public String getText() {
      String result = text.toString();
      text.setLength(0);
      return result;
   }
}   //   ConvertToCsv.

(2) [8 points] Write a CGI-Perl program (other techniques not acceptable) to read the XML document stored in the URL:

http://dcm.uhcl.edu/yue/authors.xml

The XML document authors.xml has the following DTD, authors.dtd:

<!ELEMENT authors (author*)>
<!ELEMENT author (name, book+)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT book (#PCDATA)>

Your program should output print the textual content of the XML document by removing all PI, DOCTYPE and element tags. For simplicity, assume that there are no comments or CDATA sections in authors.xml.

If the content of authors.xml is:

<?xml version="1.0"?>
<!DOCTYPE authors SYSTEM "authors.dtd">
<authors>
  <author>
    <name>Bun Yue</name>
    <book>&quot;His Life&quot;</book>
    <book>&lt;xml&gt; for &quot;dummy&quot;</book>
  </author>
  <author>
    <name>Sadegh Davari</name>
    <book>Real-time Systems</book>
    <book>The guide to Computer Science Degrees</book>
    <book>Internet Development</book>
  </author>
</authors>

the output of your program should be:


 
    Bun Yue
    "His Life"
    <xml> for "dummy"
 
 
    Sadegh Davari
    Real-time Systems
    The guide to Computer Science Degrees
    Internet Development
 

Note that there are two blank lines before "Bun Yue" as removing the element tags <authors> and <author> before the first <name> elements do not remove the two line breaks.

Your program should set the MIME to text/plain and you may assume that there is no line break in the element values of authors.xml.

The first four lines are done for you:

use LWP::Simple;
use strict;
my $url = "http://dcm.uhcl.edu/yue/authors.xml";
my $urlContents = get($url);


(3) [5 points] True or false. No justification is necessary.

(a) In Java's DOM binding, Node is a superclass of ProcessingInstruction.
(b) In XML Schema, an element may have a type of simpleType.
(c) The XML Schema's element <attribute> can be used inside the declaration of a simpleType.
(d) All W3C DOM standards apply to XML.
(e) It is not possible to use the entity &nbsp; in any XML documents.

(4) Short questions

(a) [3 points] Consider the following DTD, x.dtd:

<!ELEMENT a (b,c)>
<!ELEMENT b ANY>
<!ELEMENT c (#PCDATA|b)*>

Will it validate the following XML document, x.xml?

<?xml version="1.0"?>
<!DOCTYPE b SYSTEM "x.dtd">
<b>
  <a>
    <b>
      <c>hello<b /><b><d /></b></c><c />
    </b>
  </a>
</b>

If there are errors, point them out.

(b) [3 points] Provide the definition of the type DivisibleByFourType in XML Schema, which will accept only strings representing integers divisible by 4: 0, 4, 8, 12, ...

Hints: you may need to analyze the last two digits. To make things simpler, leading zeros are acceptable.

(c) [3 points] Consider the following DTD declarations:

<!ELEMENT P (#PCDATA|Q)*>
<!ELEMENT Q EMPTY>

Assume that the Java's node variable a is of the interface Node in Java's DOM and is referring to a<P> element. Give the Java code for printing out the textual content of the node a.