CSCI 5733.1
XML Application Development
Summer 2004
Mid-Term Examination

Name:  _________________________________

Time allowed: one hour and 30 minutes.  Total score: 30 points.

Open: lecture notes, every file I wrote and posted in my Web page and your project assignments, but no book.

Answer all questions.  Turn in both question and answer sheets.  Plan your time well.

Academic honesty policy will be followed strictly.  Cheating will result in a failing grade of D or below and a permanent academic record! 

(1) [8 points] Write a Java program, XmlToText.java, using JAXP/SAX (other techniques not acceptable) to input an XML file and output its text contents to the standard output. The program simply outputs the contents of all elements (i.e. it removes all PI, element tags, comments and DOCTYPE).

For example, running the program

java XmlToText xmlToText1.xml

where xmlToText1.xml is:

<?xml version='1.0' encoding='us-ascii'?>
<slideshow>
    <slide type="all">
      <title>Wake up to Ice cream!</title>
    </slide>
    <slide type="all">
      <title>Overview</title>
      <item>Why <em>Ice cream</em> are great</item>
      <item/>
      <item>Who <em>buys</em> Ice cream</item>
    </slide>
</slideshow>

output:



      Wake up to Ice cream!


      Overview
      Why Ice cream are great

      Who buys Ice cream

Note that, for example, there are two lines before "Wake up to Ice cream!' as removing the element tags <slideshow> and <slide> does not remove the line breaks come after them.

(2) [8 points] Write a CGI-Perl program (other techniques not acceptable) to read the XML document stored in the URL:

http://dcm.uhcl.edu/yue/authors.xml

The XML document authors.xml has the following DTD, authors.dtd:

<!ELEMENT authors (author*)>
<!ELEMENT author (name, book+)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT book (#PCDATA)>

Your program should output a comma-separated values (CSV) document, storing one author per line. The format should be:

author,book,...,book.

Each value is enclosed by a pair of ". Inside the quoted values, a " is represented by two ".

If the content of authors.xml is:

<?xml version="1.0"?>
<!DOCTYPE authors SYSTEM "authors.dtd">
<authors>
  <author>
    <name>Bun Yue</name>
    <book>&quot;His Life&quot;</book>
    <book>&lt;xml&gt; for &quot;dummy&quot;</book>
  </author>
  <author>
    <name>Sadegh Davari</name>
    <book>Real-time Systems</book>
    <book>The guide to Computer Science Degrees</book>
    <book>Internet Development</book>
  </author>
</authors>

the output of your program (two lines) should be:

"Bun Yue","""His Life""","<xml> for ""dummy"""
"Sadegh Davari","Real-time Systems","The guide to Computer Science Degrees","Internet Development"

Your program should set the MIME to text/plain and you may assume that there is no line break in the element values of authors.xml.

The first four lines are done for you:

use LWP::Simple;
use strict;
my $url = "http://dcm.uhcl.edu/yue/authors.xml";
my $urlContents = get($url);


(3) [5 points] True or false. No justification is necessary.

(a) DocumentFragment is a lightweight class in Java's DOM binding.
(b) In XML Schema, a complexType cannot have simpleContent as its content.
(c) maxOccurs is an attribute for the XML Schema's element <attribute>.
(d) All W3C DOM standards apply to both XML and HTML.
(e) An XML document with CDATA sections can always be converted to an equivalent one without CDATA section.

(4) Short questions

(a) [3 points] Consider the following DTD, x.dtd:

<!ELEMENT a (b,c)>
<!ELEMENT b ANY>
<!ELEMENT c (#PCDATA|b)*>

Will it validate the following XML document, x.xml?

<?xml version="1.0"?>
<!DOCTYPE b SYSTEM "x.dtd">
<b>
  <b>
    <c>hello<b /><b /></c><c />
  </b>
</b>

If there are errors, correct x.xml so it can be validated by x.dtd.

(b) [3 points] Provide the definition of the type OddNumberType in XML Schema, which will accept only strings representing odd numbers: ... -13, -11, -9, -7, -5, -3, -1, 1, 3, 5, 7, 9, 11, 13, ...

(c) [3 points] Consider the following DTD declaration:

<!ELEMENT P (Q)>
<!ELEMENT Q (#PCDATA)>

Assume that the Java's node variable a is of the interface Node in Java's DOM and is referring to a <P> element. Give the Java code for printing out the textual content of the node a.