CSCI 5733
XML Application Development
Spring 2006
Homework #3

Due date: March 21, 2006 (Tuesday)

(1) his programming assignment emphasizes integration of XML parsing with rigorous testing. DOM must be used but the idea applies to any XML parser.

Write a Java program, EqualXML.java, using the DOM parser, to test whether two input XML files have equal contents. The command line should be of the following format:

java EqualXml EqualXmlTest1a.xml EqualXmlTest1b.xml

which tests whether EqualXmlTest1a.xml and EqualXmlTest1b.xml are equal.

The output format should be:

Contents (elements, attributes and text) of the two input XML files, EqualXmlTest1a.xml and EqualXmlTest1b.xml, are different.

if the xml files are not equal.

It should be:

Contents (elements, attributes and text) of the two input XML files, EqualXmlTest1a.xml and EqualXmlTest1b.xml, are equal.

if the xml files are equal.

The program only compares the root elements of the two input XML documents. It compares the element (including attributes) and text contents of the document. Elements are equal if they have equal child elements in the same order and equal attribute sets. For comparison, text nodes containing solely of white spaces are ignored. This is sometimes called normalization.

Comments are ignored entirely. Processing instructions are not compared but they serve as separators for text nodes. For example, for

<a>x<?pi ?>y</a>

there are two text nodes in <a>: "x" and "y".

On the other hand, for

<a>x<!-- some comment -->y</a>

there is only one text node: "xy".

User defined entity references are ignored entirely. CData sections are simply text contents and are not compared separately. Since attributes are sets, their orders should not be considered during comparison.

Your program does not need to handle namespaces.

You may assume that the input XML documents are well-formed and are not using any DTD.

Test cases

Test run your program with the following test cases:

  1. EqualXmlTest1a.xml and EqualXmlTest1b.xml.
  2. EqualXmlTest2a.xml and EqualXmlTest2b.xml.
  3. EqualXmlTest3a.xml and EqualXmlTest3b.xml.
  4. EqualXmlTest4a.xml and EqualXmlTest4b.xml.
  5. EqualXmlTest5a.xml and EqualXmlTest5b.xml.
  6. EqualXmlTest6a.xml and EqualXmlTest6b.xml.
  7. EqualXmlTest7a.xml and EqualXmlTest7b.xml.
  8. EqualXmlTest8a.xml and EqualXmlTest8b.xml.
  9. EqualXmlTest9a.xml and EqualXmlTest9b.xml.

Make sure that you save the source code of each test case and not copy and paste the browser screen of IE. Furthermore, study the test cases enough to determine the correct output for each case.

Turn in

Create a subdirectory hw/hw3 directly under your dcm account (not under the pages subdirectory as it is not a Web application). Put your program (EqualXML.java) and test files and no other files under this subdirectory.You may put the .class file in the subdirectory but it will be over-written during grading. The TA is going to test run your program by compiling it and running it with the given nine test cases, as well as additional test cases. Following this instruction is thus very important. Failure to do so may result in no grading and/or a failure grade.

To simplify grading, do not put your class under any package.

javac EqualXML.java

should be the compilation command for your program.

Turn in listings of your account username, program source code and the output. It is not necessary to turn in the sources of the test cases.



Dr. Kwok-Bun Yue
Professor, Computer Science and Computer Information Systems
Chair, Division of Computing and Mathematics
University of Houston-Clear Lake
2700 Bay Area Boulevard
Houston, TX 77058
Yue's Home  Yue's home page     Yue's email  yue@uhcl.edu     phone  281-283-3864