CSCI 5733
XML Application Development
Summer 2009
Homework #1

Due date: June 22, 2009 (Monday)

XML servers simply serve XML contents.They can be simple or complicated. The objective of this programming assignment is to expose you with issues of writing XML servers. You will write a simple and specialized XML server that serves technical news headlines. An XML server gathers data from input sources, and prepare them to server data requests in specific XML formats. There are many issues involved in developing XML servers. This program will deal with the issue of transformation of input data sources to output XML format.

The bonus section deal with performance enhancement by storing fetched data from the data sources.

Write a server-side program to generate technical news in XML format using the REST (Representational State Transfer) architectural style. The name of the program should be h1.proper_file_extension (such as .pl (or .cgi), .php, .jsp, .aspx for CGI-Perl, PHP, JSP and ASP.NET respectively) which should be put in a directory named hw. Thus, the URL for accessing your XML server should be the following if you use CGI-Perl:

http://dcm.uhcl.edu/your_account/hw/h1.pl

You must follow the URL and HTTP parameter naming convention strictly. Otherwise, it may not be graded.

There are two input sources for your XML server.

(1) CNet Technical News in RSS 2.0.

(2) Moreover's Technical News in CSV (Comma Separated Values).

The CSV format has the following specification:

The four fields in the CSV are:

News_id,news_title,url,time

Your program should not directly access these external sites. Instead, your program should access the following XML servers:

(1) CNet Technical News:

http://dcm.uhcl.edu/yue/courses/xml/notes/general/cnetnews.pl

The server accepts one HTTP parameter, topic, with the following acceptable values:

The server returns a RSS 2.0 document with no news items if any other values are supplied.

(2) Moreover's Technical news:

http://dcm.uhcl.edu/yue/courses/xml/notes/general/newsCSV.pl

The server also accepts one HTTP parameter, topic, with the following acceptable values:

The server returns an empty document with no news items if any other values are supplied.

Your program should also accept a single HTTP parameter, topic. The following table shows its acceptable values and the sources for these values.

topic CNet Source's topic values Moreover Source's topic values
software software database, java , pc_software, os
hardware hardware handheld
security security security
network network handhelds
internet internet  
personal personal personal
database   database
enterprise software, hardware enterprise
java   java
os   os
pc_software   pc_software

Study the following example of output XML very carefully since your program will be required to faithfully reproduce the output in every detail.

For the URL:

http://dcm.uhcl.edu/youraccount/hw/h1.pl?topic=software

For the following snapshot when

the content of your program output should be software_output.xml if it is directly obtained from the sources.

Hints and notes:

  1. Your program is a Web server side application so it should generate the proper HTTP response header to specify XML contents.
  2. Although using an XML parser is efficient for the CNet source, it is not absolutely necessary. Regular expressions and finite state machines should be sufficient. Many languages have support for parsing CSV. There is no need to reinvent the wheel.
  3. Be sure to handle special characters in XML and CSV.
  4. Standard documentation for the CS/CIS department should be followed.

Turn in your work in an envelope with your name, section and student id clearly specified:

Bonus section:

To input performance, your program will use a MS Access database to cache results obtained from the sources. If the copy in the database is shorter than a certain refresh time threshold, the request will be served by the content cached in the database and there will be no request to the source servers. Set the refresh time threshold to 3 minutes to allow the TA to grade your work more effectively.