CSCI 6838 Capstone Projects
Spring 2001
Project Description

by K. Yue

1. A Caching tool for Web-Centric Programming Assignments

Mentor: K. Yue
Number of team members: 3-5.

As the computing world turns increasingly Web-centric, computing departments of many universities will offer more programming assignments where data is retrieved not locally, but from the Internet. For example, in CSCI 4230 Fall 2000, Homework 4, students are requested to write a CGI-Perl program to display a form to accept a language (e.g. Perl, Java, HTML, etc) and gauge the interest on the language in the Web by counting the number of pages with the languages from Google.com.

Unfortunately, allowing beginning students to use the university server to send HTTP request directly to Internet site has risks. Students may send too many requests resulting in the appearance of denial of service attacks to the host. In fact, the aforementioned assignment created unexpected problem for Google as the selected quote of an email from Google attests to:

...
Someone at UH has been spamming Google with repetitive queries this weekend,
from 129.7.163.196 (DCM.uhcl.edu), in violation of our terms of service: http://www.google.com/terms_of_service.html

We saw over 60,000 such queries yesterday (Saturday), and over 45,000 in the first two hours today (Pacific time)...

Google has disabled access from dcm.uhcl.edu since then.

We did not find out any students intentionally abused the server to stage a denial of service attack. Instead, it is more likely due to programming errors. Unlike the real world where Web server side development is done by (hopefully) well trained professionals, university programming assignments are written by students which are not necessarily well trained in the technology. In fact, the objectives of these assignments may well be to develop their Web server-side skills and mistakes should be expected in the learning process.

It is thus desirable to create a caching tool to cache pages from Web sites so that stidemt HTTP requests will not send to the destination host, but to a local page created by the tool.When a request to the URL with new parameter values are made, the tool fetches the page from the destination site, stores it in a database with a timestamp and return it. Subsequent HTTP requests with the same parameter values will be fetched from the database if refreshing criteria is not met. Refreshing criteria may be time based.

The tool should allow the user (instructor) to:

The tool is targeted to be an open source project to be distributed to other interested universities/institutions.