A Introduction
to Common Gateway Interface (CGI)
by K. Yue, copyright 2000
Revised: September 17, 2000
Introduction
-
HTML files are (mostly) static.
-
When a request for a HTML file is sent to the web server, the server locates
the file using the file systems and sent the file to the client site.
-
On the other hand, many applications require the dynamic generation of
contents based on different user input.
-
Common Gateway Interface (CGI) is a simple protocol for communication
between the client site, web server and the program that generates HTML
dynamically.
-
CGI access must be configured by the web server. Different web servers
have different mechanism for configuring CGI access.
-
CGI is not a programming language. CGI programs refer to programs
that use the CGI protocol.
-
CGI programs may use the .cgi or the .pl (perl) extensions.
Data From Web Server To Gateway Programs
-
A CGI program (script) can be written in any language that can read STDIN,
write to STDOUT, and read environment variables Examples: C, C++
Visual Basic, Perl, shells, etc.
-
The typical sequence of steps for a CGI program includes:
-
Read user's input using the CGI protocol, if necessary.
-
Process the data.
-
Write HTML responses to the standard output stream STDOUT.
-
There are three ways to send data from web servers to gateway programs
using CGI.
-
Command line arguments: used by GET method from an ISINDEX query (mostly
deprecated).
-
Standard Input: Data is read from the standard input in the POST method.
-
Environment variables: Data is put into special environment variables.
For both the GET and POST methods.
-
All information from the client is sent to the environment variables, except
POSTed data.
Dynamic HTML Contents from Gateway
Program To Web Servers
-
The basic method is to write the following to the standard output in order:
-
A collection of server directives and response header fields to be used
by the Web server to compose the response header. Note that it is
not necessary for the CGI program to output the entire response header.
This part usually includes the line: Content-TYPE:
text/html
-
A blank line indicating the end of server directives.
-
The HTML contents.
-
Other less frequently used server directives are:
-
Location: Redirected-URL: which is used for redirection to another URL.
-
Status: code explanation: which is used to set HTTO status code and explanation.
-
Other response header fields are not processed by the server and will be
sent directly to the client.
-
Gateway programs with the prefix nph- (non-parsed header) will not be processed
by the server and the entire output of the CGI program will be sent directly
to the client. Such programs are usually more efficient.
Data as Command Line Arguments
-
Data from the web server to the CGI program may only be passed as the command
line arguments to the CGI program for ISINDEX.
-
Since there is no name-value pair in ISINDEX, there is no "=" in the query
string. This allows the web server to determine that ISINDEX is used.
Example (adapted from the HTML sourcebook).
#!/bin/sh
echo Content-TYPE:
text/html
echo
if [ $# = 0 ]
# is the number of arguments == 0 ?
then
# do this part if there are NO arguments
echo "<HEAD>"
echo "<TITLE>Local Phonebook
Search</TITLE>"
echo "<ISINDEX>"
echo "</HEAD>"
echo "<BODY>"
echo "<H1>Local Phonebook
Search</H1>"
echo "Enter your search in the
search field.<P>"
echo "This is a case?insensitive
substring search: thus"
echo "searching for 'ian' will
find 'Ian' and Adriana'."
echo "</BODY>"
else
# this part if there ARE arguments
echo "<HEAD>"
echo "<TITLE>Result of search
for \"$*\".</TITLE>"
echo "</HEAD>"
echo "<BODY>"
echo "<H1>Result of search
for \"$*\".</H1>"
echo "<PRE>"
for i in $*
do
grep ?i $i /users/ns-home/docs/yue/phonebk.dat
done
echo "</PRE>"
echo "</BODY>"
fi
Data Passed By Environment Variables
-
Before launching the CGI program, the web server initializes several environment
variables.
-
Different programming languages have different ways for getting the environment
variables.
-
In Perl, the environment variables are stored in the associative array
%ENV. For example, $ENV{"CONTENT_LENGTH"} gives the number of bytes
of the request body (in the POST method).
Example:
#!/bin/sh
echo Content?TYPE: text/html
echo
echo "<HTML>"
echo "<HEAD>"
echo "<TITLE>Not Really A Search</TITLE>"
echo "<ISINDEX>"
echo "</HEAD>"
echo "<BODY>"
echo "<H1> The Environment Variables </H1>"
echo "<PRE>" #
print the environment variables
echo " SERVER_SOFTWARE = $SERVER_SOFTWARE"
echo " SERVER_NAME = $SERVER_NAME"
echo " GATEWAY_INTERFACE = $GATEWAY_INTERFACE"
echo " SERVER_PROTOCOL = $SERVER_PROTOCOL"
echo " SERVER_PORT = $SERVER_PORT"
echo " REQUEST_METHOD = $REQUEST_METHOD"
echo " HTTP_ACCEPT = $HTTP_ACCEPT"
echo " PATH_INFO = $PATH_INFO"
echo " PATH_TRANSLATED = $PATH_TRANSLATED"
echo " SCRIPT_NAME = $SCRIPT_NAME"
echo " QUERY_STRING = $QUERY_STRING"
echo " REMOTE_HOST = $REMOTE_HOST"
echo " REMOTE_ADDR = $REMOTE_ADDR"
echo " REMOTE_USER = $REMOTE_USER"
echo " AUTH_TYPE = $AUTH_TYPE"
echo " CONTENT_TYPE = $CONTENT_TYPE"
echo " CONTENT_LENGTH = $CONTENT_LENGTH"
echo "</PRE>"
echo "</BODY>"
echo "</HTML>"
-
Among the most important environment variables are:
-
REQUEST_METHOD: GET, POST, HEAD, etc.
-
QUERY_STRING: Store data for ISINDEX, ISMAP and the GET method.
Empty for the POST method.
-
PATH_INFO Extra path information passed to the server.
-
CONTENT_LEGNTH The length of the request body (for POST only)
Exercise 1:
Consider a file that stores data in the following format for each line.
Name;password;h1-grade;h2-grade;h3-grade;final-grade
For example:
Kwok-Bun Yue;mint;89;92;95;A
indicates that Kwok-Bun Yue’s password is "mint" and have grades 89,
92, 95 and A respectively.
Write the HTML file and the cgi Perl program that allows a student to
type in their names and passwords to find out their grades.
Data Passed By Standard Input
-
If the POST method is used, then
-
The environment variable QUERY_STRING is empty.
-
Instead, the query string is read from the STDIN.
-
The variable CONTENT_LENGTH is set to the length of the query string.
-
Use POST for long query string and for hiding form contents.
-
Use GET for allowing users to bookmark the link (as the query is part of
the URL).