Perl
Module 1: An Introduction and Data Structures
by K. Yue (copyright 2000)
Revised: August 22, 2000
Background
Invented and maintained by Larry Wall. One of the poster child
of the Open Source movement.
-
Practical Extraction and Report Language; Pathologicaly Eclectic Rubbish
Lister.
-
Perl combines elements of C, awk, sed and the Bourne shell. Perl
5.0 is also object-oriented. Perl fills the gap between C and shell.
-
Perl is an excellent tool for:
-
text and file processing.
-
system management.
-
CGI programming for Web pages.
-
Supported in many platforms, including UNIX and NT.
-
Latest version is 5.x, with object-oriented programming features.
-
Classical books for Perl:
-
Llama book: Randall Schwartz, "Learning Perl," O'Reilly & Associate
Inc, ISBN 1-56592-042-2.
-
Camel book: Larry Wall & Randall Schwartz, "Programming Perl," O'Reilly
& Associate Inc, ISBN 0-56592-149-6.
-
There are many newer books on (a) Perl and CGI programming, and (b) Perl
alone.
Getting Started
-
Perl is not a compiled language such as C or C++.
-
However, Perl is compiled into a fast internal format before execution.
Hence, it is faster than shell languages.
-
Perl programs usually (but not necessarily) end in the file extension .pl.
Ending with .pl is more important in Windows NT.
-
The first line of a Perl program is usually one of the following statements
to indicate the location of the Perl interpreter for Unix systems.
#!/usr/bin/perl
#!/usr/local/bin/perl
#!/opt/gnu/bin/perl
Exercise 1:
Type in the following Perl program and execute it:
#!/opt/gnu/bin/perl
# My first Perl program.
print "Hello, World.\n";
-
For simple tasks, Perl can be executed at the command line by using the
-e switch. Example:
$perl -e 'perl "Hello World.\n";'
-
The switch -n is used to loop through an input file, one line at a time.
Example:
$perl ?ne 'print;' quiz.html
#(print the content of quiz.html)
Perl Basics
-
Perl is a free-form language like C.
-
Like C, Perl is case sensitive.
-
Every statement in Perl must end with a semicolon (;).
-
Like other shell scripts, a Perl program is all the Perl statements in
it. No such thing as the main function in C.
-
Statements start with a # is a Perl comment.
-
Like C, Perl block statement is enclosed by {}.
-
Perl variables do not have to be declared. Perl is weakly typed.
-
Perl variables are typed and evaluated based on context.
Perl has three basic data types:
-
scalars: start with $.
-
arrays of scalars: start with @.
-
associative arrays of scalars: start with %.
Scalar Data Types
-
A scalar can be an integer, a floating point number, or a string.
-
Scalars variables always have a dollar sign ($) prefix.
Example:
$str ="Hello.";
$num = 5;
$num = "abcde";
Numeric Data Types
-
Perl numbers are all internally stored as double precision float.
-
Perl supports the complete set of float literals of C, as well as octal
and hexadecimal integers.
String Data Types
Perl's string literals may be:
-
Single quoted literals: where \ is not interpreted as a control character
except for \' (for ') and \\ (for \).
-
Double quoted literals: where \ is interpreted as a control character similar
to C.
-
Double quoted literals are also variable interpolated, as in shell languages.
-
In variable interpolation, the string is scanned for the (longest) scalar
or array variable name to be replaced by its value.
-
To turn off variable interpolation, the $ sign must be preceded by \, or
single quoted string must be used.
Example:
$x = "there!";
$y = "Hi, $x"; # $y is 'Hi, there!'
$z = "Hi, \$x"; # $z is 'Hi, $x'
-
If the variable that is meant to be substituted is not the longest possible
one, enclose the variable with a pair of {}.
# stringsubstitution.pl
$x = "there!";
$xx = "somewhere!";
$y = "Hi, $x"; # $y is 'Hi, there!'
$z = "Hi, $xx"; # $z is 'Hi, somewhere!'
$w = "Hi, ${x}x"; # $w is 'Hi, there!x'
-
Built-in string operations include comparisons (eq, ne, lt, gt, le, ge),
repetition (x), concatenation (. and .=), chop (removing the last character),
chomp (removing the last character is a \n), substr (return the substr).
Example:
chop $str;
# remove the last character of $str and return
the character.
substr("abcdefg", 3, 2); # return "de"
-
The default value for a variable in a numeric context is 0 and an empty
string in a string context.
-
Perl variables are evaluated based on context.
-
String variables which happen to contain numeric characters are interpolated
to actual numeric values if used in a numeric context.
Example: context.pl: executing the code will print 20.
$x = 12; # an integer
$y = "8"; # a string
$z = $x+$y;
print $z, "\n";
-
String constants may also be specified by using the here document syntax
as in shell languages. Here documents start with a unique string
and continue until that string is seen again.
Example: (stringconstant.pl)
$msg = <<_LSTR_;
This is a long string.
In more than one line.
_LSTR_
-
Note that the terminating string must appear by itself on the terminating
line and starts in the first column.
-
Here documents are variable interpolated.
-
A file handler may be used for input and output in the scalar context.
Example: (Echo.pl)
# Read a line from the standard input file.
$line = <STDIN>;
chop($line);
print "A line: <<$line>>\n";
Exercise 2:
Write a piece of Perl code to read in strings (one string per line)
from the standard input file. For each string, the code print the
string, ==, and the string again in a line.
Array (List) Data Types
-
Perl's arrays may contain string or number elements.
-
Array variables are prefixed with the at symbol (@).
-
Array elements can be referenced through index. Like C, Perl array
index starts with 0.
-
An array literal is a comma separated list enclosed by parenthesis.
Examples:
@num = (1, 3, 5, 7, 2, 8);
@str = ('one', 'three', 'two', 'eight');
$num[0] = 8; # change $num[0] from 1 to 8.
-
A slice of array elements can be accessed. Examples:
@num[1,2] = (3,4); # $num[1] = 3; $num[2]
= 4;
@num[1,2] = @num[2,1]; # swap $num[1] and
$num[2];
@num[1,2] = @num[3,3];
# $num[1] = $num[3]; $num[2]
= $num[3];
($num[1], $num[2]) = ($num[2], $num[1]);
# swap $num[1] and $num[2];
@num = (1,2,3,4,5)[3,2,1]; # @num = (4,3,2);
($first, @num) = @num; # remove the first
element of @num
(@num, $last) = @num;
# unexpected result.
$length = @num;
# $length gets the length of
the array @num.
($length) = @num; # $length = $num[0];
-
Note that an array element is a scalar and is thus preceded by $.
-
The list constructor operator may be used in array literals. This
is done by specifying the lower limit and upper limit of the range, separated
by '..'.
(1,3, 5..9) # same as (1, 3, 5, 6, 7, 8,
9);
-
The constructor $# is used to find out the last valid index of an array.
The constructor $[ is used to find out the base index (by default 0).
-
Thus, the number of elements in an array a is: $#a - $[ + 1.
Some important list operations:
-
push(@a, $b, $c); insert $b and $c to the end of @a.
-
pop(@a); pop the last element from @a.
-
unshift(@a, $b, $c); insert $b and $c to the front of @a; return
new size of @a.
-
shift(@a); 'pop' the first element from @a.
-
reverse(@a); reverse all elements of @a.
-
sort(@a); sort all elements of @a; regarding all elements as
string.
-
chop(@a); chop the last character of all elements of @a
Exercise 3:
Find out and correct all errors of the following code.
# Read in lines and print out in sorted orders.
$a = <STDIN>
sort(@a)
print @a
Exercise 4:
Write a Perl program to read in and print out a list of strings.
After all strings are read, the list of strings are printed out again first
in the read in order and then in the reverse order.
Associativee Arrays (Hashes)
-
The prefix for associative arrays is the percent sign (%). Elements
of associative arrays are indexed by using {}.
-
Hashes are like ordinary arrays except that the keys (indices) are strings
(numeric values are interpolated into strings), not necessarily integers.
-
Hashes facilitate key searching. Associate arrays are usually implemented
as hash tables and are thus also called hashes.
-
Hashes are usually created by element assignment.
Example:
$population{'San Antonio'} = 2200900;
print $population{'Houston'};
# print the population of Houston.
$population{'Houston'} += 9999;
# population of Houston increased by 9999.
$population{'Dallas'} = 3245672;
$population{'Houston'} = 4434545;
-
A hash can also be explicitly initialized by a hash literal, which is a
list of key-value pairs.
%population = ('Dallas', 3245672, 'Houston',
4434545,
'San Antonio', 2200900);
-
Note that the order of the key-value pairs are arbitrary in the implementation.
-
There are built-in associative arrays in Perl. For example, %ENV
contains all environment variables of the calling environment. Here
is the Perl code to see if X Windows is running: %ENV is used extensively
in CGI programming.
if ($ENV{DISPLAY})
{ print "X is (probably) running.\n";
}
-
Some important associate array operators are given below.
keys(%a)
# return a list of all current keys in %a.
# Note that the order of the keys returned
are arbitrary.
values(%a)
# return a list of all current values in
%a.
# Note that the order of the values returned
are arbitrary.
each(%a)
# Iterate over %a and return the current
key-value pair as
# a list. If %a becomes empty, return
an empty list.
delete($a{$b})
# remove the key-value pair with key $b from
%a.
Example:
# Print all key-value pairs of %a.
while (($key, $value) = each(%a)) {
print "Value of $key = $value\n";
}
Exercise 5:
Write a Perl program to read in an input file with one word per line
and print all these words in ascending string order. A word may appear
more than once in the input file but your program should only print out
every word once.
Suggested Solution to Classwork
Exercises
2. For example:
# exercise2.pl
while ($line = <STDIN>)
{ chop $line;
print "$line==$line\n";
}
3. For example:
#!/opt/gnu/bin/perl
# Read in lines and print out in sorted orders.
@a = <STDIN>;
@a = sort(@a);
print @a;
4. For example:
# exercise3.pl
@all_lines = ();
while ($line = <STDIN>)
{ push (@all_lines, $line);
}
Print @all_lines;
while ($line = pop(@all_lines))
{ print $line;
}
Alternatively:
# exercise3alt.pl
@all_lines = <STDIN>;
print @all_lines;
@all_lines = reverse(@all_lines);
print @all_lines;
5. For example:
#!/opt/gnu/bin/perl
# exercise4.pl of module 1
@lines = <STDIN>;
foreach $line (@lines)
{ chop($line);
$wordcounts{$line}++;
}
@words = keys(%wordcounts);
@words = sort(@words);
foreach $word (@words)
{ print "$word ==> $wordcounts{$word}\n";
}