Perl Module 1
Introduction and Data Structures
K. Yue, copyright @2002
1. Introduction and Background
- Invented and maintained by Larry Wall. One of the poster child of
the Open Source movement.
- Practical Extraction and Report Language; Pathologically Eclectic Rubbish
Lister.
- Perl combines elements of C, awk, sed and the Bourne shell. Perl
5.0 is also object-oriented. Perl fills the gap between C and shell.
- Perl is an excellent tool for:
- text and file processing.
- system management.
- CGI programming for Web pages.
- Supported in many platforms, including *NIX (http://www.perl.com)and
Windows (http://www.activestate.com).
- Latest version is 5.x, with object-oriented programming features.
- Classical books for Perl:
- Llama book: Randal Schwartz, et. al., "Learning Perl," O'Reilly,
ISBN 1565922840.
- Camel book: Larry Wall, et. Al., "Programming Perl," third edition,
O'Reilly, ISBN 0596000278.
- There are many newer books on (a) Perl and CGI programming, and (b)
Perl alone.
2. Getting Started
- Perl is not a compiled language such as C or C++.
- However, Perl is compiled into a fast internal format before execution.
Hence, it is faster than shell languages.
- Perl programs usually (but not necessarily) end in the file extension .pl.
Ending with PL is more important in Windows.
- The first line of a Perl program is usually one of the following statements
to indicate the location of the Perl interpreter for Unix systems. It must
be the first line of the Perl program. It is not used in Windows.
#!/usr/bin/perl
#!/usr/local/bin/perl
#!/opt/gnu/bin/perl
Exercise 1:
Type in the following Perl program and execute it:
#!/opt/gnu/bin/perl
# My first Perl program.
print "Hello, World.\n";
- For simple tasks, Perl can be executed at the command line by using the
-e switch. Example:
$perl -e 'perl "Hello World.\n";'
- The switch -n is used to loop through an input file, one line at a time.
Example:
$perl -ne 'print;' quiz.html
3. Perl Basics
-
Perl is a free-form language like C.
-
Like C, Perl is case sensitive.
-
Every statement in Perl must end with a semicolon (;).
-
Like other shell scripts, a Perl program is all the Perl statements in
it. No such thing as the main function in C.
-
Statements start with a # is a Perl comment.
-
Like C, Perl block statement is enclosed by {}.
- Perl variables do not have to be declared. Perl is weakly
typed.
- Perl variables are typed and evaluated based on context.
Perl has three basic data types:
-
scalars: start with $.
-
arrays of scalars: start with @.
-
associative arrays of scalars: start with %.
4. Scalar Data Types
- A scalar can be an integer, a floating point number, a string or a ref.
- Scalar variables always have a dollar sign ($) prefix.
Example:
$str ="Hello.";
$num = 5;
$num = "abcde";
Numeric Data Types
-
Perl numbers are all internally stored as double precision float.
-
Perl supports the complete set of float literals of C, as well as octal
and hexadecimal integers.
String Data Types
Perl's string literals may be:
- Single quoted literals: where \ is not interpreted as a control
(escape) character except for \' (for ') and \\ (for \).
-
Double quoted literals: where \ is interpreted as a control character similar
to C.
- Double quoted literals are also variable interpolated,
as in shell languages.
-
In variable interpolation, the string is scanned for the (longest) scalar
or array variable name to be replaced by its value.
-
To turn off variable interpolation, the $ sign must be preceded by \, or
single quoted string must be used.
Example:
$x = "there!";
$y = "Hi, $x"; # $y is 'Hi, there!'
$z = "Hi, \$x"; # $z is 'Hi, $x'
- If the variable that is meant to be substituted is not the longest possible
one, enclose the variable with a pair of {}. Some programmers always use {}.
# stringsubstitution.pl
$x = "there!";
$xx = "somewhere!";
$y = "Hi, $x"; # $y is 'Hi, there!'
$z = "Hi, $xx"; # $z is 'Hi, somewhere!'
$w = "Hi, ${x}x"; # $w is 'Hi, there!x'
- Built-in string operations include comparisons (eq, ne, lt, gt, le, ge),
repetition (x), concatenation (. and .=), chop (removing the last character),
chomp (removing the last character is a \n), substr (return the substr).
- One common mistakes in Perl is to confuse between numeric and string comparison
operators.
Example:
chop $str;
# remove the last character of $str and return the character.
substr("abcdefg", 3, 2); # return "de"
-
The default value for a variable in a numeric context is 0 and an empty
string in a string context.
- Perl variables are evaluated based on context.
-
String variables which happen to contain numeric characters are interpolated
to actual numeric values if used in a numeric context.
Example: context.pl: executing the code will print 20.
$x = 12; # an integer
$y = "8"; # a string
$z = $x+$y;
print $z, "\n";
- String constants (literals) may also be specified by using the here
document syntax as in shell languages. Here documents start with
'<<' and then a unique string and continue until that string is seen
again.
Example: (stringconstant.pl)
$msg = <<_LSTR_;
This is a long string.
In more than one line.
_LSTR_
-
Note that the terminating string must appear by itself on the terminating
line and starts in the first column.
-
Here documents are variable interpolated.
- A file handler may be used for input and
output.
Example: (Echo.pl)
# Read a line from the standard input file.
$line = <STDIN>;
chop($line);
print "A line: <<$line>>\n";
Exercise 2:
Write a piece of Perl code to read in strings (one string per line) from the
standard input file. For each string, the code print the string, ==, and
the string again in a line.
5. Array Data Type (List)
- Elements of Perl's arrays must be scalar.
-
Array variables are prefixed with the at symbol (@).
-
Array elements can be referenced through index. Like C, Perl array
index starts with 0.
-
An array literal is a comma separated list enclosed by parenthesis.
Examples:
@num = (1, 3, 5, 7, 2, 8);
@str = ('one', 'three', 'two', 'eight');
$num[0] = 8; # change $num[0] from 1 to 8.
-
A slice of array elements can be accessed. Examples:
@num[1,2] = (3,4); # $num[1] = 3; $num[2] = 4;
@num[1,2] = @num[2,1]; # swap $num[1] and $num[2];
@num[1,2] = @num[3,3];
# $num[1] = $num[3]; $num[2] = $num[3];
($num[1], $num[2]) = ($num[2], $num[1]);
# swap $num[1] and $num[2];
@num = (1,2,3,4,5)[3,2,1]; # @num = (4,3,2);
($first, @num) = @num; # remove the first element of @num
(@num, $last) = @num;
# unexpected result.
$length = @num;
# $length gets the length of the array @num.
($length) = @num; # $length = $num[0];
-
Note that an array element is a scalar and is thus preceded by $.
- The list constructor operator () may be used in array literals. This
is done by specifying the lower limit and upper limit of the range, separated
by '..'.
(1,3, 5..9) # same as (1, 3, 5, 6, 7, 8, 9);
-
The constructor $# is used to find out the last valid index of an array.
The constructor $[ is used to find out the base index (by default 0).
-
Thus, the number of elements in an array a is: $#a - $[ + 1.
Some important list operations:
-
push(@a, $b, $c); insert $b and $c to the end of @a.
-
pop(@a); pop the last element from @a.
-
unshift(@a, $b, $c); insert $b and $c to the front of @a; return
new size of @a.
-
shift(@a); 'pop' the first element from @a.
-
reverse(@a); reverse all elements of @a.
-
sort(@a); sort all elements of @a; regarding all elements as
string.
-
chop(@a); chop the last character of all elements of @a
Exercise 3:
Find out and correct all errors of the following code.
# Read in lines and print out in sorted orders.
$a = <STDIN>
sort(@a)
print @a
Exercise 4:
Write a Perl program to read in and print out a list of strings.
After all strings are read, the list of strings are printed out again first
in the read in order and then in the reverse order.
6. Hashes (Associative Arrays)
- The prefix for associative arrays is the percent sign (%). Elements
of associative arrays are indexed by using {}.
- Hashes are like ordinary arrays except that the keys (indices) are strings
(numeric values are converted into strings), not integers.
- Hash elements must be scalar.
- Hashes facilitate key searching. Associative arrays are usually implemented
as hash tables and are thus also called hashes.
- Hashes are usually created by element assignments.
Example:
$population{'San Antonio'} = 2200900;
print $population{'Houston'};
# print the population of Houston.
$population{'Houston'} += 9999;
# population of Houston increased by 9999.
$Population{'Dallas'} = 3245672;
$population{'Houston'} = 4434545;
- A hash can also be explicitly initialized by a hash
literal, which is a list of key-value pairs.
%Population = ('Dallas', 3245672, 'Houston', 4434545,
'San
Antonio', 2200900);
-
Note that the order of the key-value pairs are arbitrary in the implementation.
-
There are built-in associative arrays in Perl. For example, %ENV
contains all environment variables of the calling environment. Here
is the Perl code to see if X Windows is running: %ENV is used extensively
in CGI programming.
if ($ENV{DISPLAY})
{ print "X is (probably) running.\n";
}
- Some important hash operators are given below.
keys(%a)
# return a list of all current keys in %a.
# Note that the order of the keys returned are arbitrary. values(%a)
# return a list of all current values in %a.
# Note that the order of the values returned are arbitrary.
each(%a)
# Iterate over %a and return the current key-value pair as
# a list. If %a becomes empty, return an empty list.
delete($a{$b})
# remove the key-value pair with key $b from %a.
Example:
# Print all key-value pairs of %a.
while (($key, $value) = each(%a)) {
print "Value of $key = $value\n";
}
Exercise 5:
Write a Perl program to read in an input file with one word per line
and print all these words in ascending string order. A word may appear
more than once in the input file but your program should only print out
every word once.
7. Suggestion Solutions to Exercises
2. For example:
# exercise2.pl
while ($line = <STDIN>)
{ chop $line;
print "$line==$line\n";
}
3. For example:
#!/opt/gnu/bin/perl
# Read in lines and print out in sorted orders.
@a = <STDIN>;
@a = sort(@a);
print @a;
4. For example:
# exercise3.pl
@all_lines = ();
while ($line = <STDIN>)
{ push (@all_lines, $line);
}
print @all_lines;
while ($line = pop(@all_lines))
{ print $line;
}
Alternatively:
# exercise3alt.pl
@all_lines = <STDIN>;
print @all_lines;
@all_lines = reverse(@all_lines);
print @all_lines;
5. For example:
#!/opt/gnu/bin/perl
# exercise4.pl of module 1
@lines = <STDIN>;
foreach $line (@lines)
{ chop($line);
$wordcounts{$line}++;
}
@words = keys(%wordcounts);
@words = sort(@words);
foreach $word (@words)
{ print "$word ==> $wordcounts{$word}\n";
}