Perl Module 2
Control Structures and Regular Expressions
K. Yue copyright @2001
Revised: September 1, 2001
1. Operators and Comparators
Operator | Meaning |
** | Exponentiation |
**= | Exponentiation assignment |
() | Null list |
. | String concatenation |
.= | String concatenation assignment |
eq, ne, ge, gt, le, lt | String comparisons |
x | String repetition |
.. | Range |
-f, -x, -d, etc | Unary file test operators. Perl has the ability of testing various file settings. |
2. Control Structures
Example: Boolean values
""
# false
"0" # false
"00" # true
$n - $n # "0": false
undef # undef (undefined) is converted to "": false.
unless (some-condition)
{ action;
}
is equivalent to
if (! some-condition)
{ action;
}
DAILY_WORK:
while (1)
{ while (! &time_up_for_the_day)
{ last if &boss_let_go_early;
last DAILY_WORK if &win_lottery;
&work_a_while;
redo if &overtime_not_over;
} continue
{ &play_a_game_secretly_to_relax;
}
}
&work unless &too_tired;
&work if &having_fun;
&work until &too_tired;
&work while &having_fun;
for (stmt_1; stmt_2; stmt_3)
{ stmt_4;
}
is equivalent to:
stmt_1;
while (stmt_2)
{ stmt_4;
Stmt_3;
}
foreach $num (@num_list)
{ print "$num\n";
}
Note: The variable $num is set to the value of the element of @num_list in turn.
foreach $num (@num_list)
{ print "$num\n";
}
is the same as:
foreach (@num_list)
{ print "$_\n";
}
or
foreach (@num_list)
{ print;
print "\n";
}
do
{ &i_like_it_this_way;
print "interesting stuff\n";
}
do
{ &work;
} until &tired;
if (&error) { die "ay-ya-ya\n"; }
die "ay-ya-ya\n" if &error;
&error && die "ay-ya-ya\n";
unless (&kiss_me) { &leave_me; }
&leave_me unless &kiss_me;
&kiss_me || &leave_me;
Exercise 1:
Consider the C's statement:
switch (ch)
{ case 'a': a_ct++; break;
case 'e': e_ct++; break;
case 'I': i_ct++; break;
case 'o': o_ct++; break;
case 'u': u_ct++; break;
default : other_ct++; break;
}
Implement the statement in Perl in two different ways.
3. Regular Expressions
if (/good/)
{ print;
}
# if $_ contains the pattern "good", then $_ is printed.
Note:
\. .
\n newline
\r carriage return
\t tab
\f formfeed
\d a digit: [0-9]
\D a non-digit: [^0-9]
\w an alphanumeric: [0-9a-zA-Z_]
\W a non-alphanumeric: [^0-9a-zA-Z_]
\s a white-space: [\t\f\r\n]
\S a non-white-space: [^\t\f\r\n]
\060 a character with the specified value: 060 ('0')
Exercise 2:
Find the single character pattern that matches the following description.
(a) all vowels,
(b) all non-vowels,
(c) all characters except lower case letters (other than 'a' to 'z'),
(d) the backspace character,
(e) carriage return or form feed,
(f) the character ^,
(g) any character in my name ("kwok-bun Yue").
Grouping Patterns
/ab1/ matches
ab1"
/a[aeiou]c/ matches aac, aec, aic,
aoc and auc
/a.a/ matches
an a, follows by any character
and then another a.
* 0 or
more times.
+ 1 or more times.
? 0 or 1 times (i.e., optional)
{5} exactly 5 times
{3,} 3 or more times
{2,6} 2 to 6 times
/xy{2,4}/ matches xyy, xyyy
and xyyyy
/x+y*x+/ matches one or more x, follows by 0
or
more y, follows by 1 or more x.
/ABC|ace/ matches ABC or ace
/[abc]{4}/ matches a string of 4 characters
of
a, b or c.
Consider the string “abccccbaccccba”.
The pattern
/a.*ba/ matches the entire string,
not
abccccba.
/a.*BA*/ matches the entire string;
with
the first .*
matching bccccbacccc.
/(.)a\1/ matches aaa, bab,
xax, 5a5",
etc.,
but not 5a6", etc.
/(.*)a\1/ matches abaab, a, cidacid,
etc.
/([ABC])x([de])y\2x\1/
matches
axdydxa, etc.
Example:
/\bair\b/ matches air&, +air+,
air, etc,
but
not hair, airs, etc.
/\bair\B/ matches airs, +airing, etc,
but not air&, +air, air, etc.
/^air/ matches air, airs,
etc, but not hair.
/^air$/ matches air only.
/a|bc*/ is equivalent to /(a)|((b)(c*))/
Exercise 3:
Give the Perl’s pattern for the following matching:
(a) either “abcde” or “edcba”.
(b) at least two b followed by at least seven c.
(c) any number of *, followed by any number of $, followed by any number
of +.
(d) a ^ at the beginning of a string, followed by three to four a.
(e) any ten characters, including newline, just before the end of the
string.
(f) any string with the same word in a row for two or more times.
A word is defined as a sequence of alphanumeric or '_', enclosed by white
spaces or beginning or end of a string.
if (/life is (.*)\./)
{ print $1;
}
if (@s = /love is (.*) and hatred is (.*)\./)
{ print "$s[0], not $s[1]";
}
# print all lines in the file example.dat that contain "[n]",
# where n is given by the user.
print "what is the index"?"
$index = <STDIN>;
chop($index);
open(IN,"example.dat");
while (<IN>)
{ if (/\[$index\]/)
{ print;
}
}
Exercise 4:
Write a Perl program to read in a file "a.a" and prints out all lines that contain the characters ‘a’, ‘c’, ‘e’ and ‘g’.
Matching Operators
/^\/usr\/bin\/perl/
m#^/usr/bin/perl#
# Print all lines from the standard input file that contain the
# string "perl" somewhere in the line, case ignoring.
while (<STDIN>)
{ if (/perl/i)
{ print;
}
}
while ($line = <STDIN>)
{ if ($line =~ /perl/i)
{ print $line;
}
}
...
print "Do you want to quit? [y/n]";
if (<STDIN> =~ /y/i)
{ die "bye, dear.";
}
print "I love you." if $letter !~ /hate/;
Substitution and other common operators using regular expressions
$_ = "I love you.";
s/love/hate/;
print; # print out "I hate you."
$_ = "I love you and you love me.";
s/love/hate/;
print; # print out "I hate you and you love me."
$_ = "I love you and you love me.";
s/love/hate/g;
print; # print out "I hate you and you hate me."
The following is a command line execution of Perl. The switch -e indicates command line execution. The switch -n loops through each line of the file in the command line.
$perl NE "s/love/hate/g; print;" love_letter.dat
$perl NE "s/\$i\b/$count/g; print;" < ex1.pl > ex2.pl
Example:
$line = 'kwok-bun Yue,123456789,Computer Science';
($name, $ssnum, $major) = split /,/, $line;
Exercise 5:
Write a piece of Perl’s code that reads the file "some.file" and breaks down the contents into tokens. A token is a string of characters (other than white spaces) that are separated by white spaces. The tokens should be stored in the variable @words.
$glue = ":";
@list = ("12", "05","59");
print join($glue, @list); # print "12:05:59"
The first perl command swaps x and y. The second example changes all lower case characters to upper case characters.
$perl -ne tr/xy/yx/; print; < e1.dat > e2.dat
$perl -ne tr/a-z/A-Z/; print; < emp1.dat > emp2.dat
Exercise 6:
Write a Perl program to get rid of all comments of an Ada program, "ex1.ada". In Ada, anything after -- in a line is discarded by the compiler. Print out the Ada program without comments to the standard output file.
4. Suggested Solution to Classwork Exercise
1. For example,
{ $ch eq 'a' && ($a_ct++, last);
$ch eq 'e' && ($e_ct++, last);
$ch eq 'I' && ($i_ct++, last);
$ch eq 'o' && ($o_ct++, last);
$ch eq 'u' && ($u_ct++, last);
$other_ct++;
}
# or
{ ($a_ct++, last) if ($ch eq 'a');
($e_ct++, last) if ($ch eq 'e');
($i_ct++, last) if ($ch eq 'I');
($o_ct++, last) if ($ch eq 'o');
($u_ct++, last) if ($ch eq 'u');
$other_ct++;
}
# or
S1:
{ $ch eq 'a' && do {$a_ct++; last S1;}
$ch eq 'e' && do {$e_ct++; last S1;}
$ch eq 'I' && do {$i_ct++; last S1;}
$ch eq 'o' && do {$o_ct++; last S1;}
$ch eq 'u' && do {$u_ct++; last S1;}
$other_ct++;
}
(2)
(a) [aeiouAEIOU]
(b) [^aeiouAEIOU]
(c) [^AZ]
(d) \010
(e) [\r\f]
(f) \^
(g) [kwo\-bunYe]
(3)
(a) /abcde|edbca/
(b) /b{2,}c{7,}/
(c) /\**\$*\+*/
(d) /^\^a{3,4}/
(e) /(.|\n){10}$/
(f) /\b(\w*)\b(.*\b\1\b)+/
(4) For example,
#!/usr/bin/perl
open(IN, "a.a");
while (<IN>)
{ if ((/a/) && (/c/) && (/e/) && (/g/))
{ print;
}
}
(5) For example,
# Decompose a file into tokens with
# white spaces as delimiters.
open(IN, "some.file");
while (<IN>)
{ chop;
@words = (@words, split(/\s+/));
}
(6) For example.
#!/usr/bin/perl
# This does not take care of the problem of
# - inside a string.
open(IN, "ex1.ada");
while (<IN>)
{ while (/(.*)--/)
{ $_ = $1 . "\n";
}
print;
}
or simply:
# This does not take care of the problem of
# - inside a string.
perl -ne "chomp; s/^(.*?)--.*/\1/; print qq($_\n);" ex1.ada