Perl Module 2:  Perl’s Control Structures and Regular Expressions
by K. Yue (copyright 2000)
Revised Sepember 1, 2000

Operators and Comparators

Operator Meaning
** Exponentiation
**= Exponentiation assignment
() Null list
. String concatenation
.= String concatentation assignment
eq, ne, ge, gt, le, lt String comparison
x String repetition
.. Range
-f, -x, -d, etc Unary file test operators.  Perl has the ability of testing various file settings.

Control Structures

Example:

""         # false
"0"        # false
"00"       # true
$n - $n    # "0": false
undef      # undef (undefined) is converted to "": false.

unless (some-condition)
{  action;
}

is equivalent to

if (! some-condition)
{  action;
}

Example:

DAILY_WORK:
while (1)
{  while (! &time_up_for_the_day)
   {  last if &boss_let_go_early;
      last DAILY_WORK if &win_lottery;
      &work_a_while;
      redo if &overtime_not_over;
   }  continue
   {  &play_a_game_secretly_to_relax;
   }
}

Example:

&work unless &too_tired;
&work if &having_fun;
&work until &too_tired;
&work while &having_fun;

for (stmt_1; stmt_2; stmt_3)
{  stmt_4;
}

is equivalent to:

stmt_1;
while (stmt_2)
{  stmt_4;
        Stmt_3;
}

foreach $num (@num_list)
{  print "$num\n";
}

Note: The variable $num is set to the value of the element of @num_list in turn.

Example:

foreach $num (@num_list)
{  print "$num\n";
}

is the same as the followings:

foreach (@num_list)
{  print "$_\n";
}

or

foreach (@num_list)
{  print;
   print "\n";
}

Example:

do
{  &i_like_it_this_way;
   print "interesting stuff\n";
}

do
{  &work;
}  until &tired;

Example:

if (&error) { die "ay-ya-ya\n"; }
die "ay-ya-ya\n" if &error;
&error && die "ay-ya-ya\n";

unless (&kiss_me) { &leave_me; }
&leave_me unless &kiss_me;
&kiss_me || &leave_me;

Exercise 1:

Consider the C's statement:

switch (ch)
{  case 'a': a_ct++; break;
   case 'e': e_ct++; break;
   case 'I': i_ct++; break;
   case 'o': o_ct++; break;
   case 'u': u_ct++; break;
   default : other_ct++; break;
}

Implement the same statement in Perl in two ways.

Regular Expressions

Example:

if (/good/)
{  print;
}

# if $_ contains the pattern "good", then $_ is printed.

Note:

  1. A pattern is enclosed by two forward slashes (/).
  2. The string the pattern matched to is $_, unless specified otherwise.
.         any character except \n.
[abc]     a character class: matches 'a' or 'b' or 'c'.
[a-zA-Z]  a character class: matches any letter.
[^abc]    negation of a character class: matches
          any characters except 'a', 'b' or 'c'. \n     newline
\r     carriage return
\t     tab
\f     formfeed
\d     a digit: [0-9]
\D     a non-digit: [^0-9]
\w     an alphanumeric: [0-9a-zA-Z_]
\W     a non-alphanumeric: [^0-9a-zA-Z_]
\s     a white-space: [\t\f\r\n]
\S     a non-white-space: [^\t\f\r\n]
\060   a character with the specified value: 060 ('0')

Most other backlashed characters match themselves.

Exercise 2:

Find the single character pattern that matches the following description.

(a) all vowels,
(b) all non-vowels,
(c) all characters except lower case letters (other than 'a' to 'z'),
(d) the backspace character,
(e) carriage return or form feed,
(f) the character ^,
(g) any character in my name ("kwok-bun Yue").

Grouping Patterns Example

/ab1/         matches “ab1"
/a[aeiou]c/   matches “aac”, “aec”, “aic”, “aoc” and “auc”
/a.a/         matches an “a”, follows by any character
              and then another “a”.

*      0 or more times.
+      1 or more times.
?      0 or 1 times (i.e., optional)
{5}    exactly 5 times
{3,}   3 or more times
{2,6}  2 to 6 times Example:

/xy{2,4}/   matches “xyy”, “xyyy” and “xyyyy”
/x+y*x+/    matches one or more ‘x’, follows by 0
            or more “y”, follows by 1 or more “x”.
/abc|ace/   matches “abc” or “ace”
/[abc]{4}/  matches a string of 4 characters
            of ‘a’, ‘b’ or ‘c’.

Example:

Consider the string “abccccbaccccba”.

The pattern

/a.*ba/     matches the entire string,
            not “abccccba”.
/a.*ba.*/   matches the entire string;
            with the first “.*”
            matching “bccccbacccc”.

Example:

/(.)a\1/    matches “aaa”, “bab”, “xax”, “5a5",
            etc., but not “5a6", etc.
/(.*)a\1/   matches “abaab”, “a”, “cidacid”, etc.
/([abc])x([de])y\2x\1/
            matches “axdydxa”, etc.

^    matches the beginning of a string.
$    matches the end of a string.
\b   matches on word boundary (i.e., between \w
     and \W, or \w and string’s start or end.)
\B   matches on non-word boundary.

Example:

/\bair\b/  matches “ air&”, “+air+”, “air”, etc,
           but not “hair”, “airs”, etc.
/\bair\B/  matches “airs”, “+airing”, etc,
           but not “air&”, “+air”, “air”, etc.
/^air/     matches “air”, “airs”, etc, but not “hair”.
/^air$/    matches “air” only.
 

  1. Parenthesis (): highest
  2. Multipliers: +, *, ?, {m,n}
  3. Sequence and anchoring: abc, ^, $, \b, \B.
  4. Alternation: |.
Example:

/a|bc*/  is equivalent to /(a)|((b)(c*))/

Exercise 3:

Give the Perl’s pattern for the following matching:

(a) either “abcde” or “edcba”.
(b) at least two b followed by at least seven c.
(c) any number of *, followed by any number of $, followed by any number of +.
(d) a ^ at the beginning of a string, followed by three to four a.
(e) any ten characters, including newline, just before the end of the string.
(f) any string with the same word in a row for two or more times.  A word is defined as a sequence of alphanumeric or '_', enclosed by white spaces or beginning or end of a string.

Example:

if (/life is (.*)\./)
{  print $1;
}

if (@s = /love is (.*) and hatred is (.*)\./)
{  print "$s[0], not $s[1]";

}

#  print all lines in the file example.dat that contain "[n]",
# where n is given by the user.
print "what is the index"?"
$index = <STDIN>;
chop($index);
open(IN,"example.dat");
while (<IN>)
{  if (/\[$index\]/)
   {  print;
   }
}

The pattern /yue/i matches "yue", "YUE", "yUe", etc.

Exercise 4:

Write a Perl program to read in a file "a.a" and prints out all lines that contain the characters ‘a’, ‘c’, ‘e’  and ‘g’.

Matching Operators

Example:  The following pattern matchings are the same.

/^\/usr\/bin\/perl/
m#/usr/bin/perl#

Example:

# Print all lines from the standard input file that contain the
# string "perl" somewhere in the line, case ignoring.
while (<STDIN>)
{  if (/perl/i)
   {  print;
   }
}

Example:  The program above can be rewritten as (though not Perl’s style):

while ($line = <STDIN>)
{  if ($line =~ /perl/i)
   {  print $line;
   }
}
...
print "Do you want to quit? [y/n]";
if (<STDIN> =~ /y/i)
{  die "bye, dear.";
}

print "I love you." if $letter !~ /hate/;

Substitution and other common operators using regular expressions

Example:

$_ = "I love you.";
s/love/hate/;
print;     # print out "I hate you."
$_ = "I love you and you love me.";
s/love/hate/;
print;     # print out "I hate you and you love me."
$_ = "I love you and you love me.";
s/love/hate/g;
print;     # print out "I hate you and you hate me."

The following is a command line execution of Perl.  The switch -e indicates command line execution.  The switch -n loops through each line of the file in the command line.

$perl -ne "s/love/hate/g; print;" love_letter.dat
$perl -ne "s/\$i\b/$count/g; print;" < ex1.pl > ex2.pl

Example: (from the Llama’s book)

$line = "Merlyn::118:10:Randal:/home/merlyn:/usr/bin/perl";
@fields = split(/:/,$line);
# now fields is ("merlyn","","118","10","Randal",
#                   "/home/merlyn","/usr/bin/perl")

Exercise 5:

Write a piece of Perl’s code that reads the file "some.file" and breaks down the contents into tokens.  A token is a string of characters (other than white spaces) that are separated by white spaces.  The tokens should be stored in the variable @words.

Example:

$glue = ":";
@list = ("12", "05","59");
print join($glue, @list); # print "12:05:59"

Example:

The first perl command swaps x and y.  The second example changes all lower case characters to upper case characters.

$perl -ne ‘tr/xy/yx/; print;’ < e1.dat > e2.dat
$perl -ne ‘tr/a-z/A-Z/; print;’ < emp1.dat > emp2.dat

Exercise 6:

Write a Perl program to get rid of all comments of an Ada program, "ex1.ada".  In Ada, anything after -- in a line is discarded by the compiler.  Print out the Ada program without comments to the standard output file.
 

Suggested Solution to Classwork Exercise

1. For example,

{  $ch eq 'a' && ($a_ct++, last);
   $ch eq 'e' && ($e_ct++, last);
   $ch eq 'I' && ($i_ct++, last);
   $ch eq 'o' && ($o_ct++, last);
   $ch eq 'u' && ($u_ct++, last);
   $other_ct++;
}

or

{  ($a_ct++, last) if ($ch eq 'a');
   ($e_ct++, last) if ($ch eq 'e');
   ($i_ct++, last) if ($ch eq 'I');
   ($o_ct++, last) if ($ch eq 'o');
   ($u_ct++, last) if ($ch eq 'u');
   $other_ct++;
}

or

S1:
{  $ch eq 'a' && do {$a_ct++; last S1;}
   $ch eq 'e' && do {$e_ct++; last S1;}
   $ch eq 'I' && do {$i_ct++; last S1;}
   $ch eq 'o' && do {$o_ct++; last S1;}
   $ch eq 'u' && do {$u_ct++; last S1;}
   $other_ct++;
}

(2)

(a) [aeiouAEIOU]
(b) [^aeiouAEIOU]
(c) [^a-z]
(d) \010
(e) [\r\f]
(f) \^
(g) [kwo\-bunYe]

(3)

(a) /abcde|edbca/
(b) /b{2,}c{7,}/
(c) /\**\$*\+*/
(d) /^\^a{3,4}/
(e) /(.|\n){10}$/
(f) /\b(\w*)\b(.*\b\1\b)+/

(4) For example,

#!/usr/bin/perl
open(IN, "a.a");
while (<IN>)
{  if ((/a/) && (/c/) && (/e/) && (/g/))
   {  print;
   }
}

(5) For example,

#   Decompose a file into tokens with
# white spaces as delimiters.
open(IN, "some.file");
while (<IN>)
{  chop;
     @words = (@words, split(/\s+/));
}

(6) For example.

#!/usr/bin/perl
# This does not take care of the problem of
# -– inside a string.
open(IN, "ex1.ada");
while (<IN>)
{  while (/(.*)--/)
   {  $_ = $1 . "\n";
   }
   print;
}

or simply:

# This does not take care of the problem of
# -– inside a string.
perl -ne "chomp; s/^(.*?)--.*/\1/; print qq($_\n);" ex1.ada