TABLE OF CONTENTS (HIDE)

Perl Tutorial

Basics

Introduction

Perl was created by Larry Wall in 1987, based on his earlier Unix system administrative tool called awk. Perl, stands for Practical Extraction and Report Language, originally meant for text formatting and processing, has grown over times to cover system administration, network, web and database programming, glue between systems and languages (system integration and rapid prototyping), bioinformatics, data mining, and even application development.

The main features of Perl are:

  • "Perl is the Swiss Army Knife of programming languages: powerful and adaptable." Perl is a mixture of C, Unix's shell script, awk, sed, and more. Perl is much more expressive than these languages ("maximum expressivity", "There is more than one way to do it"). You can write a Perl program fairly quickly in a few lines of codes.
  • Perl is an interpreted language, and therefore platform-independent. You can run Perl scripts in any platform (Unix, Windows, Mac) where Perl interpreter is available.
  • Perl provides a powerful regular expression facility to support text processing and report generation. Perl also has symbolic debugger, built-in support for database management, and etc.
  • Perl 5 supports Object-oriented programming.
  • Many Perl utilities and add-ons are available at CPAN (Comprehensive Perl Archive Network @ www.cpan.org).
  • Perl is free. Perl is open-source.

The Perl versions include:

  • 1.0 (1987)
  • 2.0 (1988)
  • 3.0 (1989)
  • 4.0 (1991)
  • 5.0 (1994),..., 5.5 (1998),..., 5.10 (2007), 5.11 (2009)
  • Perl 6 (coming soon @ perl6.org)

Popular sites for Perl are www.perl.org, www.perl.com, www.pm.org, www.perlmongers.org. Perl documentation is available at http://perldoc.perl.org.

Installing Perl

There are many ways to get the Perl Interpreter:

The path of Perl interpreter "perl.exe" must be in included in the PATH environment variable.

First Perl Program

Use a programming text editor (such as NotePad++, PSPad, TextPad) to enter the following source codes and save as "Hello.pl":

1
2
3
4
5
#!/usr/bin/env perl
use strict;                # Terminate when error occurs
use warnings;              # Display all warning messages
print "Hello world!\n";                  # Print a message
print 'Hello world, ', 'Again!', "\n";   # Print another message
How It Works
  • Line 1, called shebang, is meant for Unixes, which specifies the location of the Perl Interpreter. This line is ignored under Windows.
  • Line 2 and 3 are directive (or pragma) to instruct Perl on how to handle errors. "use strict" instruct Perl to terminate the program immediately when an error occurs. "use warning" instruct Perl to display all the warning messages.
  • Line 1 to 3 are optional, but recommended for writing robust program.
  • A comment begins with a '#' and lasts until the end of line. Comments are used to explain the codes; but are ignored by the interpreter.
  • A Perl's statement ends with a semi-colon (;).
  • The print function prints the given string to the console. \n denotes a new-line. A function can takes zero or more arguments (separated by commas). In Perl, you can enclose the function's arguments with parentheses () or omit them. For examples, the followings are equivalent:
    print 'Hello world, ', 'Again', "\n";    # Function arguments without parentheses
    print('Hello world, ', 'Again', "\n");   # Function arguments enclosed in parentheses
    
  • Strings can be enclosed in double-quotes or single-quotes. However, double-quotes interpret variables and special character (like \n for new-line, \t for tab), but single-quotes don't. For example,
    print "\n";     # print a newline
    print '\n';     # print \n literally
    
  • Extra white-spaces (blanks, tabs, new-lines) are ignored.
  • The file extension of ".pl" is not mandatory but recommended.
Running In Windows

To run the program under Windows, start a cmd shell and issue the command:

> ... change directory to the directory containing Hello.pl ...
> perl Hello.pl
Hello world!
Hello world, Again!

Note: The path for Perl Interpreter “perl.exe” must be in included in the PATH environment variable.

To display the Interpreter's help menu, issue:

> perl -h

To find the version of the Perl Interpreter, issue:

> perl -v
This is perl, v5.10.0 built for cygwin-thread-multi-64int
......
Running In Unix

For Unixes, include "#!/usr/bin/env perl" as the first line of the program (which specifies the location of the Perl Interpreter - just like any other Unix shell script). To run the program: first make the file executable (via change-mode command) and then execute the file:

$ ... change directory to the directory containing Hello.pl ...
$ chmod u+x Hello.pl
$ ./Hello.pl

Perl 5.10 Features

Perl 5.10 introduces many nice features. For example, the Hello-world can be written as follows (to save as "Hello510.pl"):

1
2
3
4
5
#!/usr/bin/env perl
use strict;      # Terminate when error occurs
use warnings;    # Display all warning messages
use 5.010;       # Use Perl 5.10 features
say 'Hello, world!';    # Print a message
> perl Hello510.pl
Hello, world!

Notes:

  • Line 4 instructs Perl to enable the new features in Perl 5.10. It could also be written as "use feature ':5.10'".
  • Function say is similar to print, say automatically prints a newline at the end of the message, whereas print don't. say is available in Perl 5.10.

Basic Syntax

Comment

Comments are ignored by the Perl runtime but greatly useful in explaining your codes to others (and also to yourself three days later). You should use comments liberally to explain or document your codes.

A comment begins with symbol #, and lasts until the end of the line. There is no multi-line comment other than putting a # at the beginning of each line.

Statement & Block

A statement is a single instruction consisting of operators, variables and expression. A Perl's statement terminates with a semicolon (;).

Blanks, tabs and newlines are collectively called whitespaces. In Perl, extra white-spaces (blanks, tabs, newlines) are ignored (that is, multiple whitespaces are treated as one whitespace).

You can place many statements on a single line.

A block consists of zero or more statements enclosed in pair of curly braces { ... }. No semi-colon is needed after the closing brace.

Calling Perl's Built-in Functions

Perl has many built-in functions, which takes a comma-separated list of arguments. You can enclose the arguments in parentheses or omit them, depending on your programming style. For examples,

print 'Hello, world', "\n";    # Function arguments are separated by commas
print('Hello, world', "\n");   # Parentheses are optional 
say 'Hello, world';            # Function say (Perl 5.10) always prints a newline
say('Hello, world');

Variables, Literals & Data Types

A variable is a named storage location that holds a value, of a certain data type. A literal is a fixed value, e.g., 5566, 3.14, 'Hello', that can be assigned to a variable or form part of an expression.

Perl supports the following data types. It uses different initial symbols to denote and differentiate the various data types.

  • Scalar: begins with symbol $.
  • Array: begins with symbol @.
  • Hash or Associative Array: begins with symbol %.

An expression is a combination of variables, literals, operators, and sub-expressions that can be evaluated to produce a single value.

Scalar Variables and Contexts

A scalar is a single item. A scalar variable's name begins with symbol $, followed by a letter or underscore, followed by more letters, digits, or underscore. For example, $size, $_min_value, and $average.

Perl is case-sensitive. A $rose is not a $ROSE, and is not a $Rose.

Unlike strong-type languages like C/C++/C#/Java, but like JavaScript/Unix Shell Script:

  • Perl's variables name need not be declared before use, which often leads to poor programs. It is strongly recommended to declare a variable before use!!
  • The actual type of a scalar (e.g., integer, floating-point number or string) need NOT be specified. Perl's scalar is simply a single item, which could take on context of number (integer or floating-point number), string, or boolean automatically.

You could assign a value (called literal) to a scalar variable using the assignment operators (=). The scalar variable takes on the context of the literal assigned. For example, a variable takes on a string context if a string literal is assigned; takes on a numeric context if a numeric literal is assigned.

You can declare a local variable via the keyword my.

my $num = 123;      # numeric context
my $str = "Hello";  # string context

The context of the scalar is important because many operations are confined to a certain context, e.g., arithmetic operations (+, -, *, /) can be applied to numbers but not strings; strings can concatenate using "." operator; logical operations (and, or, not) are applied to boolean. Perl automatically converts between the different contexts as needed to perform an operation. In other word, the context of a scalar is determined by the operation. For example:

#!/usr/bin/env perl          # ScalarContextTest.pl
use strict;
use warnings;
my $num1 = 11;           # Numeric context
my $num2 = 22;           # Numeric context
my $str1 = 'Hello';      # String context
my $str2 = 'world';      # String context
my $str3 = '33';         # String context
my $str4 = '44';         # String context
   
print $num1 + $num2 , "\n";       # + takes numbers
print 12 * 3.4 , "\n";            # * takes numbers
print $str1 + $str2 , "\n";       # + takes numbers, not strings. Invalid output
print $str1 . " " . $str2 , "\n"; # . takes strings
print $str3 + $str4 , "\n";       # + takes numbers - String converted to numeric context
print "5.5" - 5 , "\n";           # - takes numbers - String converted to number
print $num1 . $num2 , "\n";       # . takes strings - Numbers converted to string
> perl ScalarContextTest.pl
33
40.8
Argument "world" isn't numeric in addition (+) at ScalarContextTest.pl line 13.
Argument "Hello" isn't numeric in addition (+) at ScalarContextTest.pl line 13.
0
Hello world
77
0.5
1122

If you remove the "use warning", the warning messages will not be shown, and you will have no idea that something went wrong.

How does Perl know that a variable is a number or a string? In fact, Perl does not know. Whenever a variable or string literal is used as an argument to an arithmetic operation (+, -, *, /), Perl tries to convert it to a number. If the variable does not contain a valid number, Perl simply sets it to 0; and you will not be warned unless you specify "use warning" or turn on the -w (warning) flag!

A variable takes a value called UNDEF, if no value is assigned to it.

Numeric Context and Operations

In Perl, numbers are stored as double-precision floating-point. All the arithmetic operations are carried out in floating-point. There is no distinct integer type in Perl!

Numeric literals include:

  • Point-point literals: e.g., 3.1416, -0.8e18, 1.2E-0.5.
  • Integer literals: e.g., 5566, -128. You can delimit a long integer with underscore, e.g., 12_111_222_333.
  • Octal literals: begin with a leading 0 (zero), e.g., 0127.
  • Hexadecimal literals: begin with 0x, e.g., 0xABCD.
  • Binary literals: begin with 0b, e.g., 0b10110011.
Arithmetic Operators

Perl provides the following arithmetic operators for numbers. The following results are obtained assuming that $x=5, $y=2 before the operation.

OPERATOR DESCRIPTION EXAMPLE RESULT
+ Addition $z = $x + $y; $z is 7
- Subtraction (or Unary Negation) $z = $x - $y; $z is 3
* Multiplication $z = $x * $y; $z is 10
/ Division $z = $x / $y; $z is 2.5
% Modulus (Division Remainder) $z = $x % $y; $z is 1
** Exponentiation $z = $x ** $y; $z is 25
++ Unary Pre- or Post-Increment $y = $x++; $z = ++$x;
Same as: $y = $x; $x = $x+1; $x = $x+1; $z = $x;
$y is 5, $z is 7, $x is 7
-- Unary Pre- or Post-Decrement $y = --$x; $z = $x--;
Same as: $x = $x-1; $y = $x; $z = $x; $x = $x-1;
$y is 4, $z is 4, $x is 3

Arithmetic operations are carried out in floating-point (double precision). In other words, 1/2 give 0.5 (whereas in C/Java, 1/2 gives 0). You can truncate a floating point number to integer via built-in function int().

#!/usr/bin/env perl          # NumericOpTest.pl
use strict;
use warnings;
my $num1 = 11;
my $num2 = 22;
   
print $num1, "\n";           # 11
print $num2, "\n";           # 22
print $num1+$num2, "\n";     # 33
print $num1-$num2, "\n";     # -11
print $num1*$num2, "\n";     # 242
print $num1/$num2, "\n";     # 0.5
print ++$num1, ' ', $num1, "\n";   # 12 12
print $num2--, ' ', $num2, "\n";   # 22 21
   
$num1 -= $num2;
print $num1, "\n";           # -9
Arithmetic cum Assignment Operators

These are short-hand operators to combine two operations.

OPERATOR DESCRIPTION EXAMPLE RESULT
+= Addition cum Assignment $x += $y; Same as: $x = $x + $y;
-= Subtraction cum Assignment $x -= $y; Same as: $x = $x - $y;
*= Multiplication cum Assignment $x *= $y; Same as: $x = $x * $y;
/= Division cum Assignment $x /= $y; Same as: $x = $x / $y;
%= Modulus cum Assignment $x %= $y; Same as: $x = $x % $y;
**= Exponentiation cum Assignment $x **= $y; Same as: $x = $x ** $y;
Comparison Operators

Perl provides the following operators for comparing numbers:

OPERATOR DESCRIPTION EXAMPLE RESULT
== Equal To
!= Not Equal To
> Greater Than to
>= Greater Than or Equal To
< Less Than
<= Less Than or Equal To

String Context and Operations

Strings are sequence of zero or more characters. String literals can be enquoted with single quotes or double quotes. However, the type of quotes is significant: double quotes interpret (or interpolate) variables and special characters (e.g., \n for new-line, \t for tab, \\ for back-slash), whereas single quotes don't. Perl look for the longest possible variable name in interpolation (i.e., greedy). For example,

my $msg = 'Hello';
print "$msg world\n";       # print Hello world followed by newline
print '$msg world\n';       # print $msg world\n literally (no interpretation)

Using single quotes is probably more efficient if the string does not need to be interpreted.

String Operators

Perl provides the following string operators:

OPERATOR DESCRIPTION EXAMPLE RESULT
. String Concatenation 'Hello, ' . 'world' 'Hello, world'
x Duplicate 'ba' x 4 'babababa'
.= Concatenation cum Assignment $str .= $str1; Same as $str = $str . $str1;
x= Duplicate cum Assignment $str x= $num; Same as $str = $str x $num;
String Comparison Operators

Perl provide the following operators for comparing strings:

OPERATOR DESCRIPTION EXAMPLE RESULT
eq String Equal To
ne String Not Equal To
gt String Greater Than to
ge String Greater Than or Equal To
lt String Less Than
le String Less Than or Equal To    
cmp String Compare To
  • cmp: returns 1 if the fist string is greater than the second string; 0 if equal; -1 otherwise.
String Functions

Perl provides many built-in functions for manipulating strings:

  • substr(var, index, length): returns the substring from string var, starting from position index, of length. String index begins at 0. You can also use substr to modify the original string. For example,
    my $msg = 'Perl is fun!';
    my $adj = substr($msg, 8, 3);      # Extract a portion of string
    print $adj, "\n";                  # 'fun'
    print substr($msg, 8), "\n";       # 'fun!'
    substr($msg, 8, 3) = 'quite cool'; # Modify a portion of string
    print $msg, "\n";                  # 'Perl is quite cool!'
    
  • index(string, substring): return the index of the substring in string, or -1 if not found.
  • rindex(string, substring): return the index but searching from the right.
  • length(string): returns the number of characters in string.
  • lc(string): returns a lowercase string.
  • uc(string): returns an uppercase string.
  • lcfirst(string): returns a first-letter lowercase string.
  • ucfirst(string): returns a first-letter uppercase string.
q and qq

Instead of using single quotes '...' or double quotes "...", you could also use q (for single quotes) or qq (for double quotes), as follows:

say q(Perl's cool);       # Generalized single quote - may contain single quote
say q|Perl's cool|;
say qq(Perl is "cool");   # Generalized double quote - may contain double quote
say qq|Perl is "cool"|;

Boolean Context and Operations

A scalar can take a boolean context of either true or false. "False" includes:

  • The number 0.
  • An empty string '' or "".
  • A string containing a zero (i.e., '0' or "0").
  • A variable that has not been assigned a value (i.e. UNDEF).
  • An empty array or hash (to be discussed later).

Anything else is considered as true.

Functions defined and undef: defined(var) returns true if the variable var is defined. undef(var) un-defines the variable var.

Boolean Operators

Perl provides the following boolean (or logical) operators:

OPERATOR DESCRIPTION EXAMPLE RESULT
&& C-style's Logical AND
|| C-style's Logical OR
! C-style's Logical NOT
and Perl's Logical AND    
or Perl's Logical OR    
not Perl's Logical NOT    

Notes:

  • Perl's not, and, or carry out the same operations as C-style's !, &&, ||, but these logical operators have very low precedence (lower than assignment operator =) and can be useful in certain situations (but you can also use the parentheses to change the precedence). They are also easier to read than the C-style logical operators.
  • Logical operations are always short-circuited. That is, the operation is terminated as soon as the result is certain, e.g., false && ... is short-circuited to give false, true || ... gives true.

Input From Keyboard & Formatted Output

You can use the operator <> or <STDIN> (called file-handle, to be discussed in details later in File IO) to read input from keyboard. The input, however, contains the newline character (corresponding to the enter key), which can be stripped away via function chomp.

Functions chomp and chop: chop removes the last character of a string. chomp removes the last character only if that is a newline character. Both chop and chomp returns the number of character removed.

For example,

#!/usr/bin/env perl          # UserInputTest.pl
use strict;
use warnings;
print 'Enter your message: ';
my $msg = <>;                   # <> to read user's input
print "Your message is $msg";   # $msg include a newline
   
print 'Enter your last name: ';
my $lastName = <>;
chomp $lastName;                  # Strip ending newline
print 'Enter your first name: ';
my $firstName = <>;
chomp $firstName;                 # Strip ending newline
my $fullName = $firstName . ' ' . $lastName;   # Concatenate
print "Your full name is $fullName\n";   # $fullname does not have newline

Function printf and sprintf: C-style's printf and sprintf (string printf) are supported in Perl for formatted output. For example,

my $str = 'Hello';
my $float = 1.2;
my $num = 33;
# %s for string, %f for floating-point number, %d for integer
printf "%10s %6.2f and %3d\n", $str, $float, $num;
my $pstr = sprintf "%10s %6.2f and %3d\n", $str, $float, $num;
say $pstr;

Conditional Flow Control

Perl provides many variations of flow control constructs:

SYNTAX EXAMPLE
if (condition) { trueBlock; }
if ($day eq 'sat' || $day eq 'sun') { print 'Super weekend!'; }
trueSingleStatement if condition;
print 'Super weekend!' if ($day eq 'sat' || $day eq 'sun');
unless (condition) { falseBlock; }
same as: 
if (!condition) { falseBlock; }
unless ($day eq 'sat' || $day eq 'sun') { print 'It is a weekday'; }
unless ($day ne 'sat' || $day ne 'sun') { print 'Super weekend!'; }
unless $error { print 'Yes, Hello'; }
falseSingleStatement unless condition;
print 'It's a weekday' unless ($day eq 'sat' || $day eq 'sun');
if (condition) {
   trueBlock; 
} else {
   falseBlock; 
}
if ($day eq 'sat' || $day eq 'sun') {
   print 'Super weekend!';
} else {
   print 'It is a weekday...';
}
if (condition1) {
   trueBlock1;
} elsif (condition2) {
   trueBlock2;
} elsif (condition3) {
   trueBlock3;
} elsif {
   ...
} else {
   elseBlock;
}
if ($day eq 'sat' || $day eq 'sun') {
   print 'Super weekend!';
} else if ($day eq 'fri') {
   print "Thank God, it's friday!";
} else {
   print 'It is a weekday...';
}
   
   
   
   
condition ? trueStatement : falseStatement;
 
max = (a > b) ? a : b;
abs = (a >= 0) ? a : -a;
# Perl 5.10 switch-case:
given (variable) {
   when (value1) { ... }
   when (value2) { ... }
   ......
}
given ($day) {
   when ('sat', 'sun') { print 'Super weekend!'; }
   when ('mon', 'tue', 'wed', 'thu') { print 'It is a weekday...'; }
   when ('fri') { print "Thank God, it's friday"; }
}  

Notes:

  • The curly braces are mandatory even if there is only one statement in the block.
  • A negate version of if called unless is provided. It could be hard to read and should be used only for negative logic, e.g., unless $error { ... }, could be better than if not $error { ... }.
  • The statement block can be placed before or after the if or unless clause.
  • The keyword for else-if is elsif.
  • Switch-case available from 5.10's given-when.

Arrays

An array contains a list of zero or more scalars. An array variable begins with @, whereas a scalar variable begins with $. A @rose is nothing to do with a $rose.

An array can be assigned to and from a list of commas-separated scalars enclosed in parentheses. For example:

my @months = ('jan', 'feb', 'mar', 'apr');
my @days = qw(mon tue wed thu fri sat sun);  # single-quoted words
(my $first, my $second, my $third, my $fourth) = @months;
print @months, "\n";    # janfebmarapr
print $first, "\n";     # jan
print $fourth, "\n";    # apr

You can mix numbers and strings (and undef) inside an array, e.g.,

my @mixmonths = ('jan', 2, 'mar', 4);
print @mixmonths, "\n";       # jan2mar4

You can use array index in the form of $arrayName[index] to reference individual element of an array. The array index starts at 0. Note that scalar context $ is used for referencing individual element instead of array context @. Accessing an array past its bound gives UNDEF.

You can also refer to a portion (or slice) of an array (i.e., sub-array) using an index range in the form of @arrayName[beginIndex..endIndex] or @arrayName[index1,index2,...]. For example,

my @months = ('jan', 'feb', 'mar', 'apr');
print $months[2], "\n";        # Scalar 'feb'
print @months[1..3], "\n";     # Array slice ('feb', 'mar')
print @months[3,1], "\n";      # Array slice ('apr', 'jan')
print @months[2], "\n";        # Array slice ('feb')
my @emptyArray = ();           # Empty array

Some functions, such as localtime, return an array or scalar based on the context, e.g.,

my ($sec, $min, $hour, $day, $month, $year, $weekday,$dayOfYear, $isdst) = localtime;
my ($m, $d, $y) = (localtime)[4,3,5];

my $dateTime = localtime;      # gives Tue Oct  6 19:04:44 2009

In Perl, array is not bounded. Its size will be dynamically expanded when new elements are added. For example,

my @months = ('jan', 'feb', 'mar', 'apr');
@months[4..5]= ('may', 'jun');  # @months is ('jan', 'feb', 'mar', 'apr', 'may', 'jun')
$months[7] = 'aug';             # $month[6] gets UNDEF

The scalar variable $#arrayName maintains the last index of the array @arrayName. You might be tempted to use $#arrayName+1 as the length of the array. This is not necessary, as Perl will return the length of the array if @arrayName is used in a scalar context (e.g., assign to a scalar, arithmetic and comparison operations). In other words, to reference the length of an array, you can simply assign @arrayName to a scalar context. For example,

my @months = ('jan', 'feb', 'mar', 'apr');
print $#months, "\n";                  # Gives 3
print $months[$#months], "\n";         # Gives 'apr'
$months[$#months + 1] = 'may'
my $size = @months;                    # Get the length of the array
print $size, "\n";
for (my $i = 0; $i < @months; $i++) {  # @months in scalar context
   print $months[$i], "\n";
}

Negative array index n can be used to reference the nth-to-last element of the array, e.g.,

my @months = ('jan', 'feb', 'mar', 'apr');
print $months[-1], "\n";    # Gives 'apr'
print $months[-2], "\n";    # Gives 'mar'

Array Functions

Perl provides many functions to manipulate arrays:

  • push(array, list): appends the list of elements to the end of the array.
  • pop(array): removes and returns the last element of the array.
  • shift(array): removes and returns the first element of the array.
  • unshift(array, list): add the list of the elements in front of the array.
  • splice(array, offset, length, list): removes and returns length elements from array, starting from offset, and optionally, replace them with list.
my @months = ('jan', 'feb', 'mar', 'apr');
push @months, 'may';            # @months = ('jan', 'feb', 'mar', 'apr', 'may')
print @months, "\n";
print pop @months, "\n"; # @months = ('jan', 'feb', 'mar', 'apr')
print pop @months, "\n"; # @months = ('jan', 'feb', 'mar')
push (@months, shift @months); # Move the first element to last
print @months, "\n"; # @months = ('feb', 'mar', 'jan')

Special Array Variable: The Command-Line Argument Array @ARGV

The command-line arguments (excluding the program name) are packed in an array, and passed into the Perl's program as an array named @ARGV, The function shift, which takes @ARGV as the default argument, is often used to process the command-line argument.

Flow Control - Loops

Perl provides many types of loop constructs:

SYNTAX EXAMPLE
while (condition) {
   trueBlock;
}
   
   
my $i = 0;
while ($i < 10) {
   print "$i\n";
   $i++;
}
do {
   trueBlock;
} while (condition);
  
  
my $i = 0;
do {
   print "$i\n";
   $i++;
} while ($i < 10);
until (condition) {
   falseBlock;
}
same as:
while (!condition) { falseBlock; }
my $i = 0;
until ($i >= 10) {
   print "$i\n";
   $i++;
}
foreach $scalarName ( @arrayName ) {
   statementBlock;
}
or
for $scalarName ( @arrayName ) {
   statementBlock;
}
my @months = ('jan', 'feb', 'mar', 'apr');
foreach my $month (@months) {
   print $month, "\n";
}
for my $i (5, 4, 3, 2, 1) {
   print "$i ";
}
for (initialization; expression; postIncrement) {
   statementBlock;
}

my @months = ('jan', 'feb', 'mar', 'apr');
for (my $i = 0; $i < @months; $i++) {
   print $months[$i], "\n";
}

Notes:

  • Again, the curly braces are mandatory, even if there is only one statement in the block.
  • foreach loop is handy for reading each item of the array. It cannot modify the array.
  • The negation version until should be used only for negative logic, e.g., until ($done) { ... }.

Loop Control Statements

  • last: exit the for loop (similar to break statement in C/Java).
  • next: aborts the current iteration and continues to the next iteration of the loop (similar to continue statement in C/Java)
  • redo: redo the current iteration (from the begin brace).
  • last, next and redo work with a labeled block in the form of labelName: ...

For example:

#!/usr/bin/env perl          # LoopTest.pl
use strict;
use warnings;
my $num = 1;
while (1) {                  # Always true
   $num++;
   next if ($num % 3) == 0;  # Continue to next num if num is divisible by 3
   last if $num == 17;       # Break the loop if num is 17
   if (($num % 2) == 0) {
      $num += 3;             # Add 3 for even number
   } else { 
      $num -= 3;             # Subtract 3 for odd number
   }
   print "$num ";
}
> perl LoopTest.pl
5 4 2 7 11 10 8 13 17 16

Special Scalar Variable: The Default Scalar Variable $_

Perl introduces a feature called the default variable, which is not found in other languages. The default scalar variable is named $_.

Many constructs and functions, such as foreach loop and print, takes $_ as the default argument. For example,

foreach my $month (@months) { print $month; }

can be rewritten as:

foreach (@months) { print; }    # same as: foreach $_ (@months) { print $_; }
# or
for (@months) { print; }

Another example:

while (<>) {                # while ($_ = <>) to read input from keyboard
   print;                   # print $_
   chomp;                   # chomp $_ to remove ending newline from $_
   last if ($_ eq 'done');  # break the loop if input is 'done'
}

Hash or Associative Array

We have so far covered two data types, scalar (which begins with $) for single item; and array (which begins with @) for a list of scalars. The third data type provided by Perl is called Hash or Associative Array, which begins with a %. Take note that %rose is not a @rose is not a $rose.

Hash stores key-value (or name-value) pairs. Hash is similar to regular array, except that regular arrays are indexed by numbers; but hashes are indexed by key-strings. Hash lets you associate one scalar to another, hence, it is also called associative array.

To initialize a hash, you could provide a list of key-value pairs in the form of (key1 => value1, key2 => value2, ...) or (key1, value1, key2, value2, ...). Key must be unique.

You can retrieve the value associated to a key, in the scalar-context form of $hashName{keyName}. Recall that array uses square bracket with numerical index, $arrayName[index], whereas hash uses curly bracket and key-string index.

For example,

#!/usr/bin/env perl          # HashTest.pl
use strict;
use warnings;
# Declare and initialize a hash with key-value pairs.
my %countryCodes = ('us' => 'United States', 'sg' => 'Singapore');
   
# Use $hashName{keyName} (scalar context) to reference the value of an item.
print $countryCodes{'us'}, "\n";   # prints 'United States'
print $countryCodes{'sg'}, "\n";   # prints 'Singapore'
   
# Add in more key-value pairs
$countryCodes{'fr'} = 'France';
$countryCodes{'cn'} = 'China';
   
print %countryCodes, "\n";   # prints all items
   
my %emptyHash = ();   # an initially empty hash

You can converts a hash to an array and vice versa. The array stores the key-value pairs as sequential entries but in no particular order, e.g.,

# Assign Hash to Array
my %countryCodes = ('us' => 'United States', 'sg' => 'Singapore');  # Hash
my @countryArray = %countryCodes;        # Assign a Hash to an array
print $countryArray[0], "\n";            # Referencing array
print $countryArray[1], "\n";
# Assign an Array (a list of items) to a Hash
my %countryHash = ('us', 'United States', 'sg', 'Singapore');  # Hash
print $countryHash{'us'}, "\n";   # Referencing hash
print $countryHash{'sg'}, "\n";

Hash Functions

  • keys(hashName): returns an array containing all the keys in hashName.
  • values(hashName): returns an array containing all the values in hashName.
  • each(hashName): returns a 2-element array (key, value) containing the next key-value pair from hashName.
  • delete($hashName{keyName}): removes the key-value pair of keyName from hashName, and returns the deleted value.
  • exists($hashName{keyName}): returns true if keyName exists in hashName.
  • defined($hashName{keyName}): check if value of keyName is defined in hashName.

For example:

my %countryCodes = ('us' => 'United States', 'sg' => 'Singapore');
while ((my $key, my $value) = each %countryCodes) {
   print "$key is associated with $value.\n";
}

Special Hash Variable: The Environment Variables Hash %ENV

A program can access an operating environment which contains information such as the current directory, the username, and etc. Perl stores the environment variables in a special hash named %ENV.

print $ENV{'PATH'};   # print environment variable PATH
   
while ((my $key, my $value) = (each %ENV)) {  # prints all environment variables
   print "$key=$value\n";
}

%ENV hash is useful in writing server-side CGI Perl scripts.

Sorting the Hash

foreach my $key (sort keys %ENV) {   # returns array of sorted keys.
   print "$key=$ENV{$key}\n";        # get the value with the sorted keys
}

Subroutines (or Functions)

You can define your own subroutine (or functions) by using the keyword sub with a processing block:

sub subroutineName {
   statementBlock;
   return aReturnValue;
}

In Perl, subroutine returns a single piece of data or nothing, via statement return aReturnValue (or the last expression evaluated if there is no return statement).

You can invoke a subroutine by referencing it with an ampersand & before the subroutine name. (Recall that $ identifies a scalar; @ identifies an array, and % identifies a hash.)

For example:

# Define subroutine
sub hello { return 'Hello, world'; }
   
# Invoke subroutine
print &hello, "\n";

Passing arguments into subroutines

You can pass argument(s) into subroutine. Perl places the arguments into a special array variable named @_. You can access the first element using $_[0], the second with $_[1], and so on. (Recall that $_ is the default scalar variable.)

You can use keyword local to define local variables or my to define lexical variables (available inside a block) for the subroutine, which hides the global version temporarily if there is one.

For example,

# Define a subroutine add which takes zero or more arguments
sub add {
   my $sum = 0;
   foreach (@_) { $sum += $_; }
   return $sum;
}
   
# Invoke subroutine add with various number of arguments
print &add(1), "\n";
print &add(2, 3), "\n";
print &add(4, 5, 6), "\n";

Perl's Build-in Functions

Mathematical Functions

  • sqrt(number): returns the square root of number.
  • abs(number): returns the absolute value of number.
  • sin(number): returns the sine of number, in radian.
  • cos(number): returns the cosine of number, in radian.
  • atan(y, x): returns the arc-tangent of y/x in the range of -π to π radians.
  • exp(number): returns the exponent of number.
  • log(number): returns the natural logarithm of number.

Converting between Number Bases

  • ord(character) returns the ASCII value of character.
  • chr(number) returns the character given its ASCII number.
  • oct(number) returns the decimal value of the octal number.
  • hex(number) returns the decimal value of the hexadecimal number.

Error Reporting Functions - exit, die, warn

  • exit(number): exits the program with the status number. Normal termination of program exits with number 0.
  • die(string): exits the program with the current value of the special variable $! and prints string.
  • warn(string): prints the string but does not terminates the program.

For example,

exit unless open(HANDLE, $file)
open (HANDLE, $file) or die 'cannot open $file!\n';

Special Scalar Variable: Error Number $!

$! (or $ERRNO or $OS_ERROR) contains the system error. In numeric context, it contains the error number; in string context, it contains the error string.

Backquotes `command` and Function System

`command` executes command in a sub-shell and returns the command's output. For examples,

my $today = `date`;
print $today, "\n";
  
my @dirlines = `dir`;         # Use `ls -l` for Unix
foreach (@dirlines) { print; }

system(program, args) executes the program with argument args and waits for it to return. system is similar to backquotes. However, backquotes return the output of the program; whereas system returns the exit code of the program (where 0 indicates normal termination). system lets the command go ahead and prints to the console. For example,

print system('date'), "\n";
print system('dir'), "\n";

Function sort

sort(subroutine, array) sorts the array using the comparison function subroutine and returns the sorted array. Inside the subroutine, scalar variables $a and $b are automatically set to the two elements to be compared. If sort is used without the subroutine, it sorts according to string order. (Caution: By default, number are sorted as string, that is, the number 10 is less than 2 in string order).

For examples,

#!/usr/bin/env perl
use strict;
use warnings;
my @color = ('black', 'white', 'blue', 'green');
my @sorted = sort @color;
foreach (@sorted) { print "$_ "; }
#!/usr/bin/env perl
use strict;
use warnings;
  
# Define sorting subroutine
sub numerically { if ($a > $b) {1} elsif ($a < $b) {-1} else {0} }    # Compare numbers
  
my @price = (77, 100, 99, 55, 1);
my @sorted = sort numerically @price;
foreach (@sorted) { print "$_ "; }  # 1 55 77 99 100
   
# A "spaceship" operator as the shorthand for the above because it is used very often
@sorted = sort { $a <=> $b } @price;
     
@sorted = sort @price;              
foreach (@sorted) { print "$_ "; }  # 1 100 55 77 99
#!/usr/bin/env perl
use strict;
use warnings;
  
# Define sorting subroutine 
sub alphabetically { lc($a) cmp lc($b); } # Compare lowercase string
   
my @color = ('red', 'YELLOW', 'Blue', 'green');
my @sorted = sort alphabetically @color;
foreach (@sorted) { print "$_ "; }

Random Number Functions srand and rand

  • srand(seed): initializes the random number generator with the seed. Use it once at the beginning of the program. If seed is omitted, the current time is used.
  • rand(number) returns a random floating-point number between 0 and number.
srand;
print rand(1), "\n";         # Generate a random number between 0.0 and 1.0
print int(rand(100)), "\n";  # Generate a random integer between 0 and 99

Time Functions time, localtime, gmttime

  • time: returns the number of second since January 1, 1970, GMT (Greenwich Mean Time).
  • localtime(time): converts the numeric time to time/day/date fields in the local time zone.
  • gmttime(time): converts the numeric time to time/day/date fields in GMT.

Function sleep

sleep(number) makes the program wait for number of seconds before resuming execution.

Encryption Function crypt

crypt(password, salt) encrypts password with salt, and returns the encrypted password. crypt takes only the first 8 characters of the password for encryption. salt is up to 12 bits (or 16 bits?). The first 2 characters in the encrypted password are the salt. That is needed to verify the password.

Miscellaneous

do(...): Executing another Perl program

For example, do(FILENAME) evaluates the Perl code in FILENAME. do(...) is similar to #include in C/C++.

Bitwise Operations

OPERATOR DESCRIPTION EXAMPLE RESULT
<< Left bit-shift (padded with 0's) bitPattern << number
>> Right bit-shift (padded with ??) bitPattern >> number
& Bitwise AND bitPattern1 & bitPattern2
| Bitwise OR bitPattern1 | bitPattern2
~ Bitwise NOT (1's compliment) ~bitPattern
^ Bitwise XOR bitPattern1 ^ bitPattern2

Notes:

  • You can also use the compound operators |=, &=, ^=, ~=, <<=, >>=.

Debugging Perl Programs

[TODO]

Perl Documentations

Perl comes with thousands of pages of documentations @ http://perldoc.perl.org.

  • perlfaq: Perl frequently asked questions
  • perldata: Perl data structures
  • perlsyn: Perl syntax
  • perlop: Perl operators and precedence
  • perlre: Perl regular expressions
  • perlrun: Perl execution and options
  • perlfunc: Perl builtin functions
  • perlvar: Perl predefined variables
  • perlsub: Perl subroutines
  • perlmod: Perl modules: how they work

Code Examples

Print Calendar

Given a month (e.g., mar) and the first day of the week of that month (e.g., wed) , print the calendar of the month.

#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
   
# CalendarMonth.pl
# Given a month and the first day of the week of that month, 
# print the calendar for the month. For example,
# > perl CalendarMonth.pl mar wed

my @weekdays = ("sun", "mon", "tue", "wed", "thu", "fri", "sat");
my %daysInMonth = ("jan" => 31, "feb" => 28, "mar" => 31, "apr" => 30,
                   "may" => 31, "jun" => 30, "jul" => 31, "aug" => 31,
                   "sep" => 30, "oct" => 31, "nov" => 30, "dec" => 31);
   
# Get inputs from the command-line argument @ARGV, convert to lowercase.
my $theMonth = lc(shift);
my $firstWeekDay = lc(shift);
   
# Check valid input for the first week day of the month
my $weekDayNum;
for ($weekDayNum = 0; $weekDayNum < @weekdays; $weekDayNum++) {
   last if ($weekdays[$weekDayNum] eq $firstWeekDay) 
}
die "Error: Incorrect first weekday '$firstWeekDay'" if ($weekDayNum >= @weekdays);
   
# Check valid input for the month
die "Error: Incorrect month '$theMonth'" unless (exists $daysInMonth{$theMonth});
   
# Print heading - Each month takes 4 places bMMM
printf "%16s\n", uc($theMonth);   # User C-style printf for formatted output
for my $day (@weekdays) {
   printf "%4s", ucfirst($day);
}
print "\n";
   
# Skip to the first day of the week
$weekDayNum = 0;
until ($firstWeekDay eq $weekdays[$weekDayNum]) {
   print "    ";
   $weekDayNum++;
}
   
# Printing the month
for (my $dayNum = 1; $dayNum <= $daysInMonth{$theMonth}; $dayNum++) {
   printf "%4d", $dayNum;
   $weekDayNum++;
   if ($weekDayNum == 7) {
      $weekDayNum = 0;
      print "\n";
   }
}

Given a year (e.g., 2009), print the calendar of the year.

#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
   
# CalendarYear.pl
# Given a year (>=1961), print the calendar for the year.
# > perl CalendarYear.pl 2009
   
my @weekdays = ("sun", "mon", "tue", "wed", "thu", "fri", "sat");
my @months = ('jan', 'feb', 'mar', 'apr', 'may', 'jun',
              'jul', 'aug', 'sep', 'oct', 'nov', 'dec');
my %daysInMonth = ("jan" => 31, "feb" => 28, "mar" => 31, "apr" => 30,
                   "may" => 31, "jun" => 30, "jul" => 31, "aug" => 31,
                   "sep" => 30, "oct" => 31, "nov" => 30, "dec" => 31);
                   
# Get inputs from the command-line argument @ARGV
my $theYear = shift;
my $startYear = 1961;
   
# Check valid inputs
die "Error: no year given" unless ($theYear);
die "Error: Incorrect year number '$theYear'" unless ($theYear >= $startYear);
   
# Knowing that Jan 1, 1961 is a Sunday,
#  compute the first week day of the given year
my $yearsDiff = $theYear - $startYear;
my $daysDiff = $yearsDiff * 365;
# Account for leap years
$daysDiff += int($yearsDiff / 4);
my $firstWeekDay = ($daysDiff + 0) % 7;   # +0 for Sunday
my $weekDayNum = 0;
   
# Print Month's heading - Each month takes 4 places bMMM
for my $month (@months) {
   # Print heading for month
   printf "%16s\n", uc($month);
   for my $day (@weekdays) {
      printf "%4s", ucfirst($day);
   }
   print "\n";
   
   # Skip to the first day of the week
   $weekDayNum = 0;
   until ($firstWeekDay == $weekDayNum) {
      print "    ";
      $weekDayNum++;
   }
   
   # Check for leap year - divisible by 4 but not divisible by 100, or divisible by 400
   if (((($theYear % 4) == 0) && (($theYear % 100) != 0)) || ($theYear % 400) == 0) {
$daysInMonth{'feb'} = 29; } # Continue for the rest of the month for (my $dayNum = 1; $dayNum <= $daysInMonth{$month}; $dayNum++) { printf "%4d", $dayNum; $weekDayNum++; if ($weekDayNum == 7) { $weekDayNum = 0; print "\n"; } } print "\n"; print "\n" if ($weekDayNum != 0); $firstWeekDay = $weekDayNum; # Continue for next month }

REFERENCES & RESOURCES

  • Popular Perl sites, e.g., www.perl.org, www.perl.com, www.pm.org, www.perlmongers.org.
  • Perl's documentation @ http://perldoc.perl.org.
  • "Perlintro - A brief introduction and overview of Perl", available @ http://perldoc.perl.org.
  • CPAN (Comprehensive Perl Archive Network) @ www.cpan.org.
  • (The Camel Book) Larry Wall, Tom Christiansen and Jon Orwant, "Programming Perl", 3rd eds, 2000 - covers Perl 5.6.
  • (The Llama Book) Randal L. Schwartz, Tom Phoenix and Brian D Foy, "Learning Perl", 5th eds, 2008 - covers Perl 5.10.
  • (The Ram Book) Tom Christiansen and Nathan Torkington, "The Perl Cookbook", 2nd eds, 2003 - recipes for common tasks.