Linguistics 158: Computer-aided methods in linguistics

John B. Lowe / Department of Linguistics
University of California, Berkeley - Spring 1997

Introduction to programming in PERL

week /class: 8 / 15 : Lec Tu , 11 Mar 1997

Preliminaries

Remarks on coding and debugging

Debugging

History and background of PERL

Our interest in PERL

Sources of PERL info

PERL syntax and semantics

Getting started : "console I/O"

#!/usr/bin/perl
print "Enter Hexadecimal Number: ";     # Ask for a number.
$answer = ;                         # Input the number.
print hex($answer),"\n";                # Print out new number.

Statements

The only kind of simple statement is an expression evaluated for its side effects ... [which] must terminate with a semicolon. Simple statements may optionally be followed by a single modifier, just before the terminating semicolon. Possible modifiers are:
        if EXPR
        unless EXPR
        while EXPR
        until EXPR

Flow control in PERL

NB: a modified block executes once before the conditional is evaluated. This is so that you can write loops like:
        do {
                $_ = ;
                ...
        } until $_ eq ".\n";

Let's read some more PERL!

   # a simpleminded Pascal comment stripper
   # (warning: assumes no { or } in strings)
   line: while () {
           while (s|({.*}.*){.*}|$1 |) {}
           s|{.*}| |;
           if (s|{.*| |) {
                   $front = $_;
                   while () {
                           if (/}/) {      # end of comment?
                                   s|^|$front{|;
                                   redo line;
                           }
                   }
           }
           print;
   }

Labels in PERL

     line: while () {
             next line if /^#/;      # discard comments
             ...
     }

Variables

Data conversion

Building arrays from strings

               while (<>) {
                       chop;   # avoid \n on last field
                       @array = split(/:/);
                       ...
               }

Both of these do the same thing...

               do 'stat.pl'; 
               eval \`cat stat.pl\`;

          s/\n// ;
         chop ;

String functions

chop
eval
crypt
index
rindex
length
q
substr

Arrays and list functions

delete
each
grep
keys
join
pop
push
reverse
shift
sort

A couple examples

    @foo = grep(!/^#/, @bar);    # weed out comments

    $_ = join(':',
                $login,$passwd,$uid,$gid,$gcos,$home,$shell);

Pattern matching

m/PATTERN/gio
/PATTERN/gio
?PATTERN?
s/PATTERN/REPLACEMENT/gieo

Some examples:

           s/\bgreen\b/mauve/g;                # don't change wintergreen
           ($foo = $bar) =~ s/bar/foo/;
           $_ = 'abc123xyz';
           s/\d+/$&*2/e;               # yields 'abc246xyz'
           s/\d+/sprintf("%5d",$&)/e;  # yields 'abc  246xyz'
           s/\w/$& x 2/eg;             # yields 'aabbcc  224466xxyyzz'

study
tr/SEARCHLIST/REPLACEMENTLIST/cds
y/SEARCHLIST/REPLACEMENTLIST/cds

           $ARGV[1] =~ y/A-Z/a-z/;     # canonicalize to lower case
           $cnt = tr/*/*/;             # count the stars in $_
           $cnt = tr/0-9//;            # count the digits in $_
           tr/a-zA-Z//s;       # bookkeeper -> bokeper
           ($HOST = $host) =~ tr/a-z/A-Z/;
           y/a-zA-Z/ /cs;      # change non-alphas to single space
           tr/\200-\377/\0-\177/;# delete 8th bit

Format strings

# a report on the /etc/passwd file
format STDOUT_TOP =
                        Passwd File
Name                Login    Office   Uid   Gid Home
------------------------------------------------------------------
.
format STDOUT =
@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
$name,              $login,  $office,$uid,$gid, $home
.


$_ The default input and pattern-searching space.

The following pairs are equivalent:

               while (<>) {... # only equivalent in while!
               while ($_ = <>) {...

               /^Subject:/
               $_ =~ /^Subject:/

               y/a-z/A-Z/
               $_ =~ y/a-z/A-Z/

               chop
               chop($_)

               $foo{$a,$b,$c}
               $foo{join($;, $a, $b, $c)}

Special Variables

$ARGV
contains the name of the current file when reading from <>.
@ARGV
The array ARGV contains the command line arguments
$_
The default input and pattern-searching space.
$!
If used in a numeric context, yields the current value of errno
$#
output format for printed numbers.
$%
current page number
$&
string matched by the last successful pattern match (not counting any matches hidden within a
$'
string following whatever was matched by the last successful pattern match
$*
Set to 1 to do multiline matching within a string, 0 to tell perl that it can assume that strings contain a
$+
last bracket matched by the last search pattern.
$,
output field separator for the print operator.
$-
number of lines left on the page of the currently selected output channel.
$.
current input line number of the last filehandle that was read.
$/
input record separator, newline by default.
$0
Contains the name of the file containing the perl script being executed.
$:
current set of characters after which a string may be broken
$;
subscript separator for multi-dimensional array emulation.
$
$=
current page length
$@
perl syntax error message from the last eval command. $ARGVcontains the name of the current file when reading from <>.
$^L
What formats output to perform a formfeed.
$~
name of the current report format for the currently selected output channel. Default is name of the
$^
name of the current top-of-page format
$""This is like $, except that it applies to array values interpolated into a double-quoted string
$^T
time at which the script began running, in seconds since the epoch. $^W
current value of the warning switch.
$[
index of the first element in an array, and of the first character in a substring. Default is 0, but you
$\
output record separator for the print operator. Ordinarily the print operator simply prints out the
$\`
string preceding whatever was matched by the last successful pattern match

Homework 5 : Corpus analysis (due: Mar. 13)
Homework answers
[Ling 158 Home Page | Linguistics 158 schedule]