Perl & More


Summary

Branching; Looping, Arrays, File manipulation; Regular expressions; Perl and CGI Scripts


Homework

HW # 9 (due end of Week 13) Write a Perl script which converts any input string by replacing all voiceless consonants into voiced ones and vice versa. E-mail the script to me. HW # 10 (due end of Week 14) Write a Perl script which extracts suffixes entered by the user from any given textual corpus. E-mail the script to me. Final Project (due end of Week 16): Export your dictionary from HW # 8 into a text file. Write a HTML form and a Perl code to be used as a CGI script which will query your database according to the headword, the equivalent, POS tag and the usage label. E-mail the dictionary text file, the html form, and the Perl cgi script to me.


Installing Perl

You first have to get the Perl interpreter program and install it on your machine:
If you are using a Unix-based platform, such as Linux, the intepreter will be on your platform and if you would like to switch to the Linux platform, you can download it free of charge at the:
Linux Main Page
If you want to run Perl on your Windows platform, you can get it at the: ActiveState Pages
Mac users should download the Perl interpreter from the:
MacPerl Pages


Basic tasks

ElementExampleComment
Simple ariable$b, $wordAny string preceeded by a $. Stores both numerical and alphanumerical values
Default Input and Pattern Searching Variable$_
Keyboard Input/OutputSTDINReserved word
File Input/OutputIN, OUT, FILE1Any uppercase word
Array@b, @arrayAny string preceded by a @, stores sequences of numerical and alphanumerical data

OperatorsExampleComment
Diamond operator<>reads one line at a time into $_
Math operators+, -, /, *;addition, subtraction, division, multiplication, e.g. $a=$a+1 or $a++
Logical operators||, &, !or, and, not
Comparison operators, numeric==, !=, >, <, =>, <=equal, not equal, greater than, less than, equal or greater than, equal or less than - used to compare numbers
Comparison operators, alphanumericeq, ne, gt, lt, ge, leequal, not equal, greater than, less than, equal or greater than, equal or less than - used to compare strings

InstructionsExampleComment
Start and End{ }{ opens a block of instructions, } closes it
Branchingif () {}Performs an instruction {} if the condition () is met, e.g. if ($a="yes") {$b++;}
Multiple branchingif () {} elsif () {} else {}Performs an instruction {} if the condition () is met, if elsif () condition is met, other instruction is performed, in all other instances the else {} instruction is executed, e.g., if ($weight < 120) {print "You are slim";} elsif ($weight > 180) {print "Hmmm!"} else {print "Just right.";}
For loopsfor (a; b; c) {}Loops from a to b at the rate c, e.g., for ($a=1; $a=5; $a++) {print "$a ";}

Read this indepth review of regular expressions


Support and examples

Numerous Perl tutorials are available online, such as:
Robert's Perl Tutorial | Perl Manual | C.Ball's Perl for Linguists
Support is also available throught Perl.org and Perl.com pages

Here are two commented examples of short text-manipulating Perl programs:

ScriptSource
Sequence Search ScriptSequence search Source
Frequency Count ScriptFrequency Count Source


A more complex Perl script linked to a html page (both the scrpit and its source) is available at: Lingo page
Two examples of dictionary Perl scripts are available at:
S-Cr Synonym Dictionary Page and
S-Cr - Polish Dictionary Page

To facilitate your final course project, I have made available the the Perl script behind the S-Cr Polish Dictionary and the dictionary database


Numerous uncommented Perl scripts are also available. If Perl interpreter path is not assumed, you have to add the shebang line depending on your platform. Download these scripts, figure out which text files they need to work and what they do. The following scripts are available at present:
Simple concordance
More complex concordance
Double lines
Letter concordance
Frequency list
Character frequency
Replacing one way
Replacing other way
Sequences
Dictionary
Reverse frequency
Finding
Random questions, simpler version
Random questions, more complex version
Sequences again
Yet another sequences

CGI Scripts

In addition to Perl, you will need to install the Apache server: You can pick it up from http://httpd.apache.org/.

This is how you pass the variables from a html form to the Perl script. The following script takes the name entered and then prints Hi + that name.

hi.html file

<html>
<head>
<title>Hi</title>
</head>
<body>
<form action="http://cli.la.asu.edu/cgi-bin/hi.pl" method=post>
Your name: <INPUT name=name size=60> <INPUT type=submit value='Send'>
</form>
</body>
</html>

hi.pl file

#!/usr/bin/perl

use CGI qw/:standard/;

$q = new CGI;

$name = $q->param('name');

print "Content-type: text/html; charset=windows-1250\n\n"; 
print start_html;
print "Hello $name";
print end_html;


What next

The following technologies are most commonly used to preform the tasks in the fields covered in this course. Once you master Perl, it will be relatively easy to learn any and each of them. The following tutorials are good starting points.

Books

There are two general books of computational lingusitics most commonly used in the US.
The book by Jurafski and Martin should be used by linguists and programmers. It is excellent straightforward introduction into this field.
Use the book by Manning and Schutze could be used only if you have strong mathematical/statistical background and interest.
Mike Hammond has published two excellent practical books on programming for linguists. I highly recommend both of them, one for Perl and the other for Java

What do you know (paste this in your resume):


Example

Last year's example project