Cmps 356 Assignment: Simple Tokenizer
Description
The goal of this assignment is to give you an introduction to using the regular expression capabilities in Perl. You can develop this program on any platform with a Perl processor. Given the discussion in class on regular expressions, construct a Perl program called token.pl that tokenizes the string on the command line. For example, the command,

perl token01.pl "The quick brown fox jumped over the lazy dog"

displays

Parameter="The quick brown fox jumped over the lazy dog"
delimiter = ""
variable = "The"
delimiter = " "
variable = "quick"
delimiter = " "
variable = "brown"
delimiter = " "
variable = "fox"
delimiter = " "
variable = "jumped"
delimiter = " "
variable = "over"
delimiter = " "
variable = "the"
delimiter = " "
variable = "lazy"
delimiter = " "
variable = "dog"

The Log
Place in the log all actions and your thought and reponses to running the porgram. Specifically,
  • Each time you run the Perl processor make an entry for each error message that you correct - include the reason for your correction.
  • When the program runs, note errors in the output and your responses to those errors.

This page was last modified on Sunday, 02-Jan-2000 12:08:46 EST
Submission
Due Date:
How to Submit
Email to beidler@cs.uofs.edu
Subject Entry
web token
Include
Attach your log

and attach or cut-and-paste your Perl program

Penalties
One submission per assignment (2 points)
Late fee (3+ points)
Suggestions
Getting the argument from the command line is accomplished with the ARGV array,

$input=$ARGV[0];
print "Parameter=\"$input\"\n"; 

The heart of the program is a loop that finds the first non-word and word tokens,

while ($input ne ""){
  $input=~/(\W*)(\w*)/;
    . . .

  ##  print "$input\n"; ;
}
exit;

Note that eq and ne are used for string comparison while = and != are used for arithmetic comparisons.

The first set of parenthesis capture the non-word (delimiter) in the $1 variable and the second set of parenthesis capture the word that follows in $2. After printing the delimiter and token, use the substitute command, $input =~ s/???//, to remove the delimiter and token from $input.

Don't take this last step lightly. It seems obvious, but some of you will encounter problems. It is very important that you use the logs to express the ideas going through your mind and the things you tried to make it work.

Test your program with lines like,

perl token.pl "The&&&quick.*.+brown fox%#@$jumped"

and see what happens!!