Cmps 356 Assignment: Text Concordance
Description
The countwords.pl artifact counts the number of occurrences of each word in a text file. Running the program

perl countwords.pl "gettysburg.address"

creates the file output.txt, which contains,


1:   Four score and seven years ago our fathers brought 
2: forth on this continent, a new nation, conceived in
3: Liberty, and dedicated to the proposition that all 
4: men are created equal.
5: 
6:   Now we are engaged in a great civil war, testing 
7: whether that nation or any nation so conceived and
8: so dedicated, can long endure.  We are met on a 
9: great battle-field of that war.  We have come to
10: dedicate a portion of that field, as a final resting
11: place for those who here gave their lives that that 
12: nation might live.  It is altogether fitting and proper 
13: that we should do this. 
14:  
15:   But, in a larger sense, we can not dedicate - we 
16: can not consecrate - we can not hallow - this
17: ground. The brave men, living and dead, who
18: struggled here, have consecrated it, far above our 
19: poor power to add or detract.  The world will little 
20: note, nor remember what we say here, but it
21: can never forget what we did here.  It is for us the 
22: living, rather, to be dedicated here to the unfinished
23: work which they fought here have thus far so
24: nobly advanced.  It is rather for us to be here 
25: dedicated to the great task remaining before us - 
26: that from these honored dead we take increased
27: devotion to that cause for which they gave the last
28: full measure of devotion - that we here highly
29: resolve that these dead shall not have died in vain -
30: that this nation, under God, shall have a new birth
31: of freedom - and that government of the people, by
32: the people, for the people, shall not perish from the 
33: earth. 
a  7
above  1
add  1
advanced  1
ago  1
all  1
altogether  1
and  6
any  1
are  3
as  1
battle  1
be  2
before  1
birth  1
brave  1
brought  1
but  2
by  1
can  5
 . . .

we  11
what  2
whether  1
which  2
who  2
will  1
work  1
world  1
years  1
Your task is to modify this program as follows:
  • Instead of counting the number of occurrences by adding one to the indicated hash location, $word{$var}++;, by a command that will form a string at $word{$var} by appending together the line numbers, separated by tab codes, \t.
  • Process the delimiters. From each delimiter remove all blanks. If the delimiter is not empty, include it in the hash with the line numbers separated by tab codes.
The result should look like,
1:   Four score and seven years ago our fathers brought 
2: forth on this continent, a new nation, conceived in
3: Liberty, and dedicated to the proposition that all 
4: men are created equal.
5: 
6:   Now we are engaged in a great civil war, testing 
7: whether that nation or any nation so conceived and
8: so dedicated, can long endure.  We are met on a 
9: great battle-field of that war.  We have come to
10: dedicate a portion of that field, as a final resting
11: place for those who here gave their lives that that 
12: nation might live.  It is altogether fitting and proper 
13: that we should do this. 
14:  
15:   But, in a larger sense, we can not dedicate - we 
16: can not consecrate - we can not hallow - this
17: ground. The brave men, living and dead, who
18: struggled here, have consecrated it, far above our 
19: poor power to add or detract.  The world will little 
20: note, nor remember what we say here, but it
21: can never forget what we did here.  It is for us the 
22: living, rather, to be dedicated here to the unfinished
23: work which they fought here have thus far so
24: nobly advanced.  It is rather for us to be here 
25: dedicated to the great task remaining before us - 
26: that from these honored dead we take increased
27: devotion to that cause for which they gave the last
28: full measure of devotion - that we here highly
29: resolve that these dead shall not have died in vain -
30: that this nation, under God, shall have a new birth
31: of freedom - and that government of the people, by
32: the people, for the people, shall not perish from the 
33: earth. 
,  	2	2	3	6	8
	10	15	15	17	17
	18	18	20	20	22
	22	30	30	31	32	32
-  	9	15	16	16	25
	28	29	31
.  	4	8	9	12	13
	17	19	21	24	33
a  	2	6	8	10	10
	15	30
above  	18
add  	19
advanced  	24
ago  	1
all  	3
altogether  	12
and  	1	3	7	12	17	31
any  	7
are  	4	6	8
as  	10
battle  	9
be  	22	24
before  	25
birth  	30
brave  	17
brought  	1
but  	15	20
by  	31
can  	8	15	16	16	21
 . . .

to  	3	9	19	22	22	24
	25	27
under  	30
unfinished  	22
us  	21	24	25
vain  	29
war  	6	9
we  	6	8	9	13	15	15
	16	20	21	26	28
what  	20	21
whether  	7
which  	23	27
who  	11	17
will  	19
work  	23
world  	19
years  	1

This page was last modified on Sunday, 02-Jan-2000 15:24:01 EST
Submission
Due Date:
How to Submit
Email to beidler@cs.uofs.edu
Subject Entry
web xref
Include
Attach your log
and attach or cut-and-paste your perl script.
Penalties
One submission per assignment (2 points)
Late fee (3+ points)

Artifacts
  • wordcount.pl: Warning: If you plan to use this file one a unix box you must first fix the end of line codes. UNIX uses a single character while Microsoft uses two characters (return and line feed codes). This may be accomplished in a simple manner by simply reading the file into the pico editor, doing a simple edit, like include and remove a blank space, then save the file. Pico will replace the Microsoft end-of-lines and replace them with UNIX end-of-lines.
  • gettysburg.address
  • Log file

The log
Place in the log all actions and your thought and reponses to running the porgram. Specifically,
  • Each time you run the Perl processor make an entry for each error message that you correct - include the reason for your correction.
  • When the program runs, note errors in the output and your responses to those errors.