The DATA DIVISION of a COBOL (sub)program contains two sections, the FILE SECTION and the WORKING-STORGAGE SECTION. The latter is used to describe, via "data description entries" (level numbers, PICTURE clauses, etc.), the hierarchical structure of data items that exist during execution of the program. The former is used to describe, in a similar way, the layout of records in any files that the program uses. For each file that the program uses, the FILE SECTION contains a "file description entry", the beginning of which is signaled by the keyword FD. The typical form of such an entry (the general form includes a number of optional clauses not shown here) is as follows:
FD <file-name> [RECORD CONTAINS <integer-literal> CHARACTERS] [DATA RECORD IS <data-record-name>].
Note on notation: Square brackets surrounding an entity indicate that its appearance is optional.
Immediately after the file description entry comes the "data description entry" for the file's data record (beginning with the level number 01). Here is a typical example:
FD Employee-File RECORD CONTAINS 65 CHARACTERS DATA RECORD IS Employee-Rec. 01 Employee-Rec. 02 Employee-ID PIC X(10). 02 Employee-Name. 03 Last-Name PIC X(20). 03 First-Name PIC X(12). 03 Middle-Init PIC X. 02 Position. 03 Job-Code PIC X(4). 03 Department PIC X(3). 03 Manager-ID PIC X(10). 02 Hourly-Pay PIC 9(3)V99.
The above says that Employee-File is a file in which each record has a length of 65 characters, with the first ten containing an employee ID, the next twenty containing an employee's last name, etc., etc.
When a COBOL program executes, enough main memory is allocated to hold not only the data items described in the WORKING-STORAGE SECTION but also those described in the FILE SECTION (i.e., one data record from each file). Thus, one could view the data record of a file as being a one-record buffer for that file. When a record is retrieved from a file (via the READ verb), it is placed into the file's data record. Similarly, when a record is written to a file (via the WRITE verb), it is the contents of the file's data record that are written into the file.
Note:
From class discussion, you should recall that there are also file
buffers that are not directly accessible by the application programmer.
An input buffer holds (typically) several records that already have been
read in (physically) but are waiting to be read in logically (via READ) by
the COBOL program. An output buffer holds (typically) several records that
already have been written logically (via WRITE) by the COBOL program but are
waiting to be written (physically) into a file.
COBOL directly supports
A file's organization (i.e., the way it is structured) imposes restrictions
upon how it can be accessed (i.e., upon which access modes are applicable to
it).
A file whose organization is SEQUENTIAL (which is the default) allows
only the SEQUENTIAL access mode, which means that its records may be
accessed (i.e., read or written) only in logical order, one after another.
(This restriction makes sense, as such a file has no index (or any other
auxiliary fast-search-enabling structure) associated with it to allow for
efficient access to arbitrary records.)
An INDEXED file is one for which an index exists, thereby making it possible
to locate a record quickly, given the value of its key field
(i.e., the indexing field). A RELATIVE file is one that allows access by
relative record number (RRN).
A file whose organization is INDEXED or RELATIVE allows any of the three
access modes to be applied to it: SEQUENTIAL, RANDOM, or DYNAMIC. The
notion of SEQUENTIAL access, as it applies to INDEXED and RELATIVE files,
is the same as with SEQUENTIAL files: records are accessed in their logical
order. In INDEXED files, the logical order of records corresponds to
increasing order of key field value. (For example, if Employee-ID
were the key field of the Employee-File described above, then the record
containing 'Jones00001' in that field would occur before the record
containing 'Simpson012', as the former value is less than the
latter according to COBOL's rules for ordering character strings.)
In RELATIVE files, the logical order of records corresponds to their RRN's,
with record i coming before record j if and only if
i < j.
As for RANDOM access, in the case of an INDEXED file it means access
according to the value stored in the field that is specified as the key of
the file (in the SELECT statement for the file). (Such a file has an index
for which its key field is the indexing field.) For example, if
Employee-File (see below) has as its key the field
Employee-ID (an alphanumeric string of length ten),
we have the ability to READ or WRITE a record whose Employee-ID
field contains a specified value, such as 'Simpson032'.
In the case of a RELATIVE file, RANDOM access means access according to the
logical position of a record within the file. A position is given in terms
of a relative record number (RRN), which is simply a positive integer. For
example, we can issue a command to READ or WRITE the record in position 327.
DYNAMIC access mode is a combination of both SEQUENTIAL and RANDOM access.
That is, if a program is to access records from some file both sequentially
and randomly (e.g., the former in performing a range query and the latter in
performing a single-record fetch), DYNAMIC access mode is appropriate.
The organization of a file and the access mode to be used on that file by a
particular COBOL program are specified in a SELECT statement appearing in the
FILE-CONTROL paragraph of the INPUT-OUTPUT SECTION in the ENVIRONMENT DIVISION.
The form taken by the SELECT statement depends upon the file's organization.
(Note: In order to keep things simple, we do not describe the SELECT statement
in all its generality.) For a sequential file, it looks like this:
The default organization is SEQUENTIAL, so that if we omit the ORGANIZATION
clause, COBOL will interpret this to mean that the file is SEQUENTIAL.
The presence of the optional keyword OPTIONAL indicates that the
file may or may not already exist (when the program begins execution).
(OPTIONAL files may be opened in any mode except OUTPUT.)
The data item specified in the FILE STATUS clause should be one defined with
a PIC X(2) picture clause. Each time an I/O operation is performed
on the file, a two-digit code, called the file status code, is placed into
this data item. The file status code indicates whether the operation
completed successfully (value "00") or whether something "unusual" occurred
(e.g., value "41" indicates an attempt to OPEN a file that was already open,
"10" indicates the end-of-file condition, etc., etc.). For more details, see
page 301 of Comprehensive COBOL.
The form taken by the SELECT statement when the file has INDEXED organization
is this:
For example, the SELECT statement for the Employee file mentioned above
might look like this:
The data-name specified in the RECORD KEY clause must be one of the fields
within the file's data record; it must be (or becomes, if the file doesn't
yet exist) an indexing field of the file (which is to say that, if the
file already exists, so must an index on that field).
The form taken by the SELECT statement when the file has RELATIVE organization
is one of these two:
That is, for a RELATIVE file, if SEQUENTIAL access mode is chosen, specifying
its RELATIVE KEY is optional (and seemingly useless!), but specifying the
RELATIVE KEY is mandatory if the access mode is RANDOM or DYNAMIC. Whenever
random access is made to a RELATIVE file, the contents of the field that was
identified as its RELATIVE KEY are taken to be the RRN of the record to be
accessed.
Note: Simply including the clause ORGANIZATION IS INDEXED (or RELATIVE),
when SELECT-ing a file, does not magically transform the specified file
into one having the appropriate structure. If, for example, you created
a file using a standard file editor and then tried to SELECT it using the
ORGANIZATION IS INDEXED (or RELATIVE) clause within the SELECT statement,
you would not achieve the desired results. Rather, to construct an INDEXED
(or RELATIVE) file, you would create it via the execution of some COBOL
program in which the file is opened for OUTPUT and records are written to
that file.
For an example, see
this
program, which creates a new INDEXED file, populating it with the
records in an already-existing text file.
End of Note.
There are four "file open modes": INPUT, OUTPUT, EXTEND, and I-O. A COBOL
program "announces its intention" to access a file by opening it, via the
OPEN verb. When opening a file, one of these four modes must be specified,
as in
When the program is finished using a file (perhaps only temporarily) it
closes it via the CLOSE verb, as in
A file opened in INPUT mode is one that may be accessed only via the READ
verb (plus the START verb, if the file is INDEXED or RELATIVE). A file
opened in OUTPUT mode is one that may be accessed only via the WRITE verb;
furthermore, if the file existed prior to being opened, its contents are
destroyed (so that, when execution ends, the file contains only those records
written to the file during execution of the program). A file opened in
EXTEND mode, which applies only to SEQUENTIAL files, is one that may be
accessed only via the WRITE verb; furthermore, the file must have existed
prior to being opened (unless the word OPTIONAL appeared in the
SELECT statement for that file), and any records written to it
during execution are placed after the ones already there.
(Note: A file opened in I-O mode
is one on which both reading and writing of records may be carried out,
via the READ and REWRITE verbs. (The WRITE and START verbs may be applied,
too, if the file is INDEXED or RELATIVE.)
Note that a file may be opened more than once during execution of a program,
possibly with different open modes each time. However, a file that is
open must be closed (via the CLOSE verb) before it can be opened again.
For example, a program may open a file for OUTPUT, write records into it,
close it, open it for INPUT, and then read records from it.
The OPEN and CLOSE verbs were described above (although not in full
generality---see a COBOL reference for more details).
Here we consider the remaining verbs that may be applied to a SEQUENTIAL
file: READ, WRITE, and REWRITE. Which of these three operations are
applicable to a file depends upon the mode into which the file was opened:
Syntactic format of the READ verb applied to a file declared to
be accessed in SEQUENTIAL mode
(and thereby necessarily opened in INPUT or I-O mode):
Example:
Note: As mentioned above, by the file's "data record" we mean the
01-level data item declared in the data description entry immediately
following the file's file description entry (the stuff coming
after the keyword FD). In the example above, the data record
is Employee-Rec.
The presence or absence of the word NEXT within this form of the
READ statement makes no difference.
Note that, in COBOL, the end-of-file condition does not become true until
an attempt is made to READ beyond the last record in the file. (When this
attempt is made, the imperative statement following the AT END clause of the
READ statement is executed.) This is in contrast to Ada and Pascal, in which
the end-of-file condition becomes true immediately after the last record has
been read. For this reason, a typical file processing loop in COBOL has a
somewhat different form than an equivalent loop in Ada or Pascal.
Consider this Ada-like pseudocode:
In order to make the program on the left a little more concise, we could
place the READ statement into a separate paragraph ---call it
Read-f-Rec--- and then replace each of the two occurrences of the
READ statement by PERFORM Read-f-Rec.
Syntactic format of the WRITE verb applied to a SEQUENTIAL file (necessarily
opened in OUTPUT or EXTEND mode):
Example:
WRITE Employee-Rec FROM Temp-Empl-Rec
The effect is that the file's data record (after the specified data item
has been copied into it, if the FROM clause is present) is written at the
end of the file (i.e., after the last record in the file).
Recall that opening a file in OUTPUT mode destroys
the file's previous contents, whereas opening a file in EXTEND mode leaves
its contents intact, allowing the program to write new records after the ones
already there.
Note that the WRITE verb cannot be applied to a SEQUENTIAL file opened in
I-O mode, as this mode allows REWRITE-ing but not WRITE-ing.
Syntactic format of REWRITE verb applied to a SEQUENTIAL file (necessarily
opened in I-O mode):
The effect is that the file's data record (or, if the FROM clause is present,
the specified data item) is written to the file, replacing the record most
recently read from the file. An example program that uses the REWRITE verb
appears within the course web pages.
Note that, for reasons that I have never seen explained anywhere, the READ
verb refers to the file whereas the WRITE and REWRITE verbs refer to the
file's data record.
As noted above, an INDEXED file may have any of three access modes
---SEQUENTIAL, RANDOM, or DYNAMIC--- and may be opened in any of three modes
---INPUT, OUTPUT, or I-O. Which I/O operations are applicable to an INDEXED
file depend upon both its access mode and its open mode:
As suggested in the remarks to the right of the table above, each of the
READ, WRITE, and REWRITE verbs has two forms, one for sequential access
and one for random access.
This section pertains to an INDEXED file for which, in the program under
consideration, the ACCESS MODE has been specified to be either RANDOM or
DYNAMIC.
To read a record ---with a specified value in its key field--- from an
INDEXED file opened in either INPUT or I-O mode:
The effect is that, if a record with the specified value in the key field
exists in the file, that record is read into the file's data record (and is
then copied into the data item specified in the INTO clause, if present), and,
if present, the imperative statement in the NOT INVALID KEY clause is executed.
Otherwise, if the INVALID KEY clause is present, the imperative statement there
is executed.
To write a record into an INDEXED file opened in I-O (or OUTPUT??) mode:
Example:
The effect is that, if the file contains no record whose key field matches
that currently in the file's data record (or, in the case that the FROM
clause is present, that currently in the specified data item), the data
record is written, as a new record, into the file, and, if the NOT INVALID
KEY clause is present, the imperative statement there is executed. Otherwise
(i.e., there exists a record in the file whose key field equals that of the
data record), if the INVALID KEY clause is present, the imperative statement
there is executed.
To replace a record in an INDEXED file opened in I-O mode:
Example:
The effect is that, if there exists a record in the file having the same value
in its key field as the file's data record (or, in the case that the FROM
clause is present, the data item specified there), that record is replaced by
the contents of the data record (or the FROM data item) and the NOT INVALID KEY
clause's imperative statement is executed. Otherwise, the INVALID KEY
clause's imperative statement is executed.
The difference between REWRITE and WRITE is that the former can only replace
an existing record whereas the latter can only insert a new record.
To delete a record in an INDEXED file opened in I-O mode:
Example:
The effect is that, if the file contains a record whose key field matches that
of the file's data record, that record is deleted from the file and the
imperative statement in the NOT INVALID KEY clause, if present, is executed.
Otherwise, the imperative statement in the INVALID KEY clause, if present, is
executed.
Note that, in order to apply either the REWRITE or DELETE verb to a record,
the most recent I/O operation must have been a successful READ of that record.
(Warning: This statement may be incorrect.)
This section pertains to an INDEXED file for which, in the program under
consideration, the ACCESS MODE has been specified to be either SEQUENTIAL or
DYNAMIC.
To position the file pointer (i.e., to seek) to the first record satisfying a
specified condition in an INDEXED file opened in I-O or INPUT mode:
NOTE: Some compilers may require that data-name
be declared as the RECORD KEY of the file (in the SELECT clause in the
ENVIRONMENT DIVISION). Some compilers require the INVALID KEY clause
to be present.
Example:
The effect is to place the file pointer to the first record (i.e., the one
having smallest key value) satisfying the condition specified, so that a
sequential READ will cause that to be the record read in. If no record
satisfies the specified condition (e.g., the key value sought is larger
than any in the file), the imperative statement in the INVALID KEY clause,
if present, is executed. Otherwise, the imperative statement in the
NOT INVALID KEY clause, if present, is executed.
To read "the next" record (i.e., the one following the record most recently
read, or the one "found" by an application of the START verb) in an INDEXED
file opened in either INPUT or I-O mode:
Example:
To replace the record most recently read from an INDEXED file opened in I-O
mode:
To write a new record (necessarily having a larger key than any already in
the file??) into an INDEXED file opened in I-O (or OUTPUT?) mode:
A syntax error occurs if you attempt to open an INDEXED file in EXTEND mode.
When a file whose SELECT clause specifies DYNAMIC ACCESS mode is
opened in OUTPUT or I-O mode:
Omitted for the moment. See Chapter 21 of Comprehensive COBOL.
File Organization and Access Modes
SELECT [OPTIONAL] <file-name-in-program>
ASSIGN TO <file-name-on-computer-system>
[ORGANIZATION IS SEQUENTIAL]
[ACCESS MODE IS SEQUENTIAL]
[FILE STATUS IS <data-name>]
SELECT [OPTIONAL] <file-name-in-program>
ASSIGN TO <file-name-on-computer-system>
ORGANIZATION IS INDEXED
[ACCESS MODE IS {SEQUENTIAL, RANDOM, DYNAMIC}]
RECORD KEY IS <data-name>
[FILE STATUS IS <data-name>]
Note on notation:
A list of items in curly braces indicates that exactly one of them is
to be chosen.
SELECT Employee-File
ASSIGN TO "Employees.dat"
ORGANIZATION IS INDEXED
ACCESS MODE IS RANDOM
RECORD KEY IS Employee-ID.
SELECT [OPTIONAL] <file-name-in-program>
ASSIGN TO <file-name-on-computer-system>
ORGANIZATION IS RELATIVE
ACCESS MODE IS SEQUENTIAL
[RELATIVE KEY IS <data-name>]
[FILE STATUS IS <data-name>]
or
SELECT [OPTIONAL] <file-name-in-program>
ASSIGN TO <file-name-on-computer-system>
ORGANIZATION IS RELATIVE
ACCESS MODE IS {RANDOM, DYNAMIC}
RELATIVE KEY IS <data-name>
[FILE STATUS IS <data-name>]
File Open Modes and I/O Operations/Verbs
I/O operations on SEQUENTIAL files
+--------------------------------------+
| M o d e |
Operation | |
| INPUT OUTPUT EXTEND I-O |
+---------+---------+----------+-------+
READ | x | | | x |
+---------+---------+----------+-------+
WRITE | | x | x | |
+---------+---------+----------+-------+
REWRITE | | | | x |
+---------+---------+----------+-------+
Allowed operations on a file declared to be accessed SEQUENTIAL-ly
READ <file-name> [NEXT] [INTO data-name]
AT END <imperative statement>
[NOT AT END <imperative statement>]
END-READ
READ Employee-File
AT END SET Employee-Eof TO TRUE
NOT AT END PERFORM Process-Employee
END-READ
The effect of this command is as follows:
WHILE not End_of_File(f) LOOP
Get(f, rec); --read next record of file f into rec
<code to process rec>
END LOOP;
The "equivalent" code segment in COBOL would be written in either of these
two forms (given in COBOL-like pseudocode):
SET eof TO FALSE | SET eof TO FALSE
READ f | PERFORM UNTIL eof
AT END SET eof TO TRUE | READ f
END-READ | AT END SET eof TO TRUE
PERFORM UNTIL eof | NOT AT END <code to process rec>
<code to process rec> | END-READ
READ f | END-PERFORM
AT END SET eof TO TRUE |
END-READ |
END-PERFORM |
I/O operations on INDEXED files
+-------------------------------------+
File | | O p e n M o d e |
Access | | |
Mode | Verb | INPUT OUTPUT I-O |
+---------+---------+---------+-------+
SEQUENTIAL | READ | x | | x | (sequential form only)
| WRITE | | x | | (sequential form only)
| REWRITE | | | x | (sequential form only)
| DELETE | | | x |
| START | x | | x | (surprising!)
+---------+---------+---------+-------+
RANDOM | READ | x | | x | (random form only)
| WRITE | | ? | x | (random form only)
| REWRITE | | | x | (random form only)
| DELETE | | | x |
| START | | | |
+---------+---------+---------+-------+
DYNAMIC | READ | x | | x | (either form)
| WRITE | | x | x | (either form)
| REWRITE | | | x | (either form)
| DELETE | | | x |
| START | x | | x |
+---------+---------+---------+-------+
Random Access form of READ, WRITE, and REWRITE for INDEXED files
For example,
READ <file-name> [INTO data-name]
[INVALID KEY <imperative statement>]
[NOT INVALID KEY <imperative statement>]
END-READ
DISPLAY 'Enter course ID:' WITH NO ADVANCING
ACCEPT Course-ID
READ Course-File
INVALID KEY DISPLAY 'No such record'
NOT INVALID KEY PERFORM Display-Course-Rec
END-READ
WRITE <data-record> [FROM data-name]
[INVALID KEY <imperative statement>]
[NOT INVALID KEY <imperative statement>]
END-WRITE
WRITE Employee-Rec FROM Temp-Empl-Rec
INVALID KEY DISPLAY 'Cannot WRITE; record with same key exists'
NOT INVALID KEY DISPLAY 'WRITE is successful'
END-WRITE
REWRITE <data-record-name> [FROM data-name]
[INVALID KEY <imperative statement>]
[NOT INVALID KEY <imperative statement>]
END-REWRITE
REWRITE Employee-Rec
INVALID KEY DISPLAY 'Cannot REWRITE; no record with that key exists'
NOT INVALID KEY DISPLAY 'Record rewritten successfully'
END-REWRITE
DELETE <file-name>
[INVALID KEY <imperative statement>]
[NOT INVALID KEY <imperative statement>]
END-DELETE
DISPLAY 'Enter Course ID of course to be cancelled:'
ACCEPT Course-ID
DELETE Course-File
INVALID KEY CONTINUE
NOT INVALID KEY DISPLAY 'Record deleted successfully'
END-DELETE
Sequential Access form of READ, WRITE, and REWRITE for INDEXED files
START <file-name> KEY IS { =, >, NOT <, >= } <data-name>
[INVALID KEY <imperative statement>]
[NOT INVALID KEY <imperative statement>]
END-START
MOVE 'Jones00001' TO Employee-ID
START Employee-File KEY IS NOT < Employee-ID
INVALID KEY DISPLAY 'something wrong'
NOT INVALID KEY CONTINUE
END-START
READ <file-name> NEXT RECORD [INTO data-name]
[AT END <imperative statement>]
[NOT AT END <imperative statement>]
END-READ
MOVE 'Jones00001' TO Employee-ID
START Employee-File KEY IS NOT < Employee-ID
INVALID KEY DISPLAY '*** Error ***'
NOT INVALID KEY
PERFORM UNTIL Finished OR (Employee-ID > 'Smith99999')
READ Employee-File NEXT RECORD
AT END SET Finished TO TRUE
NOT AT END PERFORM Process-Empl-Rec
END-READ
END-PERFORM
END-START
REWRITE <data-record-name> [FROM data-name]
WRITE <data-record-name> [FROM data-name] ????
Random observations based on program testing
Random Access form of READ, WRITE, and REWRITE for RELATIVE files