COBOL

Using Files

File Descriptors

The DATA DIVISION of a COBOL (sub)program contains two sections, the FILE SECTION and the WORKING-STORGAGE SECTION. The latter is used to describe, via "data description entries" (level numbers, PICTURE clauses, etc.), the hierarchical structure of data items that exist during execution of the program. The former is used to describe, in a similar way, the layout of records in any files that the program uses. For each file that the program uses, the FILE SECTION contains a "file description entry", the beginning of which is signaled by the keyword FD. The typical form of such an entry (the general form includes a number of optional clauses not shown here) is as follows:

      FD <file-name>
          [RECORD CONTAINS <integer-literal> CHARACTERS]
          [DATA RECORD IS <data-record-name>].  

Note on notation: Square brackets surrounding an entity indicate that its appearance is optional.

Immediately after the file description entry comes the "data description entry" for the file's data record (beginning with the level number 01). Here is a typical example:

    FD Employee-File
         RECORD CONTAINS 65 CHARACTERS
         DATA RECORD IS Employee-Rec.

    01 Employee-Rec.
       02 Employee-ID        PIC X(10).
       02 Employee-Name.
          03 Last-Name       PIC X(20).
          03 First-Name      PIC X(12).
          03 Middle-Init     PIC X.
       02 Position.
          03 Job-Code        PIC X(4).
          03 Department      PIC X(3).
          03 Manager-ID      PIC X(10).
       02 Hourly-Pay         PIC 9(3)V99.  

The above says that Employee-File is a file in which each record has a length of 65 characters, with the first ten containing an employee ID, the next twenty containing an employee's last name, etc., etc.

When a COBOL program executes, enough main memory is allocated to hold not only the data items described in the WORKING-STORAGE SECTION but also those described in the FILE SECTION (i.e., one data record from each file). Thus, one could view the data record of a file as being a one-record buffer for that file. When a record is retrieved from a file (via the READ verb), it is placed into the file's data record. Similarly, when a record is written to a file (via the WRITE verb), it is the contents of the file's data record that are written into the file.

Note: From class discussion, you should recall that there are also file buffers that are not directly accessible by the application programmer. An input buffer holds (typically) several records that already have been read in (physically) but are waiting to be read in logically (via READ) by the COBOL program. An output buffer holds (typically) several records that already have been written logically (via WRITE) by the COBOL program but are waiting to be written (physically) into a file.

File Organization and Access Modes

COBOL directly supports

  1. three file organizations: SEQUENTIAL, INDEXED, and RELATIVE
  2. three file access modes: SEQUENTIAL, RANDOM, and DYNAMIC
  3. four file open modes: INPUT, OUTPUT, EXTEND, I-O
  4. seven I/O operations: OPEN, CLOSE, READ, WRITE, REWRITE, DELETE, START

A file's organization (i.e., the way it is structured) imposes restrictions upon how it can be accessed (i.e., upon which access modes are applicable to it).

A file whose organization is SEQUENTIAL (which is the default) allows only the SEQUENTIAL access mode, which means that its records may be accessed (i.e., read or written) only in logical order, one after another. (This restriction makes sense, as such a file has no index (or any other auxiliary fast-search-enabling structure) associated with it to allow for efficient access to arbitrary records.)

An INDEXED file is one for which an index exists, thereby making it possible to locate a record quickly, given the value of its key field (i.e., the indexing field). A RELATIVE file is one that allows access by relative record number (RRN).

A file whose organization is INDEXED or RELATIVE allows any of the three access modes to be applied to it: SEQUENTIAL, RANDOM, or DYNAMIC. The notion of SEQUENTIAL access, as it applies to INDEXED and RELATIVE files, is the same as with SEQUENTIAL files: records are accessed in their logical order. In INDEXED files, the logical order of records corresponds to increasing order of key field value. (For example, if Employee-ID were the key field of the Employee-File described above, then the record containing 'Jones00001' in that field would occur before the record containing 'Simpson012', as the former value is less than the latter according to COBOL's rules for ordering character strings.) In RELATIVE files, the logical order of records corresponds to their RRN's, with record i coming before record j if and only if i < j.

As for RANDOM access, in the case of an INDEXED file it means access according to the value stored in the field that is specified as the key of the file (in the SELECT statement for the file). (Such a file has an index for which its key field is the indexing field.) For example, if Employee-File (see below) has as its key the field Employee-ID (an alphanumeric string of length ten), we have the ability to READ or WRITE a record whose Employee-ID field contains a specified value, such as 'Simpson032'.

In the case of a RELATIVE file, RANDOM access means access according to the logical position of a record within the file. A position is given in terms of a relative record number (RRN), which is simply a positive integer. For example, we can issue a command to READ or WRITE the record in position 327.

DYNAMIC access mode is a combination of both SEQUENTIAL and RANDOM access. That is, if a program is to access records from some file both sequentially and randomly (e.g., the former in performing a range query and the latter in performing a single-record fetch), DYNAMIC access mode is appropriate.

The organization of a file and the access mode to be used on that file by a particular COBOL program are specified in a SELECT statement appearing in the FILE-CONTROL paragraph of the INPUT-OUTPUT SECTION in the ENVIRONMENT DIVISION. The form taken by the SELECT statement depends upon the file's organization. (Note: In order to keep things simple, we do not describe the SELECT statement in all its generality.) For a sequential file, it looks like this:

   SELECT [OPTIONAL] <file-name-in-program>
      ASSIGN TO <file-name-on-computer-system>
      [ORGANIZATION IS SEQUENTIAL]
      [ACCESS MODE IS SEQUENTIAL]
      [FILE STATUS IS <data-name>]  

The default organization is SEQUENTIAL, so that if we omit the ORGANIZATION clause, COBOL will interpret this to mean that the file is SEQUENTIAL. The presence of the optional keyword OPTIONAL indicates that the file may or may not already exist (when the program begins execution). (OPTIONAL files may be opened in any mode except OUTPUT.)

The data item specified in the FILE STATUS clause should be one defined with a PIC X(2) picture clause. Each time an I/O operation is performed on the file, a two-digit code, called the file status code, is placed into this data item. The file status code indicates whether the operation completed successfully (value "00") or whether something "unusual" occurred (e.g., value "41" indicates an attempt to OPEN a file that was already open, "10" indicates the end-of-file condition, etc., etc.). For more details, see page 301 of Comprehensive COBOL.

The form taken by the SELECT statement when the file has INDEXED organization is this:

   SELECT [OPTIONAL] <file-name-in-program>
      ASSIGN TO <file-name-on-computer-system>
      ORGANIZATION IS INDEXED 
      [ACCESS MODE IS {SEQUENTIAL, RANDOM, DYNAMIC}]
      RECORD KEY IS  <data-name>
      [FILE STATUS IS <data-name>] 
Note on notation: A list of items in curly braces indicates that exactly one of them is to be chosen.

For example, the SELECT statement for the Employee file mentioned above might look like this:

   SELECT Employee-File
      ASSIGN TO "Employees.dat"
      ORGANIZATION IS INDEXED 
      ACCESS MODE IS RANDOM
      RECORD KEY IS Employee-ID.

The data-name specified in the RECORD KEY clause must be one of the fields within the file's data record; it must be (or becomes, if the file doesn't yet exist) an indexing field of the file (which is to say that, if the file already exists, so must an index on that field).

The form taken by the SELECT statement when the file has RELATIVE organization is one of these two:

   SELECT [OPTIONAL] <file-name-in-program>
      ASSIGN TO <file-name-on-computer-system>
      ORGANIZATION IS RELATIVE
      ACCESS MODE IS SEQUENTIAL
      [RELATIVE KEY IS <data-name>] 
      [FILE STATUS IS <data-name>]
or
   SELECT [OPTIONAL] <file-name-in-program>
      ASSIGN TO <file-name-on-computer-system>
      ORGANIZATION IS RELATIVE
      ACCESS MODE IS {RANDOM, DYNAMIC}
      RELATIVE KEY IS <data-name>
      [FILE STATUS IS <data-name>]

That is, for a RELATIVE file, if SEQUENTIAL access mode is chosen, specifying its RELATIVE KEY is optional (and seemingly useless!), but specifying the RELATIVE KEY is mandatory if the access mode is RANDOM or DYNAMIC. Whenever random access is made to a RELATIVE file, the contents of the field that was identified as its RELATIVE KEY are taken to be the RRN of the record to be accessed.

Note: Simply including the clause ORGANIZATION IS INDEXED (or RELATIVE), when SELECT-ing a file, does not magically transform the specified file into one having the appropriate structure. If, for example, you created a file using a standard file editor and then tried to SELECT it using the ORGANIZATION IS INDEXED (or RELATIVE) clause within the SELECT statement, you would not achieve the desired results. Rather, to construct an INDEXED (or RELATIVE) file, you would create it via the execution of some COBOL program in which the file is opened for OUTPUT and records are written to that file. For an example, see this program, which creates a new INDEXED file, populating it with the records in an already-existing text file. End of Note.

File Open Modes and I/O Operations/Verbs

There are four "file open modes": INPUT, OUTPUT, EXTEND, and I-O. A COBOL program "announces its intention" to access a file by opening it, via the OPEN verb. When opening a file, one of these four modes must be specified, as in

OPEN INPUT Course-File

When the program is finished using a file (perhaps only temporarily) it closes it via the CLOSE verb, as in

CLOSE Course-File

A file opened in INPUT mode is one that may be accessed only via the READ verb (plus the START verb, if the file is INDEXED or RELATIVE). A file opened in OUTPUT mode is one that may be accessed only via the WRITE verb; furthermore, if the file existed prior to being opened, its contents are destroyed (so that, when execution ends, the file contains only those records written to the file during execution of the program). A file opened in EXTEND mode, which applies only to SEQUENTIAL files, is one that may be accessed only via the WRITE verb; furthermore, the file must have existed prior to being opened (unless the word OPTIONAL appeared in the SELECT statement for that file), and any records written to it during execution are placed after the ones already there. (Note: A file opened in I-O mode is one on which both reading and writing of records may be carried out, via the READ and REWRITE verbs. (The WRITE and START verbs may be applied, too, if the file is INDEXED or RELATIVE.)

Note that a file may be opened more than once during execution of a program, possibly with different open modes each time. However, a file that is open must be closed (via the CLOSE verb) before it can be opened again. For example, a program may open a file for OUTPUT, write records into it, close it, open it for INPUT, and then read records from it.

I/O operations on SEQUENTIAL files

The OPEN and CLOSE verbs were described above (although not in full generality---see a COBOL reference for more details).

Here we consider the remaining verbs that may be applied to a SEQUENTIAL file: READ, WRITE, and REWRITE. Which of these three operations are applicable to a file depends upon the mode into which the file was opened:

          +--------------------------------------+
          |               M o d e                |
Operation |                                      |
          |  INPUT     OUTPUT    EXTEND     I-O  |
          +---------+---------+----------+-------+
READ      |    x    |         |          |   x   |
          +---------+---------+----------+-------+
WRITE     |         |    x    |    x     |       |
          +---------+---------+----------+-------+
REWRITE   |         |         |          |   x   |
          +---------+---------+----------+-------+

Allowed operations on a file declared to be accessed SEQUENTIAL-ly

Syntactic format of the READ verb applied to a file declared to be accessed in SEQUENTIAL mode (and thereby necessarily opened in INPUT or I-O mode):

      READ <file-name> [NEXT] [INTO data-name]
         AT END <imperative statement>
         [NOT AT END <imperative statement>]
      END-READ

Example:

     READ Employee-File
        AT END SET Employee-Eof TO TRUE
        NOT AT END PERFORM Process-Employee
     END-READ
The effect of this command is as follows:
  1. If the end-of-file condition is off, then:
    1. If there is no "next" record (e.g., because the file is empty or its last record was previously read in), the end-of-file condition is turned on and the imperative statement following the AT END clause is executed.
    2. If there is a "next" record, it is read into the file's data record (and then copied into the data item specified in the INTO clause, if it is present). Also, if the NOT AT END clause is present, the imperative statement there is executed.
  2. If the end-of-file condition is on (e.g., because a previous attempt at READing turned it on), the program aborts.

Note: As mentioned above, by the file's "data record" we mean the 01-level data item declared in the data description entry immediately following the file's file description entry (the stuff coming after the keyword FD). In the example above, the data record is Employee-Rec.

The presence or absence of the word NEXT within this form of the READ statement makes no difference.

Note that, in COBOL, the end-of-file condition does not become true until an attempt is made to READ beyond the last record in the file. (When this attempt is made, the imperative statement following the AT END clause of the READ statement is executed.) This is in contrast to Ada and Pascal, in which the end-of-file condition becomes true immediately after the last record has been read. For this reason, a typical file processing loop in COBOL has a somewhat different form than an equivalent loop in Ada or Pascal. Consider this Ada-like pseudocode:

                 WHILE not End_of_File(f) LOOP
                    Get(f, rec);  --read next record of file f into rec
                    <code to process rec>
                 END LOOP;
The "equivalent" code segment in COBOL would be written in either of these two forms (given in COBOL-like pseudocode):
   SET eof TO FALSE               |      SET eof TO FALSE
   READ f                         |      PERFORM UNTIL eof
      AT END SET eof TO TRUE      |         READ f
   END-READ                       |            AT END     SET eof TO TRUE
   PERFORM UNTIL eof              |            NOT AT END <code to process rec>
      <code to process rec>       |         END-READ
      READ f                      |      END-PERFORM
         AT END SET eof TO TRUE   |
      END-READ                    |
   END-PERFORM                    |

In order to make the program on the left a little more concise, we could place the READ statement into a separate paragraph ---call it Read-f-Rec--- and then replace each of the two occurrences of the READ statement by PERFORM Read-f-Rec.

Syntactic format of the WRITE verb applied to a SEQUENTIAL file (necessarily opened in OUTPUT or EXTEND mode):

WRITE <data-record-name> [FROM <data-name>]

Example: WRITE Employee-Rec FROM Temp-Empl-Rec

The effect is that the file's data record (after the specified data item has been copied into it, if the FROM clause is present) is written at the end of the file (i.e., after the last record in the file). Recall that opening a file in OUTPUT mode destroys the file's previous contents, whereas opening a file in EXTEND mode leaves its contents intact, allowing the program to write new records after the ones already there.

Note that the WRITE verb cannot be applied to a SEQUENTIAL file opened in I-O mode, as this mode allows REWRITE-ing but not WRITE-ing.

Syntactic format of REWRITE verb applied to a SEQUENTIAL file (necessarily opened in I-O mode):

REWRITE <data-record-name> [FROM <data-name>]

The effect is that the file's data record (or, if the FROM clause is present, the specified data item) is written to the file, replacing the record most recently read from the file. An example program that uses the REWRITE verb appears within the course web pages.

Note that, for reasons that I have never seen explained anywhere, the READ verb refers to the file whereas the WRITE and REWRITE verbs refer to the file's data record.

I/O operations on INDEXED files

As noted above, an INDEXED file may have any of three access modes ---SEQUENTIAL, RANDOM, or DYNAMIC--- and may be opened in any of three modes ---INPUT, OUTPUT, or I-O. Which I/O operations are applicable to an INDEXED file depend upon both its access mode and its open mode:

           +-------------------------------------+
File       |         |     O p e n   M o d e     |
Access     |         |                           |
Mode       |   Verb  |  INPUT     OUTPUT    I-O  |
           +---------+---------+---------+-------+
SEQUENTIAL |    READ |    x    |         |   x   |   (sequential form only)
           |   WRITE |         |    x    |       |   (sequential form only)   
           | REWRITE |         |         |   x   |   (sequential form only)
           |  DELETE |         |         |   x   |
           |   START |    x    |         |   x   |   (surprising!)
           +---------+---------+---------+-------+
RANDOM     |    READ |    x    |         |   x   |   (random form only)
           |   WRITE |         |    ?    |   x   |   (random form only) 
           | REWRITE |         |         |   x   |   (random form only)
           |  DELETE |         |         |   x   |
           |   START |         |         |       |
           +---------+---------+---------+-------+
DYNAMIC    |    READ |    x    |         |   x   |   (either form)
           |   WRITE |         |    x    |   x   |   (either form)
           | REWRITE |         |         |   x   |   (either form)
           |  DELETE |         |         |   x   |
           |   START |    x    |         |   x   |
           +---------+---------+---------+-------+

As suggested in the remarks to the right of the table above, each of the READ, WRITE, and REWRITE verbs has two forms, one for sequential access and one for random access.

Random Access form of READ, WRITE, and REWRITE for INDEXED files

This section pertains to an INDEXED file for which, in the program under consideration, the ACCESS MODE has been specified to be either RANDOM or DYNAMIC.

To read a record ---with a specified value in its key field--- from an INDEXED file opened in either INPUT or I-O mode:

  1. Place desired value into the key field (in the file's data record)
  2. READ <file-name> [INTO data-name]
       [INVALID KEY      <imperative statement>]
       [NOT INVALID KEY  <imperative statement>]
    END-READ 
For example,
     DISPLAY 'Enter course ID:' WITH NO ADVANCING
     ACCEPT Course-ID
     READ Course-File
        INVALID KEY DISPLAY 'No such record'
        NOT INVALID KEY PERFORM Display-Course-Rec
     END-READ 

The effect is that, if a record with the specified value in the key field exists in the file, that record is read into the file's data record (and is then copied into the data item specified in the INTO clause, if present), and, if present, the imperative statement in the NOT INVALID KEY clause is executed. Otherwise, if the INVALID KEY clause is present, the imperative statement there is executed.

To write a record into an INDEXED file opened in I-O (or OUTPUT??) mode:

  1. Place desired contents into file's data record (or the data item specified in the FROM clause).
  2. WRITE <data-record>  [FROM data-name]
       [INVALID KEY      <imperative statement>]
       [NOT INVALID KEY  <imperative statement>]
    END-WRITE 

Example:

     WRITE Employee-Rec FROM Temp-Empl-Rec
        INVALID KEY DISPLAY 'Cannot WRITE; record with same key exists'
        NOT INVALID KEY DISPLAY 'WRITE is successful'
     END-WRITE

The effect is that, if the file contains no record whose key field matches that currently in the file's data record (or, in the case that the FROM clause is present, that currently in the specified data item), the data record is written, as a new record, into the file, and, if the NOT INVALID KEY clause is present, the imperative statement there is executed. Otherwise (i.e., there exists a record in the file whose key field equals that of the data record), if the INVALID KEY clause is present, the imperative statement there is executed.

To replace a record in an INDEXED file opened in I-O mode:

  1. READ the record to be replaced (into the file's data record).
  2. Change the contents of the data record (but not the key field).
  3. REWRITE <data-record-name>  [FROM data-name]
       [INVALID KEY      <imperative statement>]
       [NOT INVALID KEY  <imperative statement>]
    END-REWRITE  

Example:

     REWRITE Employee-Rec
        INVALID KEY  DISPLAY 'Cannot REWRITE; no record with that key exists'
        NOT INVALID KEY DISPLAY 'Record rewritten successfully'
     END-REWRITE

The effect is that, if there exists a record in the file having the same value in its key field as the file's data record (or, in the case that the FROM clause is present, the data item specified there), that record is replaced by the contents of the data record (or the FROM data item) and the NOT INVALID KEY clause's imperative statement is executed. Otherwise, the INVALID KEY clause's imperative statement is executed.

The difference between REWRITE and WRITE is that the former can only replace an existing record whereas the latter can only insert a new record.

To delete a record in an INDEXED file opened in I-O mode:

  1. READ the record (into the file's data record) (Question: Depending upon the implementation, it may suffice to place the desired value into the key field of the file's data record, without necessarily doing so by reading the corresponding record. However, it is a good idea to READ first anyway, just to verify that the record to be deleted is really there.)
  2. DELETE <file-name>
       [INVALID KEY      <imperative statement>]
       [NOT INVALID KEY  <imperative statement>]
    END-DELETE 

Example:

     DISPLAY 'Enter Course ID of course to be cancelled:'
     ACCEPT Course-ID
     DELETE Course-File
        INVALID KEY CONTINUE
        NOT INVALID KEY DISPLAY 'Record deleted successfully'
     END-DELETE

The effect is that, if the file contains a record whose key field matches that of the file's data record, that record is deleted from the file and the imperative statement in the NOT INVALID KEY clause, if present, is executed. Otherwise, the imperative statement in the INVALID KEY clause, if present, is executed.

Note that, in order to apply either the REWRITE or DELETE verb to a record, the most recent I/O operation must have been a successful READ of that record. (Warning: This statement may be incorrect.)

Sequential Access form of READ, WRITE, and REWRITE for INDEXED files

This section pertains to an INDEXED file for which, in the program under consideration, the ACCESS MODE has been specified to be either SEQUENTIAL or DYNAMIC.

To position the file pointer (i.e., to seek) to the first record satisfying a specified condition in an INDEXED file opened in I-O or INPUT mode:

      START <file-name> KEY IS { =, >, NOT <, >= } <data-name>
         [INVALID KEY <imperative statement>]
         [NOT INVALID KEY <imperative statement>]
      END-START 

NOTE: Some compilers may require that data-name be declared as the RECORD KEY of the file (in the SELECT clause in the ENVIRONMENT DIVISION). Some compilers require the INVALID KEY clause to be present.

Example:

     MOVE 'Jones00001' TO Employee-ID
     START Employee-File KEY IS NOT < Employee-ID
        INVALID KEY DISPLAY 'something wrong'
        NOT INVALID KEY CONTINUE
     END-START 

The effect is to place the file pointer to the first record (i.e., the one having smallest key value) satisfying the condition specified, so that a sequential READ will cause that to be the record read in. If no record satisfies the specified condition (e.g., the key value sought is larger than any in the file), the imperative statement in the INVALID KEY clause, if present, is executed. Otherwise, the imperative statement in the NOT INVALID KEY clause, if present, is executed.

To read "the next" record (i.e., the one following the record most recently read, or the one "found" by an application of the START verb) in an INDEXED file opened in either INPUT or I-O mode:

       READ <file-name> NEXT RECORD [INTO data-name]
          [AT END     <imperative statement>]
          [NOT AT END <imperative statement>]
       END-READ  

Example:

 
     MOVE 'Jones00001' TO Employee-ID
     START Employee-File KEY IS NOT < Employee-ID
        INVALID KEY DISPLAY '*** Error ***'
        NOT INVALID KEY
           PERFORM UNTIL Finished OR (Employee-ID > 'Smith99999')
              READ Employee-File NEXT RECORD
                 AT END SET Finished TO TRUE
                 NOT AT END PERFORM Process-Empl-Rec
              END-READ
           END-PERFORM
     END-START 

To replace the record most recently read from an INDEXED file opened in I-O mode:

      REWRITE <data-record-name> [FROM data-name]        

To write a new record (necessarily having a larger key than any already in the file??) into an INDEXED file opened in I-O (or OUTPUT?) mode:

      WRITE <data-record-name> [FROM data-name]         ????

Random observations based on program testing

A syntax error occurs if you attempt to open an INDEXED file in EXTEND mode.

When a file whose SELECT clause specifies DYNAMIC ACCESS mode is opened in OUTPUT or I-O mode:

Random Access form of READ, WRITE, and REWRITE for RELATIVE files

Omitted for the moment. See Chapter 21 of Comprehensive COBOL.