Array-based Implementation of Positional Lists

The Naive Approach

One naive approach for using an array to implement a positional list is to store the list items in elements 0..n-1 of the array, where n is the current length of the list. We would also need to keep track of the current position, as well as the number of items currently in the list, and so we would introduce instance variables for each of these in our class, which might begin like this:

public class ListViaNaiveAry {

   private Object[] contents;  // array holding the list items 
   private int numItems;       // # of items in the list
   private int crrntPos;       // points to array element holding current item

Recall that a position in a list corresponds to one of the items in it, except for the rear position, which follows the last item. The purpose of the instance variable crrntPos is to indicate the array location holding the current item. As a special case, its value will be equal to numItems when the current position is the rear.

To explore this representation scheme, let's try to implement several of the operations that we included in our description of the ADT list.

Observers:

lengthOf(): return numItems;
isEmpty() : return lengthOf() == 0;
atFront() : return crrntPos == 0;
atRear()  : return crrntPos == numItems;
getObj()  : return contents[crrntPos];
Recall that getObj() has as its precondition !atRear(). If we had chosen to implement getObj() in accord with the defensive programming style (in which a method assumes responsibility for verifying that its precondition is met), we could have written it as follows:
getObj() : if (atRear())
              { throw an exception }
           else
              { return contents[crrntPos]; } 
So we see that this scheme gives us simple and efficient (constant-time) ways of implementing the observer operations. Let's turn to mutators.

Navigation Mutators:

toFront(): crrntPos = 0; 
toRear() : crrntPos = numItems;
toNext() : crrntPos = crrntPos + 1;
toPrev() : crrntPos = crrntPos - 1; 

Employing the defensive style in implementing toNext(), we get

toNext(): if (atRear())
             { throw an exception }
          else
             { crrntPos = crrntPos + 1; }
The defensive version of toPrev() is similar.

Content Mutators:

replace(x): contents[crrntPos] = x; 

We leave it to the reader to devise the defensive version of replace().

So far, everything looks fine. But we will see now that the insert() and remove() operations are problematic insofar as there appears to be no way to make them run any faster than in linear time (i.e., time proportional to the length of the list).

remove() : // shift contents[crrntPos+1..numItems-1] one place to the left
           for (int i = crrntPos; i != numItems-1;  i = i+1)
              { contents[i] = contents[i+1]; }
           numItems = numItems - 1;

insert(x) : // shift contents[crrntPos..numItems-1] one place to the right
            for (int i = numItems; i != crrntPos;  i = i-1)
               { contents[i] = contents[i-1]; }
            numItems = numItems + 1;
            contents[crrntPos] = x; 

We see that, to carry out remove(),, the values in the segment contents[crrntPos+1..numItems-1] are shifted to the left one place. On average, then, we can expect that about half of the items in the list will be shifted, and, in the worst case, all of them will be shifted. It follows that this a linear time operation.

Given the constraints of the representation scheme that we have chosen, there seems to be no way to avoid having the insert() and remove() operations take time linear in the number of elements in the list. This leads us to wonder whether there exists a different array-based representation scheme that allows for more efficient versions of these operations.

A Better Approach

Recall that, in devising an efficient array-based representation for queues, the key idea was to allow the element at the front of the queue to be stored at any of the locations in the array, followed by all the others (in a "wrap-around" fashion). Taking a similar approach here will help. But we need to go a step further. Because insertions and deletions occur only at the ends of a queue, keeping a queue's items stored in a contiguous array segment never requires that a sub-segment be shifted in order to plug a hole caused by a deletion or to create a hole needed for an insertion. Such is not the case for lists, in which insertion and deletion may occur anywhere. To make array segment shifting unnecessary, we relax the condition that the list items must be stored in a contiguous segment. Indeed, we allow all of the list items (not only the first) to be stored at arbitrary locations in the array. But then how do we keep track of which item is first, which is second, and so on? The answer is that, for each item, we also store the locations of its predecessor and successor.

To illustrate such a scheme, consider again this list of animals:

       +---+    +---+    +---+    +---+    +---+    +---+    +---+
       | C |    | D |    | B |    | A |    | O |    | Y |    | C |
       | A |----| O |----| U |----| N |----| W |----| A |----| O |----x
       | T |    | G |    | G |    | T |    | L |    | K |    | W |    ^
       +---+    +---+    +---+    +---+    +---+    +---+    +---+    |
                  ^                                                   |
                  |                                                   |
               current                                              rear
               position 
Following the suggestions above, one possible representation would be the following, which uses three arrays, contents[], pred[], and succ[], as well as two int variables, front and crrntPos.
      pred      contents     succ      frontPos      crrntPos
     +----+    +--------+   +----+   +----------+   +--------+
   0 |  8 |    |  null  |   | -1 |   |     5    |   |   1    |
     +----+    +--------+   +----+   +----------+   +--------+
   1 |  5 |    |  DOG   |   | 10 |
     +----+    +--------+   +----+
   2 |  3 |    |  OWL   |   |  7 |
     +----+    +--------+   +----+
   3 | 10 |    |  ANT   |   |  2 |
     +----+    +--------+   +----+
   4 |    |    |        |   |    |
     +----+    +--------+   +----+
   5 | -1 |    |  CAT   |   |  1 |
     +----+    +--------+   +----+
   6 |    |    |        |   |    |
     +----+    +--------+   +----+
   7 |  2 |    |  YAK   |   |  8 |
     +----+    +--------+   +----+
   8 |  7 |    |  COW   |   |  0 |
     +----+    +--------+   +----+
   9 |    |    |        |   |    |
     +----+    +--------+   +----+
  10 |  1 |    |  BUG   |   |  3 |
     +----+    +--------+   +----+
  11 |    |    |        |   |    |
     +----+    +--------+   +----+
  12 |    |    |        |   |    |
     +----+    +--------+   +----+ 

Notice that, for each item in the list, (a reference to) it is contained in one of the elements of contents[] and the corresponding elements of pred[] and succ[] "point to" that item's predecessor and successor, respectively. (We use -1 to denote a null pointer.) The values stored in array elements whose contents are not shown are irrelevant. The values of frontPos and crrntPos allow us to locate quickly the front and current positions, respectively, of the list. We don't need rearPos, because we will always use the zero-th element of each array to store information regarding the rear of the list.

Notice that it doesn't matter in which element of contents[] a particular list item is stored, as long as the corresponding elements of pred[] and succ[] correctly indicate the locations of that item's predecessor and successor, respectively, and as long as frontPos and crrntPos point to the right places. Indeed, within this scheme, the same list has many different possible representations (12!/6!, in fact), of which the above is only one example.

For a hint as to why this representation scheme is superior to the naive one that was explored earlier, consider the modifications needing to be made in order to get the result of applying remove(). The item at the current position is DOG. After removing it, the list looks like this:

       +---+    +---+    +---+    +---+    +---+    +---+
       | C |    | B |    | A |    | O |    | Y |    | C |
       | A |----| U |----| N |----| W |----| A |----| O |----x
       | T |    | G |    | T |    | L |    | K |    | W |    ^
       +---+    +---+    +---+    +---+    +---+    +---+    |
                  ^                                          |
                  |                                          |
               current                                      rear
               position 

With respect to our representation, then, the element of pred[] corresponding to BUG (which had been DOG's successor but is now CAT's) should be changed to point to CAT. Similarly, the element of succ[] corresponding to CAT should point to BUG. Also, crrntPos should be changed to point to BUG, which was the removed node's successor. That's all! Removing an item from a list is accomplished simply by modifying three int variables, which can be done in constant time!

Note: If the removed item were at the front of the list, we would have changed frontPos rather than the element of succ[] corresponding to that item's (non-existent) predecessor.

The updated representation is as follows:

      pred      contents     succ      frontPos      crrntPos
     +----+    +--------+   +----+   +----------+   +--------+
   0 |  8 |    |  null  |   | -1 |   |     5    |   |   10   |
     +----+    +--------+   +----+   +----------+   +--------+
   1 |    |    |        |   |    |
     +----+    +--------+   +----+
   2 |  3 |    |  OWL   |   |  7 |
     +----+    +--------+   +----+
   3 | 10 |    |  ANT   |   |  2 |
     +----+    +--------+   +----+
   4 |    |    |        |   |    |
     +----+    +--------+   +----+
   5 | -1 |    |  CAT   |   | 10 |
     +----+    +--------+   +----+
   6 |    |    |        |   |    |
     +----+    +--------+   +----+
   7 |  2 |    |  YAK   |   |  8 |
     +----+    +--------+   +----+
   8 |  7 |    |  COW   |   |  0 |
     +----+    +--------+   +----+
   9 |    |    |        |   |    |
     +----+    +--------+   +----+
  10 |  5 |    |  BUG   |   |  3 |
     +----+    +--------+   +----+
  11 |    |    |        |   |    |
     +----+    +--------+   +----+
  12 |    |    |        |   |    |
     +----+    +--------+   +----+

A list class based upon this representation scheme might begin like this:

public class PosListViaAry<T> implements PosList<T> {

   private T[] contents;       // array in which list items are stored
   private int[] pred;         // predecessor pointers
   private int[] succ;         // successor pointers
   private int frontPos;       // points to array element holding first item
   private int crrntPos;       // points to current position

Let's attempt to code some of the observer and navigation operations, just to verify that they can be done in constant time.

Observers:

isEmpty()   : return frontPos == 0;
lengthOf()  : ?
atFront()   : return crrntPos == frontPos; 
atRear()    : return crrntPos == 0;
getObj()    : return contents[crrntPos];

Navigational Mutators:

toFront() : crrntPos = frontPos; 
toRear()  : crrntPos = 0;
toNext()  : crrntPos = succ[crrntPos];
toPrev()  : crrntPos = pred[crrntPos]; 

Our implementation of isEmpty() is based upon the observation that a list is empty if and only if its front and rear positions coincide. As we always store information about the rear at location zero, it suffices to compare frontPos to zero. Our implementations of atFront(), atRear(), and getObj() are similarly obvious.

We run into a problem with lengthOf(), however. To compute the length of a list would seem to require either traversing the entire list (by getting to the front and following the succ pointers until arriving at the rear, and counting along the way). But this would take time proportional to the length of the list! This is what we are trying to avoid!

Another approach is to examine each element of contents[] to determine whether or not it is "occupied". (The latter assumes that there is some way to make the distinction between occupied and unoccupied array elements. If we stipulate that an element of the succ[] array that is logically unoccupied is to contain the value -2, then such a distinction can be made.) However, this method requires time proportional to contents.length, which may be significantly larger than the length of the list!

If the representation scheme we have in mind does not admit a constant-time algorithm for lengthOf(), perhaps we can adjust the scheme so that it does! What if we simply introduce a new instance variable of type int, called numItems, to our class? As suggested by its name, the purpose of this variable is to store the number of items in the list, i.e., its length. This gives us a very simple (and constant-time) algorithm for lengthOf():

lengthOf() :  return numItems;

We could also rewrite isEmpty() to utilize the new variable, as follows:

isEmpty()  :  return numItems == 0;

Thus, we have solved the problem, right? Well, maybe! By introducing a new variable with the associated invariant numItems == length of list, we have imposed an extra computational burden upon all the operations that cause a list's length to change. It is conceivable that, in order to maintain this invariant, some operation that otherwise could have been accomplished in constant time will now require greater-than-constant time. A few moment's reflection, however, dispels this notion. The only operations that could change the length of a list are remove() and the various insert()'s. In each case, the modification needing to be made to numItems is trivial.

Now let's consider the remove() operation. As noted above, it requires changes to only three variables. However, it is critical that the changes be made in the correct order. (You will see that in working with pointers, it is often the case that you must be very careful about the order in which values are changed.)

  remove():
    if (atRear()) 
       { throw an exception }
    else {
       int predLoc = pred[crrntPos]; //location of predecessor of item being removed
       int succLoc = next[crrntPos]; //location of successor of item being removed

       if (frontPos == crrntPos)     //if item being removed is at front,
          { frontPos := succLoc }    //its successor becomes the front
       else
          { succ[predLoc] = succLoc; } //connect predecessor to successor

       prev[succLoc] = predLoc;        //connect successor to predecessor

       crrntPos = succLoc;             //successor becomes current position
       numItems = numItems - 1;
    }
Notice that the above takes constant time!

What about insertions? To perform one, it suffices to

  1. Find an "unoccupied" array location k.
  2. Place into contents[k] (a reference to) the new item.
  3. Place into pred[k] a pointer to the item that is to become the new one's predecessor (or -1 if the new item is being placed at the front of the list).
  4. Place into succ[k] a pointer to the position that is to become the new item's successor.
  5. Place k (which points to the new item) into succ[predLoc], where predLoc points to the item that is to become the new one's predecessor (or, if the new item is to be at the front of the list, place k into frontLoc).
  6. Place k (which points to the new item) into pred[succLoc], where succLoc points to the new item's successor

Steps (2) through (6) are easy. Taking the case of the insert(newObj) operation, which inserts the new object immediately before the current position, we get

  1. k = ??? // we have to figure this out
  2. contents[k] = newObj;
  3. pred[k] = pred[crrntPos];
  4. succ[k] = crrntPos;
  5. if (atFront()) { frontPos = k; }
    else { succ[pred[k]] = k; }
  6. pred[succ[k]] = k;
Step (1) is problematic, however, because the representation scheme (so far as we have developed it) fails to include any information that is helpful in quickly locating an unoccupied array element. Even if we were to stipulate that, for any unused location i, succ[i] must contain -2, say, (which would enable us to distinguish between occupied and unoccupied array elements), finding an unoccupied element would still seem to require an exhaustive search, due to the fact that there is no restriction upon what set of array elements may be unoccupied (except that location zero is always occupied by the rear). Thus, we get a running time proportional to contents.length, which is no less than the length of the list. Luckily, this can be repaired!

Solution 1: keep unoccupied elements at end of arrays

One solution is to maintain the following representation invariant: Elements 0 through numItems of the three arrays are occupied and the rest are not. (Recall that element zero is used for storing the rear, so we need numItems + 1 locations in total.) Thus, the first unoccupied element is always at location numItems+1. This makes finding an unoccupied element very easy, which makes step (1) in the informal insertion algorithm simple. But it complicates the implementation of remove(), because it must be modified so as to leave the arrays in a state satisfying the new representation invariant. For example, suppose that an application of remove() is to have the effect of removing, from a 57-item list, the one that happens to be stored at location 24 of the arrays. Upon completion, locations 0 through 56 of the arrays should be occupied and the rest unoccupied. But the straightforward implementation of remove() given above would leave location 57 occupied and location 24 unoccupied. How can it be fixed? One way is to transfer, in each of the three arrays, the value in location 57 to location 24. In addition, any pointer to location 57 should be changed to point to 24. But the only such pointers are succ[pred[57]] (or else frontLoc, if the item at location 57 happened to be at the front of the list), pred[succ[57]], and possibly crrntPos.

Let's rewrite remove() to take account of this (as well as the numItems variable that we added since discussing it):

  remove():
    if (atRear())
       { throw an exception }
    else
       int predLoc = pred[crrntPos];   //loc. of predecessor of item to be removed
       int succLoc = succ[crrntPos];   //loc. of successor of item to be removed

       if (frontLoc == crrntPos)     //if item being removed is first,
          { frontLoc := succLoc; }   //its successor becomes the front
       else
          { succ[predLoc] = succLoc; } //connect predecessor to successor

       pred[succLoc] = predLoc;      // connect successor to predecessor

       int vacantLoc = crrntPos;
       crrntPos = succLoc;           // successor is now current pos


       if (vacantLoc == numItems) {
          // no action needed
       }
       else {
          // now adjust arrays by moving contents of last occupied element 
          // into the element that was just vacated 
          contents[vacantLoc] = contents[numItems];
          pred[vacantLoc] = pred[numItems];
          succ[vacantLoc] = succ[numItems];

          // now make all pointers to the last occupied element point to
          // the element that had been vacated
          if (pred[vacantLoc] == -1)
             { frontLoc = vacantLoc; }
          else
             { succ[pred[vacantLoc]] = vacantLoc; }

          pred[succ[vacantLoc]] = vacantLoc; 

          if (crrntPos == numItems)
             { crrntPos = vacantLoc; }
       }

       numItems = numItems - 1;

Solution 2: Use an Avail list (of unoccupied elements)

A different approach for solving the problem of quickly finding an unoccupied array element is to use an avail chain. Add a new instance variable avail to the class; its purpose is to point to the "first" unoccupied array element. For each unoccupied location i, succ[i] points to the "next" one (or has the null pointer value -1 if there is none). When an unoccupied element is needed (for an insertion), avail provides the location of one in constant time. To update avail, make it point to the "second" unoccupied element, which is given by succ[avail]. When an item is removed, the location i of the vacated element is placed at the beginning of the avail chain by setting succ[i] to avail and setting avail to i.

What is troubling about this approach is that construction of a brand new empty list takes time proportional to contents.length, as the initial avail chain must include every location, and thus each element of succ[] must be initialized to point to some other element. (One way to achieve this is to set avail to contents.length - 1 and, for each i in the index range, succ[i] = i-1.)

It seems rather ironic that our representation scheme is such that its most expensive operation is the construction of an empty list!

One slight advantage that the avail chain approach has over the earlier one is that, under it, remove() is much simpler.

A Hybrid Approach Makes all Operations run in constant time

In order to get the advantages of both approaches, a hybrid approach is possible. Here, an avail chain is maintained, but it includes only unoccupied array elements that had been occupied at some point in the past. In doing an insertion, if the avail chain is empty, the first never-occupied array elements (which will be the ones at location numItems + 1) are used for storing the new item.