CMPS 340 File Processing
B-trees: Definition of and Algorithms for Insertion/Deletion

 

A B-tree is a rooted tree in which each node contains a sequence of keys. (With each key there may be a collection of associated information.) To say that a B-tree has order m means that

(a) No node contains more than m - 1 keys.
(b) No node, except possibly the root, contains fewer than ém/2ù - 1 keys. The root may have as few as one key.
(c) Each non-leaf node has exactly one more children than keys.
(d) All leaves are equidistant from the root.
(e) Letting the keys in a node be x1, x2, ..., xr (where x1 < x2 < ... < xr) and letting the subtrees rooted at its children be s0, s1, ... sr, it is the case that, for all i satisfying 0 < i ≤ r, all the keys in si-1 are less than xi and all the keys in si are greater than xi. (If duplicate keys are allowed, we could change the latter condition to say that all the keys in si are greater than or equal to xi.)
From (b) and (c), it follows that every non-leaf node, except possibly the root, has at least [m/2] (floor of m/2) children but no more than m children.

Below we describe algorithms for the insertion and deletion of a key into/from a B-tree of order m. To simplify the description, we assume that each node is (temporarily) capable of holding up to m keys and m+1 pointers to children. A node containing m keys is said to be overflowing. A non-root node containing fewer than ém/2ù - 1 keys is said to be underflowing. During the insertion process, a node may be temporarily overflowing. During the deletion process, a node may be temporarily underflowing. A node is said to be full if it contains the maximum allowable number of keys (m - 1). A node is said to be on the verge of underflowing if it contains the minimum allowable number of keys (ém/2ù - 1). The functions Left_Sibling_of(), Right_Sibling_of(), and Parent_of() have the obvious meanings.

Insert(z):
  BEGIN
    find leaf node M where z "belongs";
    place z into proper place within M;
    Adjust(M);
  END Insert; 

Delete(z):
  BEGIN
    find node M where z "belongs";
    IF z is not in M THEN
       do nothing (or report error, if appropriate);
    ELSE
       IF M is a leaf THEN
          remove z from M;
          Adjust(M);
       ELSE
          find N, the leftmost leaf in the right subtree of z;
          let z' be the smallest key in N;
          remove z' from N;
          replace z in M by z';
          Adjust(N);
       ENDIF
    ENDIF
  END Delete; 

Adjust(M):
  BEGIN
    IF M is overflowing THEN
      IF Right_Sibling_of(M) exists and is not full THEN
        LR-Redistribute(M);
      ELSIF Left_Sibling_of(M) exists and is not full THEN
        RL-Redistribute(M);
      ELSE --all of M's immediate siblings are full
        Split(M);
        Adjust(Parent_of(M));
      ENDIF;

    ELSIF M is underflowing THEN
      IF Left_Sibling_of(M) exists and is not on verge of underflowing THEN
        LR-Redistribute(Left_Sibling_of(M));
      ELSIF Right_Sibling_of(M) exists and is not on verge of underflowing THEN
        RL-Redistribute(Right_Sibling_of(M));
      ELSIF M is the root THEN
        IF M has only one child THEN
          child of M becomes root;
          dispose of M;
        ENDIF;
      ELSE 
        --To have arrived here, M must be underflowing, M must have
        --at least one sibling, and all of M's immediate siblings are on
        --the verge of underflowing.  It follows that concatenation 
        --(or "merging", if you prefer) is called for
        IF Right_Sibling_of(M) exists THEN
          Concatenate(M);
        ELSE --left sibling must exist
          Concatenate(Left_Sibling_of(M));
        ENDIF;
      ENDIF;

    ELSE --M is neither overflowing nor underflowing
      do nothing;
    ENDIF;
  END Adjust; 

Note: In any of the places where either of the two Redistribute procedures are called, it may be beneficial to move as many keys as necessary out of the ``giving'' node and into the ``receiving'' node so as to result in the two of them having (as close as possible to) an equal number of keys. To accomplish this, simply nest the call to the appropriate Redistribute procedure inside a loop that iterates an appropriate number of times. End of note.

The subprograms LR-Redistribute(), RL-Redistribute(), Split(), and Concatenate() are described via the pictures appearing below. Each one takes a node as its lone parameter. In the cases of LR-Redistribute() and RL-Redistribute(), the parameter corresponds to the node that is giving away one of its keys. In the case of Split(), the parameter corresponds to the node that gets split into two nodes. As for Concatenate(), the parameter corresponds to the sibling on the left among the two adjacent siblings that are combined into a single node. In the pictures, capital letters (e.g., A, B, etc.) refer to nodes and small letters (u,v, etc.) refer to keys.

LR-Redistribute(B):
       +-----+-+-+-+-----+                        +-----+-+-+-+-----+
     A | ... |u|x|z| ... |                      A | ... |u|w|z| ... |
       +-----+-+-+-+-----+                        +-----+-+-+-+-----+
              /   \                                      /   \
             /     \                                    /     \
            /       \                                  /       \
  +------+-+-+       +-+------+    ======>      +------+-+     +-+-+-----+
B | .... |v|w|     C |y| .... |               B | .... |v|   C |x|y| ... |
  +------+-+-+       +-+------+                 +------+-+     +-+-+-----+
         | | |       | |                               | |     | | |
         | | |       | |                               | |     | | |
         | | |       | |                               | |     | | |
         D E F       G H                               D E     F G H

RL-Redistribute(C): Reverse the direction of the arrow in the picture above.

Split(C):
          +-----+-+-+-----+                    +-----+-+-+-+-----+
        A | ... |u|y| ... |                  A | ... |u|w|y| ... |
          +-----+-+-+-----+                    +-----+-+-+-+-----+
                | | |                                | | | |
                | | |                                | | | |
                B | D                                B | | D
                  |                                   /   \
                  |           ======>                /     \ 
          +----+-+-+-+----+                 +-----+-+       +-+-----+
        C | .. |v|w|x| .. |               C | ... |v|       |x| ... | C'
          +----+-+-+-+----+                 +-----+-+       +-+-----+
                 | |                                |       |
                 | |                                |       |
                 E F                                E       F

Concatenate(C): Reverse the direction of the arrow in the picture above.




File translated from TEX by TTH, version 2.78.
On 9 Oct 2002, 15:09.