CMPS 340: Lecture notes on RAID

Based Upon: RAID: High-Performance, Reliable Secondary Storage by Chen, et. al., ACM Computing Surveys, Vol. 26, No. 2, June 1994, pp. 145-185.

Because of sustained exponential improvements in semiconductor technology (i.e., processor speed, main memory capacity) over the past decades (as predicted by Moore in the late 1960's), as compared to the slower improvements in secondary storage technology, a widening performance gap exists between the two. The effect of this is that the latter is becoming more and more of a "bottleneck" that serves to impede the rate at which overall system performance can improve.

Hence, much research has been directed towards discovering ways of configuring secondary storage systems to improve their performance.

Perhaps the best solution to have been developed so far is RAID (Redundant Array of Independent/Inexpensive Disks), which employs two orthogonal concepts:

According to Chen, et. al.: "A number of different data-striping and redundancy schemes have been developed. The combinations and arrangements of these schemes lead to a bewildering set of options for users and designers of disk arrays. Each option presents subtle tradeoffs among reliability, performance, and cost ..."

Most RAID organizations can be distinguished based upon two features:

RAID levels (1 through 5 appeared in the original paper by Patterson, et. al., 0 and 6 were added by others):

  1. Level 0 (Nonredundant): data striping is used to exploit parallelism, but no redundant data is stored. Has best performance on WRITE operation (because no redundant data need be stored). But not the best performance on READ, because there's no choice as to where to get the data, as there is in, say, Level 1.

  2. Level 1 (Mirroring/Shadowing): For each disk, use a mirror (or shadow). Thus, there are always two copies of all data. Each WRITE must write data on two disks. A READ can choose the one that can service the request more quickly. If a disk fails, its mirror is used for restoring the data.

  3. Level 2 (Memory-style ECC (Error Correcting Codes)): Use, for example, Hamming Codes for error detection/correction. For an array of four disks, need three extra to store redundant info. In general, need lg n + 1 "extra disks", so storage efficiency increases with the number of disks. Details of error detection/correction are beyond scope of this document, but suffice it to say that by storing some extra bits for each byte (or block, or whatever unit) of data, it becomes possible to detect and/or correct errors in the bits! (Example: Storing one extra bit per byte so that each byte, including the one extra bit, has an even number of 1's, allows one to detect single-bit errors within bytes. This is commonly referred to as parity checking.)

  4. Level 3 (Bit-Interleaved Parity): Use only a single "extra" disk for storing parity info. (As the disk controller can determine which disk has failed, it is not necessary to be able to detect that info, as is possible in Level 2.) Conceptually, the data is stored on the disks interleaved on a per-bit basis. If a single disk fails, the rest can be used to recover its data.

    Each WRITE access all the disks, including the parity disk. Each READ accesses all but the extra one.

  5. Level 4 (Block-Interleaved Parity): Similar to Level 3, except that data is interleaved across disks in blocks of arbitrary size (called the striping unit).

    READ requests smaller than the striping unit access only one disk. WRITE requests must update all indicated data blocks, plus each corresponding parity block. The fact that a single disk is used for parity causes a bottleneck: every WRITE must access the parity disk.

  6. Level 5 (Block-Interleaved Distributed-Parity): Parity data is distributed onto all the disks.

    Has best small-read, large-read, and large-write performance. Small WRITE not so good compared to mirroring because of need to perform READ-MODIFY-WRITE operations to update parity. (This is its major weakness.)

  7. Level 6 (P + Q Redundancy): Has stronger error-correction capabilities to protect against simultaneous failure of two disks.