CMPS 340: Lecture notes on RAID
Based Upon: RAID: High-Performance, Reliable Secondary Storage
by Chen, et. al., ACM Computing Surveys, Vol. 26, No. 2, June 1994,
pp. 145-185.
Because of sustained exponential improvements in semiconductor
technology (i.e., processor speed, main memory capacity) over the
past decades (as predicted by Moore in the late 1960's), as compared
to the slower improvements in secondary storage technology, a widening
performance gap exists between the two. The effect of this is that
the latter is becoming more and more of a "bottleneck" that serves
to impede the rate at which overall system performance can improve.
Hence, much research has been directed towards discovering ways of
configuring secondary storage systems to improve their performance.
Perhaps the best solution to have been developed so far is
RAID (Redundant Array of Independent/Inexpensive Disks), which employs two
orthogonal concepts:
- data striping for improved performance.
Data striping distributes data transparently over multiple disk units
to make them appear as one larger, faster disk unit.
This allows multiple disk I/O's (either from different requests or
differenct parts of a single request) to be performed in parallel.
- redundancy for improved reliability. Using multiple disks,
rather than one, lowers the overall reliability (in terms of
MTTF (mean time to failure)); thus, redundant data is
stored in order to make recovery possible in case of simultaneous
failures of a small (e.g., one) number of disks in the array.
According to Chen, et. al.:
"A number of different data-striping and redundancy schemes have been
developed. The combinations and arrangements of these schemes lead to
a bewildering set of options for users and designers of disk arrays.
Each option presents subtle tradeoffs among reliability, performance,
and cost ..."
Most RAID organizations can be distinguished based upon two features:
- granularity of data interleaving
- fine-grained: (e.g., bit- or byte-level)
results in high data transfer rates for all I/O's but has
disadvantages that
(1) only one logical I/O request can be in service at one time
(because all the disks are needed for each I/O), and
(2) all disks must take time to seek for every request
- coarse-grained: (e.g., block-level)
allows multiple small requests to be serviced
simultaneously, if those requests are each for a small amount
of data (e.g., one block)
- method and pattern in which the redundant data is computed
and distributed across the disk array
There are two orthogonal issues here:
- Which method to use for computing redundant data
(e.g., parity, Hamming, Reed-Solomon codes)
- How to distribute the redundant data across the array:
using a "small" number of disks to store redundant data,
or all of them? The latter is better, because it tends
to spread out the work evenly among the disks, which avoids
the bottleneck phenomenon.
RAID levels (1 through 5 appeared in the original paper by Patterson,
et. al., 0 and 6 were added by others):
- Level 0 (Nonredundant): data striping is used to exploit parallelism,
but no redundant data is stored. Has best performance on WRITE
operation (because no redundant data need be stored). But not
the best performance on READ, because there's no choice as to
where to get the data, as there is in, say, Level 1.
- Level 1 (Mirroring/Shadowing): For each disk, use a mirror
(or shadow). Thus, there are always two copies of
all data. Each WRITE must write data on two disks.
A READ can choose the one that can service the request more quickly.
If a disk fails, its mirror is used for restoring the data.
- Level 2 (Memory-style ECC (Error Correcting Codes)):
Use, for example, Hamming Codes for error detection/correction.
For an array of four disks, need three extra to store redundant
info. In general, need lg n + 1 "extra disks",
so storage efficiency increases with the number of disks.
Details of error detection/correction are beyond scope of this
document, but suffice it to say that by storing some extra bits
for each byte (or block, or whatever unit) of data, it becomes
possible to detect and/or correct errors in the bits!
(Example: Storing one extra bit per byte so that each byte,
including the one extra bit, has an even number
of 1's, allows one to detect single-bit errors within bytes.
This is commonly referred to as parity checking.)
- Level 3 (Bit-Interleaved Parity): Use only a single "extra" disk
for storing parity info. (As the disk controller can determine
which disk has failed, it is not necessary to be able to detect
that info, as is possible in Level 2.)
Conceptually, the data is stored on the disks interleaved on a
per-bit basis.
If a single disk fails, the rest can be used to recover its data.
Each WRITE access all the disks, including the parity disk.
Each READ accesses all but the extra one.
- Level 4 (Block-Interleaved Parity):
Similar to Level 3, except that data is interleaved across disks
in blocks of arbitrary size (called the striping unit).
READ requests smaller than the striping unit access only one disk.
WRITE requests must update all indicated data blocks, plus each
corresponding parity block.
The fact that a single disk is used for parity causes a bottleneck:
every WRITE must access the parity disk.
- Level 5 (Block-Interleaved Distributed-Parity):
Parity data is distributed onto all the disks.
Has best small-read, large-read, and large-write performance.
Small WRITE not so good compared to mirroring because of need
to perform READ-MODIFY-WRITE operations to update parity.
(This is its major weakness.)
- Level 6 (P + Q Redundancy):
Has stronger error-correction capabilities to protect against
simultaneous failure of two disks.