Data Clinic Knowledgebase: Data Recovery
and Hard Disk reference section
> RAID computer systems - Primer |
|
What
does RAID stand for ?
In 1987, Patterson, Gibson and Katz at the University of California
Berkeley, published a paper entitled "A Case for Redundant Arrays
of Inexpensive Disks (RAID)" . This paper described various types
of disk arrays, referred to by the acronym RAID. The basic idea of RAID
was to combine multiple small, inexpensive disk drives into an array
of disk drives which yields performance exceeding that of a Single Large
Expensive Drive (SLED). Additionally, this array of drives appears to
the computer as a single logical storage unit or drive.
The Mean Time Between Failure
(MTBF) of the array will be equal to the MTBF of an individual drive,
divided by the number of drives in the array. Because of this, the MTBF
of an array of drives would be too low for many application requirements.
However, disk arrays can be made fault-tolerant by redundantly storing
information in various ways.
Five types of array architectures,
RAID-1 through RAID-5, were defined by the Berkeley paper, each providing
disk fault-tolerance and each offering different trade-offs in features
and performance. In addition to these five redundant array architectures,
it has become popular to refer to a non-redundant array of disk drives
as a RAID-0 array.
Data Striping
Fundamental to RAID is "striping", a method of concatenating
multiple drives into one logical storage unit. Striping involves partitioning
each drive's storage space into stripes which may be as small as one
sector (512 bytes) or as large as several megabytes. These stripes are
then interleaved round-robin, so that the combined space is composed
alternately of stripes from each drive. In effect, the storage space
of the drives is shuffled like a deck of cards. The type of application
environment, I/O or data intensive, determines whether large or small
stripes should be used.
Most multi-user operating
systems today, like NT, Unix and Netware, support overlapped disk I/O
operations across multiple drives. However, in order to maximize throughput
for the disk subsystem, the I/O load must be balanced across all the
drives so that each drive can be kept busy as much as possible. In a
multiple drive system without striping, the disk I/O load is never perfectly
balanced. Some drives will contain data files which are frequently accessed
and some drives will only rarely be accessed. In I/O intensive environments,
performance is optimized by striping the drives in the array with stripes
large enough so that each record potentially falls entirely within one
stripe. This ensures that the data and I/O will be evenly distributed
across the array, allowing each drive to work on a different I/O operation,
and thus maximize the number of simultaneous I/O operations which can
be performed by the array.
In data intensive environments
and single-user systems which access large records, small stripes (typically
one 512-byte sector in length) can be used so that each record will
span across all the drives in the array, each drive storing part of
the data from the record. This causes long record accesses to be performed
faster, since the data transfer occurs in parallel on multiple drives.
Unfortunately, small stripes rule out multiple overlapped I/O operations,
since each I/O will typically involve all drives. However, operating
systems like DOS which do not allow overlapped disk I/O, will not be
negatively impacted. Applications such as on-demand video/audio, medical
imaging and data acquisition, which utilize long record accesses, will
achieve optimum performance with small stripe arrays.
A potential drawback to using
small stripes is that synchronized spindle drives are required in order
to keep performance from being degraded when short records are accessed.
Without synchronized spindles, each drive in the array will be at different
random rotational positions. Since an I/O cannot be completed until
every drive has accessed its part of the record, the drive which takes
the longest will determine when the I/O completes. The more drives in
the array, the more the average access time for the array approaches
the worst case single-drive access time. Synchronized spindles assure
that every drive in the array reaches its data at the same time. The
access time of the array will thus be equal to the average access time
of a single drive rather than approaching the worst case access time.
The different RAID levels
RAID-0
RAID Level 0 is not redundant, hence does not truly fit the "RAID"
acronym. In level 0, data is split across drives, resulting in higher
data throughput. Since no redundant information is stored, performance
is very good, but the failure of any disk in the array results in data
loss. This level is commonly referred to as striping.
RAID-1
RAID Level 1 provides redundancy by writing all data to two or more
drives. The performance of a level 1 array tends to be faster on reads
and slower on writes compared to a single drive, but if either drive
fails, no data is lost. This is a good entry-level redundant system,
since only two drives are required; however, since one drive is used
to store a duplicate of the data, the cost per megabyte is high. This
level is commonly referred to as mirroring.
RAID-2
RAID Level 2, which uses Hamming error correction codes, is intended
for use with drives which do not have built-in error detection. All
SCSI drives support built-in error detection, so this level is of little
use when using SCSI drives.
RAID-3
RAID Level 3 stripes data at a byte level across several drives, with
parity stored on one drive. It is otherwise similar to level 4. Byte-level
striping requires hardware support for efficient use.
RAID-4
RAID Level 4 stripes data at a block level across several drives, with
parity stored on one drive. The parity information allows recovery from
the failure of any single drive. The performance of a level 4 array
is very good for reads (the same as level 0). Writes, however, require
that parity data be updated each time. This slows small random writes,
in particular, though large writes or sequential writes are fairly fast.
Because only one drive in the array stores redundant data, the cost
per megabyte of a level 4 array can be fairly low.
RAID-5
RAID Level 5 is similar to level 4, but distributes parity among the
drives. This can speed small writes in multiprocessing systems, since
the parity disk does not become a bottleneck. Because parity data must
be skipped on each drive during reads, however, the performance for
reads tends to be considerably lower than a level 4 array. The cost
per megabyte is the same as for level 4.
Summary
RAID-0 is the fastest and
most efficient array type but offers no fault-tolerance.
RAID-1 is the array of choice for performance-critical, fault-tolerant
environments. In addition, RAID-1 is the only choice for fault-tolerance
if no more than two drives are desired.
RAID-2 is seldom used today since ECC is embedded in almost all modern
disk drives.
RAID-3 can be used in data intensive or single-user environments which
access long sequential records to speed up data transfer. However, RAID-3
does not allow multiple I/O operations to be overlapped and requires
synchronized-spindle drives in order to avoid performance degradation
with short records.
RAID-4 offers no advantages over RAID-5 and does not support multiple
simultaneous write operations.
RAID-5 is the best choice in multi-user environments which are not write
performance sensitive. However, at least three, and more typically five
drives are required for RAID-5 arrays.
Possible
approaches to RAID
Hardware RAID
The hardware based system manages the RAID subsystem independently from
the host and presents to the host only a single disk per RAID array.
This way the host doesn't have to be aware of the RAID subsystems(s).
The controller based hardware solution
DPT's SCSI controllers are a good example for a controller based RAID
solution.
The intelligent contoller manages the RAID subsystem independently from
the host. The advantage over an external SCSI---SCSI RAID subsystem
is that the contoller is able to span the RAID subsystem over multiple
SCSI channels and and by this remove the limiting factor external RAID
solutions have: The transfer rate over the SCSI bus.
The external hardware solution (SCSI---SCSI
RAID)
An external RAID box moves all RAID handling "intelligence"
into a contoller that is sitting in the external disk subsystem. The
whole subsystem is connected to the host via a normal SCSI controller
and apears to the host as a single or multiple disks.
This solution has drawbacks compared to the contoller based solution:
The single SCSI channel used in this solution creates a bottleneck.
Newer technologies like Fiber Channel can ease this problem, especially
if they allow to trunk multiple channels into a Storage Area Network.
4 SCSI drives can already completely flood a parallel SCSI bus, since
the average transfer size is around 4KB and the command transfer overhead
- which is even in Ultra SCSI still done asynchonously - takes most
of the bus time.
Software RAID
- The MD driver in the Linux kernel is an example of a RAID solution
that is completely hardware independent. The Linux MD driver supports
currently RAID levels 0/1/4/5 + linear mode.
- Under Solaris you have the Solstice DiskSuite and Veritas Volume Manager
which offer RAID-0/1 and 5.
- Adaptecs AAA-RAID controllers are another example, they have no RAID
functionality whatsoever on the controller, they depend on external
drivers to provide all external RAID functionality.
They are basically only multiple single AHA2940 controllers which have
been integrated on one card. Linux detects them as AHA2940 and treats
them accordingly.
Every OS needs its own special driver for this type of RAID solution,
this is error prone and not very compatible.
Hardware vs. Software
RAID
Just like any other application, software-based arrays occupy host system
memory, consume CPU cycles and are operating system dependent. By contending
with other applications that are running concurrently for host CPU cycles
and memory, software-based arrays degrade overall server performance.
Also, unlike hardware-based arrays, the performance of a software-based
array is directly dependent on server CPU performance and load.
Except for the array functionality,
hardware-based RAID schemes have very little in common with software-based
implementations. Since the host CPU can execute user applications while
the array adapter's processor simultaneously executes the array functions,
the result is true hardware multi-tasking. Hardware arrays also do not
occupy any host system memory, nor are they operating system dependent.
Hardware arrays are also
highly fault tolerant. Since the array logic is based in hardware, software
is NOT required to boot. Some software arrays, however, will fail to
boot if the boot drive in the array fails. For example, an array implemented
in software can only be functional when the array software has been
read from the disks and is memory-resident. What happens if the server
can't load the array software because the disk that contains the fault
tolerant software has failed? Software-based implementations commonly
require a separate boot drive, which is NOT included in the array.
The article published
above is part of a complete
article from Michael
Neuffer
|