Why RAID Matters
Every hard drive will eventually fail. The question is not whether, but when -- and whether your system can survive it. A single consumer drive has a mean time between failures of roughly 3-5 years under continuous load. In a data center with thousands of drives, failures are a daily occurrence. RAID (Redundant Array of Independent Disks) was invented to solve this fundamental problem: how do you build reliable storage out of unreliable components?
But RAID is not just about survival. By spreading data across multiple drives, RAID can also multiply read and write throughput far beyond what any single drive delivers. The genius of RAID is that it offers a spectrum of tradeoffs -- from pure speed with zero protection, to bulletproof redundancy that survives multiple simultaneous failures -- and lets you choose the balance that fits your needs.
Interactive RAID Visualization
Explore different RAID levels below. Click on disks to simulate failures and watch how each level handles the loss differently:
RAID Level Explorer
RAID 0: Striping
Data striped across disks for maximum speed
Advantages
- โข2x speed
- โขFull capacity
- โขSimple setup
Disadvantages
- โขNo redundancy
- โขTotal data loss if any disk fails
- โขNot for critical data
RAID Levels Comparison
| Level | Min Disks | Capacity | Redundancy | Read Speed | Write Speed | Use Case |
|---|---|---|---|---|---|---|
| RAID 0 | 2 | 100% | None | Excellent | Excellent | Video editing, gaming |
| RAID 1 | 2 | 50% | 1 disk | Good | Normal | OS drives, critical data |
| RAID 5 | 3 | 66-94% | 1 disk | Good | Slow | File servers, NAS |
| RAID 6 | 4 | 50-88% | 2 disks | Good | Slower | Large arrays, archives |
| RAID 10 | 4 | 50% | 1-2 disks | Excellent | Good | Databases, VMs |
Understanding RAID Operations
Write Operations
RAID 0: Data split and written simultaneously to all disks
RAID 1: Same data written to all disks (mirrors)
RAID 5/6: Data + calculated parity written across disks
RAID 10: Data striped across mirror pairs
Failure Recovery
RAID 0: No recovery - total data loss
RAID 1: Read from surviving mirror
RAID 5: Reconstruct from data + parity
RAID 6: Can survive 2 disk failures
The Core Concepts: Striping, Mirroring, and Parity
Every RAID level is built from combinations of three fundamental techniques:
Striping splits data across multiple disks so that reads and writes happen in parallel. Think of it like distributing a deck of cards across several players -- dealing goes much faster than handing the whole deck to one person. Striping multiplies throughput but provides no protection: lose one disk, lose everything.
Mirroring writes identical copies of data to two or more disks. It is the simplest form of redundancy: if one disk dies, the other has a perfect copy. The cost is capacity -- you get only half of your total disk space. The benefit is instant recovery with no computation required.
Parity is the mathematical trick that makes RAID 5 and 6 possible. Using XOR operations, the array calculates a parity value from the data blocks. If any single disk is lost, its contents can be reconstructed from the remaining data and the parity. This provides redundancy without the 50% capacity penalty of mirroring.
Understanding RAID Levels
RAID 0: Pure Speed, Zero Safety
RAID 0 stripes data across all disks with no redundancy whatsoever. Every disk contributes its full capacity, and read/write speeds scale nearly linearly with the number of disks. Two disks give roughly double the throughput; four disks give quadruple.
The fatal flaw is reliability. Because data is spread across all disks, losing any single disk destroys the entire array. With N disks, you have N times the probability of catastrophic failure compared to a single drive. RAID 0 is appropriate only for data that can be easily regenerated -- video editing scratch space, game installations, temporary processing buffers.
RAID 1: The Mirror
RAID 1 writes every block to two (or more) disks simultaneously. Capacity equals that of a single disk regardless of how many mirrors you use, but read performance improves because the array can serve different read requests from different disks in parallel.
RAID 1 is the simplest redundancy scheme and the fastest to recover from. When a disk fails, the remaining mirror continues operating at full speed with no degradation. Rebuilding after replacement is a straightforward block-for-block copy. This simplicity makes RAID 1 the standard choice for boot drives and small critical systems where fast recovery matters more than capacity efficiency.
RAID 5: Distributed Parity
RAID 5 stripes data across all disks and distributes parity blocks among them. With N disks, you get (N-1) disks worth of usable capacity, and the array survives any single disk failure. The parity blocks rotate across all disks to prevent any one disk from becoming a bottleneck.
The critical concept is the XOR parity calculation. Given data blocks A and B, the parity P equals A XOR B. If disk B is lost, its contents are recovered by computing A XOR P -- the missing value falls out of the equation. Try it yourself:
XOR Parity Calculator
Interactive RAID 5 parity calculation & recovery demo
Simulate Disk Failure
Why XOR Works for RAID
RAID 5 has an important weakness for write-heavy workloads: the write penalty. Every write to the array requires four I/O operations -- read old data, read old parity, write new data, write new parity. This makes RAID 5 roughly 4x slower for random writes compared to a single disk. It excels for read-heavy workloads like file servers and media streaming, but struggles under database transaction loads.
Another concern is rebuild risk with large modern drives. Rebuilding a failed 8TB drive in a RAID 5 array takes many hours, during which a second failure would destroy all data. For arrays using drives larger than 2TB, RAID 6 is the safer choice.
RAID 6: Double Parity
RAID 6 extends RAID 5 with a second, independent parity calculation (typically using Reed-Solomon coding rather than simple XOR). This allows the array to survive any two simultaneous disk failures, at the cost of losing two disks worth of capacity instead of one.
The write penalty is even steeper -- six I/O operations per write instead of four -- but for the scenarios where RAID 6 is used (large archival arrays, backup storage, systems where rebuild times are measured in days), write performance is rarely the priority. RAID 6 is the standard for arrays of 8 or more disks, where the statistical likelihood of a second failure during rebuild becomes uncomfortably high.
RAID 10: Speed and Safety Combined
RAID 10 (also called RAID 1+0) combines mirroring and striping. Disks are organized into mirrored pairs, and data is striped across the pairs. This gives the read/write performance of RAID 0 with the redundancy of RAID 1 -- the array survives one failure per mirrored pair.
The capacity cost is 50%, the same as RAID 1, but the performance characteristics are superior to RAID 5 or 6, especially for random writes. There is no parity calculation, no write penalty, and rebuilds are fast because only the failed disk's mirror needs to be copied (not the entire array recomputed). RAID 10 is the standard choice for database servers, virtual machine hosts, and any workload that demands both high random I/O and reliability.
RAID Level Comparison
| Property | RAID 0 | RAID 1 | RAID 5 | RAID 6 | RAID 10 |
|---|---|---|---|---|---|
| Minimum disks | 2 | 2 | 3 | 4 | 4 |
| Usable capacity | 100% | 50% | (N-1)/N | (N-2)/N | 50% |
| Read performance | Excellent | Good | Good | Good | Excellent |
| Write performance | Excellent | Fair | Poor (write penalty) | Poorest | Good |
| Disk failures tolerated | 0 | N-1 | 1 | 2 | 1 per pair |
| Rebuild speed | N/A | Fast | Slow (full parity rebuild) | Slower | Fast (mirror copy) |
Interactive Capacity Calculator
RAID Capacity Calculator
Compare usable storage across RAID configurations
| RAID Level | Usable | Efficiency | Fault Tolerance |
|---|---|---|---|
RAID 0 | 8 TB | 100% | 0 disks |
RAID 1 | 2 TB | 25% | 3 disks |
RAID 5 | 6 TB | 75% | 1 disk |
RAID 6 | 4 TB | 50% | 2 disks |
RAID 10 | 4 TB | 50% | 2 disks |
Hardware RAID vs Software RAID
RAID can be implemented either in hardware (a dedicated controller card with its own processor and memory) or in software (the operating system kernel manages the array using the system CPU).
Hardware RAID was historically preferred because early CPUs could not handle parity calculations without impacting application performance. Modern hardware controllers also include battery-backed write caches, which allow safe write-back caching -- the controller acknowledges writes immediately, buffers them in battery-protected RAM, and flushes to disk later. This eliminates the write penalty for RAID 5/6 without risking data loss.
Software RAID (Linux mdadm) has become the dominant choice for most deployments. Modern CPUs handle parity calculations with negligible overhead, and software RAID offers significant advantages: no vendor lock-in, easy migration between systems, full visibility into array state, and zero cost. The main disadvantage is the lack of battery-backed cache, making write-back caching unsafe without a UPS.
| Factor | Software RAID | Hardware RAID |
|---|---|---|
| Cost | Free | Hundreds to thousands of dollars |
| Write cache safety | Needs UPS | Battery-backed |
| Portability | Move disks to any Linux system | Requires same controller model |
| Controller failure | No single point of failure | Controller death can orphan data |
| CPU overhead | Minimal on modern hardware | Zero |
| Visibility | Full kernel-level monitoring | Proprietary tools required |
Filesystem-Integrated RAID
Modern copy-on-write filesystems like Btrfs and ZFS integrate RAID functionality directly, offering capabilities that traditional RAID cannot match.
ZFS RAID-Z eliminates the "write hole" problem that plagues traditional RAID 5 -- a scenario where a power failure during a write leaves parity inconsistent with data. RAID-Z also provides end-to-end checksumming and self-healing: if a read detects a checksum mismatch, ZFS automatically reconstructs the correct data from parity and repairs the corrupted copy. The tradeoff is that RAID-Z vdevs cannot be expanded by adding individual disks; you must add an entirely new vdev.
Btrfs RAID offers flexible data and metadata redundancy, allowing different RAID levels for data versus metadata on the same volume. It supports online reshaping (adding and removing disks from a live filesystem) and per-file checksumming. However, Btrfs RAID 5/6 implementations remain unstable as of 2025 and should not be used for production data.
| Feature | mdadm | Btrfs RAID | ZFS RAID-Z |
|---|---|---|---|
| Data checksums | No | Yes | Yes |
| Self-healing | No | Yes (RAID 1) | Yes |
| Write hole protection | No | No (RAID 5/6) | Yes |
| Stability | Excellent | RAID 1 stable; 5/6 unstable | Excellent |
| Expansion flexibility | Easy | Easy | Must add full vdev |
| Memory requirements | Low | Low-Medium | High (ARC cache) |
See Btrfs and ZFS for deeper exploration of these filesystem-integrated approaches.
Performance Tuning Concepts
Two parameters have the greatest impact on RAID array performance:
Chunk size determines how much data is written to each disk before moving to the next. Large chunks (256KB-1MB) favor sequential workloads like video streaming, because each disk handles large contiguous reads. Small chunks (64KB) favor random I/O workloads by distributing small operations more evenly across disks.
Filesystem alignment ensures that the filesystem's allocation units line up with the RAID stripe boundaries. Misaligned writes can straddle two stripes, doubling the I/O required. Modern mkfs tools detect RAID geometry automatically, but manual configuration is sometimes necessary for optimal results.
Common RAID Myths Debunked
"RAID is a backup." This is the most dangerous misconception in storage administration. RAID protects against hardware failure -- nothing more. It does not protect against accidental deletion, ransomware, software corruption, fire, theft, or any scenario that affects all disks simultaneously. Always maintain separate backups, preferably offsite.
"RAID 5 is dead for large disks." This is partially true. Rebuilding a large drive takes many hours, during which a second failure would be catastrophic. However, with proactive monitoring, hot spares that trigger automatic rebuild, and regular scrubbing to detect latent errors early, RAID 5 remains viable for moderate-sized arrays. For drives larger than 2TB or arrays with more than 6 disks, RAID 6 is the more conservative choice.
"Hardware RAID is always better." On modern systems, this is rarely true. Software RAID on a modern CPU matches or exceeds hardware RAID performance for most workloads. Hardware RAID's only clear advantage is battery-backed write caching for write-intensive RAID 5/6 workloads.
Choosing the Right RAID Level
| Scenario | Best RAID Level | Rationale |
|---|---|---|
| Gaming PC | RAID 0 | Maximum speed; games can be reinstalled |
| Boot drive | RAID 1 | Simple redundancy, fast recovery |
| Home NAS | RAID 5 or 6 | Good capacity-to-protection ratio |
| Web server | RAID 10 | Fast reads, reliable under load |
| Database server | RAID 10 | Fast random I/O, no write penalty |
| Backup/archive storage | RAID 6 | Maximum protection; write speed not critical |
| Video editing workspace | RAID 0 + separate backup | Speed for active projects, backup for safety |
Best Practices
-
RAID is not backup. Maintain offsite backups independent of your RAID array. No RAID level protects against ransomware, accidental deletion, or fire.
-
Monitor continuously. A degraded array running on borrowed time looks perfectly healthy to users. Automated monitoring with email alerts ensures failed disks are replaced before a second failure strikes.
-
Keep hot spares ready. A hot spare begins rebuilding automatically the moment a disk fails, minimizing the window of vulnerability.
-
Schedule regular scrubs. Periodic scrubbing reads every block in the array and verifies parity consistency, catching latent "bit rot" errors before they compound into data loss.
-
Match your disks. Use drives with identical specifications (capacity, RPM, cache size) from the same product line. Mismatched drives cause the entire array to perform at the speed of the slowest member.
-
Test your recovery procedure. Simulate a disk failure and verify that you can rebuild the array before you face a real emergency. Untested recovery plans are not plans.
Back to Filesystems Overview
