RAID: Redundant Arrays for Speed and Safety

Why RAID Matters

Every hard drive will eventually fail. The question is not whether, but when -- and whether your system can survive it. A single consumer drive has a mean time between failures of roughly 3-5 years under continuous load. In a data center with thousands of drives, failures are a daily occurrence. RAID (Redundant Array of Independent Disks) was invented to solve this fundamental problem: how do you build reliable storage out of unreliable components?

But RAID is not just about survival. By spreading data across multiple drives, RAID can also multiply read and write throughput far beyond what any single drive delivers. The genius of RAID is that it offers a spectrum of tradeoffs -- from pure speed with zero protection, to bulletproof redundancy that survives multiple simultaneous failures -- and lets you choose the balance that fits your needs.

Interactive RAID Visualization

Explore different RAID levels below. Click on disks to simulate failures and watch how each level handles the loss differently:

RAID Level Explorer

RAID 0: Striping

Data striped across disks for maximum speed

💡 Click on a disk to simulate failure

Disk 0

65% used

Disk 1

65% used

Performance

⚡⚡⚡⚡⚡

Reliability

💀

Capacity

100%

Advantages

•2x speed
•Full capacity
•Simple setup

Disadvantages

•No redundancy
•Total data loss if any disk fails
•Not for critical data

RAID Levels Comparison

Level	Min Disks	Capacity	Redundancy	Read Speed	Write Speed	Use Case
RAID 0	2	100%	None	Excellent	Excellent	Video editing, gaming
RAID 1	2	50%	1 disk	Good	Normal	OS drives, critical data
RAID 5	3	66-94%	1 disk	Good	Slow	File servers, NAS
RAID 6	4	50-88%	2 disks	Good	Slower	Large arrays, archives
RAID 10	4	50%	1-2 disks	Excellent	Good	Databases, VMs

Understanding RAID Operations

Write Operations

RAID 0: Data split and written simultaneously to all disks

RAID 1: Same data written to all disks (mirrors)

RAID 5/6: Data + calculated parity written across disks

RAID 10: Data striped across mirror pairs

Failure Recovery

RAID 0: No recovery - total data loss

RAID 1: Read from surviving mirror

RAID 5: Reconstruct from data + parity

RAID 6: Can survive 2 disk failures

The Core Concepts: Striping, Mirroring, and Parity

Every RAID level is built from combinations of three fundamental techniques:

Striping splits data across multiple disks so that reads and writes happen in parallel. Think of it like distributing a deck of cards across several players -- dealing goes much faster than handing the whole deck to one person. Striping multiplies throughput but provides no protection: lose one disk, lose everything.

Mirroring writes identical copies of data to two or more disks. It is the simplest form of redundancy: if one disk dies, the other has a perfect copy. The cost is capacity -- you get only half of your total disk space. The benefit is instant recovery with no computation required.

Parity is the mathematical trick that makes RAID 5 and 6 possible. Using XOR operations, the array calculates a parity value from the data blocks. If any single disk is lost, its contents can be reconstructed from the remaining data and the parity. This provides redundancy without the 50% capacity penalty of mirroring.

Understanding RAID Levels

RAID 0: Pure Speed, Zero Safety

RAID 0 stripes data across all disks with no redundancy whatsoever. Every disk contributes its full capacity, and read/write speeds scale nearly linearly with the number of disks. Two disks give roughly double the throughput; four disks give quadruple.

The fatal flaw is reliability. Because data is spread across all disks, losing any single disk destroys the entire array. With N disks, you have N times the probability of catastrophic failure compared to a single drive. RAID 0 is appropriate only for data that can be easily regenerated -- video editing scratch space, game installations, temporary processing buffers.

RAID 1: The Mirror

RAID 1 writes every block to two (or more) disks simultaneously. Capacity equals that of a single disk regardless of how many mirrors you use, but read performance improves because the array can serve different read requests from different disks in parallel.

RAID 1 is the simplest redundancy scheme and the fastest to recover from. When a disk fails, the remaining mirror continues operating at full speed with no degradation. Rebuilding after replacement is a straightforward block-for-block copy. This simplicity makes RAID 1 the standard choice for boot drives and small critical systems where fast recovery matters more than capacity efficiency.

RAID 5: Distributed Parity

RAID 5 stripes data across all disks and distributes parity blocks among them. With N disks, you get (N-1) disks worth of usable capacity, and the array survives any single disk failure. The parity blocks rotate across all disks to prevent any one disk from becoming a bottleneck.

The critical concept is the XOR parity calculation. Given data blocks A and B, the parity P equals A XOR B. If disk B is lost, its contents are recovered by computing A XOR P -- the missing value falls out of the equation. Try it yourself:

XOR Parity Calculator

Interactive RAID 5 parity calculation & recovery demo

Disk 0

Data A

Decimal: 214

Disk 1

Data B

Decimal: 171

Disk 2

Parity

01111101

Decimal: 125

Bit-by-bit XOR Operation

XOR Rule: Same bits → 0 | Different bits → 1

Simulate Disk Failure

Why XOR Works for RAID

↺Reversible: A ⊕ B = P means A = B ⊕ P

⇄Commutative: A ⊕ B = B ⊕ A

⟳Associative: (A ⊕ B) ⊕ C = A ⊕ (B ⊕ C)

⚡Fast: CPU performs XOR in single cycle

RAID 5 uses distributed parity — XOR values rotate across all disks to avoid bottlenecks

RAID 5 has an important weakness for write-heavy workloads: the write penalty. Every write to the array requires four I/O operations -- read old data, read old parity, write new data, write new parity. This makes RAID 5 roughly 4x slower for random writes compared to a single disk. It excels for read-heavy workloads like file servers and media streaming, but struggles under database transaction loads.

Another concern is rebuild risk with large modern drives. Rebuilding a failed 8TB drive in a RAID 5 array takes many hours, during which a second failure would destroy all data. For arrays using drives larger than 2TB, RAID 6 is the safer choice.

RAID 6: Double Parity

RAID 6 extends RAID 5 with a second, independent parity calculation (typically using Reed-Solomon coding rather than simple XOR). This allows the array to survive any two simultaneous disk failures, at the cost of losing two disks worth of capacity instead of one.

The write penalty is even steeper -- six I/O operations per write instead of four -- but for the scenarios where RAID 6 is used (large archival arrays, backup storage, systems where rebuild times are measured in days), write performance is rarely the priority. RAID 6 is the standard for arrays of 8 or more disks, where the statistical likelihood of a second failure during rebuild becomes uncomfortably high.

RAID 10: Speed and Safety Combined

RAID 10 (also called RAID 1+0) combines mirroring and striping. Disks are organized into mirrored pairs, and data is striped across the pairs. This gives the read/write performance of RAID 0 with the redundancy of RAID 1 -- the array survives one failure per mirrored pair.

The capacity cost is 50%, the same as RAID 1, but the performance characteristics are superior to RAID 5 or 6, especially for random writes. There is no parity calculation, no write penalty, and rebuilds are fast because only the failed disk's mirror needs to be copied (not the entire array recomputed). RAID 10 is the standard choice for database servers, virtual machine hosts, and any workload that demands both high random I/O and reliability.

RAID Level Comparison

Property	RAID 0	RAID 1	RAID 5	RAID 6	RAID 10
Minimum disks	2	2	3	4	4
Usable capacity	100%	50%	(N-1)/N	(N-2)/N	50%
Read performance	Excellent	Good	Good	Good	Excellent
Write performance	Excellent	Fair	Poor (write penalty)	Poorest	Good
Disk failures tolerated	0	N-1	1	2	1 per pair
Rebuild speed	N/A	Fast	Slow (full parity rebuild)	Slower	Fast (mirror copy)

Interactive Capacity Calculator

RAID Capacity Calculator

Compare usable storage across RAID configurations

Number of Disks4

212

Disk Size2 TB

1 TB20 TB

RAID 5 ArrayDistributed Parity

2TB

Data

Parity

Total Raw Capacity: 8 TB

Usable

6 TB

Efficiency

75%

Fault Tolerance

1 disk

Overhead

2 TB

Formula: (4 - 1) × 2TB = 6TB

1 disk worth of distributed parity

Capacity Breakdown

6TB Usable

2TB Overhead

0 TB8 TB Total

RAID Level	Usable	Efficiency	Fault Tolerance
RAID 0	8 TB	100%	0 disks
RAID 1	2 TB	25%	3 disks
RAID 5	6 TB	75%	1 disk
RAID 6	4 TB	50%	2 disks
RAID 10	4 TB	50%	2 disks

Actual capacity may vary slightly due to filesystem overhead and disk formatting

Hardware RAID vs Software RAID

RAID can be implemented either in hardware (a dedicated controller card with its own processor and memory) or in software (the operating system kernel manages the array using the system CPU).

Hardware RAID was historically preferred because early CPUs could not handle parity calculations without impacting application performance. Modern hardware controllers also include battery-backed write caches, which allow safe write-back caching -- the controller acknowledges writes immediately, buffers them in battery-protected RAM, and flushes to disk later. This eliminates the write penalty for RAID 5/6 without risking data loss.

Software RAID (Linux mdadm) has become the dominant choice for most deployments. Modern CPUs handle parity calculations with negligible overhead, and software RAID offers significant advantages: no vendor lock-in, easy migration between systems, full visibility into array state, and zero cost. The main disadvantage is the lack of battery-backed cache, making write-back caching unsafe without a UPS.

Factor	Software RAID	Hardware RAID
Cost	Free	Hundreds to thousands of dollars
Write cache safety	Needs UPS	Battery-backed
Portability	Move disks to any Linux system	Requires same controller model
Controller failure	No single point of failure	Controller death can orphan data
CPU overhead	Minimal on modern hardware	Zero
Visibility	Full kernel-level monitoring	Proprietary tools required

Filesystem-Integrated RAID

Modern copy-on-write filesystems like Btrfs and ZFS integrate RAID functionality directly, offering capabilities that traditional RAID cannot match.

ZFS RAID-Z eliminates the "write hole" problem that plagues traditional RAID 5 -- a scenario where a power failure during a write leaves parity inconsistent with data. RAID-Z also provides end-to-end checksumming and self-healing: if a read detects a checksum mismatch, ZFS automatically reconstructs the correct data from parity and repairs the corrupted copy. The tradeoff is that RAID-Z vdevs cannot be expanded by adding individual disks; you must add an entirely new vdev.

Btrfs RAID offers flexible data and metadata redundancy, allowing different RAID levels for data versus metadata on the same volume. It supports online reshaping (adding and removing disks from a live filesystem) and per-file checksumming. However, Btrfs RAID 5/6 implementations remain unstable as of 2025 and should not be used for production data.

Feature	mdadm	Btrfs RAID	ZFS RAID-Z
Data checksums	No	Yes	Yes
Self-healing	No	Yes (RAID 1)	Yes
Write hole protection	No	No (RAID 5/6)	Yes
Stability	Excellent	RAID 1 stable; 5/6 unstable	Excellent
Expansion flexibility	Easy	Easy	Must add full vdev
Memory requirements	Low	Low-Medium	High (ARC cache)

See Btrfs and ZFS for deeper exploration of these filesystem-integrated approaches.

Performance Tuning Concepts

Two parameters have the greatest impact on RAID array performance:

Chunk size determines how much data is written to each disk before moving to the next. Large chunks (256KB-1MB) favor sequential workloads like video streaming, because each disk handles large contiguous reads. Small chunks (64KB) favor random I/O workloads by distributing small operations more evenly across disks.

Filesystem alignment ensures that the filesystem's allocation units line up with the RAID stripe boundaries. Misaligned writes can straddle two stripes, doubling the I/O required. Modern mkfs tools detect RAID geometry automatically, but manual configuration is sometimes necessary for optimal results.

Common RAID Myths Debunked

"RAID is a backup." This is the most dangerous misconception in storage administration. RAID protects against hardware failure -- nothing more. It does not protect against accidental deletion, ransomware, software corruption, fire, theft, or any scenario that affects all disks simultaneously. Always maintain separate backups, preferably offsite.

"RAID 5 is dead for large disks." This is partially true. Rebuilding a large drive takes many hours, during which a second failure would be catastrophic. However, with proactive monitoring, hot spares that trigger automatic rebuild, and regular scrubbing to detect latent errors early, RAID 5 remains viable for moderate-sized arrays. For drives larger than 2TB or arrays with more than 6 disks, RAID 6 is the more conservative choice.

"Hardware RAID is always better." On modern systems, this is rarely true. Software RAID on a modern CPU matches or exceeds hardware RAID performance for most workloads. Hardware RAID's only clear advantage is battery-backed write caching for write-intensive RAID 5/6 workloads.

Choosing the Right RAID Level

Scenario	Best RAID Level	Rationale
Gaming PC	RAID 0	Maximum speed; games can be reinstalled
Boot drive	RAID 1	Simple redundancy, fast recovery
Home NAS	RAID 5 or 6	Good capacity-to-protection ratio
Web server	RAID 10	Fast reads, reliable under load
Database server	RAID 10	Fast random I/O, no write penalty
Backup/archive storage	RAID 6	Maximum protection; write speed not critical
Video editing workspace	RAID 0 + separate backup	Speed for active projects, backup for safety

Best Practices

RAID is not backup. Maintain offsite backups independent of your RAID array. No RAID level protects against ransomware, accidental deletion, or fire.
Monitor continuously. A degraded array running on borrowed time looks perfectly healthy to users. Automated monitoring with email alerts ensures failed disks are replaced before a second failure strikes.
Keep hot spares ready. A hot spare begins rebuilding automatically the moment a disk fails, minimizing the window of vulnerability.
Schedule regular scrubs. Periodic scrubbing reads every block in the array and verifies parity consistency, catching latent "bit rot" errors before they compound into data loss.
Match your disks. Use drives with identical specifications (capacity, RPM, cache size) from the same product line. Mismatched drives cause the entire array to perform at the speed of the slowest member.
Test your recovery procedure. Simulate a disk failure and verify that you can rebuild the array before you face a real emergency. Untested recovery plans are not plans.

Back to Filesystems Overview