Skip to main content

RAID: Redundant Arrays for Speed and Safety

RAID storage visualized: RAID 0, 1, 5, 6, and 10 levels explained. Learn how they work, when to use them, and disk failure recovery.

18 min|linuxstorageraid
Best viewed on desktop for optimal interactive experience

Why RAID Matters

Every hard drive will eventually fail. The question is not whether, but when -- and whether your system can survive it. A single consumer drive has a mean time between failures of roughly 3-5 years under continuous load. In a data center with thousands of drives, failures are a daily occurrence. RAID (Redundant Array of Independent Disks) was invented to solve this fundamental problem: how do you build reliable storage out of unreliable components?

But RAID is not just about survival. By spreading data across multiple drives, RAID can also multiply read and write throughput far beyond what any single drive delivers. The genius of RAID is that it offers a spectrum of tradeoffs -- from pure speed with zero protection, to bulletproof redundancy that survives multiple simultaneous failures -- and lets you choose the balance that fits your needs.

Interactive RAID Visualization

Explore different RAID levels below. Click on disks to simulate failures and watch how each level handles the loss differently:

RAID Level Explorer

RAID 0: Striping

Data striped across disks for maximum speed

๐Ÿ’ก Click on a disk to simulate failure
Disk 0
65% used
Disk 1
65% used
Performance
โšกโšกโšกโšกโšก
Reliability
๐Ÿ’€
Capacity
100%
Advantages
  • โ€ข2x speed
  • โ€ขFull capacity
  • โ€ขSimple setup
Disadvantages
  • โ€ขNo redundancy
  • โ€ขTotal data loss if any disk fails
  • โ€ขNot for critical data

RAID Levels Comparison

LevelMin DisksCapacityRedundancyRead SpeedWrite SpeedUse Case
RAID 02100%NoneExcellentExcellentVideo editing, gaming
RAID 1250%1 diskGoodNormalOS drives, critical data
RAID 5366-94%1 diskGoodSlowFile servers, NAS
RAID 6450-88%2 disksGoodSlowerLarge arrays, archives
RAID 10450%1-2 disksExcellentGoodDatabases, VMs

Understanding RAID Operations

Write Operations

RAID 0: Data split and written simultaneously to all disks

RAID 1: Same data written to all disks (mirrors)

RAID 5/6: Data + calculated parity written across disks

RAID 10: Data striped across mirror pairs

Failure Recovery

RAID 0: No recovery - total data loss

RAID 1: Read from surviving mirror

RAID 5: Reconstruct from data + parity

RAID 6: Can survive 2 disk failures

The Core Concepts: Striping, Mirroring, and Parity

Every RAID level is built from combinations of three fundamental techniques:

Striping splits data across multiple disks so that reads and writes happen in parallel. Think of it like distributing a deck of cards across several players -- dealing goes much faster than handing the whole deck to one person. Striping multiplies throughput but provides no protection: lose one disk, lose everything.

Mirroring writes identical copies of data to two or more disks. It is the simplest form of redundancy: if one disk dies, the other has a perfect copy. The cost is capacity -- you get only half of your total disk space. The benefit is instant recovery with no computation required.

Parity is the mathematical trick that makes RAID 5 and 6 possible. Using XOR operations, the array calculates a parity value from the data blocks. If any single disk is lost, its contents can be reconstructed from the remaining data and the parity. This provides redundancy without the 50% capacity penalty of mirroring.

Understanding RAID Levels

RAID 0: Pure Speed, Zero Safety

RAID 0 stripes data across all disks with no redundancy whatsoever. Every disk contributes its full capacity, and read/write speeds scale nearly linearly with the number of disks. Two disks give roughly double the throughput; four disks give quadruple.

The fatal flaw is reliability. Because data is spread across all disks, losing any single disk destroys the entire array. With N disks, you have N times the probability of catastrophic failure compared to a single drive. RAID 0 is appropriate only for data that can be easily regenerated -- video editing scratch space, game installations, temporary processing buffers.

RAID 1: The Mirror

RAID 1 writes every block to two (or more) disks simultaneously. Capacity equals that of a single disk regardless of how many mirrors you use, but read performance improves because the array can serve different read requests from different disks in parallel.

RAID 1 is the simplest redundancy scheme and the fastest to recover from. When a disk fails, the remaining mirror continues operating at full speed with no degradation. Rebuilding after replacement is a straightforward block-for-block copy. This simplicity makes RAID 1 the standard choice for boot drives and small critical systems where fast recovery matters more than capacity efficiency.

RAID 5: Distributed Parity

RAID 5 stripes data across all disks and distributes parity blocks among them. With N disks, you get (N-1) disks worth of usable capacity, and the array survives any single disk failure. The parity blocks rotate across all disks to prevent any one disk from becoming a bottleneck.

The critical concept is the XOR parity calculation. Given data blocks A and B, the parity P equals A XOR B. If disk B is lost, its contents are recovered by computing A XOR P -- the missing value falls out of the equation. Try it yourself:

XOR Parity Calculator

Interactive RAID 5 parity calculation & recovery demo

Disk 0
Data A
Decimal: 214
Disk 1
Data B
Decimal: 171
Disk 2
Parity
01111101
Decimal: 125
Bit-by-bit XOR Operation
1
1
0
1
0
1
0
1
1
1
0
1
0
1
1
1
0
1
1
1
0
0
1
1
XOR Rule: Same bits โ†’ 0 | Different bits โ†’ 1

Simulate Disk Failure

Why XOR Works for RAID
โ†บReversible: A โŠ• B = P means A = B โŠ• P
โ‡„Commutative: A โŠ• B = B โŠ• A
โŸณAssociative: (A โŠ• B) โŠ• C = A โŠ• (B โŠ• C)
โšกFast: CPU performs XOR in single cycle
RAID 5 uses distributed parity โ€” XOR values rotate across all disks to avoid bottlenecks

RAID 5 has an important weakness for write-heavy workloads: the write penalty. Every write to the array requires four I/O operations -- read old data, read old parity, write new data, write new parity. This makes RAID 5 roughly 4x slower for random writes compared to a single disk. It excels for read-heavy workloads like file servers and media streaming, but struggles under database transaction loads.

Another concern is rebuild risk with large modern drives. Rebuilding a failed 8TB drive in a RAID 5 array takes many hours, during which a second failure would destroy all data. For arrays using drives larger than 2TB, RAID 6 is the safer choice.

RAID 6: Double Parity

RAID 6 extends RAID 5 with a second, independent parity calculation (typically using Reed-Solomon coding rather than simple XOR). This allows the array to survive any two simultaneous disk failures, at the cost of losing two disks worth of capacity instead of one.

The write penalty is even steeper -- six I/O operations per write instead of four -- but for the scenarios where RAID 6 is used (large archival arrays, backup storage, systems where rebuild times are measured in days), write performance is rarely the priority. RAID 6 is the standard for arrays of 8 or more disks, where the statistical likelihood of a second failure during rebuild becomes uncomfortably high.

RAID 10: Speed and Safety Combined

RAID 10 (also called RAID 1+0) combines mirroring and striping. Disks are organized into mirrored pairs, and data is striped across the pairs. This gives the read/write performance of RAID 0 with the redundancy of RAID 1 -- the array survives one failure per mirrored pair.

The capacity cost is 50%, the same as RAID 1, but the performance characteristics are superior to RAID 5 or 6, especially for random writes. There is no parity calculation, no write penalty, and rebuilds are fast because only the failed disk's mirror needs to be copied (not the entire array recomputed). RAID 10 is the standard choice for database servers, virtual machine hosts, and any workload that demands both high random I/O and reliability.

RAID Level Comparison

PropertyRAID 0RAID 1RAID 5RAID 6RAID 10
Minimum disks22344
Usable capacity100%50%(N-1)/N(N-2)/N50%
Read performanceExcellentGoodGoodGoodExcellent
Write performanceExcellentFairPoor (write penalty)PoorestGood
Disk failures tolerated0N-1121 per pair
Rebuild speedN/AFastSlow (full parity rebuild)SlowerFast (mirror copy)

Interactive Capacity Calculator

RAID Capacity Calculator

Compare usable storage across RAID configurations

212
1 TB20 TB
RAID 5 ArrayDistributed Parity
Data
Parity
Total Raw Capacity: 8 TB
Usable
6 TB
Efficiency
75%
Fault Tolerance
1 disk
Overhead
2 TB
Formula: (4 - 1) ร— 2TB = 6TB
1 disk worth of distributed parity
Capacity Breakdown
6TB Usable
2TB Overhead
0 TB8 TB Total
RAID LevelUsableEfficiencyFault Tolerance
RAID 0
8 TB100%0 disks
RAID 1
2 TB25%3 disks
RAID 5
6 TB75%1 disk
RAID 6
4 TB50%2 disks
RAID 10
4 TB50%2 disks
Actual capacity may vary slightly due to filesystem overhead and disk formatting

Hardware RAID vs Software RAID

RAID can be implemented either in hardware (a dedicated controller card with its own processor and memory) or in software (the operating system kernel manages the array using the system CPU).

Hardware RAID was historically preferred because early CPUs could not handle parity calculations without impacting application performance. Modern hardware controllers also include battery-backed write caches, which allow safe write-back caching -- the controller acknowledges writes immediately, buffers them in battery-protected RAM, and flushes to disk later. This eliminates the write penalty for RAID 5/6 without risking data loss.

Software RAID (Linux mdadm) has become the dominant choice for most deployments. Modern CPUs handle parity calculations with negligible overhead, and software RAID offers significant advantages: no vendor lock-in, easy migration between systems, full visibility into array state, and zero cost. The main disadvantage is the lack of battery-backed cache, making write-back caching unsafe without a UPS.

FactorSoftware RAIDHardware RAID
CostFreeHundreds to thousands of dollars
Write cache safetyNeeds UPSBattery-backed
PortabilityMove disks to any Linux systemRequires same controller model
Controller failureNo single point of failureController death can orphan data
CPU overheadMinimal on modern hardwareZero
VisibilityFull kernel-level monitoringProprietary tools required

Filesystem-Integrated RAID

Modern copy-on-write filesystems like Btrfs and ZFS integrate RAID functionality directly, offering capabilities that traditional RAID cannot match.

ZFS RAID-Z eliminates the "write hole" problem that plagues traditional RAID 5 -- a scenario where a power failure during a write leaves parity inconsistent with data. RAID-Z also provides end-to-end checksumming and self-healing: if a read detects a checksum mismatch, ZFS automatically reconstructs the correct data from parity and repairs the corrupted copy. The tradeoff is that RAID-Z vdevs cannot be expanded by adding individual disks; you must add an entirely new vdev.

Btrfs RAID offers flexible data and metadata redundancy, allowing different RAID levels for data versus metadata on the same volume. It supports online reshaping (adding and removing disks from a live filesystem) and per-file checksumming. However, Btrfs RAID 5/6 implementations remain unstable as of 2025 and should not be used for production data.

FeaturemdadmBtrfs RAIDZFS RAID-Z
Data checksumsNoYesYes
Self-healingNoYes (RAID 1)Yes
Write hole protectionNoNo (RAID 5/6)Yes
StabilityExcellentRAID 1 stable; 5/6 unstableExcellent
Expansion flexibilityEasyEasyMust add full vdev
Memory requirementsLowLow-MediumHigh (ARC cache)

See Btrfs and ZFS for deeper exploration of these filesystem-integrated approaches.

Performance Tuning Concepts

Two parameters have the greatest impact on RAID array performance:

Chunk size determines how much data is written to each disk before moving to the next. Large chunks (256KB-1MB) favor sequential workloads like video streaming, because each disk handles large contiguous reads. Small chunks (64KB) favor random I/O workloads by distributing small operations more evenly across disks.

Filesystem alignment ensures that the filesystem's allocation units line up with the RAID stripe boundaries. Misaligned writes can straddle two stripes, doubling the I/O required. Modern mkfs tools detect RAID geometry automatically, but manual configuration is sometimes necessary for optimal results.

Common RAID Myths Debunked

"RAID is a backup." This is the most dangerous misconception in storage administration. RAID protects against hardware failure -- nothing more. It does not protect against accidental deletion, ransomware, software corruption, fire, theft, or any scenario that affects all disks simultaneously. Always maintain separate backups, preferably offsite.

"RAID 5 is dead for large disks." This is partially true. Rebuilding a large drive takes many hours, during which a second failure would be catastrophic. However, with proactive monitoring, hot spares that trigger automatic rebuild, and regular scrubbing to detect latent errors early, RAID 5 remains viable for moderate-sized arrays. For drives larger than 2TB or arrays with more than 6 disks, RAID 6 is the more conservative choice.

"Hardware RAID is always better." On modern systems, this is rarely true. Software RAID on a modern CPU matches or exceeds hardware RAID performance for most workloads. Hardware RAID's only clear advantage is battery-backed write caching for write-intensive RAID 5/6 workloads.

Choosing the Right RAID Level

ScenarioBest RAID LevelRationale
Gaming PCRAID 0Maximum speed; games can be reinstalled
Boot driveRAID 1Simple redundancy, fast recovery
Home NASRAID 5 or 6Good capacity-to-protection ratio
Web serverRAID 10Fast reads, reliable under load
Database serverRAID 10Fast random I/O, no write penalty
Backup/archive storageRAID 6Maximum protection; write speed not critical
Video editing workspaceRAID 0 + separate backupSpeed for active projects, backup for safety

Best Practices

  1. RAID is not backup. Maintain offsite backups independent of your RAID array. No RAID level protects against ransomware, accidental deletion, or fire.

  2. Monitor continuously. A degraded array running on borrowed time looks perfectly healthy to users. Automated monitoring with email alerts ensures failed disks are replaced before a second failure strikes.

  3. Keep hot spares ready. A hot spare begins rebuilding automatically the moment a disk fails, minimizing the window of vulnerability.

  4. Schedule regular scrubs. Periodic scrubbing reads every block in the array and verifies parity consistency, catching latent "bit rot" errors before they compound into data loss.

  5. Match your disks. Use drives with identical specifications (capacity, RPM, cache size) from the same product line. Mismatched drives cause the entire array to perform at the speed of the slowest member.

  6. Test your recovery procedure. Simulate a disk failure and verify that you can rebuild the array before you face a real emergency. Untested recovery plans are not plans.

Back to Filesystems Overview

If you found this explanation helpful, consider sharing it with others.

Mastodon