Copy-on-Write (CoW): Never Overwrite, Always Preserve

Understand Copy-on-Write (CoW) in Btrfs and ZFS. Learn how CoW enables instant snapshots, atomic writes, and data integrity.

Best viewed on desktop for optimal interactive experience

The Traditional Problem: In-Place Updates

Traditional filesystems (ext4, XFS, FAT) use in-place updates:

  1. Read existing block
  2. Modify content
  3. Overwrite same block
  4. Old data gone forever

Problems:

  • Not atomic: Power failure = partially written block (corruption)
  • No history: Can't undo or snapshot without copying entire filesystem
  • Dangerous: One wrong write destroys data permanently

The Copy-on-Write Solution

Core Principle: Never modify data in place. Instead:

  1. Read existing block
  2. Allocate NEW block
  3. Write modified data to new block
  4. Update pointer (metadata)
  5. Old data remains untouched until no longer needed

Benefits:

  • Atomic writes: Either old state or new state (never corrupted)
  • Free snapshots: Old data already preserved!
  • Time travel: Keep references to old blocks = instant history
  • Data integrity: Never risk overwriting good data

How Copy-on-Write Works: Interactive Exploration

See CoW in action—from simple writes to instant snapshots:

Simple Write Operation: CoW in Action

Step 1 of 5

Initial State: File with 3 Blocks

Block 100
Block A
refs: 1
Block 101
Block B
refs: 1
Block 102
Block C
refs: 1
file.txt
Pointers: [100, 101, 102]
Size: 12KB

File "file.txt" consists of 3 data blocks

Blocks 100, 101, 102 stored on disk

Metadata points to these blocks

Each block has reference count = 1

User wants to modify Block B (middle block)

Key CoW Concepts

1. Write-Anywhere Allocation

Traditional: "Write block 1000 to sector 1000" CoW: "Write data anywhere free, update pointer"

Traditional (in-place): Block 1000: [old data] → [new data] ❌ Old data lost CoW (write-anywhere): Block 1000: [old data] ← still exists! Block 5280: [new data] ← written here Pointer: 1000 → 5280 ✅ Old data preserved

2. Metadata Updates Are Key

CoW depends on atomic metadata updates:

1. Allocate new block (5280) 2. Write data to new block 3. Update parent pointer: 1000 → 5280 ← Atomic! 4. Old block (1000) now unreferenced

If crash happens:

  • Before step 3: Old data still referenced (no change visible)
  • After step 3: New data referenced (change complete)
  • Never half-updated!

3. Reference Counting

Blocks are freed only when no references remain:

Block 1000: refs=2 (original file + snapshot) Block 5280: refs=1 (only current file) Delete snapshot: Block 1000: refs=1 → Can't free yet Block 5280: refs=1 → Keep Delete file: Block 1000: refs=0 → NOW free! Block 5280: refs=0 → Free

Snapshots: The Magic of CoW

With CoW, snapshots are free:

Traditional (non-CoW) Snapshot:

Copy entire filesystem: 100GB → 100GB copy Time: Minutes to hours Space: 200GB total

CoW Snapshot:

1. Create new root pointer → same blocks 2. Mark: "preserve current state" Time: Instant (milliseconds) Space: 0 bytes initially!

After modifications:

  • Modified blocks: New copies created (CoW kicks in)
  • Unmodified blocks: Shared between original and snapshot
  • Space used = only changed data

Snapshot Space Efficiency

Original: 100GB Snapshot: 0GB (just metadata) Modify 10GB: Original: points to 90GB old + 10GB new = 100GB Snapshot: points to 100GB old Total space: 110GB (not 200GB!) Efficiency: Only changed blocks duplicated

Atomic Operations

CoW makes complex operations atomic:

Example: Rename Directory

Traditional filesystem:

1. Update old parent: remove entry 2. Update new parent: add entry 3. Update directory: change ".." link ❌ Crash between steps = corruption!

CoW filesystem:

1. Create new metadata tree with changes 2. Update root pointer (atomic!) ✅ Either all changes visible or none

Example: Database Transaction

1. Write new data blocks (CoW) 2. Write new index blocks (CoW) 3. Write new metadata (CoW) 4. Update root (atomic commit!) Crash before step 4: Old state intact Crash after step 4: New state complete Never inconsistent!

CoW in Different Filesystems

Btrfs

  • Full CoW: Data and metadata
  • B-tree based: All structures use CoW
  • Subvolumes: Lightweight CoW containers
  • Reflinks: Share blocks between files
  • Command: cp --reflink=always src dest (instant copy!)

ZFS

  • Full CoW: Data and metadata
  • Pooled storage: Write anywhere in pool
  • Checksums: Every block verified
  • Snapshots: Recursive across datasets
  • Clones: Writable snapshots

APFS (Apple)

  • CoW for metadata: Data optionally
  • Space sharing: Multiple volumes, one pool
  • Clones: Instant file copies

Performance Implications

Advantages

  • Parallel writes: Write anywhere = no seek
  • SSD friendly: Even wear across device
  • Fast snapshots: No data copying
  • No fragmentation concerns: Every write is "fresh"

Challenges

  • Write amplification: Metadata updates cascade up tree
  • Fragmentation: Related blocks scattered
  • Space accounting: Hard to predict free space
  • Performance: Can degrade when full (>80%)

Optimization Tips

# Btrfs: Defragment (breaks CoW links!) btrfs filesystem defragment -r /mnt # ZFS: Set recordsize for workload zfs set recordsize=128k tank/database # Keep free space >20% for performance df -h /mnt # Monitor usage

Space Reclamation

Old blocks freed when no longer referenced:

# Btrfs: Delete old snapshots to free space btrfs subvolume delete /mnt/.snapshots/old # ZFS: Destroy snapshots zfs destroy tank/data@old-snapshot # Both: Check space used by snapshots btrfs qgroup show /mnt zfs list -t snapshot -o space

When CoW Hurts: Disable It

Some workloads conflict with CoW:

Databases (Random Writes)

# Btrfs: Disable CoW for database directory chattr +C /var/lib/mysql # Before creating files! # ZFS: Set copies=1, disable compression zfs set copies=1 tank/database zfs set compression=off tank/database

VM Disk Images

# Btrfs: Disable CoW for VM images chattr +C /var/lib/libvirt/images # Or use nodatacow mount option mount -o nodatacow /dev/sda1 /mnt

Note: Disabling CoW = lose snapshot benefits for that data!

CoW vs Journaling

AspectCoW (Btrfs/ZFS)Journaling (ext4/XFS)
ConsistencyAlways atomicVia journal replay
SnapshotsFree, instantNeed LVM/external
Write patternAnywhereMostly sequential
Metadata overheadHigher (tree updates)Lower (journal only)
MaturityNewerVery mature
RecoveryAlways consistentReplay journal

Best Practices

  1. Keep 20% free space: CoW performance degrades when full
  2. Monitor snapshots: Delete old snapshots to reclaim space
  3. Disable CoW for databases: Use chattr +C on Btrfs
  4. Use reflinks: Instant file copies with cp --reflink
  5. Regular scrubbing: Verify checksums (Btrfs/ZFS)
  6. Balance space: btrfs balance for optimal allocation
  • Journaling: Alternative consistency mechanism
  • Snapshots: CoW enables instant snapshots
  • Btrfs: Linux's CoW filesystem
  • ZFS: Advanced CoW filesystem with pooled storage
  • Data Integrity: CoW enables checksum verification

Key Takeaways

  • Never Overwrite: CoW writes new blocks, preserves old data
  • Atomic by Design: All operations either complete or don't happen
  • Snapshots are Free: No copying needed—just preserve references
  • Space Efficient: Share unchanged blocks between versions
  • Trade-offs: Some overhead for databases, needs free space
  • Modern Default: Btrfs, ZFS, APFS all use CoW for reliability

If you found this explanation helpful, consider sharing it with others.

Mastodon