The Traditional Problem: In-Place Updates
Traditional filesystems (ext4, XFS, FAT) use in-place updates:
- Read existing block
- Modify content
- Overwrite same block
- Old data gone forever
Problems:
- Not atomic: Power failure = partially written block (corruption)
- No history: Can't undo or snapshot without copying entire filesystem
- Dangerous: One wrong write destroys data permanently
The Copy-on-Write Solution
Core Principle: Never modify data in place. Instead:
- Read existing block
- Allocate NEW block
- Write modified data to new block
- Update pointer (metadata)
- Old data remains untouched until no longer needed
Benefits:
- Atomic writes: Either old state or new state (never corrupted)
- Free snapshots: Old data already preserved!
- Time travel: Keep references to old blocks = instant history
- Data integrity: Never risk overwriting good data
How Copy-on-Write Works: Interactive Exploration
See CoW in action—from simple writes to instant snapshots:
Simple Write Operation: CoW in Action
Initial State: File with 3 Blocks
File "file.txt" consists of 3 data blocks
Blocks 100, 101, 102 stored on disk
Metadata points to these blocks
Each block has reference count = 1
User wants to modify Block B (middle block)
Key CoW Concepts
1. Write-Anywhere Allocation
Traditional: "Write block 1000 to sector 1000" CoW: "Write data anywhere free, update pointer"
Traditional (in-place): Block 1000: [old data] → [new data] ❌ Old data lost CoW (write-anywhere): Block 1000: [old data] ← still exists! Block 5280: [new data] ← written here Pointer: 1000 → 5280 ✅ Old data preserved
2. Metadata Updates Are Key
CoW depends on atomic metadata updates:
1. Allocate new block (5280) 2. Write data to new block 3. Update parent pointer: 1000 → 5280 ← Atomic! 4. Old block (1000) now unreferenced
If crash happens:
- Before step 3: Old data still referenced (no change visible)
- After step 3: New data referenced (change complete)
- Never half-updated!
3. Reference Counting
Blocks are freed only when no references remain:
Block 1000: refs=2 (original file + snapshot) Block 5280: refs=1 (only current file) Delete snapshot: Block 1000: refs=1 → Can't free yet Block 5280: refs=1 → Keep Delete file: Block 1000: refs=0 → NOW free! Block 5280: refs=0 → Free
Snapshots: The Magic of CoW
With CoW, snapshots are free:
Traditional (non-CoW) Snapshot:
Copy entire filesystem: 100GB → 100GB copy Time: Minutes to hours Space: 200GB total
CoW Snapshot:
1. Create new root pointer → same blocks 2. Mark: "preserve current state" Time: Instant (milliseconds) Space: 0 bytes initially!
After modifications:
- Modified blocks: New copies created (CoW kicks in)
- Unmodified blocks: Shared between original and snapshot
- Space used = only changed data
Snapshot Space Efficiency
Original: 100GB Snapshot: 0GB (just metadata) Modify 10GB: Original: points to 90GB old + 10GB new = 100GB Snapshot: points to 100GB old Total space: 110GB (not 200GB!) Efficiency: Only changed blocks duplicated
Atomic Operations
CoW makes complex operations atomic:
Example: Rename Directory
Traditional filesystem:
1. Update old parent: remove entry 2. Update new parent: add entry 3. Update directory: change ".." link ❌ Crash between steps = corruption!
CoW filesystem:
1. Create new metadata tree with changes 2. Update root pointer (atomic!) ✅ Either all changes visible or none
Example: Database Transaction
1. Write new data blocks (CoW) 2. Write new index blocks (CoW) 3. Write new metadata (CoW) 4. Update root (atomic commit!) Crash before step 4: Old state intact Crash after step 4: New state complete Never inconsistent!
CoW in Different Filesystems
Btrfs
- Full CoW: Data and metadata
- B-tree based: All structures use CoW
- Subvolumes: Lightweight CoW containers
- Reflinks: Share blocks between files
- Command:
cp --reflink=always src dest(instant copy!)
ZFS
- Full CoW: Data and metadata
- Pooled storage: Write anywhere in pool
- Checksums: Every block verified
- Snapshots: Recursive across datasets
- Clones: Writable snapshots
APFS (Apple)
- CoW for metadata: Data optionally
- Space sharing: Multiple volumes, one pool
- Clones: Instant file copies
Performance Implications
Advantages
- Parallel writes: Write anywhere = no seek
- SSD friendly: Even wear across device
- Fast snapshots: No data copying
- No fragmentation concerns: Every write is "fresh"
Challenges
- Write amplification: Metadata updates cascade up tree
- Fragmentation: Related blocks scattered
- Space accounting: Hard to predict free space
- Performance: Can degrade when full (>80%)
Optimization Tips
# Btrfs: Defragment (breaks CoW links!) btrfs filesystem defragment -r /mnt # ZFS: Set recordsize for workload zfs set recordsize=128k tank/database # Keep free space >20% for performance df -h /mnt # Monitor usage
Space Reclamation
Old blocks freed when no longer referenced:
# Btrfs: Delete old snapshots to free space btrfs subvolume delete /mnt/.snapshots/old # ZFS: Destroy snapshots zfs destroy tank/data@old-snapshot # Both: Check space used by snapshots btrfs qgroup show /mnt zfs list -t snapshot -o space
When CoW Hurts: Disable It
Some workloads conflict with CoW:
Databases (Random Writes)
# Btrfs: Disable CoW for database directory chattr +C /var/lib/mysql # Before creating files! # ZFS: Set copies=1, disable compression zfs set copies=1 tank/database zfs set compression=off tank/database
VM Disk Images
# Btrfs: Disable CoW for VM images chattr +C /var/lib/libvirt/images # Or use nodatacow mount option mount -o nodatacow /dev/sda1 /mnt
Note: Disabling CoW = lose snapshot benefits for that data!
CoW vs Journaling
| Aspect | CoW (Btrfs/ZFS) | Journaling (ext4/XFS) |
|---|---|---|
| Consistency | Always atomic | Via journal replay |
| Snapshots | Free, instant | Need LVM/external |
| Write pattern | Anywhere | Mostly sequential |
| Metadata overhead | Higher (tree updates) | Lower (journal only) |
| Maturity | Newer | Very mature |
| Recovery | Always consistent | Replay journal |
Best Practices
- Keep 20% free space: CoW performance degrades when full
- Monitor snapshots: Delete old snapshots to reclaim space
- Disable CoW for databases: Use
chattr +Con Btrfs - Use reflinks: Instant file copies with
cp --reflink - Regular scrubbing: Verify checksums (Btrfs/ZFS)
- Balance space:
btrfs balancefor optimal allocation
Related Concepts
- Journaling: Alternative consistency mechanism
- Snapshots: CoW enables instant snapshots
- Btrfs: Linux's CoW filesystem
- ZFS: Advanced CoW filesystem with pooled storage
- Data Integrity: CoW enables checksum verification
Key Takeaways
- Never Overwrite: CoW writes new blocks, preserves old data
- Atomic by Design: All operations either complete or don't happen
- Snapshots are Free: No copying needed—just preserve references
- Space Efficient: Share unchanged blocks between versions
- Trade-offs: Some overhead for databases, needs free space
- Modern Default: Btrfs, ZFS, APFS all use CoW for reliability
