Memory Access Patterns: Sequential vs Strided

What Are Memory Access Patterns?

Memory access patterns are one of the most critical factors affecting application performance. The way your code accesses memory determines cache efficiency, memory bandwidth utilization, and whether hardware optimizations like prefetching can help.

Key Insight: The difference between optimal and suboptimal patterns can be 10x or more in performance!

Interactive Visualization

Experience the dramatic performance difference between sequential and strided memory access patterns:

Why Access Patterns Matter

The Memory Hierarchy Gap

Modern computers have a multi-level memory hierarchy:

Level	Size	Latency	Bandwidth
L1 Cache	32-64 KB	1-4 cycles	3+ TB/s
L2 Cache	256-512 KB	10-20 cycles	1+ TB/s
L3 Cache	8-32 MB	30-70 cycles	500+ GB/s
Main Memory	8-64 GB	100-300 cycles	50-100 GB/s

The Gap: Accessing data from cache is 100x faster than main memory!

Cache Lines: The Unit of Transfer

Memory transfers in 64-byte cache lines
Loading one byte loads the entire 64-byte line
Spatial locality determines whether those 64 bytes are useful

Sequential vs Strided Access

The same loop body can run 10x slower purely because of how it walks memory. Sequential access uses every byte the cache fetches; strided access throws most of it away.

Aspect

Sequentialoptimal

Stridedsuboptimal

Pattern

Consecutive memory locations

Fixed-stride jumps through memory

Spatial locality

Uses all 64 bytes of each cache line

Loads 64 bytes, uses only a few

Cache hit rate

~87.5% (7 hits per 8 accesses)

Collapses — often a new line per access

Prefetcher

Pattern is predicted; data loaded ahead

Large strides defeat prediction

Bandwidth

Every byte transferred is used

Wastes up to 87.5% (stride-8)

Common Patterns

Scenario	Sequential-friendly layout	Strided / poor layout
Matrix traversal	Row-major — consecutive in memory	Column-major — strided by row width
Struct access (single field)	Struct of Arrays (SoA) — sequential per field	Array of Structs (AoS) — strided per field

Hardware Prefetching

Modern CPUs include sophisticated prefetchers:

What They Do:

Detect access patterns (sequential, stride, stream)
Load data into cache before it's needed
Multiple prefetch units (L1, L2, L3)
Adaptive learning of patterns

Prefetcher-friendly	Prefetcher-unfriendly
Sequential access	Random access
Fixed stride (if not too large)	Large irregular strides
Stream processing	Pointer chasing
Linear traversal	Hash-table lookups

Optimization Strategies

Area	Techniques
Data structure design	Contiguous arrays; SoA for partial field access; align critical data to cache-line boundaries
Algorithm design	Cache-friendly traversal order; block/tile matrix algorithms; minimize working-set size
Loop optimization	Interchange loops for sequential access; tile/block for locality; manual prefetch for irregular patterns

Measuring Performance

Key Metrics

Metric	Definition
Cache hit rate	HitsTotal\_Accesses × 100
Memory bandwidth	Bytes transferred per second
Cache-line utilization	Useful bytes / 64 bytes
Prefetch accuracy	Useful prefetches / total prefetches

Tools

Linux perf: perf stat -e L1-dcache-load-misses,L1-dcache-loads ./program
Intel VTune: vtune -collect memory-access ./program

Best Practices

Design for Sequential Access: Arrange data structures for linear traversal
Minimize Stride: Keep related data close together
Use Cache-Aware Algorithms: Block matrix multiply, tiled convolution
Profile Real Workloads: Memory patterns vary by input
Consider NUMA Effects: Access patterns affect NUMA systems differently

Conclusion

Memory access patterns can make or break performance. Sequential access leverages spatial locality, cache line transfers, and hardware prefetching for maximum performance. Strided access wastes bandwidth, thrashes caches, and defeats optimization. Understanding these patterns through visual exploration enables 10x+ performance improvements without algorithmic changes.

Systems & Architecture

SoA vs AoS: Data Layout Optimization

Master Structure of Arrays (SoA) vs Array of Structures (AoS) data layouts for optimal cache efficiency, SIMD vectorization, and GPU memory coalescing.

Systems & Architecture

CPU Cache Lines: The Complete Guide with Interactive Simulator

Deep dive into CPU cache lines — interactive cache simulator with configurable associativity and replacement policies, false sharing MESI protocol visualization, access pattern benchmarks, and optimization techniques.

Systems & Architecture

Transparent Huge Pages (THP): Reducing TLB Pressure

Learn how Transparent Huge Pages (THP) reduces TLB misses by promoting 4KB to 2MB pages. Understand performance benefits and memory bloat tradeoffs.

Systems & Architecture

CPU Pipelines & Branch Prediction in Processors

Explore CPU pipeline stages, instruction-level parallelism, pipeline hazards, and branch prediction through interactive visualizations.

Systems & Architecture

Hazard Detection: Pipeline Dependencies and Solutions

Master pipeline hazards through interactive visualizations of data dependencies, control hazards, structural conflicts, and advanced detection mechanisms.

Systems & Architecture

Memory Interleaving: Parallel Memory Access

Discover how memory interleaving distributes addresses across banks for parallel access. Boost memory bandwidth in DDR5 and GPU systems.