Stack vs Heap Memory
Every C++ program uses two primary memory regions: the stack for automatic, short-lived allocations and the heap for dynamic, long-lived ones. Understanding the difference is not academic — it directly affects your program’s performance, safety, and debuggability.
Stack Memory
The stack is a contiguous block of memory — typically 8 MB on Linux, 1 MB on Windows — that grows downward from high addresses. Every function call pushes a frame onto the stack; every return pops it. The compiler manages this entirely — no system calls, no allocator overhead, just pointer arithmetic.
Stack Frame Anatomy
Each frame contains four things:
- Return address — where to jump when the function returns (set by the
callinstruction) - Previous frame pointer — saved
rbpso the caller’s frame can be restored - Local variables — all
int x,char buf[256], etc. - Function parameters — arguments passed by the caller (modern ABIs pass the first 6 in registers)
void process(int n) { int result = 0; // 4 bytes on stack char buffer[256]; // 256 bytes on stack double matrix[4][4]; // 128 bytes on stack // Total frame: ~388 bytes + return addr + saved rbp }
Why the Stack Is Fast
Stack allocation is a single instruction: sub rsp, N. Deallocation is add rsp, N. No free lists, no searching, no fragmentation. The CPU even has a dedicated return stack buffer that predicts where ret will jump.
; What the compiler generates: push rbp ; save caller's frame pointer mov rbp, rsp ; set new frame pointer sub rsp, 388 ; allocate locals ; ... function body ... mov rsp, rbp ; deallocate pop rbp ; restore caller ret ; jump to return address
Stack Limits
# Check stack size (Linux/macOS) ulimit -s # 8192 (8 MB) # Set larger stack (use sparingly) ulimit -s 65536 # 64 MB
Heap Memory
The heap is managed by a runtime allocator (ptmalloc2 on glibc, jemalloc on FreeBSD, mimalloc for performance). Heap memory persists until you explicitly free it.
How Allocators Work
When you call new or malloc, the allocator:
- Checks free lists — previously freed blocks organized by size
- Finds a suitable block — using first-fit, best-fit, or size-class strategy
- Splits the block if larger than requested
- Requests OS memory via
sbrk()ormmap()if nothing fits
When you call delete or free:
- Marks the block as free
- Coalesces with adjacent free blocks (reduces fragmentation)
- Returns large blocks to OS via
munmap()(optional)
Fragmentation
External fragmentation happens when free memory is scattered in small non-contiguous chunks. You might have 1 MB free total but no single block larger than 64 KB.
Allocation Styles
// Raw new/delete — you own the lifetime int* p = new int(42); int* arr = new int[100]; delete p; delete[] arr; // Smart pointers — automatic lifetime auto unique = std::make_unique<int>(42); auto shared = std::make_shared<MyClass>(args); // Containers — the standard recommendation std::vector<int> vec(100); std::string s = "hello"; // No manual delete — RAII handles cleanup
Memory Issues
Stack Overflow
// Infinite recursion void infinite() { infinite(); } // Segfault after ~8MB // Large local arrays void bad() { int huge[10000000]; // 40 MB — way beyond 8 MB stack } // Fix: use heap for large data void good() { std::vector<int> huge(10000000); // heap-allocated }
Memory Leak
void leak() { int* p = new int(42); // Missing: delete p; } // p goes out of scope, memory leaked forever // Fix: RAII void safe() { auto p = std::make_unique<int>(42); } // unique_ptr destructor calls delete
Dangling Pointer
int* dangling() { int local = 42; return &local; // BAD: returns address of stack variable } // local is destroyed, pointer points to garbage // Fix: return by value or allocate on heap std::unique_ptr<int> safe() { return std::make_unique<int>(42); }
Use-After-Free
int* p = new int(42); delete p; *p = 100; // Undefined behavior — p points to freed memory // Fix: set to nullptr after delete, or use smart pointers
Performance: Stack vs Heap
Stack allocation costs ~1 nanosecond (one instruction). Heap allocation costs 80–200 nanoseconds (free list traversal, potential system call). For one allocation, invisible. For millions in a tight loop, it’s microseconds vs seconds.
When Allocation Cost Matters
- Hot inner loops — use stack arrays or pre-allocated buffers
- Real-time systems — audio, games, trading cannot tolerate allocation jitter
- Embedded systems — heap may not be available
- Data structures — linked list: one alloc per insert; vector: amortized across many
Comparison Table
| Aspect | Stack | Heap |
|---|---|---|
| Speed | Very Fast | Slower |
| Size | Limited (~8MB) | System Memory |
| Management | Automatic | Manual |
| Fragmentation | None | Possible |
| Thread Safety | Yes (per thread) | No (needs sync) |
| Allocation | Compile-time size | Runtime size |
Custom Allocators
When the general-purpose allocator is too slow or fragments too much:
Arena Allocator (Bump Allocator)
Allocate by bumping a pointer. Free everything at once. Zero fragmentation, zero per-object overhead.
class Arena { char* buffer; size_t offset = 0; size_t capacity; public: Arena(size_t size) : buffer(new char[size]), capacity(size) {} ~Arena() { delete[] buffer; } void* allocate(size_t size) { size = (size + 7) & ~7; // align to 8 bytes if (offset + size > capacity) throw std::bad_alloc(); void* ptr = buffer + offset; offset += size; return ptr; } void reset() { offset = 0; } };
Use cases: per-frame game allocations, compiler AST nodes, request-scoped server memory.
Pool Allocator
Pre-allocate N objects of same size. O(1) alloc and free via free list.
template<typename T, size_t N> class Pool { union Block { T obj; Block* next; }; Block storage[N]; Block* freeList; public: Pool() { freeList = &storage[0]; for (size_t i = 0; i < N - 1; i++) storage[i].next = &storage[i + 1]; storage[N - 1].next = nullptr; } T* allocate() { if (!freeList) throw std::bad_alloc(); Block* block = freeList; freeList = freeList->next; return &block->obj; } void deallocate(T* ptr) { Block* block = reinterpret_cast<Block*>(ptr); block->next = freeList; freeList = block; } };
Use cases: particle systems, ECS components, network connections.
RAII and Ownership
RAII ties heap lifetime to stack objects. When the stack object’s destructor runs, the heap memory is freed.
// Without RAII — leak on exception void risky() { int* p = new int[1000]; process(p); // if this throws, p leaks delete[] p; } // With RAII — safe void safe() { auto p = std::make_unique<int[]>(1000); process(p.get()); // if this throws, unique_ptr frees p }
Smart Pointer Decision Tree
unique_ptr— single owner, zero overhead, use by defaultshared_ptr— multiple owners, reference-counted, 16-byte overheadweak_ptr— non-owning observer of shared_ptr, breaks cycles- Raw pointer — non-owning reference only, lifetime managed elsewhere
Debugging Memory Issues
AddressSanitizer (ASan)
Compile with -fsanitize=address — detects use-after-free, buffer overflows, leaks at ~2x slowdown.
g++ -fsanitize=address -g -O1 program.cpp -o program ./program
Example ASan output for use-after-free:
==12345==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000010 READ of size 4 at 0x602000000010 thread T0 #0 main /home/user/program.cpp:8 freed by thread T0 here: #0 operator delete(void*) #1 main /home/user/program.cpp:7
Valgrind
No recompilation needed, ~20x slowdown. Detects leaks, invalid access, uninit reads.
valgrind --leak-check=full ./program
Quick Reference
| Tool | Overhead | Recompile? | Detects |
|---|---|---|---|
| ASan | ~2x | Yes | Use-after-free, overflow, leaks |
| MSan | ~3x | Yes | Uninitialized reads |
| TSan | ~5x | Yes | Data races |
| Valgrind | ~20x | No | Leaks, invalid access |
Platform Differences
| Aspect | Linux | macOS | Windows |
|---|---|---|---|
| Default stack | 8 MB | 8 MB | 1 MB |
| Default allocator | ptmalloc2 | libmalloc | NT Heap |
| Large alloc threshold | 128 KB (mmap) | 128 KB | 512 KB |
| Set stack size | ulimit -s | ulimit -s | /STACK:size |
Windows’s 1 MB default stack is a common surprise when porting Linux code.
Key Takeaways
-
Stack is 100x faster than heap — a single instruction vs allocator traversal. Prefer stack in hot paths.
-
Heap fragmentation is real — repeated alloc/free creates gaps. Custom allocators (arena, pool) eliminate this.
-
RAII eliminates leaks — tie heap lifetime to stack objects via smart pointers.
unique_ptrhas zero overhead. -
Debug with sanitizers — ASan catches use-after-free and overflows at 2x slowdown. Use it in CI.
-
Know your platform — Windows has 1 MB stack (8x smaller than Linux). Test cross-platform.
Related Concepts
- Smart Pointers: unique_ptr, shared_ptr, weak_ptr — RAII in practice
- RAII Patterns: Resource management beyond memory — files, locks, sockets
- Memory Layout: How the OS loads your program into virtual address space
- Cache Lines: Why stack is fast — spatial locality and cache friendliness
Further Reading
- What Every Programmer Should Know About Memory - Ulrich Drepper’s comprehensive guide to memory hierarchy
- CppReference: Memory Management - Complete reference for C++ memory facilities
- Google Sanitizers - AddressSanitizer, MemorySanitizer, ThreadSanitizer documentation
- jemalloc - High-performance allocator used by Facebook, Firefox, and Redis
