Skip to main content

C++ Stack vs Heap Memory: A Complete Guide

Deep dive into C++ memory allocation — stack frame internals, heap allocator mechanics, fragmentation, performance benchmarks, custom allocators, RAII, and debugging with AddressSanitizer and Valgrind.

18 min|programmingcppmemory-mgmtruntime
Best viewed on desktop for optimal interactive experience

Stack vs Heap Memory

Every C++ program uses two primary memory regions: the stack for automatic, short-lived allocations and the heap for dynamic, long-lived ones. Understanding the difference is not academic — it directly affects your program’s performance, safety, and debuggability.

Stack Memory

The stack is a contiguous block of memory — typically 8 MB on Linux, 1 MB on Windows — that grows downward from high addresses. Every function call pushes a frame onto the stack; every return pops it. The compiler manages this entirely — no system calls, no allocator overhead, just pointer arithmetic.

Stack Frame Anatomy

Each frame contains four things:

  1. Return address — where to jump when the function returns (set by the call instruction)
  2. Previous frame pointer — saved rbp so the caller’s frame can be restored
  3. Local variables — all int x, char buf[256], etc.
  4. Function parameters — arguments passed by the caller (modern ABIs pass the first 6 in registers)
void process(int n) { int result = 0; // 4 bytes on stack char buffer[256]; // 256 bytes on stack double matrix[4][4]; // 128 bytes on stack // Total frame: ~388 bytes + return addr + saved rbp }

Why the Stack Is Fast

Stack allocation is a single instruction: sub rsp, N. Deallocation is add rsp, N. No free lists, no searching, no fragmentation. The CPU even has a dedicated return stack buffer that predicts where ret will jump.

; What the compiler generates: push rbp ; save caller's frame pointer mov rbp, rsp ; set new frame pointer sub rsp, 388 ; allocate locals ; ... function body ... mov rsp, rbp ; deallocate pop rbp ; restore caller ret ; jump to return address

Stack Limits

# Check stack size (Linux/macOS) ulimit -s # 8192 (8 MB) # Set larger stack (use sparingly) ulimit -s 65536 # 64 MB

Heap Memory

The heap is managed by a runtime allocator (ptmalloc2 on glibc, jemalloc on FreeBSD, mimalloc for performance). Heap memory persists until you explicitly free it.

How Allocators Work

When you call new or malloc, the allocator:

  1. Checks free lists — previously freed blocks organized by size
  2. Finds a suitable block — using first-fit, best-fit, or size-class strategy
  3. Splits the block if larger than requested
  4. Requests OS memory via sbrk() or mmap() if nothing fits

When you call delete or free:

  1. Marks the block as free
  2. Coalesces with adjacent free blocks (reduces fragmentation)
  3. Returns large blocks to OS via munmap() (optional)

Fragmentation

External fragmentation happens when free memory is scattered in small non-contiguous chunks. You might have 1 MB free total but no single block larger than 64 KB.

Allocation Styles

// Raw new/delete — you own the lifetime int* p = new int(42); int* arr = new int[100]; delete p; delete[] arr; // Smart pointers — automatic lifetime auto unique = std::make_unique<int>(42); auto shared = std::make_shared<MyClass>(args); // Containers — the standard recommendation std::vector<int> vec(100); std::string s = "hello"; // No manual delete — RAII handles cleanup

Memory Issues

Stack Overflow

// Infinite recursion void infinite() { infinite(); } // Segfault after ~8MB // Large local arrays void bad() { int huge[10000000]; // 40 MB — way beyond 8 MB stack } // Fix: use heap for large data void good() { std::vector<int> huge(10000000); // heap-allocated }

Memory Leak

void leak() { int* p = new int(42); // Missing: delete p; } // p goes out of scope, memory leaked forever // Fix: RAII void safe() { auto p = std::make_unique<int>(42); } // unique_ptr destructor calls delete

Dangling Pointer

int* dangling() { int local = 42; return &local; // BAD: returns address of stack variable } // local is destroyed, pointer points to garbage // Fix: return by value or allocate on heap std::unique_ptr<int> safe() { return std::make_unique<int>(42); }

Use-After-Free

int* p = new int(42); delete p; *p = 100; // Undefined behavior — p points to freed memory // Fix: set to nullptr after delete, or use smart pointers

Performance: Stack vs Heap

Stack allocation costs ~1 nanosecond (one instruction). Heap allocation costs 80–200 nanoseconds (free list traversal, potential system call). For one allocation, invisible. For millions in a tight loop, it’s microseconds vs seconds.

When Allocation Cost Matters

  • Hot inner loops — use stack arrays or pre-allocated buffers
  • Real-time systems — audio, games, trading cannot tolerate allocation jitter
  • Embedded systems — heap may not be available
  • Data structures — linked list: one alloc per insert; vector: amortized across many

Comparison Table

AspectStackHeap
SpeedVery FastSlower
SizeLimited (~8MB)System Memory
ManagementAutomaticManual
FragmentationNonePossible
Thread SafetyYes (per thread)No (needs sync)
AllocationCompile-time sizeRuntime size

Custom Allocators

When the general-purpose allocator is too slow or fragments too much:

Arena Allocator (Bump Allocator)

Allocate by bumping a pointer. Free everything at once. Zero fragmentation, zero per-object overhead.

class Arena { char* buffer; size_t offset = 0; size_t capacity; public: Arena(size_t size) : buffer(new char[size]), capacity(size) {} ~Arena() { delete[] buffer; } void* allocate(size_t size) { size = (size + 7) & ~7; // align to 8 bytes if (offset + size > capacity) throw std::bad_alloc(); void* ptr = buffer + offset; offset += size; return ptr; } void reset() { offset = 0; } };

Use cases: per-frame game allocations, compiler AST nodes, request-scoped server memory.

Pool Allocator

Pre-allocate N objects of same size. O(1) alloc and free via free list.

template<typename T, size_t N> class Pool { union Block { T obj; Block* next; }; Block storage[N]; Block* freeList; public: Pool() { freeList = &storage[0]; for (size_t i = 0; i < N - 1; i++) storage[i].next = &storage[i + 1]; storage[N - 1].next = nullptr; } T* allocate() { if (!freeList) throw std::bad_alloc(); Block* block = freeList; freeList = freeList->next; return &block->obj; } void deallocate(T* ptr) { Block* block = reinterpret_cast<Block*>(ptr); block->next = freeList; freeList = block; } };

Use cases: particle systems, ECS components, network connections.

RAII and Ownership

RAII ties heap lifetime to stack objects. When the stack object’s destructor runs, the heap memory is freed.

// Without RAII — leak on exception void risky() { int* p = new int[1000]; process(p); // if this throws, p leaks delete[] p; } // With RAII — safe void safe() { auto p = std::make_unique<int[]>(1000); process(p.get()); // if this throws, unique_ptr frees p }

Smart Pointer Decision Tree

  • unique_ptr — single owner, zero overhead, use by default
  • shared_ptr — multiple owners, reference-counted, 16-byte overhead
  • weak_ptr — non-owning observer of shared_ptr, breaks cycles
  • Raw pointer — non-owning reference only, lifetime managed elsewhere

Debugging Memory Issues

AddressSanitizer (ASan)

Compile with -fsanitize=address — detects use-after-free, buffer overflows, leaks at ~2x slowdown.

g++ -fsanitize=address -g -O1 program.cpp -o program ./program

Example ASan output for use-after-free:

==12345==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000010 READ of size 4 at 0x602000000010 thread T0 #0 main /home/user/program.cpp:8 freed by thread T0 here: #0 operator delete(void*) #1 main /home/user/program.cpp:7

Valgrind

No recompilation needed, ~20x slowdown. Detects leaks, invalid access, uninit reads.

valgrind --leak-check=full ./program

Quick Reference

ToolOverheadRecompile?Detects
ASan~2xYesUse-after-free, overflow, leaks
MSan~3xYesUninitialized reads
TSan~5xYesData races
Valgrind~20xNoLeaks, invalid access

Platform Differences

AspectLinuxmacOSWindows
Default stack8 MB8 MB1 MB
Default allocatorptmalloc2libmallocNT Heap
Large alloc threshold128 KB (mmap)128 KB512 KB
Set stack sizeulimit -sulimit -s/STACK:size

Windows’s 1 MB default stack is a common surprise when porting Linux code.

Key Takeaways

  1. Stack is 100x faster than heap — a single instruction vs allocator traversal. Prefer stack in hot paths.

  2. Heap fragmentation is real — repeated alloc/free creates gaps. Custom allocators (arena, pool) eliminate this.

  3. RAII eliminates leaks — tie heap lifetime to stack objects via smart pointers. unique_ptr has zero overhead.

  4. Debug with sanitizers — ASan catches use-after-free and overflows at 2x slowdown. Use it in CI.

  5. Know your platform — Windows has 1 MB stack (8x smaller than Linux). Test cross-platform.

  • Smart Pointers: unique_ptr, shared_ptr, weak_ptr — RAII in practice
  • RAII Patterns: Resource management beyond memory — files, locks, sockets
  • Memory Layout: How the OS loads your program into virtual address space
  • Cache Lines: Why stack is fast — spatial locality and cache friendliness

Further Reading

If you found this explanation helpful, consider sharing it with others.

Mastodon