Skip to main content

C++ Program Loading: From ELF to Running Process

How C++ programs are loaded — ELF segments, the _start to main() chain, dynamic linking with PLT/GOT, ASLR, real readelf/strace/proc maps output, and startup debugging.

20 min|programmingcppruntimelinking
Best viewed on desktop for optimal interactive experience

From File to Process

Running ./program looks simple, but behind that one command three separate systems cooperate to turn a file on disk into a live process. The kernel reads the binary and maps its segments into virtual memory. The dynamic linker (ld-linux-x86-64.so.2) resolves shared library dependencies and patches addresses. The C runtime (crt0 / __libc_start_main) initializes the standard library, runs global constructors, and finally calls main(). If any one of these stages fails, your program never reaches its first line of code.

The ELF Binary

Every Linux executable (and shared library) uses ELF — the Executable and Linkable Format. ELF has two parallel views of the same file:

  • Linking view (sections): used by the linker at build time — .text, .data, .bss, .symtab, .rela.dyn, etc.
  • Execution view (segments): used by the kernel at load time — LOAD, INTERP, DYNAMIC, GNU_STACK, etc.

Sections are fine-grained (one per purpose). Segments group multiple sections that share the same memory permissions so the kernel can mmap them in a single call.

Here’s the ELF header from a real C++ binary (readelf -h):

ELF Header: Magic: 7f 45 4c 46 02 01 01 00 ... Class: ELF64 Type: DYN (Position-Independent Executable) Machine: Advanced Micro Devices X86-64 Entry point address: 0x1060 Start of program headers: 64 (bytes into file) Number of program headers: 13

The Type: DYN means this is a position-independent executable (PIE) — it can be loaded at any address, which is essential for ASLR. The Entry point address: 0x1060 is _start, not main.

Key Segments

LOAD segments are the segments the kernel actually maps into memory. A typical binary has two or three:

  • LOAD (r--p): ELF headers + .rodata (read-only data, string literals)
  • LOAD (r-xp): .text (executable code)
  • LOAD (rw-p): .data + .bss (writable globals)

Why does memsz sometimes exceed filesz? Because the .bss segment (zero-initialized globals) doesn’t need to occupy space in the file — the kernel just allocates zeroed pages. So memsz - filesz = .bss size.

INTERP segment contains a single string: /lib64/ld-linux-x86-64.so.2. This tells the kernel which dynamic linker to invoke before transferring control to the program.

Memory Layout

Once the kernel and dynamic linker finish their work, the process has a well-defined virtual address space:

You can see the real layout by reading /proc/PID/maps:

55a3f2400000-55a3f2401000 r--p program (ELF headers) 55a3f2401000-55a3f2402000 r-xp program (.text) 55a3f2402000-55a3f2403000 r--p program (.rodata) 55a3f2403000-55a3f2405000 rw-p program (.data, .bss) 7f8c12000000-7f8c12200000 r-xp /lib/x86_64-linux-gnu/libc.so.6 7ffca1200000-7ffca1221000 rw-p [stack] 7ffca1304000-7ffca1306000 r-xp [vdso]

Each line shows the virtual address range, permissions (r = read, w = write, x = execute, p = private), and what occupies that region. Notice the program occupies four small mappings with different permissions, libc is mapped separately, and the stack is at the top of the address space. The [vdso] is a kernel-provided shared object that accelerates system calls like gettimeofday without a full context switch.

ASLR: Address Space Layout Randomization

Run cat /proc/self/maps twice and you’ll get different addresses each time. That’s ASLR — the kernel randomizes the base addresses of the executable, shared libraries, stack, and heap on every execution.

ASLR defeats return-oriented programming (ROP) and ret2libc attacks. If an attacker overflows a buffer, they can’t hardcode a jump target because the addresses are different every run. Combined with PIE binaries (which randomize the code segment too, not just the stack and libraries), ASLR makes exploitation significantly harder.

PIE vs non-PIE: a PIE binary (gcc -pie, the default since GCC 6+) gets its code segment randomized. A non-PIE binary always loads at 0x400000, making code addresses predictable.

For debugging, you can disable ASLR temporarily:

setarch $(uname -m) -R ./program

Or system-wide (not recommended for production):

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

The Startup Sequence

The path from execve to your main() function involves several handoffs. Understanding this chain explains why programs can crash “before main” and how the C runtime sets up the environment your code depends on.

Why _start Exists

The kernel doesn’t call main() — it jumps to _start, a tiny assembly stub provided by crt1.o (linked into every executable). _start extracts argc, argv, and envp from the stack (which the kernel set up during execve) and passes them to __libc_start_main.

What __libc_start_main Does

This function is the C runtime’s bootstrap. It performs a surprising amount of work:

  1. Sets up the thread-local storage (TLS) area
  2. Registers __libc_csu_fini with atexit so destructors run on exit
  3. Calls __libc_csu_init, which iterates over the .init_array section — this is where global constructors run
  4. Calls main(argc, argv, envp)
  5. Passes main’s return value to exit()

The Static Initialization Order Fiasco

Global constructors run in translation-unit order within a single file, but across files the order is unspecified. If global A depends on global B in a different file, you have a 50/50 chance of a crash:

class Logger { Logger() { std::cout << "Logger init\n"; } // runs BEFORE main } g_logger; int main() { std::cout << "main\n"; } // Output: Logger init, then main

This works fine in isolation, but if Logger’s constructor tries to use another global from a different translation unit that hasn’t been constructed yet, you get undefined behavior. The fix is the Construct on First Use idiom: wrap the global in a function that returns a reference to a local static.

Dynamic Linking: PLT and GOT

When your program calls a shared library function like printf, the compiler doesn’t know the final address at compile time. Instead, it generates a call through two indirection tables: the PLT (Procedure Linkage Table) and the GOT (Global Offset Table).

How Lazy Binding Works

  1. First call: the PLT stub for printf jumps to the GOT entry, which initially points back to a PLT resolver stub
  2. Resolver invokes _dl_runtime_resolve: the dynamic linker searches loaded libraries for printf, finds its address in libc.so
  3. GOT is patched: the resolver writes the real address into the GOT entry
  4. Subsequent calls: the PLT jumps to the GOT, which now contains the real address — no resolver overhead

This means the first call to each library function is slow (symbol lookup), but every call after that is a single indirect jump.

RELRO: Hardening the GOT

The GOT is writable (so the resolver can patch it), which makes it an attractive attack target. RELRO (Relocation Read-Only) protections exist in two forms:

  • Partial RELRO (default): resolves .got at load time and marks it read-only, but .got.plt stays writable for lazy binding
  • Full RELRO (-Wl,-z,relro,-z,now): resolves all symbols at load time and marks the entire GOT read-only

Full RELRO increases startup time but eliminates GOT overwrite attacks.

Forcing Eager Resolution

LD_BIND_NOW=1 ./program

This forces the dynamic linker to resolve all PLT entries at load time instead of lazily. Equivalent to compiling with -Wl,-z,now. Useful for catching missing symbols early and for security hardening.

Static vs Dynamic Linking

AspectStatic (-static)Dynamic (default)
Binary sizeLarge (libc included, ~1 MB+)Small (~16 KB)
Startup timeFaster (no ld.so resolution)Slower (symbol resolution)
Memory sharingNone (each process has copy)Shared (one libc.so in RAM)
DeploymentSingle file, portableNeeds matching .so files
Security patchesMust recompile everythingUpdate .so, all programs benefit
Use caseContainers, Go binaries, embeddedDesktop, servers, system packages

Static linking produces a self-contained binary that works on any compatible kernel. Dynamic linking saves memory when many processes share the same library and lets you patch vulnerabilities by updating a single .so file. Most production systems use dynamic linking; containers and cross-compiled binaries often use static.

Debugging Loading

Trace System Calls

# Trace syscalls during loading strace -f -e trace=openat,mmap,mprotect ./program 2>&1 | head -20

Real output from strace:

execve("./program", ["./program"], 0x7ffd...) = 0 brk(NULL) = 0x55a3f4a00000 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY) = 3 mmap(NULL, 2136936, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8c12000000 mprotect(0x7f8c12028000, 1990656, PROT_NONE) = 0

This shows the kernel executing the binary (execve), then the dynamic linker opening libc.so.6, mapping it into memory with mmap, and setting up page protections with mprotect.

Debug the Dynamic Linker

# Library search paths LD_DEBUG=libs ./program # Symbol resolution LD_DEBUG=bindings ./program # Everything LD_DEBUG=all ./program

Check Dependencies and Segments

# Check library dependencies ldd ./program # View segments readelf -l program

Common Loading Errors

ErrorCauseFix
cannot open shared object fileLibrary not in search pathldconfig, or set LD_LIBRARY_PATH, or install the package
GLIBC_2.34 not foundBinary compiled with newer glibc than targetCompile on older system, use static linking, or update target
Segfault before main()Global constructor crashRun with gdb, break on __libc_start_main, step through constructors
version GLIBCXX_3.4.30 not foundC++ stdlib mismatchUpdate libstdc++ or use -static-libstdc++

Key Takeaways

  1. Loading is a 3-party collaboration — kernel maps segments, dynamic linker resolves symbols, C runtime initializes and calls main().

  2. ELF segments define memory layout — LOAD segments map file regions to memory with specific R/W/E permissions.

  3. main() is not the entry point — _start → __libc_start_main → global constructors → main(). Constructor bugs crash before main().

  4. ASLR randomizes addresses — PIE binaries get a new base every run, defeating return-oriented programming exploits.

  5. Static vs dynamic is a deployment tradeoff — static = portable single binary; dynamic = shared libraries, smaller, patchable.

Further Reading

If you found this explanation helpful, consider sharing it with others.

Mastodon