From File to Process
Running ./program looks simple, but behind that one command three separate systems cooperate to turn a file on disk into a live process. The kernel reads the binary and maps its segments into virtual memory. The dynamic linker (ld-linux-x86-64.so.2) resolves shared library dependencies and patches addresses. The C runtime (crt0 / __libc_start_main) initializes the standard library, runs global constructors, and finally calls main(). If any one of these stages fails, your program never reaches its first line of code.
The ELF Binary
Every Linux executable (and shared library) uses ELF — the Executable and Linkable Format. ELF has two parallel views of the same file:
- Linking view (sections): used by the linker at build time —
.text,.data,.bss,.symtab,.rela.dyn, etc. - Execution view (segments): used by the kernel at load time —
LOAD,INTERP,DYNAMIC,GNU_STACK, etc.
Sections are fine-grained (one per purpose). Segments group multiple sections that share the same memory permissions so the kernel can mmap them in a single call.
Here’s the ELF header from a real C++ binary (readelf -h):
ELF Header: Magic: 7f 45 4c 46 02 01 01 00 ... Class: ELF64 Type: DYN (Position-Independent Executable) Machine: Advanced Micro Devices X86-64 Entry point address: 0x1060 Start of program headers: 64 (bytes into file) Number of program headers: 13
The Type: DYN means this is a position-independent executable (PIE) — it can be loaded at any address, which is essential for ASLR. The Entry point address: 0x1060 is _start, not main.
Key Segments
LOAD segments are the segments the kernel actually maps into memory. A typical binary has two or three:
- LOAD (r--p): ELF headers +
.rodata(read-only data, string literals) - LOAD (r-xp):
.text(executable code) - LOAD (rw-p):
.data+.bss(writable globals)
Why does memsz sometimes exceed filesz? Because the .bss segment (zero-initialized globals) doesn’t need to occupy space in the file — the kernel just allocates zeroed pages. So memsz - filesz = .bss size.
INTERP segment contains a single string: /lib64/ld-linux-x86-64.so.2. This tells the kernel which dynamic linker to invoke before transferring control to the program.
Memory Layout
Once the kernel and dynamic linker finish their work, the process has a well-defined virtual address space:
You can see the real layout by reading /proc/PID/maps:
55a3f2400000-55a3f2401000 r--p program (ELF headers) 55a3f2401000-55a3f2402000 r-xp program (.text) 55a3f2402000-55a3f2403000 r--p program (.rodata) 55a3f2403000-55a3f2405000 rw-p program (.data, .bss) 7f8c12000000-7f8c12200000 r-xp /lib/x86_64-linux-gnu/libc.so.6 7ffca1200000-7ffca1221000 rw-p [stack] 7ffca1304000-7ffca1306000 r-xp [vdso]
Each line shows the virtual address range, permissions (r = read, w = write, x = execute, p = private), and what occupies that region. Notice the program occupies four small mappings with different permissions, libc is mapped separately, and the stack is at the top of the address space. The [vdso] is a kernel-provided shared object that accelerates system calls like gettimeofday without a full context switch.
ASLR: Address Space Layout Randomization
Run cat /proc/self/maps twice and you’ll get different addresses each time. That’s ASLR — the kernel randomizes the base addresses of the executable, shared libraries, stack, and heap on every execution.
ASLR defeats return-oriented programming (ROP) and ret2libc attacks. If an attacker overflows a buffer, they can’t hardcode a jump target because the addresses are different every run. Combined with PIE binaries (which randomize the code segment too, not just the stack and libraries), ASLR makes exploitation significantly harder.
PIE vs non-PIE: a PIE binary (gcc -pie, the default since GCC 6+) gets its code segment randomized. A non-PIE binary always loads at 0x400000, making code addresses predictable.
For debugging, you can disable ASLR temporarily:
setarch $(uname -m) -R ./program
Or system-wide (not recommended for production):
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
The Startup Sequence
The path from execve to your main() function involves several handoffs. Understanding this chain explains why programs can crash “before main” and how the C runtime sets up the environment your code depends on.
Why _start Exists
The kernel doesn’t call main() — it jumps to _start, a tiny assembly stub provided by crt1.o (linked into every executable). _start extracts argc, argv, and envp from the stack (which the kernel set up during execve) and passes them to __libc_start_main.
What __libc_start_main Does
This function is the C runtime’s bootstrap. It performs a surprising amount of work:
- Sets up the thread-local storage (TLS) area
- Registers
__libc_csu_finiwithatexitso destructors run on exit - Calls
__libc_csu_init, which iterates over the.init_arraysection — this is where global constructors run - Calls
main(argc, argv, envp) - Passes
main’s return value toexit()
The Static Initialization Order Fiasco
Global constructors run in translation-unit order within a single file, but across files the order is unspecified. If global A depends on global B in a different file, you have a 50/50 chance of a crash:
class Logger { Logger() { std::cout << "Logger init\n"; } // runs BEFORE main } g_logger; int main() { std::cout << "main\n"; } // Output: Logger init, then main
This works fine in isolation, but if Logger’s constructor tries to use another global from a different translation unit that hasn’t been constructed yet, you get undefined behavior. The fix is the Construct on First Use idiom: wrap the global in a function that returns a reference to a local static.
Dynamic Linking: PLT and GOT
When your program calls a shared library function like printf, the compiler doesn’t know the final address at compile time. Instead, it generates a call through two indirection tables: the PLT (Procedure Linkage Table) and the GOT (Global Offset Table).
How Lazy Binding Works
- First call: the PLT stub for
printfjumps to the GOT entry, which initially points back to a PLT resolver stub - Resolver invokes
_dl_runtime_resolve: the dynamic linker searches loaded libraries forprintf, finds its address inlibc.so - GOT is patched: the resolver writes the real address into the GOT entry
- Subsequent calls: the PLT jumps to the GOT, which now contains the real address — no resolver overhead
This means the first call to each library function is slow (symbol lookup), but every call after that is a single indirect jump.
RELRO: Hardening the GOT
The GOT is writable (so the resolver can patch it), which makes it an attractive attack target. RELRO (Relocation Read-Only) protections exist in two forms:
- Partial RELRO (default): resolves
.gotat load time and marks it read-only, but.got.pltstays writable for lazy binding - Full RELRO (
-Wl,-z,relro,-z,now): resolves all symbols at load time and marks the entire GOT read-only
Full RELRO increases startup time but eliminates GOT overwrite attacks.
Forcing Eager Resolution
LD_BIND_NOW=1 ./program
This forces the dynamic linker to resolve all PLT entries at load time instead of lazily. Equivalent to compiling with -Wl,-z,now. Useful for catching missing symbols early and for security hardening.
Static vs Dynamic Linking
| Aspect | Static (-static) | Dynamic (default) |
|---|---|---|
| Binary size | Large (libc included, ~1 MB+) | Small (~16 KB) |
| Startup time | Faster (no ld.so resolution) | Slower (symbol resolution) |
| Memory sharing | None (each process has copy) | Shared (one libc.so in RAM) |
| Deployment | Single file, portable | Needs matching .so files |
| Security patches | Must recompile everything | Update .so, all programs benefit |
| Use case | Containers, Go binaries, embedded | Desktop, servers, system packages |
Static linking produces a self-contained binary that works on any compatible kernel. Dynamic linking saves memory when many processes share the same library and lets you patch vulnerabilities by updating a single .so file. Most production systems use dynamic linking; containers and cross-compiled binaries often use static.
Debugging Loading
Trace System Calls
# Trace syscalls during loading strace -f -e trace=openat,mmap,mprotect ./program 2>&1 | head -20
Real output from strace:
execve("./program", ["./program"], 0x7ffd...) = 0 brk(NULL) = 0x55a3f4a00000 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY) = 3 mmap(NULL, 2136936, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8c12000000 mprotect(0x7f8c12028000, 1990656, PROT_NONE) = 0
This shows the kernel executing the binary (execve), then the dynamic linker opening libc.so.6, mapping it into memory with mmap, and setting up page protections with mprotect.
Debug the Dynamic Linker
# Library search paths LD_DEBUG=libs ./program # Symbol resolution LD_DEBUG=bindings ./program # Everything LD_DEBUG=all ./program
Check Dependencies and Segments
# Check library dependencies ldd ./program # View segments readelf -l program
Common Loading Errors
| Error | Cause | Fix |
|---|---|---|
cannot open shared object file | Library not in search path | ldconfig, or set LD_LIBRARY_PATH, or install the package |
GLIBC_2.34 not found | Binary compiled with newer glibc than target | Compile on older system, use static linking, or update target |
| Segfault before main() | Global constructor crash | Run with gdb, break on __libc_start_main, step through constructors |
version GLIBCXX_3.4.30 not found | C++ stdlib mismatch | Update libstdc++ or use -static-libstdc++ |
Key Takeaways
-
Loading is a 3-party collaboration — kernel maps segments, dynamic linker resolves symbols, C runtime initializes and calls main().
-
ELF segments define memory layout — LOAD segments map file regions to memory with specific R/W/E permissions.
-
main() is not the entry point — _start → __libc_start_main → global constructors → main(). Constructor bugs crash before main().
-
ASLR randomizes addresses — PIE binaries get a new base every run, defeating return-oriented programming exploits.
-
Static vs dynamic is a deployment tradeoff — static = portable single binary; dynamic = shared libraries, smaller, patchable.
Related Concepts
- Stack vs Heap: How stack and heap regions are used after loading
- Symbol Resolution: How the linker matches references to definitions
- Dynamic Linking: Runtime library loading and GOT/PLT mechanics
- Virtual Memory: Address translation that makes process isolation possible
Further Reading
- How Programs Get Run: ELF Binaries — LWN deep dive into ELF loading
- A Whirlwind Tutorial on Creating Really Teensy ELF Executables — Classic article on minimal ELF
- Drepper: How to Write Shared Libraries — Ulrich Drepper’s definitive guide to ELF and dynamic linking
- The Linux Programming Interface, Ch. 41-42 — Michael Kerrisk’s comprehensive coverage of shared libraries and loading
