C++ Compiler Optimization

Summary: C++ compiler optimization lab notebook — compare optimization levels, inspect compiler rewrites, diagnose auto-vectorization, and build production verification commands.

Why Compiler Optimization Matters

The difference between -O0 and -O2 is typically 5-10x in execution speed. Between -O0 and -O3 with auto-vectorization, it can be 20-40x for numerical code. The important skill is not memorizing flags. It is learning to ask: what rewrite did the compiler prove safe, what tradeoff did it make, and how do I verify the result?

Think of optimization as a lab workflow:

choose a starting level such as -O2
inspect the generated assembly or compiler remarks
fix source-level blockers like aliasing or dependencies
add LTO or PGO only when measurement justifies it
treat unsafe flags as explicit correctness decisions

Lab 1: Optimization Levels

Each -O level enables progressively more aggressive optimizations. The difference isn’t just “faster” — the compiler generates fundamentally different assembly at each level:

-O3 Is Not Always Faster

-O3 enables aggressive inlining and loop unrolling that increase binary size. On code with large working sets, the extra I-cache pressure can make -O3 slower than -O2. Always measure. The right answer is often -O2 -march=native.

Lab 2: Transformation Passes

Compiler passes are the individual rewrites behind the optimization level. One pass folds constants, another removes dead code, another hoists loop-invariant work, and another rewrites expensive indexing into cheaper increments.

Use this lab to connect the human-readable source change to the mechanical compiler transformation:

Lab 3: Vectorization Clinic

Auto-vectorization is where source shape matters most. At -O3 or with explicit vectorization flags, the compiler tries to convert scalar loops into SIMD instructions. It succeeds only when it can prove that iterations are independent and memory accesses are safe.

When vectorization fails, the fix is usually structural: remove loop-carried dependencies, make memory access contiguous, or give the compiler a legal no-alias promise such as __restrict__.

Lab 4: Production Flags and Verification

Once local transformations look good, production builds usually add two more layers:

LTO: lets the optimizer see across .cpp file boundaries during linking. This enables cross-file inlining and whole-program dead function elimination.
PGO: uses representative runtime profiles to make better branch layout, inlining, and hot/cold code placement decisions.
Reports: make the compiler tell you what it optimized, missed, or spent time on.

-ffast-math Changes Program Semantics

-ffast-math can be faster because it lets the compiler ignore parts of IEEE 754 floating-point behavior: NaN checks, infinities, signed zero, trapping math, and strict reassociation rules. Treat it as a correctness decision, not a normal performance flag.

Use the flag builder as the final lab step. Start with -O2, add one idea at a time, then verify with compiler reports and benchmarks instead of assuming the flag helped.

godbolt.org is still the fastest manual check: paste one function, compare -O2 and -O3, inspect whether the branch disappeared or the loop vectorized, then decide whether the real code deserves a build-system change.