Skip to main content

CPython Internals: How Python Really Works Under the Hood

Deep dive into CPython internals: bytecode compilation, memory management, the GIL, object model, and garbage collection.

Abhik SarkarAbhik Sarkar
25 min read|PythonCPythonInternalsMemory Management+4
Best viewed on desktop for optimal interactive experience

Introduction

Python is one of the most popular programming languages, but what happens when you run python script.py? This article explores the internals of CPython, the reference implementation of Python, revealing how Python code goes through bytecode compilation, how memory is managed, and why the Global Interpreter Lock (GIL) exists.

Python Execution Model

Python code goes through several stages before execution:

  1. Parsing: Source code → Abstract Syntax Tree (AST)
  2. Compilation: AST → Bytecode
  3. Execution: Bytecode → Python Virtual Machine

From Source to Bytecode

The Compilation Pipeline

# Python source code def greet(name): return f"Hello, {name}!" result = greet("World") print(result)

Understanding Python Bytecode

Python compiles source code to bytecode, which is executed by the Python Virtual Machine (PVM):

import dis def add(a, b): return a + b dis.dis(add)

Output:

2 0 LOAD_FAST 0 (a) 2 LOAD_FAST 1 (b) 4 BINARY_ADD 6 RETURN_VALUE

Bytecode Instructions

Key bytecode instructions:

  • LOAD_FAST: Load local variable
  • LOAD_GLOBAL: Load global variable
  • STORE_FAST: Store to local variable
  • BINARY_ADD: Add two values from stack
  • CALL_FUNCTION: Call a function
  • RETURN_VALUE: Return from function

Python Object Model

Everything is a PyObject

In CPython, every Python object is represented as a PyObject structure:

typedef struct _object { _PyObject_HEAD_EXTRA Py_ssize_t ob_refcnt; // Reference count PyTypeObject *ob_type; // Type pointer } PyObject;

Type Objects

Every Python type (int, str, list, etc.) has a corresponding type object:

typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; // Type name Py_ssize_t tp_basicsize; // Instance size destructor tp_dealloc; // Deallocator getattrfunc tp_getattr; // Get attribute setattrfunc tp_setattr; // Set attribute // ... many more fields } PyTypeObject;

Object Creation

When you create an object in Python:

x = 42 # Creates a PyLongObject

CPython:

  1. Allocates memory for PyLongObject
  2. Sets reference count to 1
  3. Sets type pointer to PyLong_Type
  4. Stores the value 42

Memory Management

PyMalloc: Python's Memory Allocator

CPython uses a hierarchical memory management system:

  1. Small objects (< 512 bytes): PyMalloc
  2. Large objects: System malloc
  3. Memory pools: Pre-allocated blocks

Memory Pools and Arenas

Arena (256 KB) ├── Pool 1 (4 KB) - 8-byte blocks ├── Pool 2 (4 KB) - 16-byte blocks ├── Pool 3 (4 KB) - 24-byte blocks └── ... (up to 512-byte blocks)

Object Allocation Strategy

# Small integer optimization a = 256 # Uses cached object b = 256 # Same object as 'a' print(a is b) # True c = 257 # Creates new object d = 257 # Different object print(c is d) # False

Performance Optimization

CPython caches small integers (-5 to 256) and single-character strings for performance.

Reference Counting

How Reference Counting Works

import sys x = [] # refcount = 1 y = x # refcount = 2 z = [x, x] # refcount = 4 print(sys.getrefcount(x)) # Shows 5 (includes temporary reference)

Reference Count Operations

// Increment reference count Py_INCREF(obj); // Decrement reference count Py_DECREF(obj); // Deallocates if refcount reaches 0

Circular References Problem

# Circular reference class Node: def __init__(self): self.ref = None a = Node() b = Node() a.ref = b b.ref = a # Circular reference!

Garbage Collection

Generational Garbage Collection

CPython uses a generational garbage collector for circular references:

  • Generation 0: New objects
  • Generation 1: Survived one collection
  • Generation 2: Long-lived objects

How the Cyclic GC Works

CPython's primary memory reclamation mechanism is reference counting (described above). The cyclic garbage collector exists solely to handle reference cycles that reference counting cannot detect. It does not use a mark-and-sweep algorithm. Instead, the cyclic GC works as follows:

  1. Enumerate containers: Find all container objects (lists, dicts, classes, etc.) in the current generation being collected
  2. Compute gc_refs: For each container, copy its reference count into a temporary gc_refs field, then traverse each container's references and decrement gc_refs for every internally-referenced object
  3. Identify unreachable objects: Objects whose gc_refs drops to zero are potentially unreachable -- they are only kept alive by objects within the cycle, not by any external reference
  4. Collect and finalize: Unreachable objects are moved to an unreachable set, finalizers are called where needed, and the objects are deallocated

There is no compaction phase in CPython's garbage collector.

Controlling the GC

import gc # Disable automatic collection gc.disable() # Manual collection collected = gc.collect() print(f"Collected {collected} objects") # GC statistics print(gc.get_stats()) # Set collection thresholds gc.set_threshold(700, 10, 10)

The Global Interpreter Lock (GIL)

Why the GIL Exists

The GIL ensures thread safety for:

  1. Reference counting operations
  2. Memory allocation
  3. Python/C API calls

GIL Behavior

import threading import time def cpu_bound(): total = 0 for i in range(100_000_000): total += i return total # Single thread start = time.time() cpu_bound() print(f"Single thread: {time.time() - start:.2f}s") # Multiple threads (doesn't help for CPU-bound) start = time.time() threads = [] for _ in range(4): t = threading.Thread(target=cpu_bound) t.start() threads.append(t) for t in threads: t.join() print(f"Multi-thread: {time.time() - start:.2f}s")

GIL Release Points

The GIL is released during:

  • I/O operations
  • time.sleep()
  • C extension calls (if designed to)
  • Every 5 milliseconds (the default time-based switch interval since Python 3.2, configurable via sys.setswitchinterval())

Optimization Techniques

Peephole Optimizer

CPython performs compile-time optimizations:

# Before optimization if True: x = 1 else: x = 2 # After optimization (dead code eliminated) x = 1

Constant Folding

# Compile-time evaluation result = 2 * 3 + 4 # Becomes: result = 10

String Interning

# String interning a = "hello" b = "hello" print(a is b) # True (interned) c = "hello world" d = "hello world" print(c is d) # May be False (not automatically interned) # Force interning import sys e = sys.intern("hello world") f = sys.intern("hello world") print(e is f) # True

Function Calls and Stack Frames

Python Stack Frame

typedef struct _frame { PyObject_VAR_HEAD struct _frame *f_back; // Previous frame PyCodeObject *f_code; // Code object PyObject *f_builtins; // Builtin namespace PyObject *f_globals; // Global namespace PyObject *f_locals; // Local namespace PyObject **f_valuestack; // Value stack // ... more fields } PyFrameObject;

Function Call Overhead

def add(a, b): return a + b # Function call creates: # 1. New frame object # 2. Argument parsing # 3. Local namespace setup # 4. Frame cleanup on return

C Extensions Interface

Writing C Extensions

#include <Python.h> static PyObject* fast_add(PyObject* self, PyObject* args) { long a, b; if (!PyArg_ParseTuple(args, "ll", &a, &b)) return NULL; return PyLong_FromLong(a + b); } static PyMethodDef module_methods[] = { {"fast_add", fast_add, METH_VARARGS, "Add two numbers"}, {NULL, NULL, 0, NULL} }; static struct PyModuleDef module = { PyModuleDef_HEAD_INIT, "fastmath", "Fast math operations", -1, module_methods }; PyMODINIT_FUNC PyInit_fastmath(void) { return PyModule_Create(&module); }

Using Cython for Optimization

# Pure Python def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2) # Cython version (fibonacci.pyx) def fibonacci_cy(int n): if n <= 1: return n return fibonacci_cy(n-1) + fibonacci_cy(n-2)

Performance Profiling

Using cProfile

import cProfile import pstats def profile_code(): # Your code here pass cProfile.run('profile_code()', 'profile_stats') # Analyze results p = pstats.Stats('profile_stats') p.sort_stats('cumulative') p.print_stats(10)

Memory Profiling

from memory_profiler import profile @profile def memory_intensive(): a = [1] * (10 ** 6) b = [2] * (2 * 10 ** 7) del b return a

Python 3.11+ Optimizations

Adaptive Bytecode

Python 3.11 introduces adaptive bytecode that specializes based on runtime behavior:

def add_numbers(a, b): return a + b # Specializes for int after seeing int inputs # First calls: generic bytecode # After ~8 calls with ints: specialized INT_ADD

Frame Objects Optimization

  • Lazy frame creation
  • Reduced memory overhead
  • Faster function calls

Best Practices for Performance

  1. Use built-in functions: They're implemented in C
  2. List comprehensions: Faster than loops
  3. Local variables: Faster than global
  4. __slots__: Reduce memory for classes
  5. Profile before optimizing: Measure, don't guess
# Slower result = [] for i in range(1000): if i % 2 == 0: result.append(i * 2) # Faster result = [i * 2 for i in range(1000) if i % 2 == 0]

Debugging CPython

Using gdb with Python

# Debug Python with gdb gdb python (gdb) run script.py (gdb) py-bt # Python backtrace (gdb) py-list # List Python source (gdb) py-locals # Show local variables

Inspecting Objects

import ctypes import sys def get_object_address(obj): return id(obj) def inspect_pyobject(obj): address = id(obj) refcount = sys.getrefcount(obj) size = sys.getsizeof(obj) print(f"Address: 0x{address:x}") print(f"Refcount: {refcount}") print(f"Size: {size} bytes")

Conclusion

Understanding CPython internals helps you:

  • Write more efficient Python code
  • Debug performance issues
  • Understand Python's limitations
  • Make informed decisions about optimization
  • Contribute to CPython development

The journey from Python source code to execution involves complex machinery: bytecode compilation, memory management, garbage collection, and the GIL. While Python abstracts these details, knowing them makes you a better Python developer.

Further Reading

Abhik Sarkar

Abhik Sarkar

Machine Learning Consultant specializing in Computer Vision and Deep Learning. Leading ML teams and building innovative solutions.

Share this article

If you found this article helpful, consider sharing it with your network

Mastodon