What is Python Bytecode?
Python bytecode is an intermediate representation of Python source code. When you run a Python program, it's first compiled to bytecode, which is then executed by the Python Virtual Machine (PVM).
Python Bytecode Visualization
Python Source
def add(a, b):
return a + b
result = add(3, 5)Bytecode Instructions
How Python Bytecode Works
- • Python compiles source code to bytecode before execution
- • Bytecode is platform-independent and cached in .pyc files
- • The Python Virtual Machine (PVM) executes bytecode instructions
- • Each instruction manipulates the value stack and local/global namespaces
The Compilation Process
1. Source Code to Tokens
Python first tokenizes your source code:
# Source code def add(a, b): return a + b # Tokens NAME 'def' NAME 'add' OP '(' NAME 'a' OP ',' NAME 'b' OP ')' OP ':' # ...
2. Tokens to AST
The tokens are parsed into an Abstract Syntax Tree:
import ast import inspect def add(a, b): return a + b # Get the AST tree = ast.parse(inspect.getsource(add)) print(ast.dump(tree))
3. AST to Bytecode
The AST is compiled to bytecode instructions:
import dis def add(a, b): return a + b dis.dis(add) # Output: # 2 0 LOAD_FAST 0 (a) # 2 LOAD_FAST 1 (b) # 4 BINARY_ADD # 6 RETURN_VALUE
Understanding Bytecode Instructions
Common Instructions
| Instruction | Description | Stack Effect |
|---|---|---|
LOAD_FAST | Load local variable | Push value |
LOAD_GLOBAL | Load global variable | Push value |
STORE_FAST | Store to local variable | Pop value |
BINARY_ADD | Add two values | Pop 2, push 1 |
CALL_FUNCTION | Call a function | Pop args+func, push result |
RETURN_VALUE | Return from function | Pop value |
POP_TOP | Remove top of stack | Pop 1 |
JUMP_IF_FALSE | Conditional jump | Check top |
The Value Stack
Python's VM is stack-based. Operations manipulate a value stack:
# Expression: a + b * c # Bytecode execution: LOAD_FAST 0 (a) # Stack: [a] LOAD_FAST 1 (b) # Stack: [a, b] LOAD_FAST 2 (c) # Stack: [a, b, c] BINARY_MUL # Stack: [a, b*c] BINARY_ADD # Stack: [a+b*c]
Bytecode Caching
.pyc Files
Python caches compiled bytecode in __pycache__ directories:
mymodule.py __pycache__/ mymodule.cpython-311.pyc # Python 3.11 bytecode
Cache Validation
Python checks if the source has changed:
- Compare modification timestamps
- Compare source hash (Python 3.7+)
- Recompile if needed
Inspecting Bytecode
Using the dis Module
import dis # Disassemble a function def factorial(n): if n <= 1: return 1 return n * factorial(n - 1) dis.dis(factorial)
Bytecode Objects
# Access bytecode directly code = factorial.__code__ print(f"Argument count: {code.co_argcount}") print(f"Local variables: {code.co_nlocals}") print(f"Stack size: {code.co_stacksize}") print(f"Constants: {code.co_consts}") print(f"Variable names: {code.co_varnames}") print(f"Bytecode: {code.co_code.hex()}")
Control Flow in Bytecode
Conditional Execution
def check_positive(x): if x > 0: return "positive" return "non-positive" # Bytecode uses jumps: # LOAD_FAST 0 (x) # LOAD_CONST 1 (0) # COMPARE_OP 4 (>) # POP_JUMP_IF_FALSE to 8 # LOAD_CONST 2 ('positive') # RETURN_VALUE # LOAD_CONST 3 ('non-positive') # RETURN_VALUE
Loops
def sum_range(n): total = 0 for i in range(n): total += i return total # Loop bytecode uses: # GET_ITER # FOR_ITER # JUMP_ABSOLUTE (back to loop start)
Python 3.11+ Improvements
Adaptive Bytecode
Python 3.11 specializes bytecode based on runtime behavior:
def add_numbers(a, b): return a + b # First calls: BINARY_ADD (generic) # After ~8 int additions: BINARY_ADD_INT (specialized)
Inline Caching
Frequently accessed attributes are cached inline:
# Before: LOAD_ATTR requires dictionary lookup # After: LOAD_ATTR_SLOT uses cached offset
Performance Implications
What's Fast
- Local variable access (LOAD_FAST)
- Built-in operations
- Specialized bytecode (3.11+)
What's Slow
- Global variable access (LOAD_GLOBAL)
- Attribute lookup (LOAD_ATTR)
- Function calls (CALL_FUNCTION)
Practical Example
# Original code def process_list(items): result = [] for item in items: if item > 0: result.append(item * 2) return result # More efficient (fewer bytecode instructions) def process_list_optimized(items): return [item * 2 for item in items if item > 0]
Key Takeaways
- Python compiles to bytecode before execution
- Bytecode is cached in .pyc files for faster imports
- The PVM is stack-based - operations manipulate a value stack
- Local variables are fastest - they use indexed access
- Python 3.11+ adapts bytecode based on runtime behavior
Understanding bytecode helps you write more efficient Python code and debug performance issues at a deeper level.
