Python's Dual Approach
Python uses two complementary mechanisms for memory management:
- Reference Counting (primary, immediate)
- Generational Garbage Collection (secondary, for cycles)
Garbage Collection
Generational Garbage Collector
Generation 0
New objectsGeneration 1
Survived 1 collectionGeneration 2
Long-lived objectsCircular Reference Detection
A
→B
B
→C
C
→A
D
→E
E
→D
Objects A-B-C form a reachable cycle. Objects D-E form an unreachable cycle.
GC Configuration
import gc
gc.get_threshold()
# (700, 10, 10)
gc.set_threshold(800, 20, 20)
gc.collect() # Manual collectionBest Practices
- • Break circular references explicitly
- • Use weak references when appropriate
- • Monitor with gc.get_stats()
- • Profile with gc.set_debug()
Reference Counting
How It Works
import sys # Object created, refcount = 1 x = [] # Reference added, refcount = 2 y = x # Check refcount (adds temporary reference) print(sys.getrefcount(x)) # 3 # Remove reference, refcount = 1 del y # Remove last reference, object freed del x # Object deallocated immediately
Advantages
- Immediate cleanup
- Predictable
- Simple implementation
Disadvantages
- Cannot handle circular references
- Overhead on every operation
- Not thread-safe (needs GIL)
Circular References Problem
Creating Circular References
# Simple circular reference class Node: def __init__(self, value): self.value = value self.next = None # Create cycle a = Node(1) b = Node(2) a.next = b b.next = a # Circular reference! # Even after deleting references del a, b # Objects still exist in memory (refcount never reaches 0)
Real-World Examples
# Parent-child relationships class Parent: def __init__(self): self.children = [] def add_child(self, child): self.children.append(child) child.parent = self # Circular reference # Event handlers class EventEmitter: def __init__(self): self.handlers = [] def on(self, handler): self.handlers.append(handler) class Handler: def __init__(self, emitter): self.emitter = emitter emitter.on(self.handle) # Circular reference def handle(self, event): pass
Generational Garbage Collection
The Three Generations
import gc # Get current thresholds print(gc.get_threshold()) # (700, 10, 10) # Generation 0: New objects, collected frequently # Generation 1: Survived 1 collection # Generation 2: Long-lived objects, collected rarely
Collection Algorithm
# Simplified collection process def collect_generation(generation): # 1. Mark phase reachable = set() for root in get_root_objects(): mark_reachable(root, reachable) # 2. Sweep phase for obj in generation: if obj not in reachable: free_object(obj) # 3. Promote survivors if generation < 2: promote_to_next_generation(survivors)
Using the gc Module
Manual Collection
import gc # Disable automatic collection gc.disable() # Create circular references for i in range(1000): a = [] b = [] a.append(b) b.append(a) # Manual collection collected = gc.collect() # Returns number of objects collected print(f"Collected {collected} objects") # Re-enable automatic collection gc.enable()
Debugging with gc
import gc # Enable debug flags gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_LEAK) # Create objects with circular references class MyClass: def __init__(self): self.ref = self obj = MyClass() del obj # Force collection with debug output gc.collect() # Get garbage objects (uncollectable) garbage = gc.garbage print(f"Uncollectable objects: {len(garbage)}")
GC Statistics
import gc # Get collection stats stats = gc.get_stats() for i, gen_stats in enumerate(stats): print(f"Generation {i}:") print(f" Collections: {gen_stats['collections']}") print(f" Collected: {gen_stats['collected']}") print(f" Uncollectable: {gen_stats['uncollectable']}")
Weak References
Breaking Cycles with weakref
import weakref class Node: def __init__(self, value): self.value = value self.parent = None self.children = [] def add_child(self, child): self.children.append(child) # Use weak reference to avoid cycle child.parent = weakref.ref(self) def get_parent(self): if self.parent: return self.parent() # Call weak reference return None # No circular reference problem! parent = Node("parent") child = Node("child") parent.add_child(child) del parent # Parent can be collected # child.get_parent() returns None
WeakValueDictionary
import weakref # Cache that doesn't prevent garbage collection class Cache: def __init__(self): self._cache = weakref.WeakValueDictionary() def get(self, key): return self._cache.get(key) def set(self, key, value): self._cache[key] = value cache = Cache() obj = MyExpensiveObject() cache.set("key", obj) # Object exists in cache assert cache.get("key") is obj # Delete only reference del obj # Object automatically removed from cache import gc gc.collect() assert cache.get("key") is None
Performance Tuning
Adjusting Thresholds
import gc # Default: (700, 10, 10) # Format: (threshold0, threshold1, threshold2) # More aggressive collection gc.set_threshold(500, 5, 5) # Less frequent collection (better performance) gc.set_threshold(1000, 20, 20) # Disable automatic collection entirely gc.set_threshold(0) # Must call gc.collect() manually
Monitoring GC Impact
import gc import time # Measure GC overhead gc_time = 0 for gen_stats in gc.get_stats(): gc_time += gen_stats.get('gc_time', 0) print(f"Total GC time: {gc_time:.3f} seconds") # Benchmark with/without GC gc.disable() start = time.time() # Your code here no_gc_time = time.time() - start gc.enable() start = time.time() # Your code here with_gc_time = time.time() - start print(f"Without GC: {no_gc_time:.3f}s") print(f"With GC: {with_gc_time:.3f}s")
Best Practices
1. Avoid Creating Cycles
# Bad: Circular reference class BadNode: def __init__(self): self.parent = None self.children = [] # Good: Use weak references class GoodNode: def __init__(self): self.parent = None # Set as weakref.ref() self.children = []
2. Explicitly Break Cycles
class Resource: def __init__(self): self.circular_ref = self def cleanup(self): self.circular_ref = None # Break cycle # Use context managers class ManagedResource: def __enter__(self): return self def __exit__(self, *args): self.cleanup() # Automatic cleanup
3. Use slots for Large Numbers of Objects
# Reduces memory and GC overhead class Point: __slots__ = ('x', 'y') def __init__(self, x, y): self.x = x self.y = y
4. Profile Before Optimizing
import gc import tracemalloc # Start tracing tracemalloc.start() gc.collect() # Clean slate # Your code here create_many_objects() # Check memory and GC stats snapshot = tracemalloc.take_snapshot() top_stats = snapshot.statistics('lineno') for stat in top_stats[:10]: print(stat) print(f"GC collected: {gc.collect()} objects")
Common Pitfalls
1. Relying on del
# Problematic: __del__ with circular references class Bad: def __del__(self): print("Cleaning up") # May never be called! # Better: Use context managers class Good: def __enter__(self): return self def __exit__(self, *args): print("Cleaning up") # Guaranteed to be called
2. Large Default Arguments
# Bad: Keeps large object alive def bad_function(data=large_default_object): pass # Good: Create when needed def good_function(data=None): if data is None: data = create_large_object()
Key Takeaways
- Reference counting handles most cleanup immediately
- Cyclic GC handles circular references in generations
- Weak references can break cycles
- gc module provides control and debugging
- Profile first before adjusting GC settings
- Context managers ensure cleanup
- slots reduces GC overhead
