Skip to main content

Python Garbage Collection

Understand CPython garbage collection: reference counting, generational GC for circular references, weak references, and gc module tuning strategies.

What Garbage Collection Means in CPython

Garbage collection answers one question: can the running program still reach this object? If the answer is yes, the object must stay alive. If the answer is no, the runtime can reclaim the object or reuse its storage.

CPython uses two mechanisms together. Reference counting is the fast path: most objects disappear as soon as their last strong reference disappears. The cyclic garbage collector is the backup path: it finds groups of container objects that point at each other but are no longer reachable from the program.

That distinction matters. Python variables are names bound to objects, not boxes that own values. Lists, dictionaries, closures, stack frames, caches, and object attributes can all hold references. The object lives while at least one live path can still reach it.

Figure: CPython Garbage Collection Guided Story

Reference counting

CPython frees most objects as soon as the last strong reference disappears.

Step 1 of 5Create the object
Roots
items
points to list
Heap objects
list objectrefs 1

[]

Alive

The object has one strong reference, so CPython keeps it.

References
itemsliststrong ref
Create the object

items = [] creates a list object and binds one name to it.

Reference Counting

How It Works

Every CPython object has a reference count. Binding a name, storing an object in a container, capturing it in a closure, or keeping it in a cache can increase that count. Removing one of those owners decreases it. When the count reaches zero, CPython can deallocate the object immediately.

import sys # Object created, refcount = 1 x = [] # Reference added, refcount = 2 y = x # Check refcount (adds temporary reference) print(sys.getrefcount(x)) # 3 # Remove reference, refcount = 1 del y # Remove last reference, object freed del x # Object deallocated immediately

sys.getrefcount() is useful for learning, but it always reports one extra reference because passing the object into the function creates a temporary reference.

Tradeoffs

  • Immediate cleanup
  • Usually predictable lifetime for simple objects
  • Cannot reclaim isolated cycles by itself
  • Adds bookkeeping to reference-changing operations

Circular References Problem

Creating Circular References

Reference counting only sees how many references point to an object. It does not know whether those references are still reachable from running code.

# Simple circular reference class Node: def __init__(self, value): self.value = value self.next = None # Create cycle a = Node(1) b = Node(2) a.next = b b.next = a # Circular reference! # Even after deleting references del a, b # a and b are gone, but each node still references the other

After del a, b, no external name reaches the two nodes. Their internal next links still keep each other's reference count above zero, so the fast reference-counting path cannot free them. The cyclic collector is needed to notice that the whole group is unreachable.

Real-World Examples

# Parent-child relationships class Parent: def __init__(self): self.children = [] def add_child(self, child): self.children.append(child) child.parent = self # Circular reference # Event handlers class EventEmitter: def __init__(self): self.handlers = [] def on(self, handler): self.handlers.append(handler) class Handler: def __init__(self, emitter): self.emitter = emitter emitter.on(self.handle) # Circular reference def handle(self, event): pass

Generational Garbage Collection

Generations and Thresholds

CPython's cyclic collector focuses on tracked container objects that can participate in cycles, such as lists, dictionaries, instances, frames, and closures. Simple non-container objects do not need cycle detection.

Generations are a performance shortcut. Newly allocated tracked containers start in the youngest generation, which is checked most often. Objects that survive collections are promoted, so long-lived objects are scanned less frequently.

Most CPython versions you will meet expose three generation counters and thresholds. One detail is version-sensitive: Python 3.14.0 through 3.14.4 temporarily used a young/old model where generation 1 had no objects and threshold2 was ignored; Python 3.14.5 restores the three-generation behavior to match Python 3.13.

import gc # Typical CPython 3.13 and 3.14.5+ thresholds print(gc.get_threshold()) # often (700, 10, 10) # Generation 0: New objects, collected frequently # Generation 1: Survived 1 collection # Generation 2: Long-lived objects, collected rarely

Collection Algorithm

At a high level, CPython does not run a normal tracing mark/sweep over every object. It starts with tracked container candidates, copies each object's reference count into temporary GC bookkeeping, then traverses the candidate containers and subtracts references that come from inside the candidate set. Containers that still have an outside reference survive. Containers only reachable from each other become tentatively unreachable and can be finalized and freed.

# Simplified CPython cycle-detection idea def collect_generation(generation): candidates = tracked(generation) # 1. Copy real refcounts. for obj in candidates: obj.gc_refs = obj.refcount # 2. Subtract internal references. for obj in candidates: for child in traverse(obj): if child in candidates: child.gc_refs -= 1 # 3. Outside refs prove reachability. reachable = { obj for obj in candidates if obj.gc_refs > 0 } restore_reachable_closure(reachable) # 4. The rest are isolated cycles. collect_cycles(candidates - reachable)

Using the gc Module

Manual Collection

Most applications should leave automatic GC enabled. Manual collection is mainly useful in tests, diagnostics, or controlled batch phases where you have measured a benefit.

import gc # Disable automatic collection gc.disable() # Create circular references for i in range(1000): a = [] b = [] a.append(b) b.append(a) # Manual collection collected = gc.collect() # Returns number of objects collected print(f"Collected {collected} objects") # Re-enable automatic collection gc.enable()

Debugging with gc

The gc module can show collection counts and uncollectable objects. Use debug flags temporarily because they can produce noisy output and change runtime behavior.

import gc # Print collection stats and uncollectable objects. gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_UNCOLLECTABLE) # Create objects with circular references class MyClass: def __init__(self): self.ref = self obj = MyClass() del obj # Force collection with debug output gc.collect() # Objects that could not be freed are placed in gc.garbage. garbage = gc.garbage print(f"Uncollectable objects: {len(garbage)}")

Use gc.DEBUG_LEAK only when you intentionally want inspection mode: it includes gc.DEBUG_SAVEALL, which saves all unreachable objects in gc.garbage instead of freeing them.

GC Statistics

import gc # Get collection stats stats = gc.get_stats() for i, gen_stats in enumerate(stats): print(f"Generation {i}:") print(f" Collections: {gen_stats['collections']}") print(f" Collected: {gen_stats['collected']}") print(f" Uncollectable: {gen_stats['uncollectable']}")

Weak References

Breaking Cycles with weakref

Weak references let one object observe another without owning it. They are useful for parent links, observer lists, and caches where keeping a strong reference would accidentally extend an object's lifetime.

import weakref class Node: def __init__(self, value): self.value = value self.parent = None self.children = [] def add_child(self, child): self.children.append(child) # Use weak reference to avoid cycle child.parent = weakref.ref(self) def get_parent(self): if self.parent: return self.parent() # Call weak reference return None # No circular reference problem! parent = Node("parent") child = Node("child") parent.add_child(child) del parent # Parent can be collected # child.get_parent() returns None

WeakValueDictionary

import weakref # Cache that doesn't prevent garbage collection class MyExpensiveObject: pass class Cache: def __init__(self): self._cache = weakref.WeakValueDictionary() def get(self, key): return self._cache.get(key) def set(self, key, value): self._cache[key] = value cache = Cache() obj = MyExpensiveObject() cache.set("key", obj) # Object exists in cache assert cache.get("key") is obj # Delete only reference del obj # Object automatically removed from cache import gc gc.collect() assert cache.get("key") is None

Performance Tuning

Adjusting Thresholds

Treat GC thresholds as a last-mile tuning tool. Profile first, because changing thresholds can trade lower pause frequency for higher memory growth or longer collection bursts.

import gc # Typical CPython 3.13 and 3.14.5+: (700, 10, 10) # Python 3.14.0-3.14.4 ignored threshold2. # More aggressive collection gc.set_threshold(500, 5, 5) # Less frequent collection (better performance) gc.set_threshold(1000, 20, 20) # Disable automatic collection entirely gc.set_threshold(0) # Must call gc.collect() manually

Monitoring GC Impact

import gc import time def workload(): for _ in range(1000): a = [] b = [a] a.append(b) start = time.perf_counter() workload() work_time = time.perf_counter() - start start = time.perf_counter() collected = gc.collect() collect_time = time.perf_counter() - start print(f"workload: {work_time:.6f}s") print(f"gc.collect: {collect_time:.6f}s") print(f"collected: {collected}")

Best Practices

1. Avoid Creating Cycles

# Bad: Circular reference class BadNode: def __init__(self): self.parent = None self.children = [] # Good: Use weak references class GoodNode: def __init__(self): self.parent = None # Set as weakref.ref() self.children = []

2. Explicitly Break Cycles

class Resource: def __init__(self): self.circular_ref = self def cleanup(self): self.circular_ref = None # Break cycle # Use context managers class ManagedResource: def cleanup(self): self.circular_ref = None def __enter__(self): return self def __exit__(self, *args): self.cleanup() # Automatic cleanup

3. Use __slots__ for Large Numbers of Objects

# Reduces memory and GC overhead class Point: __slots__ = ('x', 'y') def __init__(self, x, y): self.x = x self.y = y

4. Profile Before Optimizing

import gc import tracemalloc # Start tracing tracemalloc.start() gc.collect() # Clean slate # Your code here create_many_objects() # Check memory and GC stats snapshot = tracemalloc.take_snapshot() top_stats = snapshot.statistics('lineno') for stat in top_stats[:10]: print(stat) print(f"GC collected: {gc.collect()} objects")

Common Pitfalls

1. Relying on del

# Fragile: cleanup timing depends on object lifetime class Bad: def __del__(self): print("Cleaning up") # Better: make cleanup deterministic with a context manager class Good: def __enter__(self): return self def __exit__(self, *args): print("Cleaning up")

Finalizers can make cycles harder to reason about, and they do not express exactly when a resource should be released. For files, locks, sockets, database connections, and similar resources, prefer with blocks so cleanup happens at the end of the block.

2. Large Default Arguments

# Bad: Keeps large object alive def bad_function(data=large_default_object): pass # Good: Create when needed def good_function(data=None): if data is None: data = create_large_object()

If you found this explanation helpful, consider sharing it with others.

Mastodon