Why the Object Model Matters
Every value in Python -- every integer, string, list, function, class, and even None -- is an object. This is not just a language design philosophy; it is a concrete implementation reality. Understanding how CPython represents objects in memory explains why Python behaves the way it does: why variables work like name tags rather than boxes, why small integers share identity, why is and == differ, and why some operations are fast while others are slow.
The object model is the foundation upon which all of Python is built. Once you understand it, patterns that seemed arbitrary -- mutable default arguments, the behavior of copy vs deepcopy, the cost of attribute access -- become logical consequences of a simple, unified design.
Python Object Model
Python Code
x = 42Memory Layout
C Implementation
Click "Show C Structure" to view implementation
PyLongObject
- • Every Python object starts with PyObject_HEAD
- • ob_refcnt tracks reference count for memory management
- • ob_type points to the type object (metaclass)
- • Additional fields store the actual data
Common PyObject Operations
Reference Counting
Py_INCREF(obj);
Py_DECREF(obj);Type Checking
PyLong_Check(obj);
Py_TYPE(obj);Object Creation
PyLong_FromLong(42);
PyUnicode_FromString("hello");The Universal Blueprint: PyObject
At the C level, every Python object begins with the same two-field header. No matter how complex the object -- a giant dictionary, a compiled function, an entire module -- it starts with these same two pieces of information:
- A reference count -- how many names currently point to this object
- A type pointer -- a pointer to another object that describes what this object is
Think of every Python object as a labeled box sitting in a warehouse. The reference count is a tally sheet taped to the box, tracking how many name tags point to it. The type pointer is a label saying "this box contains an integer" or "this box contains a list." The warehouse manager (CPython's memory system) checks the tally sheet, and when it drops to zero, the box gets recycled.
This uniform structure is what makes Python's dynamism possible. Because every object has a type pointer, Python can always ask "what are you?" at runtime. Because every object has a reference count, Python can manage memory automatically without requiring the programmer to free anything manually.
Fixed-Size vs Variable-Size Objects
Some objects always occupy the same amount of memory. An integer object (ignoring arbitrary precision for a moment) has a fixed layout. But a list can hold 3 items or 3 million. For these variable-length objects, CPython adds a third field to the header: a size field that records how many items the object currently contains.
This distinction divides all Python objects into two families:
| Family | Header Fields | Examples |
|---|---|---|
| Fixed-size (PyObject) | Reference count, type pointer | None, True, float, small int |
| Variable-size (PyVarObject) | Reference count, type pointer, item count | list, tuple, str, bytes, dict |
Variables Are Name Tags, Not Boxes
In many languages, a variable is a named container that holds a value. In Python, a variable is a name tag attached to an object. The object exists independently in memory; the variable is just a way to find it.
When you write x = 42, Python does not put the number 42 into a container called x. Instead, it creates an integer object holding 42 (or reuses an existing one) and attaches the name x to it. When you then write y = x, no copying occurs. Python simply attaches a second name tag, y, to the same object. Both names now refer to the same box in the warehouse, and the reference count on that box increments from 1 to 2.
This explains several behaviors that surprise newcomers:
- Mutable aliasing: If
xandypoint to the same list, appending throughxis visible throughy. They are two name tags on the same box. - Rebinding vs mutating:
x = [1, 2, 3]followed byx = [4, 5, 6]does not change the first list. It moves the name tagxto a new box. The old box's reference count drops by one. isvs==: Theisoperator checks whether two names point to the same box (same object identity). The==operator opens both boxes and compares their contents.
Reference Counting: Automatic Memory Management
Every time a name is attached to an object, its reference count increases. Every time a name is removed -- through del, reassignment, or a variable going out of scope -- the count decreases. When the count reaches zero, no name tags point to the box anymore, and CPython immediately deallocates it.
import sys x = [1, 2, 3] # refcount = 1 (just x) y = x # refcount = 2 (x and y) del y # refcount = 1 (just x) print(sys.getrefcount(x)) # Shows 2 (x + temporary reference from getrefcount itself)
This approach is simple and predictable: objects are freed the instant they become unreachable. However, reference counting alone cannot handle circular references -- two objects that point to each other but are otherwise unreachable. For these cases, CPython supplements reference counting with a periodic cycle detector (the gc module) that identifies and collects reference cycles.
Type Objects: Objects That Describe Objects
The type pointer in every object's header points to a type object -- itself a Python object with its own reference count and type pointer. The type object is like a blueprint: it defines what operations the object supports, how large it is in memory, and how to create and destroy instances.
When you call type(42), Python follows the integer object's type pointer and returns the type object it finds: <class 'int'>. When you call type(int), you follow that object's type pointer and arrive at <class 'type'> -- the metaclass. And type(type) loops back to itself. This is the root of Python's type hierarchy.
The Type Hierarchy
The relationship between objects, types, and the metaclass forms a clean lattice:
| Expression | Result | Meaning |
|---|---|---|
type(42) | <class 'int'> | 42 is an instance of int |
type(int) | <class 'type'> | int is an instance of type (a metaclass instance) |
type(type) | <class 'type'> | type is its own metaclass (the bootstrap root) |
isinstance(True, int) | True | bool is a subclass of int |
issubclass(int, object) | True | everything inherits from object |
Every type object carries a rich collection of slots -- function pointers that implement the type's behavior. When you write a + b, Python does not look up a method by name. It goes to a's type object, finds the slot for addition, and calls the function pointer stored there. This slot-based dispatch is what makes Python's operator overloading and special methods work.
Special Methods and Their Slots
Python's special methods (__init__, __repr__, __add__, etc.) are the programmer-facing interface to the type object's internal slots. When you define __add__ on a class, CPython installs a function pointer in the corresponding slot of that class's type object.
| Python Special Method | Internal Slot | Purpose |
|---|---|---|
__init__ | tp_init | Initialize a new instance |
__new__ | tp_new | Allocate and create a new instance |
__del__ | tp_dealloc | Clean up before deallocation |
__repr__ | tp_repr | Developer-facing string representation |
__add__ | nb_add (via tp_as_number) | Addition operator |
__len__ | sq_length (via tp_as_sequence) | Length for sequences |
__getattr__ | tp_getattro | Attribute access |
__call__ | tp_call | Make the object callable |
Operations are grouped into protocols. Numeric operations (add, multiply, negate) live in the number protocol. Sequence operations (length, indexing, slicing) live in the sequence protocol. Mapping operations (key-based access) live in the mapping protocol. A type opts into a protocol by populating the corresponding group of slots.
The Descriptor Protocol: How Attribute Access Really Works
When you access obj.x, Python does not simply look up x in a dictionary. It follows a carefully defined sequence called the descriptor protocol, which is the mechanism behind properties, class methods, static methods, and even ordinary method binding.
The lookup proceeds in this order:
- Data descriptors on the type: If the class (or any base class) has an attribute
xthat defines both__get__and__set__, it takes priority. Properties are data descriptors. - Instance dictionary: If the object has
xin its own__dict__, that value is returned. - Non-data descriptors on the type: If the class has an attribute
xthat defines__get__but not__set__, it is used. Functions are non-data descriptors, which is how they become bound methods. - AttributeError: If none of the above found
x, Python raises an error.
This three-tier system explains why properties can override instance attributes (they are data descriptors with higher priority) and why assigning to an instance attribute shadows a class-level function (the instance dict sits between data and non-data descriptors in the lookup order).
Method Resolution Order: Navigating Inheritance
When a class inherits from multiple parents, Python must decide which parent's method to use. The Method Resolution Order (MRO) is a linearization of the inheritance graph computed using the C3 algorithm. It guarantees that:
- A class always appears before its parents
- If a class inherits from A then B, A is checked before B
- The order is consistent across the entire hierarchy
The MRO matters because attribute lookup on a class walks the MRO from left to right, checking each class's namespace until it finds the attribute. You can inspect any class's MRO through its __mro__ attribute.
Memory Layout and Optimization
The Cost of Flexibility
By default, every instance of a user-defined class carries a __dict__ -- a full dictionary object for storing arbitrary attributes. This is powerful (you can add any attribute to any instance at runtime) but expensive: each dictionary consumes significant memory, and dictionary lookups are slower than direct memory access.
Slots: Trading Flexibility for Efficiency
Defining __slots__ on a class tells CPython to allocate fixed storage for the named attributes instead of a per-instance dictionary. The attributes are stored directly in the object's memory layout at known offsets, enabling fast, direct access.
| Aspect | Default (__dict__) | With __slots__ |
|---|---|---|
| Memory per instance | ~200+ bytes (dict overhead) | Only the fields themselves (~8 bytes each) |
| Attribute access speed | Dictionary hash lookup | Direct memory offset |
| Can add arbitrary attributes | Yes | No (only declared slots) |
| Supports weak references | Yes (via __weakref__) | Only if __weakref__ is in slots |
For classes with millions of instances (data points, graph nodes, pixel records), __slots__ can reduce memory usage by 40-60% and measurably speed up attribute access.
Singleton Objects: None, True, and False
Some objects exist as singletons -- exactly one instance exists for the entire runtime. None, True, and False are all singleton objects. Every use of None anywhere in a Python program refers to the exact same object in memory, which is why x is None is the idiomatic way to check for None: it is an identity check, not an equality check, and it is both faster and more correct.
CPython also caches small integers (typically -5 through 256) as singletons. This is why a = 5; b = 5; a is b returns True -- both names point to the same pre-allocated integer object. For larger integers, CPython creates new objects, so is comparisons become unreliable. This is an implementation detail, not a language guarantee, and is why == should always be used for value comparisons.
Performance Implications
Understanding the object model reveals why certain Python patterns are faster than others:
| Operation | Relative Speed | Why |
|---|---|---|
| Local variable access | Fastest | Stored by index in a C array on the frame |
Attribute with __slots__ | Fast | Direct memory offset, no hash lookup |
Attribute via __dict__ | Moderate | Dictionary hash table lookup |
| Global variable access | Slower | Two dictionary lookups (local namespace miss, then global) |
Deeply chained access (a.b.c.d) | Slowest | Descriptor protocol runs at each dot |
A common optimization pattern is to cache attribute lookups in local variables within tight loops. Assigning method = obj.method before a loop avoids repeating the descriptor protocol thousands of times.
Key Takeaways
-
Every Python value is a PyObject with a reference count and a type pointer. This uniform structure enables dynamic typing, introspection, and automatic memory management.
-
Variables are name tags, not boxes. Assignment attaches a name to an existing object; it does not copy data. Multiple names can point to the same object, and the reference count tracks how many do.
-
Reference counting provides immediate cleanup. Objects are deallocated the instant their reference count drops to zero. A supplementary cycle detector handles circular references.
-
Type objects define behavior through slots. Python's special methods (
__add__,__len__,__getattr__) map to function pointer slots in the type object. Protocols group related slots together. -
The descriptor protocol governs attribute access. The three-tier lookup (data descriptors, instance dict, non-data descriptors) explains properties, method binding, and attribute shadowing.
-
__slots__trades flexibility for performance. Replacing the per-instance dictionary with fixed-offset storage saves memory and speeds up attribute access for high-volume objects. -
Singletons and caching are implementation details.
None,True,False, and small integers are shared objects. Useisfor identity checks on singletons, but always use==for value comparisons.
