Skip to main content

Python Object Model Internals

Learn how CPython implements PyObject, type objects, and the unified object model. Explore reference counting, memory layout, and Python internals.

10 min|programmingpythoninternalsruntime
Best viewed on desktop for optimal interactive experience

Why the Object Model Matters

Every value in Python -- every integer, string, list, function, class, and even None -- is an object. This is not just a language design philosophy; it is a concrete implementation reality. Understanding how CPython represents objects in memory explains why Python behaves the way it does: why variables work like name tags rather than boxes, why small integers share identity, why is and == differ, and why some operations are fast while others are slow.

The object model is the foundation upon which all of Python is built. Once you understand it, patterns that seemed arbitrary -- mutable default arguments, the behavior of copy vs deepcopy, the cost of attribute access -- become logical consequences of a simple, unified design.

Python Object Model

Python Code

x = 42

Memory Layout

ob_refcnt
1
Reference count
ob_type
&PyLong_Type
Type pointer
ob_size
1
Number of digits
ob_digit[0]
42
Actual value
Total Size:28 bytes

C Implementation

Click "Show C Structure" to view implementation

PyLongObject
  • • Every Python object starts with PyObject_HEAD
  • • ob_refcnt tracks reference count for memory management
  • • ob_type points to the type object (metaclass)
  • • Additional fields store the actual data

Common PyObject Operations

Reference Counting
Py_INCREF(obj);
Py_DECREF(obj);
Type Checking
PyLong_Check(obj);
Py_TYPE(obj);
Object Creation
PyLong_FromLong(42);
PyUnicode_FromString("hello");

The Universal Blueprint: PyObject

At the C level, every Python object begins with the same two-field header. No matter how complex the object -- a giant dictionary, a compiled function, an entire module -- it starts with these same two pieces of information:

  1. A reference count -- how many names currently point to this object
  2. A type pointer -- a pointer to another object that describes what this object is

Think of every Python object as a labeled box sitting in a warehouse. The reference count is a tally sheet taped to the box, tracking how many name tags point to it. The type pointer is a label saying "this box contains an integer" or "this box contains a list." The warehouse manager (CPython's memory system) checks the tally sheet, and when it drops to zero, the box gets recycled.

This uniform structure is what makes Python's dynamism possible. Because every object has a type pointer, Python can always ask "what are you?" at runtime. Because every object has a reference count, Python can manage memory automatically without requiring the programmer to free anything manually.

Fixed-Size vs Variable-Size Objects

Some objects always occupy the same amount of memory. An integer object (ignoring arbitrary precision for a moment) has a fixed layout. But a list can hold 3 items or 3 million. For these variable-length objects, CPython adds a third field to the header: a size field that records how many items the object currently contains.

This distinction divides all Python objects into two families:

FamilyHeader FieldsExamples
Fixed-size (PyObject)Reference count, type pointerNone, True, float, small int
Variable-size (PyVarObject)Reference count, type pointer, item countlist, tuple, str, bytes, dict

Variables Are Name Tags, Not Boxes

In many languages, a variable is a named container that holds a value. In Python, a variable is a name tag attached to an object. The object exists independently in memory; the variable is just a way to find it.

When you write x = 42, Python does not put the number 42 into a container called x. Instead, it creates an integer object holding 42 (or reuses an existing one) and attaches the name x to it. When you then write y = x, no copying occurs. Python simply attaches a second name tag, y, to the same object. Both names now refer to the same box in the warehouse, and the reference count on that box increments from 1 to 2.

This explains several behaviors that surprise newcomers:

  • Mutable aliasing: If x and y point to the same list, appending through x is visible through y. They are two name tags on the same box.
  • Rebinding vs mutating: x = [1, 2, 3] followed by x = [4, 5, 6] does not change the first list. It moves the name tag x to a new box. The old box's reference count drops by one.
  • is vs ==: The is operator checks whether two names point to the same box (same object identity). The == operator opens both boxes and compares their contents.

Reference Counting: Automatic Memory Management

Every time a name is attached to an object, its reference count increases. Every time a name is removed -- through del, reassignment, or a variable going out of scope -- the count decreases. When the count reaches zero, no name tags point to the box anymore, and CPython immediately deallocates it.

import sys x = [1, 2, 3] # refcount = 1 (just x) y = x # refcount = 2 (x and y) del y # refcount = 1 (just x) print(sys.getrefcount(x)) # Shows 2 (x + temporary reference from getrefcount itself)

This approach is simple and predictable: objects are freed the instant they become unreachable. However, reference counting alone cannot handle circular references -- two objects that point to each other but are otherwise unreachable. For these cases, CPython supplements reference counting with a periodic cycle detector (the gc module) that identifies and collects reference cycles.

Type Objects: Objects That Describe Objects

The type pointer in every object's header points to a type object -- itself a Python object with its own reference count and type pointer. The type object is like a blueprint: it defines what operations the object supports, how large it is in memory, and how to create and destroy instances.

When you call type(42), Python follows the integer object's type pointer and returns the type object it finds: <class 'int'>. When you call type(int), you follow that object's type pointer and arrive at <class 'type'> -- the metaclass. And type(type) loops back to itself. This is the root of Python's type hierarchy.

The Type Hierarchy

The relationship between objects, types, and the metaclass forms a clean lattice:

ExpressionResultMeaning
type(42)<class 'int'>42 is an instance of int
type(int)<class 'type'>int is an instance of type (a metaclass instance)
type(type)<class 'type'>type is its own metaclass (the bootstrap root)
isinstance(True, int)Truebool is a subclass of int
issubclass(int, object)Trueeverything inherits from object

Every type object carries a rich collection of slots -- function pointers that implement the type's behavior. When you write a + b, Python does not look up a method by name. It goes to a's type object, finds the slot for addition, and calls the function pointer stored there. This slot-based dispatch is what makes Python's operator overloading and special methods work.

Special Methods and Their Slots

Python's special methods (__init__, __repr__, __add__, etc.) are the programmer-facing interface to the type object's internal slots. When you define __add__ on a class, CPython installs a function pointer in the corresponding slot of that class's type object.

Python Special MethodInternal SlotPurpose
__init__tp_initInitialize a new instance
__new__tp_newAllocate and create a new instance
__del__tp_deallocClean up before deallocation
__repr__tp_reprDeveloper-facing string representation
__add__nb_add (via tp_as_number)Addition operator
__len__sq_length (via tp_as_sequence)Length for sequences
__getattr__tp_getattroAttribute access
__call__tp_callMake the object callable

Operations are grouped into protocols. Numeric operations (add, multiply, negate) live in the number protocol. Sequence operations (length, indexing, slicing) live in the sequence protocol. Mapping operations (key-based access) live in the mapping protocol. A type opts into a protocol by populating the corresponding group of slots.

The Descriptor Protocol: How Attribute Access Really Works

When you access obj.x, Python does not simply look up x in a dictionary. It follows a carefully defined sequence called the descriptor protocol, which is the mechanism behind properties, class methods, static methods, and even ordinary method binding.

The lookup proceeds in this order:

  1. Data descriptors on the type: If the class (or any base class) has an attribute x that defines both __get__ and __set__, it takes priority. Properties are data descriptors.
  2. Instance dictionary: If the object has x in its own __dict__, that value is returned.
  3. Non-data descriptors on the type: If the class has an attribute x that defines __get__ but not __set__, it is used. Functions are non-data descriptors, which is how they become bound methods.
  4. AttributeError: If none of the above found x, Python raises an error.

This three-tier system explains why properties can override instance attributes (they are data descriptors with higher priority) and why assigning to an instance attribute shadows a class-level function (the instance dict sits between data and non-data descriptors in the lookup order).

Method Resolution Order: Navigating Inheritance

When a class inherits from multiple parents, Python must decide which parent's method to use. The Method Resolution Order (MRO) is a linearization of the inheritance graph computed using the C3 algorithm. It guarantees that:

  • A class always appears before its parents
  • If a class inherits from A then B, A is checked before B
  • The order is consistent across the entire hierarchy

The MRO matters because attribute lookup on a class walks the MRO from left to right, checking each class's namespace until it finds the attribute. You can inspect any class's MRO through its __mro__ attribute.

Memory Layout and Optimization

The Cost of Flexibility

By default, every instance of a user-defined class carries a __dict__ -- a full dictionary object for storing arbitrary attributes. This is powerful (you can add any attribute to any instance at runtime) but expensive: each dictionary consumes significant memory, and dictionary lookups are slower than direct memory access.

Slots: Trading Flexibility for Efficiency

Defining __slots__ on a class tells CPython to allocate fixed storage for the named attributes instead of a per-instance dictionary. The attributes are stored directly in the object's memory layout at known offsets, enabling fast, direct access.

AspectDefault (__dict__)With __slots__
Memory per instance~200+ bytes (dict overhead)Only the fields themselves (~8 bytes each)
Attribute access speedDictionary hash lookupDirect memory offset
Can add arbitrary attributesYesNo (only declared slots)
Supports weak referencesYes (via __weakref__)Only if __weakref__ is in slots

For classes with millions of instances (data points, graph nodes, pixel records), __slots__ can reduce memory usage by 40-60% and measurably speed up attribute access.

Singleton Objects: None, True, and False

Some objects exist as singletons -- exactly one instance exists for the entire runtime. None, True, and False are all singleton objects. Every use of None anywhere in a Python program refers to the exact same object in memory, which is why x is None is the idiomatic way to check for None: it is an identity check, not an equality check, and it is both faster and more correct.

CPython also caches small integers (typically -5 through 256) as singletons. This is why a = 5; b = 5; a is b returns True -- both names point to the same pre-allocated integer object. For larger integers, CPython creates new objects, so is comparisons become unreliable. This is an implementation detail, not a language guarantee, and is why == should always be used for value comparisons.

Performance Implications

Understanding the object model reveals why certain Python patterns are faster than others:

OperationRelative SpeedWhy
Local variable accessFastestStored by index in a C array on the frame
Attribute with __slots__FastDirect memory offset, no hash lookup
Attribute via __dict__ModerateDictionary hash table lookup
Global variable accessSlowerTwo dictionary lookups (local namespace miss, then global)
Deeply chained access (a.b.c.d)SlowestDescriptor protocol runs at each dot

A common optimization pattern is to cache attribute lookups in local variables within tight loops. Assigning method = obj.method before a loop avoids repeating the descriptor protocol thousands of times.

Key Takeaways

  1. Every Python value is a PyObject with a reference count and a type pointer. This uniform structure enables dynamic typing, introspection, and automatic memory management.

  2. Variables are name tags, not boxes. Assignment attaches a name to an existing object; it does not copy data. Multiple names can point to the same object, and the reference count tracks how many do.

  3. Reference counting provides immediate cleanup. Objects are deallocated the instant their reference count drops to zero. A supplementary cycle detector handles circular references.

  4. Type objects define behavior through slots. Python's special methods (__add__, __len__, __getattr__) map to function pointer slots in the type object. Protocols group related slots together.

  5. The descriptor protocol governs attribute access. The three-tier lookup (data descriptors, instance dict, non-data descriptors) explains properties, method binding, and attribute shadowing.

  6. __slots__ trades flexibility for performance. Replacing the per-instance dictionary with fixed-offset storage saves memory and speeds up attribute access for high-volume objects.

  7. Singletons and caching are implementation details. None, True, False, and small integers are shared objects. Use is for identity checks on singletons, but always use == for value comparisons.

If you found this explanation helpful, consider sharing it with others.

Mastodon