Python Shared Memory

What is Shared Memory?

multiprocessing.shared_memory is a Python 3.8+ feature that enables multiple processes to access the same block of memory directly. Unlike pipes or queues that serialize data between processes, shared memory provides raw byte-level access—making it extremely fast but also requiring careful synchronization to avoid race conditions and data corruption.

In typical multiprocessing, each process has its own isolated memory space. When processes need to communicate, data must be serialized (pickled), sent through a pipe or queue, and deserialized—creating significant overhead. Shared memory bypasses this by allowing multiple processes to read and write to the same physical memory region.

Loading visualization...

Creating and Using Shared Memory

The SharedMemory class creates a named block of memory that can be accessed by any process that knows its name:

from multiprocessing import shared_memory
import numpy as np

# Process A: Create shared memory block
shm = shared_memory.SharedMemory(create=True, size=1024, name="my_shared_block")
print(f"Created: {shm.name}, Size: {shm.size} bytes")

# Write data to shared memory
data = b"Hello from Process A!"
shm.buf[:len(data)] = data

# Process B: Attach to existing shared memory
shm_b = shared_memory.SharedMemory(name="my_shared_block")
message = bytes(shm_b.buf[:21]).decode('utf-8')
print(f"Read: {message}")  # Output: Hello from Process A!

# IMPORTANT: Cleanup (covered in detail later)
shm_b.close()
shm.close()
shm.unlink()  # Only creator should unlink!

Memory Layout of SharedMemory

A SharedMemory object wraps a raw block of bytes allocated in the operating system's shared memory region. The buf attribute provides a memoryview for direct byte manipulation.

Loading visualization...

The Danger: Race Conditions

The biggest danger with shared memory is race conditions—when multiple processes read and write to the same memory location simultaneously, leading to data corruption or undefined behavior. Unlike with regular Python objects, there is no Global Interpreter Lock (GIL) protecting shared memory across processes.

⚠️ RACE CONDITION occurs when two or more processes access shared data concurrently, and at least one access is a write. The final state depends on the unpredictable timing of process execution—a recipe for bugs that are nearly impossible to reproduce and debug.

Loading visualization...

Using SharedMemory with NumPy

One of the most powerful uses of shared memory is with NumPy arrays. By creating a NumPy array that uses shared memory as its buffer, multiple processes can operate on the same numerical data without copying.

import numpy as np
from multiprocessing import shared_memory, Process

def create_shared_array(shape, dtype=np.float64):
    """Create a NumPy array backed by shared memory."""
    size = np.prod(shape) * np.dtype(dtype).itemsize
    shm = shared_memory.SharedMemory(create=True, size=size)
    arr = np.ndarray(shape, dtype=dtype, buffer=shm.buf)
    return arr, shm

def attach_shared_array(name, shape, dtype=np.float64):
    """Attach to an existing shared NumPy array."""
    shm = shared_memory.SharedMemory(name=name)
    arr = np.ndarray(shape, dtype=dtype, buffer=shm.buf)
    return arr, shm

# Process A: Create shared array
arr, shm = create_shared_array((1000, 1000))
arr[:] = np.random.rand(1000, 1000)
print(f"Shared memory name: {shm.name}")
print(f"Array sum: {arr.sum():.2f}")

Loading visualization...

Safe Synchronization Methods

To safely use shared memory, you must use synchronization primitives. Python's multiprocessing module provides several options.

Loading visualization...

Memory Management: close() vs unlink()

Proper cleanup is critical with shared memory. Failing to clean up properly can leave orphaned shared memory blocks in your system, wasting resources until reboot.

⚠️ RESOURCE LEAK occurs when shared memory is not properly cleaned up. On POSIX systems (Linux/macOS), shared memory persists until explicitly unlinked—even after your program exits!

Loading visualization...

Best Practice: Context Manager Pattern

from contextlib import contextmanager
from multiprocessing import shared_memory

@contextmanager
def managed_shared_memory(name=None, create=False, size=0):
    """Context manager for automatic shared memory cleanup."""
    shm = None
    try:
        if create:
            shm = shared_memory.SharedMemory(create=True, size=size, name=name)
        else:
            shm = shared_memory.SharedMemory(name=name)
        yield shm
    finally:
        if shm is not None:
            shm.close()
            if create:
                try:
                    shm.unlink()
                except FileNotFoundError:
                    pass  # Already unlinked

# Usage - automatic cleanup guaranteed!
with managed_shared_memory(create=True, size=1024) as shm:
    shm.buf[0:5] = b"Hello"
    print(f"Name: {shm.name}")
# ✓ Automatically closed and unlinked here!

Common Pitfalls and Solutions

Pitfall	Problem	Solution
No Synchronization	Race conditions corrupt data	Always use Lock, RLock, or Condition
Forgetting unlink()	Shared memory leaks	Use context managers or try/finally
Multiple unlink()	FileNotFoundError	Only creator should unlink
Wrong size	Data truncation or segfault	Calculate exact byte size needed
Endianness issues	Wrong values on different CPUs	Use explicit byte order ('little' or 'big')
Using after close()	ValueError or crash	Don't access buf after close()
Name collisions	FileExistsError	Use unique names or let Python generate

Complete API Reference

SharedMemory(name=None, create=False, size=0)

Purpose: Create or attach to shared memory block

Parameters:

name - Unique identifier (auto-generated if None and create=True)
create - True to create new, False to attach to existing
size - Bytes to allocate (only when create=True)

Raises: FileExistsError (create existing), FileNotFoundError (attach nonexistent)

shm.buf

Type: memoryview

Purpose: Direct access to shared memory bytes

Usage: shm.buf[0:10] = b"0123456789"

Warning: Do not access after close()!

shm.name

Type: str (read-only)

Purpose: Unique identifier for the shared memory block

Usage: Pass to other processes so they can attach

shm.size

Type: int (read-only)

Purpose: Total bytes allocated

Note: May be larger than requested (page alignment)

shm.close()

Purpose: Release this process's access to shared memory

Required: Every process MUST call this

Effect: buf becomes invalid, block still exists

shm.unlink()

Purpose: Request deletion of shared memory block

Careful: Only creator should call this

Effect: Block deleted when last process detaches

Raises: FileNotFoundError if already unlinked

Real-World Example: Parallel Image Processing

import numpy as np
from multiprocessing import shared_memory, Process, Lock
import time

def process_chunk(shm_name, shape, dtype, start_row, end_row, lock):
    """Process a chunk of the image (e.g., apply grayscale)."""
    # Attach to shared memory
    shm = shared_memory.SharedMemory(name=shm_name)
    img = np.ndarray(shape, dtype=dtype, buffer=shm.buf)

    # Process our assigned rows (no lock needed - non-overlapping regions!)
    for row in range(start_row, end_row):
        for col in range(shape[1]):
            # Convert RGB to grayscale
            r, g, b = img[row, col, 0], img[row, col, 1], img[row, col, 2]
            gray = int(0.299 * r + 0.587 * g + 0.114 * b)
            img[row, col] = [gray, gray, gray]

    # Report progress (needs lock for shared print)
    with lock:
        print(f"Processed rows {start_row}-{end_row}")

    shm.close()

def parallel_grayscale(image, num_workers=4):
    """Convert image to grayscale using multiple processes."""
    height, width, channels = image.shape

    # Create shared memory for the image
    shm = shared_memory.SharedMemory(create=True, size=image.nbytes)
    shared_img = np.ndarray(image.shape, dtype=image.dtype, buffer=shm.buf)
    shared_img[:] = image  # Copy data to shared memory

    lock = Lock()
    processes = []
    rows_per_worker = height // num_workers

    # Spawn worker processes
    for i in range(num_workers):
        start_row = i * rows_per_worker
        end_row = height if i == num_workers - 1 else (i + 1) * rows_per_worker

        p = Process(
            target=process_chunk,
            args=(shm.name, image.shape, image.dtype, start_row, end_row, lock)
        )
        processes.append(p)
        p.start()

    # Wait for all processes to complete
    for p in processes:
        p.join()

    # Copy result back
    result = shared_img.copy()

    # Cleanup
    shm.close()
    shm.unlink()

    return result

# Usage
if __name__ == "__main__":
    # Create a test image (1920x1080 RGB)
    test_image = np.random.randint(0, 256, (1080, 1920, 3), dtype=np.uint8)

    start = time.time()
    result = parallel_grayscale(test_image, num_workers=4)
    elapsed = time.time() - start

    print(f"Processed {test_image.size:,} pixels in {elapsed:.2f}s")

💡 Key Design Insight: This example avoids locks during processing by giving each worker a non-overlapping region. The only lock is for coordinating print statements. This pattern—partitioning work to avoid shared writes—is the most efficient way to use shared memory.

Key Takeaways

Shared memory is fast but dangerous: Zero-copy access means no serialization overhead, but you're responsible for synchronization.
Always synchronize writes: Use Lock, RLock, or Condition when multiple processes write to the same memory region.
Partition work when possible: The fastest approach is giving each process non-overlapping regions, eliminating the need for locks.
Clean up properly: Every process must call close(), and the creator must call unlink() to prevent resource leaks.
Use context managers: Wrap shared memory access in try/finally or custom context managers for automatic cleanup.
NumPy integration is powerful: Backing NumPy arrays with shared memory enables efficient parallel numerical computing.
Test for race conditions: Run with many iterations—race conditions are non-deterministic and may not appear in short tests.