Python Global Interpreter Lock (GIL)

What is the GIL?

The Global Interpreter Lock, usually called the GIL, is a lock inside CPython that allows only one thread at a time to execute Python bytecode in a single Python process.

That definition is accurate, but it is not the best way to build intuition. A lock sounds like an arbitrary limitation until you understand what CPython is trying to protect.

Think of a running Python program as a workshop:

Python objects are the parts on the workbench.
Threads are workers trying to use those parts.
CPython's interpreter is the machinery that updates objects, calls functions, and runs bytecode.
The GIL is a single shop pass. Only the worker holding the pass can operate the Python machinery.

Other workers can still wait, sleep, perform I/O, or run native code that releases the pass. But while ordinary Python bytecode is being executed, one worker is in charge.

The core idea

The GIL does not mean Python has only one thread. It means that, inside one CPython process, only one thread at a time runs Python bytecode. This is why CPU-bound Python threads take turns, while I/O-bound threads can still overlap waiting time.

For a broader look at how the GIL fits into CPython's architecture, see the CPython internals deep dive.

Global Interpreter Lock (GIL)

GIL Status

Not acquired

Bytecode Ticks

Thread Execution

Thread 1

CPU

Thread 2

CPU

Thread 3

Thread 4

CPU

When GIL is Released

I/O Operation

file.read(), socket.recv()

time.sleep()

time.sleep(1)

C Extension

numpy operations

Pure Python

for i in range(1000000)

Threading Lock

lock.acquire()

Every 100 bytecodes

Automatic check

CPU-Bound Tasks

No true parallelism

Threads take turns

Use multiprocessing instead

I/O-Bound Tasks

GIL released during I/O

Good concurrency

Threading works well

Working Around the GIL

• Use multiprocessing for CPU-bound parallelism
• Use asyncio for I/O-bound concurrency
• Write performance-critical code in C extensions
• Consider alternative Python implementations (PyPy, Jython)
• Use concurrent.futures for high-level parallelism

The Problem the GIL Solves

CPython manages most Python objects using reference counting. Every object keeps a small counter that says how many live references point to it.

For example:

items = []
other_name = items

Both items and other_name refer to the same list. CPython increments the list's reference count when another name points to it. When a reference disappears, CPython decrements the count. When the count reaches zero, the object can be freed.

This is simple and fast, but it creates a threading problem. Updating a reference count is not a magical single action. At the machine level it is closer to:

Read the current count.
Add or subtract one.
Write the new count back.

If two threads update the same object's count at the same time, they can step on each other.

A Tiny Race

# Imagine an object whose reference count is 7.

# Thread A wants to add a reference:
#   read 7
#   compute 8
#   write 8

# Thread B also wants to add a reference:
#   read 7
#   compute 8
#   write 8

# Correct final count: 9
# Possible final count without synchronization: 8

That wrong count is serious. If CPython thinks an object has fewer references than it really has, it may free memory while something still uses it. If it thinks an object has more references than it really has, it may leak memory.

The GIL prevents this class of corruption by making sure only one thread is manipulating Python objects through the interpreter at a time.

Why One Big Lock?

CPython could have used many smaller locks: one for each object, dictionary, list, reference count, module, allocator, and internal structure. That is possible, but it has costs:

every object operation would need more locking logic
single-threaded Python could get slower
C extensions would need stricter rules
deadlocks and subtle races would become easier to introduce
the interpreter would be much harder to maintain

The GIL is the simpler trade-off: one process-wide interpreter lock protects a large amount of CPython's internal state.

This is why the GIL is often described as a historical compromise. It makes CPython simpler and keeps single-threaded code fast, but it limits CPU parallelism for Python threads.

How a Python Thread Runs

A Python thread does not continuously own the interpreter forever. It runs in slices.

The simplified loop looks like this:

while True:
    acquire_gil()

    for _ in range(bytecode_slice):
        execute_one_instruction()

    if other_threads_waiting():
        release_gil()
        thread_yield()

In real CPython, the details are more sophisticated than this pseudocode, but the mental model is the same: a thread gets a turn, runs Python bytecode, and eventually gives other threads a chance.

Step by Step

A thread is ready to run Python code.
It asks for the GIL.
If no other thread holds it, the thread runs Python bytecode.
Other Python threads wait for their turn.
The running thread may release the GIL when it blocks, performs I/O, sleeps, or reaches a scheduling check.
Another thread can acquire the GIL and continue.

This is concurrency, not necessarily parallelism. Multiple tasks are in progress, but only one thread is executing Python bytecode at any instant in that process.

When the GIL is Released

The GIL is not held for every kind of work. CPython releases it around operations that may block or that can run safely outside the interpreter.

Common release points:

waiting for files, sockets, or database responses
time.sleep()
many blocking system calls
some C extension operations
many NumPy operations that run native loops
periodic interpreter checks so other Python threads get a turn

This distinction explains almost every practical GIL rule.

CPU-Bound vs I/O-Bound Work

The GIL matters differently depending on what your threads spend time doing.

CPU-Bound: Threads Take Turns

CPU-bound Python code spends most of its time executing Python bytecode. Counting, parsing, transforming objects, walking lists, and running pure-Python loops are typical examples.

If four threads all run CPU-bound Python code, they compete for the same GIL. They do not run Python bytecode on four CPU cores at the same time. They take turns.

import threading
import time

def count_down(n):
    total = 0
    while n:
        total += n
        n -= 1
    return total

start = time.time()
count_down(50_000_000)
print(f"one thread: {time.time() - start:.2f}s")

start = time.time()
threads = []
for _ in range(4):
    t = threading.Thread(target=count_down, args=(50_000_000,))
    t.start()
    threads.append(t)

for t in threads:
    t.join()

print(f"four threads: {time.time() - start:.2f}s")

The four-thread version may be no faster, and sometimes slower, because threads add scheduling overhead while still sharing one interpreter lock.

I/O-Bound: Waiting Can Overlap

I/O-bound code spends much of its time waiting for something outside Python: a network response, disk read, database query, subprocess, or timer.

When a thread waits for I/O, CPython can release the GIL. Another thread can run Python code while the first thread is blocked in the operating system.

import threading
import time

def wait_for_io():
    time.sleep(1)

start = time.time()
wait_for_io()
wait_for_io()
print(f"sequential: {time.time() - start:.2f}s")

start = time.time()
threads = [threading.Thread(target=wait_for_io) for _ in range(2)]

for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"threaded: {time.time() - start:.2f}s")

The threaded version finishes in about one second, not two, because both threads spend most of their time sleeping outside Python bytecode execution.

The Desk Analogy

Think of CPU-bound work as writing calculations by hand at a single desk. Even if four people are in the room, only the person at the desk can write. Passing the pen around does not make the calculation four times faster.

Think of I/O-bound work as placing phone calls. One person can dial and wait on hold, then hand the desk to someone else. Many calls can be in progress because most of the time is spent waiting outside the desk.

That is the GIL in practice:

CPU-heavy Python threads fight over the desk.
I/O-heavy threads can make progress because waiting does not need the desk.

What the GIL Does Not Mean

The GIL is often explained badly, so it helps to separate the myths from the actual rule.

Myth: Python cannot use multiple cores

Python can use multiple cores. Use multiple processes, native extensions, NumPy, PyTorch, Cython, Rust, C, or a Python implementation without the same GIL design. The GIL specifically limits multiple threads in one CPython process from executing Python bytecode in parallel.

Myth: Threading is useless in Python

Threading is useful for I/O-heavy programs: web crawlers, network clients, background file work, blocking SDKs, and programs that wait on external services.

Myth: The GIL makes every Python program slow

Most Python programs are single-threaded or I/O-bound. The GIL is mainly painful when you expect CPU-bound Python threads to scale across cores.

Myth: The GIL prevents race conditions in my code

No. The GIL protects CPython internals. Your application state can still have race conditions.

import threading

balance = 0

def deposit():
    global balance
    for _ in range(100_000):
        balance += 1

t1 = threading.Thread(target=deposit)
t2 = threading.Thread(target=deposit)

t1.start(); t2.start()
t1.join(); t2.join()

print(balance)

Do not rely on the GIL as your application lock. Use threading.Lock, queue.Queue, database transactions, or higher-level concurrency primitives when shared state matters.

Choosing the Right Tool

The practical question is not "How do I remove the GIL?" It is "What kind of work am I doing?"

If the Work is CPU-Bound Python

Use multiprocessing or ProcessPoolExecutor. Each process has its own Python interpreter and its own GIL, so the operating system can run those processes on different CPU cores.

from concurrent.futures import ProcessPoolExecutor

def score_document(doc):
    return expensive_python_scoring(doc)

with ProcessPoolExecutor() as pool:
    scores = list(pool.map(score_document, documents))

This is the usual answer for CPU-heavy pure Python.

If the Work is I/O-Bound and Blocking

Use threads when you are calling blocking libraries that do not provide async APIs.

from concurrent.futures import ThreadPoolExecutor

def fetch_user(user_id):
    return blocking_client.get_user(user_id)

with ThreadPoolExecutor(max_workers=32) as pool:
    users = list(pool.map(fetch_user, user_ids))

Threads work well here because most time is spent waiting for I/O, not holding the GIL.

If the Work is High-Concurrency I/O

Use asyncio when the libraries you need support async operations and you want thousands of concurrent tasks without thousands of OS threads.

import asyncio

async def fetch_user(session, user_id):
    async with session.get(f"/users/{user_id}") as response:
        return await response.json()

async def main(session, user_ids):
    tasks = [fetch_user(session, user_id) for user_id in user_ids]
    return await asyncio.gather(*tasks)

Asyncio does not remove the GIL. It avoids needing many threads for I/O concurrency.

If the Work Runs in Native Code

Libraries such as NumPy, OpenCV, PyTorch, TensorFlow, and many compression or cryptography libraries often do heavy work in C, C++, CUDA, or other native code. Some of those operations release the GIL while they run.

import numpy as np

a = np.random.random((2000, 2000))
b = np.random.random((2000, 2000))

# The heavy matrix multiply runs in optimized native code.
c = a @ b

In this case, Python starts the operation, but the hot loop is not ordinary Python bytecode.

How to Diagnose a GIL Problem

Do not assume the GIL is the bottleneck just because a program uses threads. Ask three questions:

Are multiple threads spending most of their time in Python code?
Is CPU usage stuck around one core while other cores are idle?
Does replacing threads with processes improve throughput?

If the answer is yes, the GIL may be the limit. If the program waits on network, disk, database, locks, or external services, the bottleneck may be elsewhere.

For CPU profiling, start simple:

import cProfile

def workload():
    # Put the threaded workload here.
    pass

cProfile.run("workload()")

Then compare a thread-based version with a process-based version for the CPU-heavy portion.

Practical Decision Table

Workload	Good choice	Why
Pure Python number crunching	`multiprocessing`, `ProcessPoolExecutor`	Separate processes can run on separate cores
Blocking HTTP/database/file calls	`threading`, `ThreadPoolExecutor`	Threads can overlap time spent waiting
Many network tasks with async libraries	`asyncio`	Efficient concurrency without many OS threads
NumPy/PyTorch/OpenCV operations	native library APIs	Heavy work runs outside Python bytecode
Shared mutable state	`queue.Queue`, locks, actors, processes	The GIL is not an application-level data model
Web server request handling	threads, async workers, multiple processes	Choice depends on framework and workload

Future of the GIL

The GIL is changing, but slowly and carefully.

PEP 703: Optional No-GIL CPython

PEP 703 introduced the path toward a free-threaded CPython build. Python 3.13 introduced free-threaded builds, and Python 3.14 documentation describes free-threading support as an opt-in build where the GIL can be disabled. The default Python build still uses the GIL, and some third-party extension packages may re-enable it if they are not ready for free-threaded execution.

The transition is difficult because the GIL is tied to:

CPython object memory management
C extension compatibility
single-thread performance
packaging and deployment expectations
decades of ecosystem assumptions

For most production Python users, the practical guidance remains the same: choose the right concurrency model for the workload.

Subinterpreters

Subinterpreters are another direction. The idea is to run multiple interpreters inside one process, each with more isolated state. In modern Python work, this is connected to efforts such as per-interpreter GIL work and better isolation between interpreter instances.

This is not a drop-in replacement for threads in every program, but it shows the direction: more ways to get parallelism while preserving CPython's safety and compatibility.

Mental Checklist

When you see Python threads, ask:

Are they doing CPU-bound Python work?
Are they mostly waiting for I/O?
Is the hot loop actually inside a native extension?
Do the threads share mutable Python objects?
Would processes or async tasks express the workload more clearly?

Details matter more than memorizing "Python has a GIL."

Key Takeaways

Essential GIL Concepts

• Scope: The GIL affects threads inside one CPython process.

• Protection: It protects CPython object and interpreter internals.

• CPU work: Pure-Python CPU threads take turns instead of running bytecode in parallel.

• I/O work: Threads can overlap waiting because blocking operations often release the GIL.

• Processes: Use processes for CPU-bound Python parallelism.

• Async: Use asyncio for many concurrent I/O tasks when async libraries are available.

• Native code: C extensions can release the GIL and use optimized parallel code.

• Future: Free-threaded CPython is available as an opt-in build, but compatibility and deployment still matter.

The GIL is not a random flaw bolted onto Python. It is a design trade-off in CPython: simpler interpreter internals and strong compatibility in exchange for limited CPU parallelism with threads. Once you know whether your work is CPU-bound, I/O-bound, or native-extension-heavy, the right concurrency tool becomes much easier to choose.