Linux Namespaces: The Foundation of Container Isolation

The Illusion of Isolation

When you run a Docker container, it feels like a lightweight virtual machine - it has its own hostname, its own process tree starting from PID 1, its own network interfaces, and its own filesystem. But there's no hypervisor, no separate kernel. How does Linux create this illusion?

The answer is namespaces - a kernel feature that partitions system resources so that different processes see different views of the system.

Analogy: The Truman Show

Think of namespaces like the town in The Truman Show. Truman believes he lives in a normal world, but everything he sees - the sky, the buildings, the people - is actually a constructed set. The "real world" exists outside, but Truman can't see or interact with it.

Similarly, a process in a namespace sees a constructed view of system resources. It believes it's PID 1 with full control, but the "real" system exists in the parent namespace.

The Seven Namespace Types

Linux has seven different namespace types, each isolating a different aspect of the system. Click each type to explore what it isolates and compare host vs namespace views:

The Seven Linux Namespaces

Click on each namespace type to explore what it isolates. Toggle between host and namespace views to see the difference.

Process ID Namespace

CLONE_NEWPIDSince kernel 2.6.24

Isolates process ID number space. Processes in different PID namespaces can have the same PID.

What gets isolated:

Process ID numbering (PIDs start from 1 inside)
Process visibility (/proc filesystem view)
Signal delivery between namespaces
Parent-child process relationships

$ docker run alpine ps aux # Shows only container processes

View Comparison

All system processes visible:

1systemd

1000containerd

1001container-init

1002app

Key insight: Namespaces don't provide resource limits (that's cgroups) or security (that's capabilities/seccomp). They only provide isolation - making resources invisible between namespace boundaries.

How Namespaces Work

The Kernel's Perspective

Every process in Linux has a task_struct containing pointers to its namespace memberships. When a process makes a syscall that depends on isolated resources (like listing PIDs or network interfaces), the kernel consults these pointers to determine what the process should see.

// Simplified view of task_struct namespace membership
struct task_struct {
    // ... other fields ...
    struct nsproxy *nsproxy;  // Points to namespace set
};

struct nsproxy {
    struct uts_namespace *uts_ns;
    struct ipc_namespace *ipc_ns;
    struct mnt_namespace *mnt_ns;
    struct pid_namespace *pid_ns;
    struct net *net_ns;
    struct cgroup_namespace *cgroup_ns;
    struct user_namespace *user_ns;
};

Creating Namespaces

There are three ways for processes to enter namespaces:

Syscall	Usage	Description
`clone()`	Process creation	Create child in new namespace(s)
`unshare()`	Current process	Move calling process to new namespace(s)
`setns()`	Existing namespace	Join an existing namespace by fd

📋 Namespace Commands (click to expand)

# List all namespaces on the system
lsns

# Create new namespace and run command
unshare --pid --fork --mount-proc bash

# Enter existing namespace (requires namespace fd or PID)
nsenter --target 1234 --pid --net bash

# See what namespaces a process belongs to
ls -la /proc/$$/ns/

# Create persistent namespace
ip netns add mynetns
ip netns exec mynetns ip addr

PID Namespace Deep Dive

The PID namespace is perhaps the most iconic - it's what makes container processes appear to start from PID 1. Watch how the same processes have different PIDs depending on which namespace is observing:

PID Namespace in Action

Create a PID namespace and fork processes. Watch how the same process has different PIDs depending on the observer's namespace.

Host View

What the host kernel sees

systemd

100

containerd

Namespace View

What processes inside see (PIDs start at 1)

Create a namespace to see processes here

Why PID 1 matters: In a PID namespace, PID 1 is special - it cannot be killed by signals (except SIGKILL from parent namespace), and orphaned processes are re-parented to it. This is why containers need a proper init process!

Key PID Namespace Properties

Hierarchical structure: PID namespaces form a tree. Parent namespaces can see all child namespace PIDs, but not vice versa.
PID translation: Each process has a PID in every ancestor namespace up to the root. The kernel maintains this mapping.
Init process (PID 1): The first process in a PID namespace becomes init. It has special signal handling - only signals it has handlers for can kill it (except from parent namespace).
Orphan adoption: When a parent dies, children are re-parented to the namespace's init process (PID 1), not the host's init.

# Demonstrate PID namespace
$ unshare --pid --fork --mount-proc bash
# Now inside new PID namespace

$ echo $$
1  # We are PID 1!

$ ps aux
USER  PID  COMMAND
root    1  bash
# Only our processes visible

Network Namespace Simulation

Network namespaces provide isolated network stacks. This is how containers get their own IP addresses and port bindings. Build a container-like network topology and watch packets flow:

Network Namespace Simulator

Build a container-like network topology: create namespaces, add a bridge, connect with veth pairs, and watch packets flow.

Host Network Stack

Interfaces:

eth0192.168.1.100physical

lo127.0.0.1loopback

Network Namespaces

Create a namespace to get started

How Docker does it: Each container gets its own network namespace. A bridge (docker0) connects containers via veth pairs. Port mapping uses iptables NAT rules to forward traffic from the host to the container's namespace.

Container Networking Patterns

Docker and other container runtimes use network namespaces in several configurations:

Mode	Description	Use Case
Bridge	Container connects to bridge via veth	Default, provides NAT
Host	Container shares host's network namespace	Maximum performance
None	Only loopback interface	Security isolation
Container	Share another container's network namespace	Pod-like sharing (Kubernetes)

Mount Namespace: Filesystem Views

The mount namespace was the first namespace type (2002), originally just called "namespace". It isolates the mount table, allowing different processes to see different filesystem hierarchies.

Key Use Cases

Container root filesystem: Each container has its own / using pivot_root() or chroot()
Bind mounts: Mount host directories into container without affecting host view
tmpfs for /tmp: Give each container isolated temporary storage
Hiding sensitive paths: Don't mount /etc/shadow into containers

Mount Propagation

Mounts can be configured to propagate (or not) between namespaces:

Type	Behavior
shared	Mounts propagate bidirectionally
slave	Mounts propagate from master to slave only
private	No propagation
unbindable	Cannot be bind-mounted

User Namespace: Unprivileged Containers

The user namespace is the newest (kernel 3.8) and most powerful. It maps UIDs/GIDs between namespaces, enabling rootless containers.

The Magic of User Namespaces

Inside the namespace: Process runs as root (UID 0) with full capabilities

On the host: Process runs as unprivileged user (e.g., UID 100000)

Even if the container process escapes, it has no privileges on the host!

# Run as root inside namespace, nobody outside
$ id
uid=1000(alice) gid=1000(alice)

$ unshare --user --map-root-user bash
$ id
uid=0(root) gid=0(root)

# But on the host, still alice!

Namespace Lifecycle

Creation

Namespaces are created implicitly when the first process enters them (via clone() or unshare()). They're reference-counted kernel objects.

Persistence

A namespace persists as long as:

At least one process is a member
A mount holds a reference (/proc/[pid]/ns/[type])
An open file descriptor refers to it

Destruction

When the last reference is released, the namespace and its resources are cleaned up. For network namespaces, this means all virtual interfaces are destroyed.

# Create persistent network namespace
ip netns add myns
# Creates /var/run/netns/myns bind mount

# Namespace persists even with no processes
ip netns exec myns ip link
# lo only

# Delete when done
ip netns del myns

Security Considerations

Important Security Notes

Namespaces provide isolation, not security. They hide resources but don't prevent access if the boundary is breached.

Complete container security requires multiple layers:

Namespaces (isolation)
cgroups (resource limits)
Seccomp (syscall filtering)
Capabilities (privilege restriction)
SELinux/AppArmor (mandatory access control)

Common Escape Vectors

Shared kernel: All containers share the host kernel - kernel exploits affect everyone
Privileged containers: --privileged disables most isolation
Sensitive mounts: Mounting /proc, /sys, or device files can provide escape paths
CAP_SYS_ADMIN: This capability enables many namespace-breaking operations

Practical Examples

Manual Container-like Isolation

# Create all namespaces except user (requires root)
unshare --pid --net --mount --uts --ipc --fork bash

# Set hostname (UTS namespace)
hostname my-container

# Mount new proc (after PID namespace)
mount -t proc proc /proc

# Now we have basic container-like isolation!

Inspecting Container Namespaces

# Find container's init process
docker inspect --format '{{.State.Pid}}' mycontainer
# Returns: 12345

# List its namespaces
ls -la /proc/12345/ns/
# lrwxrwxrwx 1 root root 0 ... cgroup -> 'cgroup:[4026532583]'
# lrwxrwxrwx 1 root root 0 ... ipc -> 'ipc:[4026532517]'
# ...

# Enter the container's namespaces manually
nsenter --target 12345 --all bash

Essential Takeaways

1.Seven namespace types isolate different resources: PID, network, mount, user, UTS, IPC, cgroup

2.Containers are just processes in separate namespaces - no hypervisor needed

3.clone(), unshare(), setns() are the syscalls for namespace management

4.PID namespace makes container init PID 1 with special signal handling

5.Network namespaces plus veth pairs enable container networking

6.User namespaces enable rootless containers (root inside, nobody outside)

7.Namespaces isolate but don't limit resources (that's cgroups) or filter syscalls (that's seccomp)

8.Use lsns, nsenter, unshare commands to explore and manipulate namespaces

cgroups: Resource limits for processes (CPU, memory, I/O)
Containers Under the Hood: How namespaces + cgroups combine to create containers
Process Management: Understanding fork, exec, and process trees
Kernel Architecture: How the kernel manages these abstractions