Linux Namespaces: The Foundation of Container Isolation

Master Linux namespaces for container isolation. Learn PID, network, mount, and user namespaces with interactive demos.

Best viewed on desktop for optimal interactive experience

The Illusion of Isolation

When you run a Docker container, it feels like a lightweight virtual machine - it has its own hostname, its own process tree starting from PID 1, its own network interfaces, and its own filesystem. But there's no hypervisor, no separate kernel. How does Linux create this illusion?

The answer is namespaces - a kernel feature that partitions system resources so that different processes see different views of the system.

Analogy: The Truman Show

Think of namespaces like the town in The Truman Show. Truman believes he lives in a normal world, but everything he sees - the sky, the buildings, the people - is actually a constructed set. The "real world" exists outside, but Truman can't see or interact with it.

Similarly, a process in a namespace sees a constructed view of system resources. It believes it's PID 1 with full control, but the "real" system exists in the parent namespace.

The Seven Namespace Types

Linux has seven different namespace types, each isolating a different aspect of the system. Click each type to explore what it isolates and compare host vs namespace views:

The Seven Linux Namespaces

Click on each namespace type to explore what it isolates. Toggle between host and namespace views to see the difference.

Process ID Namespace

CLONE_NEWPIDSince kernel 2.6.24

Isolates process ID number space. Processes in different PID namespaces can have the same PID.

What gets isolated:
  • Process ID numbering (PIDs start from 1 inside)
  • Process visibility (/proc filesystem view)
  • Signal delivery between namespaces
  • Parent-child process relationships
$ docker run alpine ps aux # Shows only container processes
View Comparison
All system processes visible:
1systemd
1000containerd
1001container-init
1002app

Key insight: Namespaces don't provide resource limits (that's cgroups) or security (that's capabilities/seccomp). They only provide isolation - making resources invisible between namespace boundaries.

How Namespaces Work

The Kernel's Perspective

Every process in Linux has a task_struct containing pointers to its namespace memberships. When a process makes a syscall that depends on isolated resources (like listing PIDs or network interfaces), the kernel consults these pointers to determine what the process should see.

// Simplified view of task_struct namespace membership struct task_struct { // ... other fields ... struct nsproxy *nsproxy; // Points to namespace set }; struct nsproxy { struct uts_namespace *uts_ns; struct ipc_namespace *ipc_ns; struct mnt_namespace *mnt_ns; struct pid_namespace *pid_ns; struct net *net_ns; struct cgroup_namespace *cgroup_ns; struct user_namespace *user_ns; };

Creating Namespaces

There are three ways for processes to enter namespaces:

SyscallUsageDescription
clone()Process creationCreate child in new namespace(s)
unshare()Current processMove calling process to new namespace(s)
setns()Existing namespaceJoin an existing namespace by fd

📋 Namespace Commands (click to expand)

# List all namespaces on the system lsns # Create new namespace and run command unshare --pid --fork --mount-proc bash # Enter existing namespace (requires namespace fd or PID) nsenter --target 1234 --pid --net bash # See what namespaces a process belongs to ls -la /proc/$$/ns/ # Create persistent namespace ip netns add mynetns ip netns exec mynetns ip addr

PID Namespace Deep Dive

The PID namespace is perhaps the most iconic - it's what makes container processes appear to start from PID 1. Watch how the same processes have different PIDs depending on which namespace is observing:

PID Namespace in Action

Create a PID namespace and fork processes. Watch how the same process has different PIDs depending on the observer's namespace.

Host View

What the host kernel sees

1
systemd
100
containerd
Namespace View

What processes inside see (PIDs start at 1)

Create a namespace to see processes here

Why PID 1 matters: In a PID namespace, PID 1 is special - it cannot be killed by signals (except SIGKILL from parent namespace), and orphaned processes are re-parented to it. This is why containers need a proper init process!

Key PID Namespace Properties

  1. Hierarchical structure: PID namespaces form a tree. Parent namespaces can see all child namespace PIDs, but not vice versa.

  2. PID translation: Each process has a PID in every ancestor namespace up to the root. The kernel maintains this mapping.

  3. Init process (PID 1): The first process in a PID namespace becomes init. It has special signal handling - only signals it has handlers for can kill it (except from parent namespace).

  4. Orphan adoption: When a parent dies, children are re-parented to the namespace's init process (PID 1), not the host's init.

# Demonstrate PID namespace $ unshare --pid --fork --mount-proc bash # Now inside new PID namespace $ echo $$ 1 # We are PID 1! $ ps aux USER PID COMMAND root 1 bash # Only our processes visible

Network Namespace Simulation

Network namespaces provide isolated network stacks. This is how containers get their own IP addresses and port bindings. Build a container-like network topology and watch packets flow:

Network Namespace Simulator

Build a container-like network topology: create namespaces, add a bridge, connect with veth pairs, and watch packets flow.

Host Network Stack

Interfaces:

eth0192.168.1.100physical
lo127.0.0.1loopback
Network Namespaces
Create a namespace to get started

How Docker does it: Each container gets its own network namespace. A bridge (docker0) connects containers via veth pairs. Port mapping uses iptables NAT rules to forward traffic from the host to the container's namespace.

Container Networking Patterns

Docker and other container runtimes use network namespaces in several configurations:

ModeDescriptionUse Case
BridgeContainer connects to bridge via vethDefault, provides NAT
HostContainer shares host's network namespaceMaximum performance
NoneOnly loopback interfaceSecurity isolation
ContainerShare another container's network namespacePod-like sharing (Kubernetes)

Mount Namespace: Filesystem Views

The mount namespace was the first namespace type (2002), originally just called "namespace". It isolates the mount table, allowing different processes to see different filesystem hierarchies.

Key Use Cases

  1. Container root filesystem: Each container has its own / using pivot_root() or chroot()
  2. Bind mounts: Mount host directories into container without affecting host view
  3. tmpfs for /tmp: Give each container isolated temporary storage
  4. Hiding sensitive paths: Don't mount /etc/shadow into containers

Mount Propagation

Mounts can be configured to propagate (or not) between namespaces:

TypeBehavior
sharedMounts propagate bidirectionally
slaveMounts propagate from master to slave only
privateNo propagation
unbindableCannot be bind-mounted

User Namespace: Unprivileged Containers

The user namespace is the newest (kernel 3.8) and most powerful. It maps UIDs/GIDs between namespaces, enabling rootless containers.

The Magic of User Namespaces

Inside the namespace: Process runs as root (UID 0) with full capabilities

On the host: Process runs as unprivileged user (e.g., UID 100000)

Even if the container process escapes, it has no privileges on the host!

# Run as root inside namespace, nobody outside $ id uid=1000(alice) gid=1000(alice) $ unshare --user --map-root-user bash $ id uid=0(root) gid=0(root) # But on the host, still alice!

Namespace Lifecycle

Creation

Namespaces are created implicitly when the first process enters them (via clone() or unshare()). They're reference-counted kernel objects.

Persistence

A namespace persists as long as:

  • At least one process is a member
  • A mount holds a reference (/proc/[pid]/ns/[type])
  • An open file descriptor refers to it

Destruction

When the last reference is released, the namespace and its resources are cleaned up. For network namespaces, this means all virtual interfaces are destroyed.

# Create persistent network namespace ip netns add myns # Creates /var/run/netns/myns bind mount # Namespace persists even with no processes ip netns exec myns ip link # lo only # Delete when done ip netns del myns

Security Considerations

Important Security Notes

Namespaces provide isolation, not security. They hide resources but don't prevent access if the boundary is breached.

Complete container security requires multiple layers:

  • Namespaces (isolation)
  • cgroups (resource limits)
  • Seccomp (syscall filtering)
  • Capabilities (privilege restriction)
  • SELinux/AppArmor (mandatory access control)

Common Escape Vectors

  1. Shared kernel: All containers share the host kernel - kernel exploits affect everyone
  2. Privileged containers: --privileged disables most isolation
  3. Sensitive mounts: Mounting /proc, /sys, or device files can provide escape paths
  4. CAP_SYS_ADMIN: This capability enables many namespace-breaking operations

Practical Examples

Manual Container-like Isolation

# Create all namespaces except user (requires root) unshare --pid --net --mount --uts --ipc --fork bash # Set hostname (UTS namespace) hostname my-container # Mount new proc (after PID namespace) mount -t proc proc /proc # Now we have basic container-like isolation!

Inspecting Container Namespaces

# Find container's init process docker inspect --format '{{.State.Pid}}' mycontainer # Returns: 12345 # List its namespaces ls -la /proc/12345/ns/ # lrwxrwxrwx 1 root root 0 ... cgroup -> 'cgroup:[4026532583]' # lrwxrwxrwx 1 root root 0 ... ipc -> 'ipc:[4026532517]' # ... # Enter the container's namespaces manually nsenter --target 12345 --all bash

Essential Takeaways

1.Seven namespace types isolate different resources: PID, network, mount, user, UTS, IPC, cgroup
2.Containers are just processes in separate namespaces - no hypervisor needed
3.clone(), unshare(), setns() are the syscalls for namespace management
4.PID namespace makes container init PID 1 with special signal handling
5.Network namespaces plus veth pairs enable container networking
6.User namespaces enable rootless containers (root inside, nobody outside)
7.Namespaces isolate but don't limit resources (that's cgroups) or filter syscalls (that's seccomp)
8.Use lsns, nsenter, unshare commands to explore and manipulate namespaces

If you found this explanation helpful, consider sharing it with others.

Mastodon