Linux Networking Stack: From Packets to Applications

Master the Linux networking stack through interactive visualizations. Understand TCP/IP layers, sockets, iptables, routing, and network namespaces.

Best viewed on desktop for optimal interactive experience

The Internet Post Office

Every time you browse a website, your computer performs an intricate dance involving multiple layers of wrapping, addressing, and routing. Think of it like a multinational postal system:

πŸ“¬ The Network as a Postal System

  • Your messageβ†’ The letter content (HTTP request, file data)
  • TCP envelopeβ†’ Tracking number + delivery confirmation (ensures nothing gets lost)
  • IP envelopeβ†’ Street address (tells routers where to send it)
  • Ethernet envelopeβ†’ Local mailroom routing (MAC addresses for the local network)
  • Security checkpointβ†’ iptables/netfilter (inspects and filters every package)

Let's open each envelope and see how your data actually travels from application to wire.


Packet Encapsulation: The Russian Nesting Doll

When you send data, it gets wrapped in headers at each layerβ€”like putting a letter in progressively larger envelopes. Watch the encapsulation process:

Packet Encapsulation Journey

Watch how your data gets wrapped in headers as it travels through the network stack. Each layer adds its own "envelope" with routing and control information.

Direction:
Data flows down through layers
Application
HTTP
12B

Your application writes "Hello World" to a socket

Transport
TCP
Network
IP
Data Link
Ethernet
Packet Structure
Data 12B
12B
Payload
+
0B
Headers
=
12B
On Wire
Application data ready to send
Why So Much Overhead?

Each header contains essential routing and control information. For a 12-byte payload, you need 54 bytes of headersβ€”that's 82% overhead! This is why small packets are inefficient: the same overhead applies whether you send 12 bytes or 1400 bytes.

Layer Responsibilities

Each layer has a specific job:

LayerProtocolWhat It AddsWhy
ApplicationHTTPRequest/responseYour actual data
TransportTCPPorts + sequencingWhich app, reliable delivery
NetworkIPIP addressesWhere to route globally
Data LinkEthernetMAC addressesWhere on local network
πŸ’‘
Why So Much Overhead?

For a 12-byte "Hello World", you add 54 bytes of headersβ€”that's 82% overhead! This is why small packets are inefficient, and why protocols like HTTP/2 batch multiple requests together. The overhead is the same whether you send 12 bytes or 1,400 bytes.


The TCP Handshake: Establishing Trust

Before any data flows, TCP performs a "three-way handshake" to establish a reliable connection. This is one of the most asked-about networking concepts:

TCP Three-Way Handshake

Before any data can flow, TCP establishes a reliable connection through this handshake. Watch the sequence numbersβ€”they're how TCP tracks and orders every byte.

Phase:
Client
State
CLOSED
Server
State
LISTEN
Ready
Server is listening, client ready to connect
Step 1/5
Why Three Packets?

The three-way handshake ensures both sides can send AND receive:

  • SYN: Client proves it can send
  • SYN-ACK: Server proves it can receive AND send
  • ACK: Client proves it can receive

Why Three Packets?

The handshake proves that both sides can send AND receive:

Client Server | | | -------- SYN (seq=1000) ------> | Client proves: "I can send" | | | <-- SYN-ACK (seq=2000,ack=1001) | Server proves: "I can receive AND send" | | | -------- ACK (ack=2001) ------> | Client proves: "I can receive" | | | Connection ESTABLISHED |

The sequence numbers (1000, 2000, etc.) are how TCP tracks every byteβ€”they're essential for detecting lost or reordered packets.


Socket Programming: The Application Interface

Sockets are the API between applications and the network stack. Under the hood, every socket operation is a system call that transitions from user space into the kernel.

Socket Lifecycle Demo

Sockets are just file descriptorsβ€”integers that reference kernel data structures. Watch the system calls that create, configure, and use them.

Role:
Step 1/8
server
CLOSED
No socket exists yet
Code:
// Server starting up...
System Call Sequence:
File Descriptors Are Just Integers

Every socket is a file descriptorβ€”just an integer that indexes into the kernel's per-process file table. This is Unix's "everything is a file" philosophy in action:

  • fd 0, 1, 2 = stdin, stdout, stderr
  • fd 3+ = your sockets, files, pipes
  • accept() returns a new fd for each client

Socket Types

// TCP Socket (reliable, ordered delivery) int tcp_sock = socket(AF_INET, SOCK_STREAM, 0); // UDP Socket (fast, no guarantees) int udp_sock = socket(AF_INET, SOCK_DGRAM, 0); // Raw Socket (direct IP access, requires root) int raw_sock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP); // Unix Domain Socket (local IPC, no network overhead) int unix_sock = socket(AF_UNIX, SOCK_STREAM, 0);

TCP Server Pattern

#include <sys/socket.h> #include <netinet/in.h> int create_server(int port) { int server_fd = socket(AF_INET, SOCK_STREAM, 0); // Allow port reuse (avoid "Address already in use") int opt = 1; setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = INADDR_ANY, .sin_port = htons(port) }; bind(server_fd, (struct sockaddr *)&addr, sizeof(addr)); listen(server_fd, 128); // Backlog queue while (1) { int client = accept(server_fd, NULL, NULL); handle_client(client); close(client); } }

High-Performance I/O with epoll

For handling thousands of connections:

int epfd = epoll_create1(0); struct epoll_event ev = { .events = EPOLLIN | EPOLLET, // Edge-triggered .data.fd = socket_fd }; epoll_ctl(epfd, EPOLL_CTL_ADD, socket_fd, &ev); // Event loop struct epoll_event events[MAX_EVENTS]; while (1) { int n = epoll_wait(epfd, events, MAX_EVENTS, -1); for (int i = 0; i < n; i++) { handle_event(events[i].data.fd); } }

Routing and Forwarding

The kernel decides where to send each packet based on its routing table. Understanding how routes are selected is crucial for network debugging.

Routing Decision Demo

The kernel uses longest prefix matching to select routes. Watch how it evaluates each route and picks the most specific match.

Destination IP:
Destination IP in binary:
00001000000010000000100000001000
Routing Table (sorted by prefix length):
Ready
192.168.1.100/32eth0
direct β€’ Host Route
192.168.1.0/24eth0
direct β€’ Local Network
172.17.0.0/16docker0
direct β€’ Docker Network
10.0.0.0/8vpn0
via 10.0.0.1 β€’ VPN Network
0.0.0.0/0eth0
via 192.168.1.1 β€’ Default Gateway
Longest Prefix Wins

When multiple routes match, the kernel picks the most specific one (longest prefix). This is why:

  • /32 host routes override network routes
  • /24 beats /16 beats /8
  • 0.0.0.0/0 is the defaultβ€”matches everything but loses to any specific route

Viewing Routes

# Modern way ip route show # Output: # default via 192.168.1.1 dev eth0 # 192.168.1.0/24 dev eth0 scope link # 172.17.0.0/16 dev docker0 scope link

Adding Routes

# Static route to a network ip route add 10.0.0.0/8 via 192.168.1.1 dev eth0 # Default gateway ip route add default via 192.168.1.1 # Source-based routing (advanced) ip rule add from 192.168.2.0/24 table custom ip route add default via 10.0.0.1 table custom

Enable IP Forwarding (Router Mode)

# Temporarily echo 1 > /proc/sys/net/ipv4/ip_forward # Permanently echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf sysctl -p

Netfilter: The Kernel's Security Checkpoint

Every packet entering or leaving your system passes through netfilterβ€”a core part of the Linux kernel architecture. This is what iptables (and its successor nftables) uses for packet filtering.

Netfilter Packet Flow

Watch packets traverse the netfilter hooks. Different packet types take different pathsβ€” understanding this is key to writing correct iptables rules.

Network
Network
Local Process
Routing
PREROUTING
INPUT
FORWARD
OUTPUT
POSTROUTING
Path: PREROUTING β†’ INPUT
Test Packets:
iptables Rules (INPUT chain):
INPUT-m state --state ESTABLISHED,RELATED
Allow established connections
ACCEPT
INPUT-p tcp --dport 22
Allow SSH
ACCEPT
INPUT-p tcp --dport 80
Allow HTTP
ACCEPT
INPUT-i lo
Allow loopback
ACCEPT
Quick Add Rule:
Rule Order Matters!

iptables processes rules top-to-bottomβ€”first matching rule wins. This is why you should:

  • Put ESTABLISHED,RELATED rule first (most traffic matches this)
  • Add specific ACCEPT rules for allowed services
  • Set default policy to DROP (deny by default)

Understanding Netfilter Hooks

Packets traverse different hooks depending on their destination:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Local Process β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Network ──► PREROUTING ──► INPUT β”‚ β–² β”‚ β”‚ (local destination) β”‚ β”‚ └──────► FORWARD ─────────┴──► POSTROUTING ──► Network β”‚ β–² β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ (forwarded packets) OUTPUT ────────────► POSTROUTING ──► Network β–² β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β” β”‚ Local Process β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Common iptables Rules

# Allow established connections (most efficient rule first!) iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT # Allow SSH from specific network iptables -A INPUT -p tcp --dport 22 -s 192.168.1.0/24 -j ACCEPT # Allow HTTP/HTTPS iptables -A INPUT -p tcp -m multiport --dports 80,443 -j ACCEPT # Drop everything else (default deny) iptables -P INPUT DROP
⚠️
Rule Order Matters!

iptables processes rules top-to-bottomβ€”first match wins. If you put -j DROP before -j ACCEPT, packets will be dropped before reaching the accept rule. Always put your most-matched rules (like ESTABLISHED) first for performance.


Performance Tuning

Network performance depends heavily on memory managementβ€”the kernel allocates buffers (sk_buff structures) for every packet in flight. The most critical tuning parameter is the TCP buffer size.

TCP Buffer Tuning Simulator

TCP buffers must be sized correctly for your network conditions. The Bandwidth-Delay Product (BDP) tells you the optimal buffer size.

Bandwidth100 Mbps
Round-Trip Latency (RTT)50 ms
Bandwidth-Delay Product (BDP):
100 MbpsΓ—50 ms=610.4 KB
This is the amount of data "in flight" at any moment when the link is fully utilized.
TCP Buffer Size131,072
128.0 KB
Buffer Utilization0%
0 B / 128.0 KB
Buffer too small β€” throughput limited!
Achieved Throughput21.0 Mbps
0 Mbps21% of max (100 Mbps)100 Mbps
Apply these settings:
sysctl commands
# Set maximum buffer sizes
sysctl -w net.core.rmem_max=131072
sysctl -w net.core.wmem_max=131072

# Set TCP buffer auto-tuning range
sysctl -w net.ipv4.tcp_rmem="4096 87380 131072"
sysctl -w net.ipv4.tcp_wmem="4096 65536 131072"
Why Buffer Size Matters for High-Latency Links

TCP can only have one buffer's worth of data in flight before waiting for ACKs. On high-latency links (like satellite), this becomes the bottleneck:

  • 64KB buffer + 600ms RTT = max 853 Kbps (throttled!)
  • BDP-sized buffer = full bandwidth utilization
  • Too large = wasted memory, possible bufferbloat

TCP Buffer Tuning Commands

# Increase buffer sizes for high-bandwidth links sysctl -w net.core.rmem_max=134217728 sysctl -w net.core.wmem_max=134217728 sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728" sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728" # Use BBR congestion control (Google's algorithm) sysctl -w net.ipv4.tcp_congestion_control=bbr sysctl -w net.core.default_qdisc=fq

Network Interface Tuning

# Increase ring buffer ethtool -G eth0 rx 4096 tx 4096 # Enable offloading ethtool -K eth0 gso on gro on tso on # Receive Packet Steering (multi-core) echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus

Network Namespaces

Linux can create isolated network environmentsβ€”this is how containers get their own network stack. Network namespaces work alongside other namespace types (PID, mount, user namespaces) to provide full container isolation.

# Create namespace ip netns add mycontainer # Create virtual ethernet pair ip link add veth0 type veth peer name veth1 # Move one end into namespace ip link set veth1 netns mycontainer # Configure host side ip addr add 10.0.0.1/24 dev veth0 ip link set veth0 up # Configure container side ip netns exec mycontainer ip addr add 10.0.0.2/24 dev veth1 ip netns exec mycontainer ip link set veth1 up ip netns exec mycontainer ip link set lo up # Test connectivity ping 10.0.0.2

Debugging Tools

Essential Commands

# Active connections ss -tunap # Packet capture tcpdump -i eth0 'tcp port 80' # Route tracing traceroute google.com mtr google.com # Better interactive version # Performance testing iperf3 -s # Server iperf3 -c host # Client

Common Pitfalls

🚨 Common Networking Mistakes

1
Small buffers on high-latency links

Default 64KB buffers throttle satellite/VPN connections. Calculate BDP and size buffers accordingly.

2
iptables rule ordering mistakes

Putting DROP before ACCEPT, or putting expensive rules before ESTABLISHED. First match wins!

3
Forgetting to enable ip_forward

Linux won't route packets between interfaces unless net.ipv4.ip_forward=1.

4
Blocking I/O for high-concurrency servers

Use epoll/kqueue for thousands of connections. Blocking I/O means one thread per connection.

5
Ignoring connection state tracking overhead

conntrack uses memory per connection. High-traffic servers may need to tune nf_conntrack_max.


Key Takeaways

🎯 What You Should Remember

Encapsulation

Data gets wrapped in headers at each layer. Each layer only understands its own headerβ€”that's the beauty of layering.

TCP Handshake

Three packets establish that both sides can send AND receive. Sequence numbers track every byte.

Sockets Are File Descriptors

Just integers indexing into kernel tables. accept() returns a NEW fd per client while the server keeps listening.

Longest Prefix Wins

Routing uses longest prefix matching. /32 beats /24 beats /16. Default route (0.0.0.0/0) is the fallback.

Netfilter Paths

Local traffic: PREROUTING β†’ INPUT. Forwarded traffic: PREROUTING β†’ FORWARD β†’ POSTROUTING. Outgoing: OUTPUT β†’ POSTROUTING.

Buffer = BDP

TCP buffers should match the Bandwidth-Delay Product for full throughput on high-latency links.

Understanding the networking stack empowers you to debug connectivity issues, build high-performance servers, and secure your systems. Every web request, every SSH session, every API call traverses this intricate system.


If you found this explanation helpful, consider sharing it with others.

Mastodon