Sitemap

A visual representation of the site structure to help you navigate through the content.

Site Structure

Main landing page with introduction and recent articles

About/about

Learn more about me, my background, and expertise

Speaking/speaking

My talks, presentations, and speaking engagements

Articles/articles

Collection of articles I've written on various topics

Numerical sensitivity/articles/numerical-sensitivity

Article content

Sam multi mask ambiguity/articles/sam-multi-mask-ambiguity

Article content

Visualizing yolov11/articles/visualizing-yolov11

Article content

H264 implementation applications/articles/h264-implementation-applications

Article content

H264 transform quantization/articles/h264-transform-quantization

Article content

H264 fundamentals/articles/h264-fundamentals

Article content

Zettel/articles/zettel

Article content

Compiling pytorch kernel/articles/compiling-pytorch-kernel

Article content

View size not compatible/articles/view-size-not-compatible

Article content

Gpu boot errors/articles/gpu-boot-errors

Article content

H264 interactive guide/articles/h264-interactive-guide

Article content

Ggml structure/articles/ggml-structure

Article content

Quantization deep dive/articles/quantization-deep-dive

Article content

How tensorrt works/articles/how-tensorrt-works

Article content

Kernel fusion/articles/kernel-fusion

Article content

Visualizing yolov5/articles/visualizing-yolov5

Article content

Cpython internals/articles/cpython-internals

Article content

Cpp compilation process/articles/cpp-compilation-process

Article content

Cpp linking in depth/articles/cpp-linking-in-depth

Article content

Cpp loading runtime/articles/cpp-loading-runtime

Article content

Registry pattern/articles/registry-pattern

Article content

Magic numbers/articles/magic-numbers

Article content

Image encoding/articles/image-encoding

Article content

Text encoding/articles/text-encoding

Article content

Papers/papers

Research papers and publications

Visual instruction tuning/papers/visual-instruction-tuning

Paper content

Vit object detection/papers/vit-object-detection

Paper content

Yolo/papers/yolo

Paper content

Efficientnet/papers/efficientnet

Paper content

Faster rcnn/papers/faster-rcnn

Paper content

Sam/papers/sam

Paper content

DETR/papers/DETR

Paper content

Blip2/papers/blip2

Paper content

Image worth 16x16/papers/image-worth-16x16

Paper content

Optimizing transformer inference/papers/optimizing-transformer-inference

Paper content

Surf/papers/surf

Paper content

Swin transformer/papers/swin-transformer

Paper content

Clip/papers/clip

Paper content

Deeplearning go brr/papers/deeplearning-go-brr

Paper content

Attention is all you need/papers/attention-is-all-you-need

Paper content

Data movement transformer/papers/data-movement-transformer

Paper content

Deep residual learning/papers/deep-residual-learning

Paper content

Concepts/concepts

Interactive explanations of machine learning concepts

initramfs: The Initial RAM Filesystem Explained/concepts/linux/initramfs-boot-process

Learn how initramfs enables Linux boot by loading essential drivers before the root filesystem mounts. Explore early userspace initialization.

Linux kernel architecture explained. Learn syscalls, protection rings, user vs kernel space, and what happens when you run a command.

Explore the inner workings of RAM through beautiful animations and interactive visualizations. Understand memory cells, addressing, and the memory hierarchy.

Python Bytecode Compilation/concepts/python/bytecode-compilation

Explore CPython bytecode compilation from source to .pyc files. Learn the dis module, PVM stack operations, and Python 3.11+ adaptive specialization.

High Bandwidth Memory (HBM)/concepts/gpu/hbm-memory

High Bandwidth Memory (HBM) architecture: 3D-stacked DRAM with TSV technology powering NVIDIA GPUs and AI accelerators with TB/s bandwidth.

GPU Memory Hierarchy & Optimization/concepts/gpu/memory-hierarchy

Master GPU memory hierarchy from registers to global memory, understand coalescing patterns, bank conflicts, and optimization strategies for maximum performance

Compare NVLink vs PCIe bandwidth for multi-GPU training. Learn GPU topologies, NVSwitch, and choose between NCCL, Gloo, and MPI for distributed deep learning.

Filesystems: The Digital DNA of Data Storage/concepts/linux/filesystems-overview

Explore Linux filesystems through interactive visuals. Learn VFS, compare ext4 vs Btrfs vs ZFS, and understand file operations.

Python Memory Management/concepts/python/memory-management

Deep dive into CPython memory management: PyMalloc arenas, object pools, reference counting, and optimization techniques like __slots__ and generators.

NVIDIA Unified Virtual Memory/concepts/gpu/unified-memory

NVIDIA Unified Virtual Memory (UVM): on-demand page migration, memory oversubscription, and simplified CPU-GPU memory management.

Learn how filesystem journaling prevents data loss during crashes. Explore write-ahead logging and recovery in ext4 and XFS.

Understand Linux inodes - the metadata structures behind every file. Learn about hard links, soft links, and inode limits.

Global Interpreter Lock (GIL)/concepts/python/global-interpreter-lock

Understand CPython Global Interpreter Lock (GIL): thread switching, CPU vs I/O workloads, multiprocessing workarounds, and PEP 703 no-GIL future.

Page Migration & Fault Handling/concepts/gpu/page-migration

CUDA page migration and fault handling between CPU and GPU memory. Learn TLB management, DMA transfers, and memory optimization.

Understand Copy-on-Write (CoW) in Btrfs and ZFS. Learn how CoW enables instant snapshots, atomic writes, and data integrity.

FUSE: Filesystem in Userspace Explained/concepts/linux/fuse-filesystem

Learn FUSE (Filesystem in Userspace) for building custom filesystems. Understand how NTFS-3G, SSHFS, and cloud storage work.

Python Object Model/concepts/python/object-model

Learn how CPython implements PyObject, type objects, and the unified object model. Explore reference counting, memory layout, and Python internals.

ext4: The Linux Workhorse Filesystem/concepts/linux/ext4-filesystem

Explore ext4, the default Linux filesystem with journaling, extents, and proven reliability. Learn how ext4 protects your data.

Filesystem Snapshots: Time Travel for Your Data/concepts/linux/filesystem-snapshots

How modern filesystems create instant snapshots. Explore Btrfs/ZFS snapshot mechanics, rollback operations, and backup strategies interactively.

Python Garbage Collection/concepts/python/garbage-collection

Understand CPython garbage collection: reference counting, generational GC for circular references, weak references, and gc module tuning strategies.

CPU Pipeline Architecture/concepts/computer-architecture/cpu-pipeline-detailed

Deep dive into CPU pipeline architecture covering 5-stage RISC pipelines, data hazards, control hazards, superscalar execution, and out-of-order processing.

Master Linux mount options like noatime and async for performance tuning and security hardening. Interactive guide to fstab configuration.

NTFS Filesystem: The Master File Table/concepts/linux/ntfs-filesystem

Understand how NTFS organizes files through the Master File Table (MFT), including the key distinction between resident and non-resident file storage.

Python Optimization Techniques/concepts/python/python-optimization

Python performance optimization guide: CPython peephole optimizer, lru_cache, profiling with cProfile, and Python 3.11+ adaptive bytecode specialization.

Contrastive Learning/concepts/embeddings/contrastive-learning

Master contrastive learning for vector embeddings: how InfoNCE loss and self-supervised techniques train models to create high-quality semantic representations.

Btrfs: Modern Copy-on-Write Filesystem/concepts/linux/btrfs-filesystem

Learn Btrfs with built-in snapshots, RAID, and compression. Explore copy-on-write, subvolumes, and self-healing on Linux.

Filesystem Data Integrity: Detecting Silent Corruption/concepts/linux/filesystem-integrity

Understand how modern filesystems use checksums to detect silent data corruption that traditional filesystems miss entirely.

__slots__ Optimization/concepts/python/slots-optimization

Master Python __slots__ for 40-50% memory reduction and faster attribute access. Learn CPython descriptor protocol, inheritance patterns, and best practices.

Cross-Lingual Alignment/concepts/embeddings/cross-lingual-alignment

Learn cross-lingual embedding alignment techniques like VecMap and MUSE for multilingual vector retrieval and zero-shot language transfer in search systems.

NVIDIA Device Files in /dev//concepts/gpu/nvidia-device-files

Understanding character devices, major/minor numbers, and the device file hierarchy created by NVIDIA drivers for GPU access in Linux.

ZFS: The Ultimate Filesystem/concepts/linux/zfs-filesystem

Master ZFS filesystem with pooled storage, RAID-Z, snapshots, and checksums. Learn enterprise-grade data integrity on Linux.

Green Threads vs OS Threads: Understanding Concurrency Models/concepts/python/green-threads-vs-os-threads

Compare Python green threads vs OS threads. Learn asyncio coroutines, gevent, context switching costs, and when to use each concurrency model.

Domain Adaptation/concepts/embeddings/domain-adaptation

Domain adaptation for embeddings: transfer learning to fine-tune retrieval models across domains while preventing catastrophic forgetting.

XFS: High-Performance Parallel Filesystem/concepts/linux/xfs-filesystem

XFS filesystem internals: allocation groups, extent-based allocation, and delayed allocation for high-performance parallel I/O.

Python asyncio: Mastering Asynchronous Programming/concepts/python/asyncio-event-loop

Deep dive into Python's asyncio library, understanding event loops, coroutines, tasks, and async/await patterns with interactive visualizations.

Binary Embeddings/concepts/embeddings/binary-embeddings

Learn how binary embeddings use 1-bit quantization for ultra-compact vector representations, enabling billion-scale similarity search with 32x memory reduction.

FAT32 & exFAT: Universal Filesystems/concepts/linux/fat-filesystems

Learn FAT32 and exFAT filesystems for cross-platform USB drives and SD cards. Understand file size limits and compatibility.

Python Shared Memory/concepts/python/shared-memory

Master Python multiprocessing.shared_memory for zero-copy IPC. Learn synchronization, NumPy integration, and race condition prevention patterns.

Hybrid Retrieval Systems/concepts/embeddings/hybrid-retrieval-systems

Build hybrid retrieval systems combining BM25 sparse search with dense vector embeddings using reciprocal rank fusion for superior semantic search performance.

RAID storage visualized: RAID 0, 1, 5, 6, and 10 levels explained. Learn how they work, when to use them, and disk failure recovery.

Memory Controllers: The Brain Behind RAM Management/concepts/memory/memory-controllers

Learn how memory controllers manage CPU-RAM data flow. Interactive demos of channels, ranks, banks, and command scheduling for optimal bandwidth.

BM25 Algorithm/concepts/embeddings/bm25-algorithm

Master the BM25 algorithm, the probabilistic ranking function powering Elasticsearch and Lucene for keyword-based document retrieval and search systems.

Linux Process Management: Fork, Exec, and Beyond/concepts/linux/process-management

Master Linux process management through interactive visualizations. Understand process lifecycle, fork/exec operations, zombies, orphans, and CPU scheduling.

Distributed Parallelism in Deep Learning/concepts/gpu/distributed-parallelism

GPU distributed parallelism: Data Parallel (DDP), Tensor Parallel, Pipeline Parallel, and ZeRO optimization for training large AI models.

Explore Linux memory management through interactive visualizations. Understand virtual memory, page tables, TLB, swapping, and memory allocation.

Linux system calls visualized: how user programs communicate with the kernel, protection rings, context switching, and syscall performance.

Master the Linux networking stack through interactive visualizations. Understand TCP/IP layers, sockets, iptables, routing, and network namespaces.

Linux Boot Process: From Power-On to Login/concepts/linux/boot-process

Visualize the complete Linux boot sequence from BIOS/UEFI to login. Learn how GRUB, kernel, and systemd work together with interactive visualizations.

Linux Init Systems: From SysV to systemd/concepts/linux/init-systems

Compare Linux init systems through interactive visualizations. Understand the evolution from SysV Init to systemd, service management, and boot orchestration.

Master Linux kernel modules through interactive visualizations. Learn how to load, unload, develop, and debug kernel modules that extend Linux functionality.

Master Linux namespaces for container isolation. Learn PID, network, mount, and user namespaces with interactive demos.

Compare Wayland vs X11 display servers on Linux. Learn about architecture, performance, security, and modern graphics stack.

Master cgroups to limit CPU, memory, and I/O for process groups. Understand cgroups v1 vs v2, the hierarchical structure, and how containers use them.

Discover how containers work by combining namespaces, cgroups, and OverlayFS. Build a mental model of Docker internals through interactive visualizations.

Learn nvidia-modeset for display configuration on Linux. Understand kernel mode-setting, DRM integration, and GPU drivers.

CUDA Multi-Process Service (MPS)/concepts/gpu/cuda-mps

Learn CUDA Multi-Process Service (MPS) for GPU sharing. Enable concurrent kernel execution from multiple processes and maximize GPU utilization.

Understanding TCP/IP Protocol Stack/concepts/networking/tcp-ip

Explore the TCP/IP protocol stack, packet encapsulation, and how data travels through network layers from application to physical transmission.

Flynn's Classification: Taxonomy of Computer Architectures/concepts/computer-architecture/flynns-classification

Explore Flynn's Classification of computer architectures through interactive visualizations of SISD, SIMD, MISD, and MIMD systems.

CPU Pipelines & Branch Prediction: Modern Processor Architecture/concepts/computer-architecture/cpu-pipelines

Explore CPU pipeline stages, instruction-level parallelism, pipeline hazards, and branch prediction through interactive visualizations.

Hazard Detection: Pipeline Dependencies and Solutions/concepts/computer-architecture/hazard-detection

Master pipeline hazards through interactive visualizations of data dependencies, control hazards, structural conflicts, and advanced detection mechanisms.

Master thread safety concepts through interactive visualizations of race conditions, mutexes, atomic operations, and deadlock scenarios.

Convolution Operation: The Foundation of CNNs/concepts/deep-learning/convolution-operation

Interactive guide to convolution in CNNs: visualize sliding windows, kernels, stride, padding, and feature detection with step-by-step demos.

Cross-Entropy Loss/concepts/deep-learning/cross-entropy-loss

Understand cross-entropy loss for classification: interactive demos of binary and multi-class CE, the -log(p) curve, softmax gradients, and focal loss.

Dilated Convolutions: Expanding Receptive Fields Efficiently/concepts/deep-learning/dilated-convolutions

Understand dilated (atrous) convolutions: how dilation rates expand receptive fields exponentially without extra parameters and how to avoid gridding artifacts.

Feature Pyramid Networks/concepts/deep-learning/feature-pyramid-networks

Learn how Feature Pyramid Networks build multi-scale feature representations through top-down pathways and lateral connections for robust object detection.

Receptive Field/concepts/deep-learning/receptive-field

Understand receptive fields in CNNs — how convolutional layers expand their field of view, the gap between theoretical and effective receptive fields, and strategies for controlling RF growth.

VAE Latent Space: Understanding Variational Autoencoders/concepts/deep-learning/vae-latent-space

Explore VAE latent space in deep learning. Learn variational autoencoder encoding, decoding, interpolation, and the reparameterization trick.

Master virtual memory and TLB address translation with interactive demos. Learn page tables, page faults, and memory management optimization.

CPU Cache Lines: The Unit of Memory Transfer/concepts/memory/cpu-cache-lines

Learn how CPU cache lines transfer data between memory and cache. Understand spatial locality and optimize memory access patterns for better performance.

Memory Access Patterns: Sequential vs Strided/concepts/memory/memory-access-patterns

Master sequential vs strided memory access patterns. Learn how cache efficiency and hardware prefetching affect application performance.

Memory Interleaving: Parallel Memory Access/concepts/memory/memory-interleaving

Discover how memory interleaving distributes addresses across banks for parallel access. Boost memory bandwidth in DDR5 and GPU systems.

NUMA Architecture: Non-Uniform Memory Access/concepts/memory/numa-architecture

Explore NUMA architecture and memory locality in multi-socket systems. Understand local vs remote memory access latency and optimization strategies.

Understanding NVIDIA Kubernetes GPU Operator/concepts/gpu/kubernetes-operator

Automate NVIDIA GPU management in Kubernetes with the GPU Operator. Deploy drivers, device plugins, and monitoring as DaemonSets.

Understanding CUDA Contexts/concepts/gpu/cuda-context

Explore the concept of CUDA contexts, their role in managing GPU resources, and how they enable parallel execution across multiple CPU threads.

CLS Token in Vision Transformers/concepts/attention/cls-token

Learn how the CLS token acts as a global information aggregator in Vision Transformers, enabling whole-image classification through attention mechanisms.

Hierarchical Attention in Vision Transformers/concepts/attention/hierarchical-attention

Explore how hierarchical attention enables Vision Transformers (ViT) to process sequential data by encoding relative positions.

Multi-Head Attention in Vision Transformers/concepts/attention/multihead-attention

Explore how multi-head attention enables Vision Transformers (ViT) to process sequential data by encoding relative positions.

Positional Embeddings in Vision Transformers/concepts/attention/positional-embeddings-vit

Explore how positional embeddings enable Vision Transformers (ViT) to process sequential data by encoding relative positions.

Interactive Look: Self-Attention in Vision Transformers/concepts/attention/self-attention-vit

Explore how self-attention enables Vision Transformers (ViT) to understand images by capturing global context, with CNN comparison.

Transparent Huge Pages (THP): Reducing TLB Pressure/concepts/memory/transparent-huge-pages

Learn how Transparent Huge Pages (THP) reduces TLB misses by promoting 4KB to 2MB pages. Understand performance benefits and memory bloat tradeoffs.

ALiBi: Attention with Linear Biases/concepts/attention/alibi

Learn ALiBi, the position encoding method that adds linear biases to attention scores for exceptional length extrapolation in transformers.

MHA vs GQA vs MQA: Choosing the Right Attention/concepts/attention/attention-comparison

Compare Multi-Head, Grouped-Query, and Multi-Query Attention mechanisms to understand their trade-offs and choose the optimal approach for your use case.

Attention Sinks: Stable Streaming LLMs/concepts/attention/attention-sinks

Learn about attention sinks, where LLMs concentrate attention on initial tokens, and how preserving them enables streaming inference.

Cross-Attention: Bridging Different Modalities/concepts/attention/cross-attention

Understand cross-attention, the mechanism that enables transformers to align and fuse information from different sources, sequences, or modalities.

Grouped-Query Attention (GQA)/concepts/attention/grouped-query-attention

Learn how Grouped-Query Attention (GQA) balances Multi-Head quality with Multi-Query efficiency for faster LLM inference.

Linear Attention Approximations/concepts/attention/linear-attention-approximations

Explore linear complexity attention mechanisms including Performer, Linformer, and other efficient transformers that scale to very long sequences.

Masked and Causal Attention/concepts/attention/masked-attention

Learn how masked attention enables autoregressive generation and prevents information leakage in transformers and language models.

Multi-Query Attention (MQA)/concepts/attention/multi-query-attention

Learn Multi-Query Attention (MQA), the optimization that shares keys and values across attention heads for massive memory savings.

Rotary Position Embeddings (RoPE)/concepts/attention/rotary-position-embeddings

Learn Rotary Position Embeddings (RoPE), the elegant position encoding using rotation matrices, powering LLaMA, Mistral, and modern LLMs.

Scaled Dot-Product Attention/concepts/attention/scaled-dot-product

Master scaled dot-product attention, the fundamental transformer building block. Learn why scaling is crucial for stable training.

Sliding Window Attention/concepts/attention/sliding-window-attention

Sliding Window Attention for long sequences: local context windows enable O(n) complexity, used in Mistral and Longformer models.

Sparse Attention Patterns/concepts/attention/sparse-attention-patterns

Explore sparse attention mechanisms that reduce quadratic complexity to linear or sub-quadratic, enabling efficient processing of long sequences.

SoA vs AoS: Data Layout Optimization/concepts/computer-architecture/soa-vs-aos

Master Structure of Arrays (SoA) vs Array of Structures (AoS) data layouts for optimal cache efficiency, SIMD vectorization, and GPU memory coalescing.

Contrastive Loss/concepts/deep-learning/contrastive-loss

Understand contrastive loss for representation learning: interactive demos of InfoNCE, triplet loss, and embedding space clustering with temperature tuning.

Dropout Regularization/concepts/deep-learning/dropout

Understand dropout regularization: how randomly silencing neurons prevents overfitting, the inverted dropout trick, and when to use each dropout variant.

Focal Loss: Focusing on Hard Examples/concepts/deep-learning/focal-loss

Learn focal loss for deep learning: down-weight easy examples, focus on hard ones. Interactive demos of gamma, alpha balancing, and RetinaNet.

He/Kaiming Initialization/concepts/deep-learning/he-initialization

Learn He (Kaiming) initialization for ReLU neural networks: understand why ReLU needs special weight initialization, visualize variance flow, and see dead neurons in action.

KL Divergence/concepts/deep-learning/kl-divergence

Learn KL divergence for machine learning: measure distribution differences in VAEs, knowledge distillation, and variational inference with interactive visualizations.

MSE and MAE Loss Functions/concepts/deep-learning/mse-mae

Interactive guide to MSE vs MAE for regression: explore outlier sensitivity, gradient behavior, and Huber loss with visualizations.

Xavier/Glorot Initialization/concepts/deep-learning/xavier-initialization

Learn Xavier (Glorot) initialization: how it balances forward signals and backward gradients to enable stable deep network training with tanh and sigmoid.

Understanding NVIDIA Persistence Daemon/concepts/gpu/nvidia-persistence-daemon

Eliminating GPU initialization latency through nvidia-persistenced - a userspace daemon that maintains GPU driver state for optimal startup performance.

ANN Algorithms Comparison/concepts/embeddings/ann-comparison

Compare all approximate nearest neighbor algorithms side-by-side: HNSW, IVF-PQ, LSH, Annoy, and ScaNN. Find the best approach for your use case.

HNSW: Hierarchical Navigable Small World/concepts/embeddings/hnsw-search

Interactive visualization of HNSW - the graph-based algorithm that powers modern vector search with logarithmic complexity.

Vector Index Structures/concepts/embeddings/index-structures

Explore the fundamental data structures powering vector databases: trees, graphs, hash tables, and hybrid approaches for efficient similarity search.

Learn how IVF-PQ combines clustering and compression to enable billion-scale vector search with minimal memory footprint.

LSH: Locality Sensitive Hashing/concepts/embeddings/lsh-search

Explore how LSH uses probabilistic hash functions to find similar vectors in sub-linear time, perfect for streaming and high-dimensional data.

Vector Quantization Techniques/concepts/embeddings/vector-quantization

Master vector compression techniques from scalar to product quantization. Learn how to reduce memory usage by 10-100× while preserving search quality.

Long Polling: The Patient Connection/concepts/networking/long-polling

Learn HTTP long polling - a server-side technique that holds connections open until data arrives. Achieve near real-time updates with standard protocols.

Short Polling: The Impatient Client Pattern/concepts/networking/short-polling

Learn short polling in networking - a simple HTTP pattern for periodic data fetching. See why 70-90% of requests waste bandwidth and when to use alternatives.

Master WebSocket protocol for real-time bidirectional communication over TCP. Learn handshakes, frames, and building low-latency web applications.

Adaptive Tiling: Efficient Visual Token Generation/concepts/deep-learning/adaptive-tiling

Learn adaptive tiling in vision transformers: dynamically partition images based on visual complexity to reduce token counts by up to 80% while preserving detail where it matters.

Emergent Abilities in Large Language Models/concepts/deep-learning/emergent-abilities

Explore emergent abilities in large language models: sudden capabilities that appear at scale thresholds, phase transitions, and the mirage debate, with interactive visualizations.

Prompt Engineering/concepts/deep-learning/prompt-engineering

Master prompt engineering for large language models: from basic composition to Chain-of-Thought, few-shot, and advanced techniques with interactive visualizations.

Deep dive into how different prompt components influence model behavior across transformer layers, from surface patterns to abstract reasoning.

Neural Scaling Laws/concepts/deep-learning/scaling-laws

Explore neural scaling laws in deep learning: power law relationships between model size, data, and compute that predict AI performance, with interactive visualizations.

Visual Complexity Analysis: Smart Image Processing/concepts/deep-learning/visual-complexity-analysis

Learn visual complexity analysis in deep learning - how neural networks measure entropy, edges, and saliency for adaptive image processing.

Cross-Encoder vs Bi-Encoder/concepts/embeddings/cross-encoder-vs-bi-encoder

Understand the fundamental differences between independent and joint encoding architectures for neural retrieval systems.

Dense Embeddings Space Explorer/concepts/embeddings/dense-embeddings

Interactive visualization of high-dimensional vector spaces, word relationships, and semantic arithmetic operations.

Matryoshka Embeddings/concepts/embeddings/matryoshka-embeddings

Matryoshka embeddings: nested representations enabling dimension reduction by simple truncation without model retraining for flexible retrieval.

Multi-Vector Late Interaction/concepts/embeddings/multi-vector-late-interaction

Explore ColBERT and other multi-vector retrieval models that use fine-grained token-level matching for superior search quality.

Quantization Effects Simulator/concepts/embeddings/quantization-effects

Embedding quantization simulator: explore memory-accuracy trade-offs from float32 to int8 and binary representations for retrieval.

Sparse vs Dense Embeddings/concepts/embeddings/sparse-vs-dense

Compare lexical (BM25/TF-IDF) and semantic (BERT) retrieval approaches, understanding their trade-offs and hybrid strategies.

Context Windows: The Memory Limits of LLMs/concepts/llms/context-windows

Interactive visualization of LLM context windows - sliding windows, expanding contexts, and attention patterns that define model memory limits.

Flash Attention: IO-Aware Exact Attention/concepts/llms/flash-attention

Interactive Flash Attention visualization - the IO-aware algorithm achieving memory-efficient exact attention through tiling and kernel fusion.

Interactive KV cache visualization - how key-value caching in LLM transformers enables fast text generation without quadratic recomputation.

Tokenization: Converting Text to Numbers/concepts/llms/tokenization

Interactive exploration of tokenization methods in LLMs - BPE, SentencePiece, and WordPiece. Understand how text becomes tokens that models can process.

The Vision-Language Alignment Problem/concepts/multimodal/alignment-problem

How vision-language models align visual and text representations using contrastive learning, cross-modal attention, and CLIP-style training.

The Modality Gap/concepts/multimodal/modality-gap

The modality gap in CLIP and vision-language models: why image and text embeddings occupy separate regions despite contrastive training.

Multimodal Scaling Laws/concepts/multimodal/scaling-laws

Discover how multimodal vision-language models like CLIP, ALIGN, and LLaVA scale with data, parameters, and compute following Chinchilla-style power laws.

Master LoRA, bottleneck adapters, and prefix tuning for parameter-efficient fine-tuning of vision-language models like LLaVA with minimal compute and memory.

Client-Server Communication: Polling vs WebSockets/concepts/networking/client-server-communication

Learn client-server communication patterns including short polling, long polling, and WebSockets. Compare HTTP protocols for real-time web applications.

C++ AST & Parsing/concepts/cpp/ast-parsing

Explore how C++ code is parsed into an Abstract Syntax Tree (AST). Learn lexical analysis, tokenization, and syntax parsing for systems programming.

C++ Compilation Overview/concepts/cpp/compilation

Understand the complete C++ compilation pipeline from source code to object files. Learn preprocessing, parsing, code generation, and optimization stages.

C++ Dynamic Linking/concepts/cpp/dynamic-linking

Master C++ dynamic linking and runtime library loading. Learn shared libraries, position-independent code, dlopen, and systems-level library management.

C++ Linking Overview/concepts/cpp/linking

How C++ object files are linked into executables. Learn symbol resolution, static vs dynamic linking, and linker optimization.

C++ Program Loading/concepts/cpp/loading

Understand how C++ programs are loaded and executed by the operating system. Learn ELF format, process creation, memory mapping, and runtime initialization.

Memory Management & RAII in C++/concepts/cpp/memory-raii

Learn Resource Acquisition Is Initialization (RAII) - the cornerstone of C++ memory management. Understand automatic resource cleanup and exception safety.

Modern C++ Features (C++11 and Beyond)/concepts/cpp/modern-cpp-features

Explore modern C++ features including auto, lambdas, ranges, and coroutines. Learn how C++11/14/17/20 transformed the language.

Object-Oriented Programming in C++/concepts/cpp/oop-inheritance

Master C++ OOP concepts including inheritance, polymorphism, virtual functions, and modern object-oriented design principles with interactive examples.

C++ Compiler Optimization/concepts/cpp/optimization

C++ compiler optimization: loop unrolling, inlining, dead code elimination. Learn GCC and Clang optimization flags and techniques.

Pointers & References in C++/concepts/cpp/pointers-references

Master C++ pointers and references through interactive visualizations. Learn memory addressing, dereferencing, smart pointers, and avoid common pitfalls.

C++ Preprocessor/concepts/cpp/preprocessor

C++ preprocessor visualized: macros, header guards, conditional compilation, and #include directives explained interactively.

Smart Pointers in Modern C++/concepts/cpp/smart-pointers

Master C++11 smart pointers through interactive examples. Learn unique_ptr, shared_ptr, and weak_ptr with reference counting visualizations.

C++ Stack vs Heap/concepts/cpp/stack-heap

C++ stack vs heap memory allocation visualized. Learn LIFO stack frames, dynamic heap allocation, and memory management patterns.

C++ Symbol Resolution/concepts/cpp/symbol-resolution

C++ symbol resolution explained: how linkers fix undefined references, name mangling, weak vs strong symbols, and common linking errors.

Templates & STL in C++/concepts/cpp/templates-stl

Master C++ templates and the Standard Template Library. Learn generic programming, template metaprogramming, and STL containers and algorithms.

Gradient Flow in Deep Networks/concepts/deep-learning/gradient-flow

Learn how gradients propagate through deep neural networks during backpropagation. Understand vanishing and exploding gradient problems with interactive visualizations.

NCCL: High-Performance Multi-GPU Communication/concepts/gpu/nccl-communication

Master NVIDIA NCCL for multi-GPU deep learning. Learn AllReduce, ring algorithms, and GPU-Direct communication for efficient distributed training on CUDA.

DataParallel vs DistributedDataParallel/concepts/pytorch/data-parallel

Compare PyTorch DataParallel vs DistributedDataParallel for multi-GPU training. Learn GIL limitations, NCCL AllReduce, and DDP best practices.

PyTorch DataLoader Pipeline/concepts/pytorch/dataloader-pipeline

Understanding how PyTorch DataLoader moves data from disk through CPU to GPU, including Dataset, Sampler, Workers, and Collate components.

Understanding num_workers/concepts/pytorch/num-workers

Deep dive into PyTorch DataLoader num_workers parameter: how parallel workers prefetch data, optimal configuration, and common pitfalls.

Pinned Memory and DMA Transfers/concepts/pytorch/pin-memory

Understanding PyTorch pin_memory for faster CPU to GPU data transfers using DMA (Direct Memory Access) and page-locked memory.

ASFF: Adaptive Spatial Feature Fusion/concepts/computer-vision/asff

Learning where to fuse multi-scale features with per-pixel, per-level fusion weights. ASFF challenges FPN's uniform fusion assumption.

RoI Pooling, RoI Align & Deformable RoI Pooling/concepts/computer-vision/roi-pooling

Understanding region-based feature extraction for object detection, from quantized pooling to sub-pixel alignment and adaptive sampling

Anchor-Based vs Anchor-Free Object Detection/concepts/computer-vision/anchor-based-vs-anchor-free

Compare anchor-based vs anchor-free object detection: Faster R-CNN and RetinaNet anchors vs FCOS and CenterNet point-based methods.

Understanding how neural architecture search discovers optimal feature pyramid architectures that outperform hand-designed alternatives

Modern Object Detection: DETR and Transformer-Based Approaches/concepts/computer-vision/modern-object-detection

Understanding end-to-end object detection with transformers, from DETR's object queries to bipartite matching and attention-based localization

NMS & Soft-NMS: Removing Duplicate Detections/concepts/computer-vision/nms-soft-nms

Understanding Non-Maximum Suppression algorithms for object detection post-processing, from greedy NMS to soft variants

NAdam: Nesterov-Accelerated Adam/concepts/deep-learning/nadam

Understand the NAdam optimizer that fuses Adam adaptive learning rates with Nesterov look-ahead momentum for faster, smoother convergence in deep learning.

Visual Complexity Analysis for Token Allocation/concepts/computer-vision/visual-complexity-analysis

Learn how visual complexity analysis optimizes vision transformer token allocation using edge detection, FFT, and entropy metrics.

NVIDIA Tensor Cores explained: mixed-precision matrix operations delivering 10x speedups for AI training and inference on CUDA GPUs.

Layer Normalization/concepts/deep-learning/layer-normalization

Learn layer normalization for transformers and sequence models: how normalizing across features enables batch-independent training with interactive visualizations.

Internal Covariate Shift/concepts/deep-learning/internal-covariate-shift

Understand internal covariate shift in deep learning: why layer input distributions change during training, how it slows convergence, and how batch normalization fixes it.

Batch Normalization/concepts/deep-learning/batch-normalization

Learn batch normalization in deep learning: how normalizing layer inputs accelerates training, improves gradient flow, and acts as regularization with interactive visualizations.

Skip Connections/concepts/deep-learning/skip-connections

Learn how skip connections and residual learning enable training of very deep neural networks. Understand the ResNet revolution with interactive visualizations.

CPU Performance & Optimization/concepts/computer-architecture/cpu-optimization

CPU performance optimization: memory hierarchy, cache blocking, SIMD vectorization, and profiling tools for modern processors.

C++ Virtual Tables & Inheritance/concepts/cpp/virtual-tables-inheritance

C++ virtual tables (vtables) explained. Learn virtual dispatch, single/multiple inheritance, RTTI, and object memory layout visually.

Graph Attention Networks (GAT)/concepts/graph/graph-attention-networks

Adaptive attention-based aggregation for graph neural networks - multi-head attention, learned weights, and interpretable graph learning

Graph Centrality & Metrics/concepts/graph/graph-centrality

Understanding node importance through centrality measures, shortest paths, hop distances, clustering coefficients, and fundamental graph metrics

Graph Convolutional Networks (GCN)/concepts/graph/graph-convolutional-networks

Learn Graph Convolutional Networks (GCN) with spectral theory, message passing, and node classification for geometric deep learning.

Graph Embeddings/concepts/graph/graph-embeddings

Learning low-dimensional vector representations of graphs through random walks, DeepWalk, Node2Vec, and skip-gram models

Graph Pooling Methods/concepts/graph/graph-pooling

Hierarchical graph coarsening techniques - TopK, SAGPool, DiffPool, and readout operations for graph-level representations

Mixture of Experts (MoE)/concepts/llms/mixture-of-experts

Understanding sparse mixture of experts models - architecture, routing mechanisms, load balancing, and efficient scaling strategies for large language models

GPU Streaming Multiprocessor (SM)/concepts/gpu/shared-multiprocessor

Deep dive into the fundamental processing unit of modern GPUs - the Streaming Multiprocessor architecture, execution model, and memory hierarchy

Uses/uses

Tools, software, and hardware I use

Resume/resume

My professional experience and qualifications

Bookmarks/bookmarks

A curated collection of articles and resources I find valuable

Consulting/consulting

Services and consulting offerings

Thank You/thank-you

Confirmation page after form submissions

Sitemap/sitemap

Visual representation of the site structure

Mastodon