Sitemap

A visual representation of the site structure to help you navigate through the content.

Site Structure

Home/

Main landing page with introduction and recent articles

About/about

Learn more about me, my background, and expertise

Speaking/speaking

My talks, presentations, and speaking engagements

Articles/articles

Collection of articles I've written on various topics

Numerical sensitivity/articles/numerical-sensitivity

Article content

Sam multi mask ambiguity/articles/sam-multi-mask-ambiguity

Article content

Visualizing yolov11/articles/visualizing-yolov11

Article content

H264 implementation applications/articles/h264-implementation-applications

Article content

H264 transform quantization/articles/h264-transform-quantization

Article content

H264 fundamentals/articles/h264-fundamentals

Article content

Zettel/articles/zettel

Article content

Compiling pytorch kernel/articles/compiling-pytorch-kernel

Article content

View size not compatible/articles/view-size-not-compatible

Article content

Gpu boot errors/articles/gpu-boot-errors

Article content

H264 interactive guide/articles/h264-interactive-guide

Article content

Ggml structure/articles/ggml-structure

Article content

Quantization deep dive/articles/quantization-deep-dive

Article content

How tensorrt works/articles/how-tensorrt-works

Article content

Kernel fusion/articles/kernel-fusion

Article content

Visualizing yolov5/articles/visualizing-yolov5

Article content

Cpython internals/articles/cpython-internals

Article content

Cpp compilation process/articles/cpp-compilation-process

Article content

Cpp linking in depth/articles/cpp-linking-in-depth

Article content

Cpp loading runtime/articles/cpp-loading-runtime

Article content

Registry pattern/articles/registry-pattern

Article content

Magic numbers/articles/magic-numbers

Article content

Image encoding/articles/image-encoding

Article content

Text encoding/articles/text-encoding

Article content

Papers/papers

Research papers and publications

Visual instruction tuning/papers/visual-instruction-tuning

Paper content

Vit object detection/papers/vit-object-detection

Paper content

Yolo/papers/yolo

Paper content

Efficientnet/papers/efficientnet

Paper content

Faster rcnn/papers/faster-rcnn

Paper content

Sam/papers/sam

Paper content

DETR/papers/DETR

Paper content

Blip2/papers/blip2

Paper content

Image worth 16x16/papers/image-worth-16x16

Paper content

Optimizing transformer inference/papers/optimizing-transformer-inference

Paper content

Surf/papers/surf

Paper content

Swin transformer/papers/swin-transformer

Paper content

Clip/papers/clip

Paper content

Deeplearning go brr/papers/deeplearning-go-brr

Paper content

Attention is all you need/papers/attention-is-all-you-need

Paper content

Data movement transformer/papers/data-movement-transformer

Paper content

Deep residual learning/papers/deep-residual-learning

Paper content

Concepts/concepts

Interactive explanations of machine learning concepts

initramfs: The Initial RAM Filesystem Explained/concepts/linux/initramfs-boot-process

Learn how initramfs enables Linux boot by loading essential drivers before the root filesystem mounts. Explore early userspace initialization.

Linux Kernel Architecture: How Your OS Actually Works/concepts/linux/kernel-architecture

Linux kernel architecture explained. Learn syscalls, protection rings, user vs kernel space, and what happens when you run a command.

How RAM Works: Interactive Deep Dive into Computer Memory/concepts/memory/how-ram-works

Explore the inner workings of RAM through beautiful animations and interactive visualizations. Understand memory cells, addressing, and the memory hierarchy.

Python Bytecode Compilation/concepts/python/bytecode-compilation

Explore CPython bytecode compilation from source to .pyc files. Learn the dis module, PVM stack operations, and Python 3.11+ adaptive specialization.

High Bandwidth Memory (HBM)/concepts/gpu/hbm-memory

High Bandwidth Memory (HBM) architecture: 3D-stacked DRAM with TSV technology powering NVIDIA GPUs and AI accelerators with TB/s bandwidth.

GPU Memory Hierarchy & Optimization/concepts/gpu/memory-hierarchy

Master GPU memory hierarchy from registers to global memory, understand coalescing patterns, bank conflicts, and optimization strategies for maximum performance

Multi-GPU Communication: NVLink vs PCIe, NCCL, and Distributed Training/concepts/gpu/multi-gpu-communication

Compare NVLink vs PCIe bandwidth for multi-GPU training. Learn GPU topologies, NVSwitch, and choose between NCCL, Gloo, and MPI for distributed deep learning.

Filesystems: The Digital DNA of Data Storage/concepts/linux/filesystems-overview

Explore Linux filesystems through interactive visuals. Learn VFS, compare ext4 vs Btrfs vs ZFS, and understand file operations.

Python Memory Management/concepts/python/memory-management

Deep dive into CPython memory management: PyMalloc arenas, object pools, reference counting, and optimization techniques like __slots__ and generators.

NVIDIA Unified Virtual Memory/concepts/gpu/unified-memory

NVIDIA Unified Virtual Memory (UVM): on-demand page migration, memory oversubscription, and simplified CPU-GPU memory management.

Filesystem Journaling: How Write-Ahead Logging Prevents Data Loss/concepts/linux/filesystem-journaling

Learn how filesystem journaling prevents data loss during crashes. Explore write-ahead logging and recovery in ext4 and XFS.

Inodes: The Hidden Metadata That Powers Every File/concepts/linux/inodes

Understand Linux inodes - the metadata structures behind every file. Learn about hard links, soft links, and inode limits.

Global Interpreter Lock (GIL)/concepts/python/global-interpreter-lock

Understand CPython Global Interpreter Lock (GIL): thread switching, CPU vs I/O workloads, multiprocessing workarounds, and PEP 703 no-GIL future.

Page Migration & Fault Handling/concepts/gpu/page-migration

CUDA page migration and fault handling between CPU and GPU memory. Learn TLB management, DMA transfers, and memory optimization.

Copy-on-Write (CoW): Never Overwrite, Always Preserve/concepts/linux/copy-on-write

Understand Copy-on-Write (CoW) in Btrfs and ZFS. Learn how CoW enables instant snapshots, atomic writes, and data integrity.

FUSE: Filesystem in Userspace Explained/concepts/linux/fuse-filesystem

Learn FUSE (Filesystem in Userspace) for building custom filesystems. Understand how NTFS-3G, SSHFS, and cloud storage work.

Python Object Model/concepts/python/object-model

Learn how CPython implements PyObject, type objects, and the unified object model. Explore reference counting, memory layout, and Python internals.

ext4: The Linux Workhorse Filesystem/concepts/linux/ext4-filesystem

Explore ext4, the default Linux filesystem with journaling, extents, and proven reliability. Learn how ext4 protects your data.

Filesystem Snapshots: Time Travel for Your Data/concepts/linux/filesystem-snapshots

How modern filesystems create instant snapshots. Explore Btrfs/ZFS snapshot mechanics, rollback operations, and backup strategies interactively.

Python Garbage Collection/concepts/python/garbage-collection

Understand CPython garbage collection: reference counting, generational GC for circular references, weak references, and gc module tuning strategies.

CPU Pipeline Architecture/concepts/computer-architecture/cpu-pipeline-detailed

Deep dive into CPU pipeline architecture covering 5-stage RISC pipelines, data hazards, control hazards, superscalar execution, and out-of-order processing.

Mount Options: Fine-Tuning Filesystem Behavior and Performance/concepts/linux/mount-options

Master Linux mount options like noatime and async for performance tuning and security hardening. Interactive guide to fstab configuration.

NTFS Filesystem: The Master File Table/concepts/linux/ntfs-filesystem

Understand how NTFS organizes files through the Master File Table (MFT), including the key distinction between resident and non-resident file storage.

Python Optimization Techniques/concepts/python/python-optimization

Python performance optimization guide: CPython peephole optimizer, lru_cache, profiling with cProfile, and Python 3.11+ adaptive bytecode specialization.

Contrastive Learning/concepts/embeddings/contrastive-learning

Master contrastive learning for vector embeddings: how InfoNCE loss and self-supervised techniques train models to create high-quality semantic representations.

Btrfs: Modern Copy-on-Write Filesystem/concepts/linux/btrfs-filesystem

Learn Btrfs with built-in snapshots, RAID, and compression. Explore copy-on-write, subvolumes, and self-healing on Linux.

Filesystem Data Integrity: Detecting Silent Corruption/concepts/linux/filesystem-integrity

Understand how modern filesystems use checksums to detect silent data corruption that traditional filesystems miss entirely.

__slots__ Optimization/concepts/python/slots-optimization

Master Python __slots__ for 40-50% memory reduction and faster attribute access. Learn CPython descriptor protocol, inheritance patterns, and best practices.

Cross-Lingual Alignment/concepts/embeddings/cross-lingual-alignment

Learn cross-lingual embedding alignment techniques like VecMap and MUSE for multilingual vector retrieval and zero-shot language transfer in search systems.

NVIDIA Device Files in /dev//concepts/gpu/nvidia-device-files

Understanding character devices, major/minor numbers, and the device file hierarchy created by NVIDIA drivers for GPU access in Linux.

ZFS: The Ultimate Filesystem/concepts/linux/zfs-filesystem

Master ZFS filesystem with pooled storage, RAID-Z, snapshots, and checksums. Learn enterprise-grade data integrity on Linux.

Green Threads vs OS Threads: Understanding Concurrency Models/concepts/python/green-threads-vs-os-threads

Compare Python green threads vs OS threads. Learn asyncio coroutines, gevent, context switching costs, and when to use each concurrency model.

Domain Adaptation/concepts/embeddings/domain-adaptation

Domain adaptation for embeddings: transfer learning to fine-tune retrieval models across domains while preventing catastrophic forgetting.

XFS: High-Performance Parallel Filesystem/concepts/linux/xfs-filesystem

XFS filesystem internals: allocation groups, extent-based allocation, and delayed allocation for high-performance parallel I/O.

Python asyncio: Mastering Asynchronous Programming/concepts/python/asyncio-event-loop

Deep dive into Python's asyncio library, understanding event loops, coroutines, tasks, and async/await patterns with interactive visualizations.

Binary Embeddings/concepts/embeddings/binary-embeddings

Learn how binary embeddings use 1-bit quantization for ultra-compact vector representations, enabling billion-scale similarity search with 32x memory reduction.

FAT32 & exFAT: Universal Filesystems/concepts/linux/fat-filesystems

Learn FAT32 and exFAT filesystems for cross-platform USB drives and SD cards. Understand file size limits and compatibility.

Python Shared Memory/concepts/python/shared-memory

Master Python multiprocessing.shared_memory for zero-copy IPC. Learn synchronization, NumPy integration, and race condition prevention patterns.

Hybrid Retrieval Systems/concepts/embeddings/hybrid-retrieval-systems

Build hybrid retrieval systems combining BM25 sparse search with dense vector embeddings using reciprocal rank fusion for superior semantic search performance.

RAID: Redundant Arrays for Speed and Safety/concepts/linux/raid-storage

RAID storage visualized: RAID 0, 1, 5, 6, and 10 levels explained. Learn how they work, when to use them, and disk failure recovery.

Memory Controllers: The Brain Behind RAM Management/concepts/memory/memory-controllers

Learn how memory controllers manage CPU-RAM data flow. Interactive demos of channels, ranks, banks, and command scheduling for optimal bandwidth.

BM25 Algorithm/concepts/embeddings/bm25-algorithm

Master the BM25 algorithm, the probabilistic ranking function powering Elasticsearch and Lucene for keyword-based document retrieval and search systems.

Linux Process Management: Fork, Exec, and Beyond/concepts/linux/process-management

Master Linux process management through interactive visualizations. Understand process lifecycle, fork/exec operations, zombies, orphans, and CPU scheduling.

Distributed Parallelism in Deep Learning/concepts/gpu/distributed-parallelism

GPU distributed parallelism: Data Parallel (DDP), Tensor Parallel, Pipeline Parallel, and ZeRO optimization for training large AI models.

Linux Memory Management: Virtual Memory, Paging, and Beyond/concepts/linux/memory-management

Explore Linux memory management through interactive visualizations. Understand virtual memory, page tables, TLB, swapping, and memory allocation.

Linux System Calls: The User-Kernel Interface/concepts/linux/system-calls

Linux system calls visualized: how user programs communicate with the kernel, protection rings, context switching, and syscall performance.

Linux Networking Stack: From Packets to Applications/concepts/linux/networking-stack

Master the Linux networking stack through interactive visualizations. Understand TCP/IP layers, sockets, iptables, routing, and network namespaces.

Linux Boot Process: From Power-On to Login/concepts/linux/boot-process

Visualize the complete Linux boot sequence from BIOS/UEFI to login. Learn how GRUB, kernel, and systemd work together with interactive visualizations.

Linux Init Systems: From SysV to systemd/concepts/linux/init-systems

Compare Linux init systems through interactive visualizations. Understand the evolution from SysV Init to systemd, service management, and boot orchestration.

Linux Kernel Modules: Extending the Kernel at Runtime/concepts/linux/kernel-modules

Master Linux kernel modules through interactive visualizations. Learn how to load, unload, develop, and debug kernel modules that extend Linux functionality.

Linux Namespaces: The Foundation of Container Isolation/concepts/linux/namespaces

Master Linux namespaces for container isolation. Learn PID, network, mount, and user namespaces with interactive demos.

Wayland vs X11: Modern Display Server Architecture/concepts/linux/wayland-x11

Compare Wayland vs X11 display servers on Linux. Learn about architecture, performance, security, and modern graphics stack.

Linux cgroups: Resource Limits for Processes/concepts/linux/cgroups

Master cgroups to limit CPU, memory, and I/O for process groups. Understand cgroups v1 vs v2, the hierarchical structure, and how containers use them.

Containers Under the Hood: From Primitives to Docker/concepts/linux/containers

Discover how containers work by combining namespaces, cgroups, and OverlayFS. Build a mental model of Docker internals through interactive visualizations.

Understanding nvidia-modeset: Kernel Mode-Setting for NVIDIA GPUs/concepts/linux/nvidia-modeset

Learn nvidia-modeset for display configuration on Linux. Understand kernel mode-setting, DRM integration, and GPU drivers.

CUDA Multi-Process Service (MPS)/concepts/gpu/cuda-mps

Learn CUDA Multi-Process Service (MPS) for GPU sharing. Enable concurrent kernel execution from multiple processes and maximize GPU utilization.

Understanding TCP/IP Protocol Stack/concepts/networking/tcp-ip

Explore the TCP/IP protocol stack, packet encapsulation, and how data travels through network layers from application to physical transmission.

Flynn's Classification: Taxonomy of Computer Architectures/concepts/computer-architecture/flynns-classification

Explore Flynn's Classification of computer architectures through interactive visualizations of SISD, SIMD, MISD, and MIMD systems.

CPU Pipelines & Branch Prediction: Modern Processor Architecture/concepts/computer-architecture/cpu-pipelines

Explore CPU pipeline stages, instruction-level parallelism, pipeline hazards, and branch prediction through interactive visualizations.

Hazard Detection: Pipeline Dependencies and Solutions/concepts/computer-architecture/hazard-detection

Master pipeline hazards through interactive visualizations of data dependencies, control hazards, structural conflicts, and advanced detection mechanisms.

Thread Safety: Concurrent Programming Fundamentals/concepts/cpp/thread-safety

Master thread safety concepts through interactive visualizations of race conditions, mutexes, atomic operations, and deadlock scenarios.

Convolution Operation: The Foundation of CNNs/concepts/deep-learning/convolution-operation

Interactive guide to convolution in CNNs: visualize sliding windows, kernels, stride, padding, and feature detection with step-by-step demos.

Cross-Entropy Loss/concepts/deep-learning/cross-entropy-loss

Understand cross-entropy loss for classification: interactive demos of binary and multi-class CE, the -log(p) curve, softmax gradients, and focal loss.

Dilated Convolutions: Expanding Receptive Fields Efficiently/concepts/deep-learning/dilated-convolutions

Understand dilated (atrous) convolutions: how dilation rates expand receptive fields exponentially without extra parameters and how to avoid gridding artifacts.

Feature Pyramid Networks/concepts/deep-learning/feature-pyramid-networks

Learn how Feature Pyramid Networks build multi-scale feature representations through top-down pathways and lateral connections for robust object detection.

Receptive Field/concepts/deep-learning/receptive-field

Understand receptive fields in CNNs — how convolutional layers expand their field of view, the gap between theoretical and effective receptive fields, and strategies for controlling RF growth.

VAE Latent Space: Understanding Variational Autoencoders/concepts/deep-learning/vae-latent-space

Explore VAE latent space in deep learning. Learn variational autoencoder encoding, decoding, interpolation, and the reparameterization trick.

Virtual Memory & TLB: Complete Guide to Address Translation/concepts/memory/virtual-memory

Master virtual memory and TLB address translation with interactive demos. Learn page tables, page faults, and memory management optimization.

CPU Cache Lines: The Unit of Memory Transfer/concepts/memory/cpu-cache-lines

Learn how CPU cache lines transfer data between memory and cache. Understand spatial locality and optimize memory access patterns for better performance.

Memory Access Patterns: Sequential vs Strided/concepts/memory/memory-access-patterns

Master sequential vs strided memory access patterns. Learn how cache efficiency and hardware prefetching affect application performance.

Memory Interleaving: Parallel Memory Access/concepts/memory/memory-interleaving

Discover how memory interleaving distributes addresses across banks for parallel access. Boost memory bandwidth in DDR5 and GPU systems.

NUMA Architecture: Non-Uniform Memory Access/concepts/memory/numa-architecture

Explore NUMA architecture and memory locality in multi-socket systems. Understand local vs remote memory access latency and optimization strategies.

Understanding NVIDIA Kubernetes GPU Operator/concepts/gpu/kubernetes-operator

Automate NVIDIA GPU management in Kubernetes with the GPU Operator. Deploy drivers, device plugins, and monitoring as DaemonSets.

Understanding CUDA Contexts/concepts/gpu/cuda-context

Explore the concept of CUDA contexts, their role in managing GPU resources, and how they enable parallel execution across multiple CPU threads.

CLS Token in Vision Transformers/concepts/attention/cls-token

Learn how the CLS token acts as a global information aggregator in Vision Transformers, enabling whole-image classification through attention mechanisms.

Hierarchical Attention in Vision Transformers/concepts/attention/hierarchical-attention

Explore how hierarchical attention enables Vision Transformers (ViT) to process sequential data by encoding relative positions.

Multi-Head Attention in Vision Transformers/concepts/attention/multihead-attention

Explore how multi-head attention enables Vision Transformers (ViT) to process sequential data by encoding relative positions.

Positional Embeddings in Vision Transformers/concepts/attention/positional-embeddings-vit

Explore how positional embeddings enable Vision Transformers (ViT) to process sequential data by encoding relative positions.

Interactive Look: Self-Attention in Vision Transformers/concepts/attention/self-attention-vit

Explore how self-attention enables Vision Transformers (ViT) to understand images by capturing global context, with CNN comparison.

Transparent Huge Pages (THP): Reducing TLB Pressure/concepts/memory/transparent-huge-pages

Learn how Transparent Huge Pages (THP) reduces TLB misses by promoting 4KB to 2MB pages. Understand performance benefits and memory bloat tradeoffs.

ALiBi: Attention with Linear Biases/concepts/attention/alibi

Learn ALiBi, the position encoding method that adds linear biases to attention scores for exceptional length extrapolation in transformers.

MHA vs GQA vs MQA: Choosing the Right Attention/concepts/attention/attention-comparison

Compare Multi-Head, Grouped-Query, and Multi-Query Attention mechanisms to understand their trade-offs and choose the optimal approach for your use case.

Attention Sinks: Stable Streaming LLMs/concepts/attention/attention-sinks

Learn about attention sinks, where LLMs concentrate attention on initial tokens, and how preserving them enables streaming inference.

Cross-Attention: Bridging Different Modalities/concepts/attention/cross-attention

Understand cross-attention, the mechanism that enables transformers to align and fuse information from different sources, sequences, or modalities.

Grouped-Query Attention (GQA)/concepts/attention/grouped-query-attention

Learn how Grouped-Query Attention (GQA) balances Multi-Head quality with Multi-Query efficiency for faster LLM inference.

Linear Attention Approximations/concepts/attention/linear-attention-approximations

Explore linear complexity attention mechanisms including Performer, Linformer, and other efficient transformers that scale to very long sequences.

Masked and Causal Attention/concepts/attention/masked-attention

Learn how masked attention enables autoregressive generation and prevents information leakage in transformers and language models.

Multi-Query Attention (MQA)/concepts/attention/multi-query-attention

Learn Multi-Query Attention (MQA), the optimization that shares keys and values across attention heads for massive memory savings.

Rotary Position Embeddings (RoPE)/concepts/attention/rotary-position-embeddings

Learn Rotary Position Embeddings (RoPE), the elegant position encoding using rotation matrices, powering LLaMA, Mistral, and modern LLMs.

Scaled Dot-Product Attention/concepts/attention/scaled-dot-product

Master scaled dot-product attention, the fundamental transformer building block. Learn why scaling is crucial for stable training.

Sliding Window Attention/concepts/attention/sliding-window-attention

Sliding Window Attention for long sequences: local context windows enable O(n) complexity, used in Mistral and Longformer models.

Sparse Attention Patterns/concepts/attention/sparse-attention-patterns

Explore sparse attention mechanisms that reduce quadratic complexity to linear or sub-quadratic, enabling efficient processing of long sequences.

SoA vs AoS: Data Layout Optimization/concepts/computer-architecture/soa-vs-aos

Master Structure of Arrays (SoA) vs Array of Structures (AoS) data layouts for optimal cache efficiency, SIMD vectorization, and GPU memory coalescing.

Contrastive Loss/concepts/deep-learning/contrastive-loss

Understand contrastive loss for representation learning: interactive demos of InfoNCE, triplet loss, and embedding space clustering with temperature tuning.

Dropout Regularization/concepts/deep-learning/dropout

Understand dropout regularization: how randomly silencing neurons prevents overfitting, the inverted dropout trick, and when to use each dropout variant.

Focal Loss: Focusing on Hard Examples/concepts/deep-learning/focal-loss

Learn focal loss for deep learning: down-weight easy examples, focus on hard ones. Interactive demos of gamma, alpha balancing, and RetinaNet.

He/Kaiming Initialization/concepts/deep-learning/he-initialization

Learn He (Kaiming) initialization for ReLU neural networks: understand why ReLU needs special weight initialization, visualize variance flow, and see dead neurons in action.

KL Divergence/concepts/deep-learning/kl-divergence

Learn KL divergence for machine learning: measure distribution differences in VAEs, knowledge distillation, and variational inference with interactive visualizations.

MSE and MAE Loss Functions/concepts/deep-learning/mse-mae

Interactive guide to MSE vs MAE for regression: explore outlier sensitivity, gradient behavior, and Huber loss with visualizations.

Xavier/Glorot Initialization/concepts/deep-learning/xavier-initialization

Learn Xavier (Glorot) initialization: how it balances forward signals and backward gradients to enable stable deep network training with tanh and sigmoid.

Understanding NVIDIA Persistence Daemon/concepts/gpu/nvidia-persistence-daemon

Eliminating GPU initialization latency through nvidia-persistenced - a userspace daemon that maintains GPU driver state for optimal startup performance.

ANN Algorithms Comparison/concepts/embeddings/ann-comparison

Compare all approximate nearest neighbor algorithms side-by-side: HNSW, IVF-PQ, LSH, Annoy, and ScaNN. Find the best approach for your use case.

HNSW: Hierarchical Navigable Small World/concepts/embeddings/hnsw-search

Interactive visualization of HNSW - the graph-based algorithm that powers modern vector search with logarithmic complexity.

Vector Index Structures/concepts/embeddings/index-structures

Explore the fundamental data structures powering vector databases: trees, graphs, hash tables, and hybrid approaches for efficient similarity search.

IVF-PQ: Inverted File with Product Quantization/concepts/embeddings/ivf-pq

Learn how IVF-PQ combines clustering and compression to enable billion-scale vector search with minimal memory footprint.

LSH: Locality Sensitive Hashing/concepts/embeddings/lsh-search

Explore how LSH uses probabilistic hash functions to find similar vectors in sub-linear time, perfect for streaming and high-dimensional data.

Vector Quantization Techniques/concepts/embeddings/vector-quantization

Master vector compression techniques from scalar to product quantization. Learn how to reduce memory usage by 10-100× while preserving search quality.

Long Polling: The Patient Connection/concepts/networking/long-polling

Learn HTTP long polling - a server-side technique that holds connections open until data arrives. Achieve near real-time updates with standard protocols.

Short Polling: The Impatient Client Pattern/concepts/networking/short-polling

Learn short polling in networking - a simple HTTP pattern for periodic data fetching. See why 70-90% of requests waste bandwidth and when to use alternatives.

WebSockets: Real-Time Bidirectional Communication/concepts/networking/websocket

Master WebSocket protocol for real-time bidirectional communication over TCP. Learn handshakes, frames, and building low-latency web applications.

Adaptive Tiling: Efficient Visual Token Generation/concepts/deep-learning/adaptive-tiling

Learn adaptive tiling in vision transformers: dynamically partition images based on visual complexity to reduce token counts by up to 80% while preserving detail where it matters.

Emergent Abilities in Large Language Models/concepts/deep-learning/emergent-abilities

Explore emergent abilities in large language models: sudden capabilities that appear at scale thresholds, phase transitions, and the mirage debate, with interactive visualizations.

Prompt Engineering/concepts/deep-learning/prompt-engineering

Master prompt engineering for large language models: from basic composition to Chain-of-Thought, few-shot, and advanced techniques with interactive visualizations.

Prompt Influence Flow: How Instructions Propagate Through Model Layers/concepts/deep-learning/prompt-influence-flow

Deep dive into how different prompt components influence model behavior across transformer layers, from surface patterns to abstract reasoning.

Neural Scaling Laws/concepts/deep-learning/scaling-laws

Explore neural scaling laws in deep learning: power law relationships between model size, data, and compute that predict AI performance, with interactive visualizations.

Visual Complexity Analysis: Smart Image Processing/concepts/deep-learning/visual-complexity-analysis

Learn visual complexity analysis in deep learning - how neural networks measure entropy, edges, and saliency for adaptive image processing.

Cross-Encoder vs Bi-Encoder/concepts/embeddings/cross-encoder-vs-bi-encoder

Understand the fundamental differences between independent and joint encoding architectures for neural retrieval systems.

Dense Embeddings Space Explorer/concepts/embeddings/dense-embeddings

Interactive visualization of high-dimensional vector spaces, word relationships, and semantic arithmetic operations.

Matryoshka Embeddings/concepts/embeddings/matryoshka-embeddings

Matryoshka embeddings: nested representations enabling dimension reduction by simple truncation without model retraining for flexible retrieval.

Multi-Vector Late Interaction/concepts/embeddings/multi-vector-late-interaction

Explore ColBERT and other multi-vector retrieval models that use fine-grained token-level matching for superior search quality.

Quantization Effects Simulator/concepts/embeddings/quantization-effects

Embedding quantization simulator: explore memory-accuracy trade-offs from float32 to int8 and binary representations for retrieval.

Sparse vs Dense Embeddings/concepts/embeddings/sparse-vs-dense

Compare lexical (BM25/TF-IDF) and semantic (BERT) retrieval approaches, understanding their trade-offs and hybrid strategies.

Context Windows: The Memory Limits of LLMs/concepts/llms/context-windows

Interactive visualization of LLM context windows - sliding windows, expanding contexts, and attention patterns that define model memory limits.

Flash Attention: IO-Aware Exact Attention/concepts/llms/flash-attention

Interactive Flash Attention visualization - the IO-aware algorithm achieving memory-efficient exact attention through tiling and kernel fusion.

KV Cache: The Secret to Fast LLM Inference/concepts/llms/kv-cache

Interactive KV cache visualization - how key-value caching in LLM transformers enables fast text generation without quadratic recomputation.

Tokenization: Converting Text to Numbers/concepts/llms/tokenization

Interactive exploration of tokenization methods in LLMs - BPE, SentencePiece, and WordPiece. Understand how text becomes tokens that models can process.

The Vision-Language Alignment Problem/concepts/multimodal/alignment-problem

How vision-language models align visual and text representations using contrastive learning, cross-modal attention, and CLIP-style training.

The Modality Gap/concepts/multimodal/modality-gap

The modality gap in CLIP and vision-language models: why image and text embeddings occupy separate regions despite contrastive training.

Multimodal Scaling Laws/concepts/multimodal/scaling-laws

Discover how multimodal vision-language models like CLIP, ALIGN, and LLaVA scale with data, parameters, and compute following Chinchilla-style power laws.

Vision-Language Adapters: Parameter-Efficient Multimodal Fine-tuning/concepts/multimodal/vision-language-adapters

Master LoRA, bottleneck adapters, and prefix tuning for parameter-efficient fine-tuning of vision-language models like LLaVA with minimal compute and memory.

Client-Server Communication: Polling vs WebSockets/concepts/networking/client-server-communication

Learn client-server communication patterns including short polling, long polling, and WebSockets. Compare HTTP protocols for real-time web applications.

C++ AST & Parsing/concepts/cpp/ast-parsing

Explore how C++ code is parsed into an Abstract Syntax Tree (AST). Learn lexical analysis, tokenization, and syntax parsing for systems programming.

C++ Compilation Overview/concepts/cpp/compilation

Understand the complete C++ compilation pipeline from source code to object files. Learn preprocessing, parsing, code generation, and optimization stages.

C++ Dynamic Linking/concepts/cpp/dynamic-linking

Master C++ dynamic linking and runtime library loading. Learn shared libraries, position-independent code, dlopen, and systems-level library management.

C++ Linking Overview/concepts/cpp/linking

How C++ object files are linked into executables. Learn symbol resolution, static vs dynamic linking, and linker optimization.

C++ Program Loading/concepts/cpp/loading

Understand how C++ programs are loaded and executed by the operating system. Learn ELF format, process creation, memory mapping, and runtime initialization.

Memory Management & RAII in C++/concepts/cpp/memory-raii

Learn Resource Acquisition Is Initialization (RAII) - the cornerstone of C++ memory management. Understand automatic resource cleanup and exception safety.

Modern C++ Features (C++11 and Beyond)/concepts/cpp/modern-cpp-features

Explore modern C++ features including auto, lambdas, ranges, and coroutines. Learn how C++11/14/17/20 transformed the language.

Object-Oriented Programming in C++/concepts/cpp/oop-inheritance

Master C++ OOP concepts including inheritance, polymorphism, virtual functions, and modern object-oriented design principles with interactive examples.

C++ Compiler Optimization/concepts/cpp/optimization

C++ compiler optimization: loop unrolling, inlining, dead code elimination. Learn GCC and Clang optimization flags and techniques.

Pointers & References in C++/concepts/cpp/pointers-references

Master C++ pointers and references through interactive visualizations. Learn memory addressing, dereferencing, smart pointers, and avoid common pitfalls.

C++ Preprocessor/concepts/cpp/preprocessor

C++ preprocessor visualized: macros, header guards, conditional compilation, and #include directives explained interactively.

Smart Pointers in Modern C++/concepts/cpp/smart-pointers

Master C++11 smart pointers through interactive examples. Learn unique_ptr, shared_ptr, and weak_ptr with reference counting visualizations.

C++ Stack vs Heap/concepts/cpp/stack-heap

C++ stack vs heap memory allocation visualized. Learn LIFO stack frames, dynamic heap allocation, and memory management patterns.

C++ Symbol Resolution/concepts/cpp/symbol-resolution

C++ symbol resolution explained: how linkers fix undefined references, name mangling, weak vs strong symbols, and common linking errors.

Templates & STL in C++/concepts/cpp/templates-stl

Master C++ templates and the Standard Template Library. Learn generic programming, template metaprogramming, and STL containers and algorithms.

Gradient Flow in Deep Networks/concepts/deep-learning/gradient-flow

Learn how gradients propagate through deep neural networks during backpropagation. Understand vanishing and exploding gradient problems with interactive visualizations.

NCCL: High-Performance Multi-GPU Communication/concepts/gpu/nccl-communication

Master NVIDIA NCCL for multi-GPU deep learning. Learn AllReduce, ring algorithms, and GPU-Direct communication for efficient distributed training on CUDA.

DataParallel vs DistributedDataParallel/concepts/pytorch/data-parallel

Compare PyTorch DataParallel vs DistributedDataParallel for multi-GPU training. Learn GIL limitations, NCCL AllReduce, and DDP best practices.

PyTorch DataLoader Pipeline/concepts/pytorch/dataloader-pipeline

Understanding how PyTorch DataLoader moves data from disk through CPU to GPU, including Dataset, Sampler, Workers, and Collate components.

Understanding num_workers/concepts/pytorch/num-workers

Deep dive into PyTorch DataLoader num_workers parameter: how parallel workers prefetch data, optimal configuration, and common pitfalls.

Pinned Memory and DMA Transfers/concepts/pytorch/pin-memory

Understanding PyTorch pin_memory for faster CPU to GPU data transfers using DMA (Direct Memory Access) and page-locked memory.

ASFF: Adaptive Spatial Feature Fusion/concepts/computer-vision/asff

Learning where to fuse multi-scale features with per-pixel, per-level fusion weights. ASFF challenges FPN's uniform fusion assumption.

RoI Pooling, RoI Align & Deformable RoI Pooling/concepts/computer-vision/roi-pooling

Understanding region-based feature extraction for object detection, from quantized pooling to sub-pixel alignment and adaptive sampling

Anchor-Based vs Anchor-Free Object Detection/concepts/computer-vision/anchor-based-vs-anchor-free

Compare anchor-based vs anchor-free object detection: Faster R-CNN and RetinaNet anchors vs FCOS and CenterNet point-based methods.

NAS-FPN: Learning to Design Feature Pyramid Networks/concepts/computer-vision/nas-fpn

Understanding how neural architecture search discovers optimal feature pyramid architectures that outperform hand-designed alternatives

Modern Object Detection: DETR and Transformer-Based Approaches/concepts/computer-vision/modern-object-detection

Understanding end-to-end object detection with transformers, from DETR's object queries to bipartite matching and attention-based localization

NMS & Soft-NMS: Removing Duplicate Detections/concepts/computer-vision/nms-soft-nms

Understanding Non-Maximum Suppression algorithms for object detection post-processing, from greedy NMS to soft variants

NAdam: Nesterov-Accelerated Adam/concepts/deep-learning/nadam

Understand the NAdam optimizer that fuses Adam adaptive learning rates with Nesterov look-ahead momentum for faster, smoother convergence in deep learning.

Visual Complexity Analysis for Token Allocation/concepts/computer-vision/visual-complexity-analysis

Learn how visual complexity analysis optimizes vision transformer token allocation using edge detection, FFT, and entropy metrics.

Tensor Cores: Accelerating Deep Learning/concepts/gpu/tensor-cores

NVIDIA Tensor Cores explained: mixed-precision matrix operations delivering 10x speedups for AI training and inference on CUDA GPUs.

Layer Normalization/concepts/deep-learning/layer-normalization

Learn layer normalization for transformers and sequence models: how normalizing across features enables batch-independent training with interactive visualizations.

Internal Covariate Shift/concepts/deep-learning/internal-covariate-shift

Understand internal covariate shift in deep learning: why layer input distributions change during training, how it slows convergence, and how batch normalization fixes it.

Batch Normalization/concepts/deep-learning/batch-normalization

Learn batch normalization in deep learning: how normalizing layer inputs accelerates training, improves gradient flow, and acts as regularization with interactive visualizations.

Skip Connections/concepts/deep-learning/skip-connections

Learn how skip connections and residual learning enable training of very deep neural networks. Understand the ResNet revolution with interactive visualizations.

CPU Performance & Optimization/concepts/computer-architecture/cpu-optimization

CPU performance optimization: memory hierarchy, cache blocking, SIMD vectorization, and profiling tools for modern processors.

C++ Virtual Tables & Inheritance/concepts/cpp/virtual-tables-inheritance

C++ virtual tables (vtables) explained. Learn virtual dispatch, single/multiple inheritance, RTTI, and object memory layout visually.

Graph Attention Networks (GAT)/concepts/graph/graph-attention-networks

Adaptive attention-based aggregation for graph neural networks - multi-head attention, learned weights, and interpretable graph learning

Graph Centrality & Metrics/concepts/graph/graph-centrality

Understanding node importance through centrality measures, shortest paths, hop distances, clustering coefficients, and fundamental graph metrics

Graph Convolutional Networks (GCN)/concepts/graph/graph-convolutional-networks

Learn Graph Convolutional Networks (GCN) with spectral theory, message passing, and node classification for geometric deep learning.

Graph Embeddings/concepts/graph/graph-embeddings

Learning low-dimensional vector representations of graphs through random walks, DeepWalk, Node2Vec, and skip-gram models

Graph Pooling Methods/concepts/graph/graph-pooling

Hierarchical graph coarsening techniques - TopK, SAGPool, DiffPool, and readout operations for graph-level representations

Mixture of Experts (MoE)/concepts/llms/mixture-of-experts

Understanding sparse mixture of experts models - architecture, routing mechanisms, load balancing, and efficient scaling strategies for large language models

GPU Streaming Multiprocessor (SM)/concepts/gpu/shared-multiprocessor

Deep dive into the fundamental processing unit of modern GPUs - the Streaming Multiprocessor architecture, execution model, and memory hierarchy

Uses/uses

Tools, software, and hardware I use

Resume/resume

My professional experience and qualifications

Bookmarks/bookmarks

A curated collection of articles and resources I find valuable

Consulting/consulting

Services and consulting offerings

Thank You/thank-you

Confirmation page after form submissions

Sitemap/sitemap

Visual representation of the site structure