Quantization Effects Simulator
Embedding quantization simulator: explore memory-accuracy trade-offs from float32 to int8 and binary representations for retrieval.
Clear explanations of core machine learning concepts, from foundational ideas to advanced techniques. Understand attention mechanisms, transformers, skip connections, and more.
Embedding quantization simulator: explore memory-accuracy trade-offs from float32 to int8 and binary representations for retrieval.
Compare lexical (BM25/TF-IDF) and semantic (BERT) retrieval approaches, understanding their trade-offs and hybrid strategies.
Interactive visualization of LLM context windows - sliding windows, expanding contexts, and attention patterns that define model memory limits.
Interactive Flash Attention visualization - the IO-aware algorithm achieving memory-efficient exact attention through tiling and kernel fusion.
Interactive KV cache visualization - how key-value caching in LLM transformers enables fast text generation without quadratic recomputation.
Interactive exploration of tokenization methods in LLMs - BPE, SentencePiece, and WordPiece. Understand how text becomes tokens that models can process.