🤖

Large Language Models

Architecture, training, and optimization techniques for large language models and transformers.

Concepts

All Large Language Models Concepts

January 21, 2025

Context Windows: The Memory Limits of LLMs

Interactive visualization of LLM context windows - sliding windows, expanding contexts, and attention patterns that define model memory limits.

llms attention memory transformers

No direct links0 refs

January 21, 2025

Flash Attention: IO-Aware Exact Attention

Interactive Flash Attention visualization - the IO-aware algorithm achieving memory-efficient exact attention through tiling and kernel fusion.

llms optimization attention gpu

No direct links0 refs

January 21, 2025

KV Cache: The Secret to Fast LLM Inference

Interactive KV cache visualization - how key-value caching in LLM transformers enables fast text generation without quadratic recomputation.

llms optimization inference transformers

No direct links0 refs

January 21, 2025

Tokenization: Converting Text to Numbers

Interactive exploration of tokenization methods in LLMs - BPE, SentencePiece, and WordPiece. Understand how text becomes tokens that models can process.

llms tokenization nlp transformers

No direct links0 refs

January 16, 2024

Mixture of Experts (MoE)

Understanding sparse mixture of experts models - architecture, routing mechanisms, load balancing, and efficient scaling strategies for large language models

MoE sparse-models expert-networks routing transformers scaling switch-transformer mixtral

No direct links0 refs