Tagged with

transformers

Explore machine learning concepts related to transformers. Clear explanations and practical insights.

Concepts Found

Concepts Related to transformers

January 31, 2025

MHA vs GQA vs MQA: Choosing the Right Attention

Compare Multi-Head, Grouped-Query, and Multi-Query Attention mechanisms to understand their trade-offs and choose the optimal approach for your use case.

deep-learning attention transformers optimization

9 min readConcept

January 31, 2025

ALiBi: Attention with Linear Biases

Learn ALiBi, the position encoding method that adds linear biases to attention scores for exceptional length extrapolation in transformers.

deep-learning attention transformers position-encoding

19 min readConcept

January 31, 2025

Attention Sinks: Stable Streaming LLMs

Learn about attention sinks, where LLMs concentrate attention on initial tokens, and how preserving them enables streaming inference.

deep-learning attention transformers streaming inference

17 min readConcept

January 31, 2025

Cross-Attention: Bridging Different Modalities

Understand cross-attention, the mechanism that enables transformers to align and fuse information from different sources, sequences, or modalities.

deep-learning attention transformers multimodal

15 min readConcept

January 31, 2025

Grouped-Query Attention (GQA)

Learn how Grouped-Query Attention (GQA) balances Multi-Head quality with Multi-Query efficiency for faster LLM inference.

deep-learning attention transformers optimization

7 min readConcept

January 31, 2025

Linear Attention Approximations

Explore linear complexity attention mechanisms including Performer, Linformer, and other efficient transformers that scale to very long sequences.

deep-learning attention transformers linear-attention optimization

6 min readConcept

January 31, 2025

Masked and Causal Attention

Learn how masked attention enables autoregressive generation and prevents information leakage in transformers and language models.

deep-learning attention transformers language-models

7 min readConcept

January 31, 2025

Multi-Query Attention (MQA)

Learn Multi-Query Attention (MQA), the optimization that shares keys and values across attention heads for massive memory savings.

deep-learning attention transformers optimization

7 min readConcept

January 31, 2025

Rotary Position Embeddings (RoPE)

Learn Rotary Position Embeddings (RoPE), the elegant position encoding using rotation matrices, powering LLaMA, Mistral, and modern LLMs.

deep-learning attention transformers position-encoding

8 min readConcept

January 31, 2025

Scaled Dot-Product Attention

Master scaled dot-product attention, the fundamental transformer building block. Learn why scaling is crucial for stable training.

deep-learning attention transformers fundamentals

6 min readConcept

January 31, 2025

Sliding Window Attention

Sliding Window Attention for long sequences: local context windows enable O(n) complexity, used in Mistral and Longformer models.

deep-learning attention transformers optimization

14 min readConcept

January 31, 2025

Sparse Attention Patterns

Explore sparse attention mechanisms that reduce quadratic complexity to linear or sub-quadratic, enabling efficient processing of long sequences.

deep-learning attention transformers optimization sparse-models

7 min readConcept

January 21, 2025

Prompt Influence Flow: How Instructions Propagate Through Model Layers

Deep dive into how different prompt components influence model behavior across transformer layers, from surface patterns to abstract reasoning.

deep-learning llms prompting attention transformers

6 min readConcept

January 21, 2025

Cross-Encoder vs Bi-Encoder

Understand the fundamental differences between independent and joint encoding architectures for neural retrieval systems.

cross-encoder bi-encoder retrieval reranking neural-search transformers

7 min readConcept

January 21, 2025

Context Windows: The Memory Limits of LLMs

Interactive visualization of LLM context windows - sliding windows, expanding contexts, and attention patterns that define model memory limits.

llms attention memory transformers

6 min readConcept

January 21, 2025

KV Cache: The Secret to Fast LLM Inference

Interactive KV cache visualization - how key-value caching in LLM transformers enables fast text generation without quadratic recomputation.

llms optimization inference transformers

7 min readConcept

January 21, 2025

Tokenization: Converting Text to Numbers

Interactive exploration of tokenization methods in LLMs - BPE, SentencePiece, and WordPiece. Understand how text becomes tokens that models can process.

llms tokenization nlp transformers

5 min readConcept

December 23, 2024

Modern Object Detection: DETR and Transformer-Based Approaches

Understanding end-to-end object detection with transformers, from DETR's object queries to bipartite matching and attention-based localization

Object Detection DETR Transformers Computer Vision Deep Learning Attention Mechanisms

8 min readConcept

April 4, 2024

Layer Normalization

Learn layer normalization for transformers and sequence models: how normalizing across features enables batch-independent training with interactive visualizations.

deep-learning normalization transformers training

7 min readConcept

January 16, 2024

Mixture of Experts (MoE)

Understanding sparse mixture of experts models - architecture, routing mechanisms, load balancing, and efficient scaling strategies for large language models

MoE sparse-models expert-networks routing transformers scaling switch-transformer mixtral

6 min readConcept