Attention Sinks: Stable Streaming LLMs
Learn about attention sinks, where LLMs concentrate attention on initial tokens, and how preserving them enables streaming inference.
17 min readConcept
Explore machine learning concepts related to streaming. Clear explanations and practical insights.
Learn about attention sinks, where LLMs concentrate attention on initial tokens, and how preserving them enables streaming inference.
Explore how LSH uses probabilistic hash functions to find similar vectors in sub-linear time, perfect for streaming and high-dimensional data.