Pooling Strategies
How a transformer’s per-token outputs become one embedding: CLS, mean, max, last-token, and attention pooling — what each does and when to use it.
5 min readConcept
Explore machine learning concepts related to mean-pooling. Clear explanations and practical insights.
How a transformer’s per-token outputs become one embedding: CLS, mean, max, last-token, and attention pooling — what each does and when to use it.