Embeddings & Retrieval

Dense and sparse embeddings, quantization, and vector search for semantic retrieval.

19 concepts

All Embeddings & Retrieval Concepts

January 21, 2025

Dense Embeddings

How dense embeddings turn meaning into geometry: word2vec, GloVe, and contextual models, vector arithmetic, cosine similarity, and where the field is heading.

embeddings word2vec glove bert semantic-search vector-space

No direct links0 refs

January 21, 2025

Sparse vs Dense vs Hybrid Retrieval: BM25, BERT, and Reranking Compared

How sparse retrieval (BM25/TF-IDF), dense retrieval (BERT-style embeddings), and hybrid systems that combine both compare on recall, semantic understanding, computational cost, and operational complexity for modern search.

sparse-embeddings dense-embeddings bm25 tfidf bert hybrid-search retrieval

No direct links0 refs

August 16, 2024

BM25 Algorithm for Text Retrieval

Master the BM25 algorithm, the probabilistic ranking function powering Elasticsearch and Lucene for keyword-based document retrieval and search systems.

bm25 retrieval ranking sparse-retrieval tf-idf search

No direct links0 refs

June 3, 2026

Pooling Strategies

How a transformer’s per-token outputs become one embedding: CLS, mean, max, last-token, and attention pooling — what each does and when to use it.

embeddings pooling sentence-embeddings mean-pooling cls-token

No direct links0 refs

June 2, 2026

Contrastive Learning

Master contrastive learning for vector embeddings: how InfoNCE loss and self-supervised techniques train models to create high-quality semantic representations.

contrastive-learning self-supervised representation-learning infonce simclr

No direct links0 refs

January 21, 2025

Matryoshka Embeddings

Matryoshka embeddings: nested representations enabling dimension reduction by simple truncation without model retraining for flexible retrieval.

matryoshka embeddings dimension-reduction multi-scale efficient-retrieval

No direct links0 refs

August 16, 2024

Domain Adaptation for Embeddings

Domain adaptation for embeddings: transfer learning to fine-tune retrieval models across domains while preventing catastrophic forgetting.

domain-adaptation transfer-learning fine-tuning distribution-shift

No direct links0 refs

August 16, 2024

Cross-Lingual Alignment

Learn cross-lingual embedding alignment techniques like VecMap and MUSE for multilingual vector retrieval and zero-shot language transfer in search systems.

cross-lingual multilingual alignment translation vecmap

No direct links0 refs

January 21, 2025

Cross-Encoder vs Bi-Encoder

Understand the fundamental differences between independent and joint encoding architectures for neural retrieval systems.

cross-encoder bi-encoder retrieval reranking neural-search transformers

No direct links0 refs

January 21, 2025

Multi-Vector Late Interaction

Explore ColBERT and other multi-vector retrieval models that use fine-grained token-level matching for superior search quality.

colbert retrieval multi-vector late-interaction dense-retrieval search

No direct links0 refs

August 16, 2024

Hybrid Retrieval Systems

Build hybrid retrieval systems combining BM25 sparse search with dense vector embeddings using reciprocal rank fusion for superior semantic search performance.

hybrid-retrieval fusion sparse-dense search ranking

No direct links0 refs

January 21, 2025

Quantization Effects Simulator

Embedding quantization simulator: explore memory-accuracy trade-offs from float32 to int8 and binary representations for retrieval.

quantization embeddings compression int8 binary optimization

No direct links0 refs

January 23, 2025

Vector Quantization Techniques

Master vector compression techniques from scalar to product quantization. Learn how to reduce memory usage by 10-100× while preserving search quality.

embeddings quantization compression pq scalar-quantization optimization

No direct links0 refs

August 16, 2024

Binary Embeddings for Fast Search

Learn how binary embeddings use 1-bit quantization for ultra-compact vector representations, enabling billion-scale similarity search with 32x memory reduction.

binary-embeddings quantization hashing compression retrieval

No direct links0 refs

January 23, 2025

Vector Index Structures

Explore the fundamental data structures powering vector databases: trees, graphs, hash tables, and hybrid approaches for efficient similarity search.

embeddings index data-structures trees graphs databases

No direct links0 refs

January 23, 2025

HNSW vs IVF-PQ vs LSH: Approximate Nearest Neighbor Algorithms Compared

How HNSW, IVF-PQ, and LSH compare for approximate nearest neighbor (ANN) search — recall, latency, memory, build cost, and update characteristics — with Annoy, ScaNN, and DiskANN included for completeness.

embeddings search ann comparison benchmarks algorithms

No direct links0 refs

January 23, 2025