Skip to main content

Dense Embeddings Space Explorer

Interactive visualization of high-dimensional vector spaces, word relationships, and semantic arithmetic operations.

Dense Embeddings Space Explorer

Dense embeddings revolutionized NLP by representing words and sentences as continuous vectors in high-dimensional space, where semantic similarity corresponds to geometric proximity.

How Text Becomes Vectors

Watch how text transforms into high-dimensional vectors through the embedding process:

Interactive 3D Embedding Space

What Are Dense Embeddings?

Dense embeddings are continuous vector representations where:

  • Every dimension has a value (unlike sparse representations)
  • Semantic similarity = geometric proximity
  • Vector arithmetic captures relationships
  • Typically 50-1000 dimensions

Key Concepts

1. Word Embeddings Evolution

The progression of embedding techniques:

ModelYearKey InnovationDimensions
Word2Vec2013Skip-gram/CBOW50-300
GloVe2014Global matrix factorization50-300
FastText2016Subword information100-300
BERT2018Contextual embeddings768
GPT-32020Scale + few-shot12,288

2. Training Objectives

Different models use different objectives:

Word2Vec Skip-gram:

J(θ) = -1TΣt=1TΣ-c ≤ j ≤ c, j ≠ 0 log p(wt+j | wt)

GloVe:

J = Σi,j=1V f(Xij)(wiT \tilde{w}j + bi + \tilde{b}j - log Xij)2

3. Cosine Similarity

The standard metric for comparing embeddings:

\text{similarity}(u, v) = u · v‖u‖ · ‖v‖ = Σi=1n ui vi√(Σi=1n ui2) · √(Σi=1n vi2)

Vector Arithmetic

The Famous Analogy

The most celebrated property of word embeddings:

king - man + woman ≈ queen

This works because embeddings encode relationships:

  • king - man = royalty vector
  • Adding woman applies royalty to female
  • Result closest to queen

More Examples

# Relationships captured by arithmetic paris - france + italy ≈ rome bigger - big + small ≈ smaller walking - walk + swim ≈ swimming

Implementation Details

Creating Word Embeddings

import numpy as np from gensim.models import Word2Vec # Train Word2Vec sentences = [["cat", "sat", "mat"], ["dog", "stood", "rug"]] model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=1) # Skip-gram # Get embeddings cat_vector = model.wv['cat'] dog_vector = model.wv['dog'] # Compute similarity similarity = model.wv.similarity('cat', 'dog')

Finding Nearest Neighbors

def find_nearest(embedding, embeddings, k=5): """Find k nearest neighbors using cosine similarity""" similarities = [] for word, vec in embeddings.items(): sim = cosine_similarity(embedding, vec) similarities.append((word, sim)) # Sort by similarity similarities.sort(key=lambda x: x[1], reverse=True) return similarities[:k]

Sentence Embeddings

Moving from words to sentences:

Average Pooling

Simple but effective:

sentence_emb = np.mean([word_emb for word in sentence], axis=0)

Weighted Average

Using TF-IDF or importance weights:

weights = compute_tfidf(sentence) sentence_emb = np.average(word_embs, weights=weights, axis=0)

Sentence-BERT

Specialized models for sentence embeddings:

from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') embeddings = model.encode(sentences)

Applications

# Index documents doc_embeddings = model.encode(documents) # Search query_embedding = model.encode(query) similarities = cosine_similarity(query_embedding, doc_embeddings) top_k = np.argsort(similarities)[-k:]

2. Clustering

from sklearn.cluster import KMeans # Cluster embeddings kmeans = KMeans(n_clusters=10) clusters = kmeans.fit_predict(embeddings)

3. Classification

# Use embeddings as features X = np.array([get_embedding(text) for text in texts]) classifier = LogisticRegression() classifier.fit(X, labels)

Visualization Techniques

t-SNE Projection

Reduce dimensions for visualization:

from sklearn.manifold import TSNE tsne = TSNE(n_components=2, perplexity=30) embeddings_2d = tsne.fit_transform(embeddings)

UMAP

Faster alternative to t-SNE:

import umap reducer = umap.UMAP(n_components=2) embeddings_2d = reducer.fit_transform(embeddings)

Common Pitfalls

1. Bias in Embeddings

Word embeddings can encode societal biases:

# Problematic associations doctor - man + woman ≈ nurse # Gender bias programmer - man + woman ≈ homemaker # Occupation bias

2. Out-of-Vocabulary Words

Handling unknown words:

  • Use subword tokenization (FastText)
  • Fall back to character embeddings
  • Use contextual models (BERT)

3. Polysemy

Single vector per word loses context:

  • "bank" (financial) vs "bank" (river)
  • Solution: Contextual embeddings (BERT, GPT)

Performance Considerations

Memory Usage

  • Word2Vec: ~1GB for 1M words × 300 dims
  • BERT: ~400MB model + dynamic computation
  • Storage: Use float16 or quantization

Speed Optimization

# Batch operations similarities = np.dot(query_embs, doc_embs.T) # Approximate nearest neighbor from annoy import AnnoyIndex index = AnnoyIndex(embedding_dim, 'angular') for i, vec in enumerate(embeddings): index.add_item(i, vec) index.build(10) # 10 trees

Modern Developments

1. Contextual Embeddings

BERT and GPT models provide context-dependent embeddings:

from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') model = AutoModel.from_pretrained('bert-base-uncased') # Different embeddings for same word in different contexts inputs1 = tokenizer("The bank is closed", return_tensors="pt") inputs2 = tokenizer("The river bank is muddy", return_tensors="pt")

2. Multilingual Embeddings

Cross-lingual understanding:

  • mBERT: 104 languages
  • XLM-R: 100 languages
  • LaBSE: Language-agnostic sentence embeddings

3. Multimodal Embeddings

Combining text and vision:

  • CLIP: Text-image alignment
  • ALIGN: Noisy data training
  • Flamingo: Few-shot multimodal

Best Practices

  1. Choose the right model:

    • Static embeddings for speed
    • Contextual for accuracy
    • Domain-specific when available
  2. Normalize embeddings:

    normalized = embedding / np.linalg.norm(embedding)
  3. Use appropriate similarity metrics:

    • Cosine for normalized vectors
    • Euclidean for positional relationships
    • Dot product for efficiency
  4. Consider fine-tuning:

    • Domain adaptation improves performance
    • Contrastive learning for specific tasks

References

  • Mikolov et al. "Efficient Estimation of Word Representations in Vector Space"
  • Pennington et al. "GloVe: Global Vectors for Word Representation"
  • Devlin et al. "BERT: Pre-training of Deep Bidirectional Transformers"
  • Reimers & Gurevych "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks"

If you found this explanation helpful, consider sharing it with others.

Mastodon