Silhouette Score: Per-Point Clustering Evaluation

How the silhouette score measures clustering quality for every individual point — comparing intra-cluster cohesion to nearest-cluster separation, with per-point diagnostics that work for arbitrary cluster shapes.

Best viewed on desktop for optimal interactive experience

What Silhouette Score Measures

The silhouette score answers a per-point question: is this data point closer to its own cluster or to the nearest neighboring cluster? Unlike the Calinski-Harabasz index, which produces one number for the entire clustering, the silhouette score computes a value for every individual point, revealing exactly where cluster assignments are strong and where they break down. Proposed by Peter Rousseeuw in 1987, it remains one of the most widely used internal clustering evaluation metrics precisely because of this granularity.

Each point's score ranges from −1 to +1. A score near +1 means the point is deep inside its cluster, far from any neighbor. Near 0 means it sits on the boundary between two clusters. Below 0 means the point is likely assigned to the wrong cluster — it is actually closer to a different cluster on average. This per-point granularity is the silhouette's defining advantage.

Mathematical Definition

For a point xi in cluster Ck:

a(i) = 1|Ck| - 1 Σxj ∈ Ck, j ≠ i d(xi, xj)

This is the average distance to all other points in the same cluster — the intra-cluster distance.

b(i) = minCl ≠ Ck 1|Cl| Σxj ∈ Cl d(xi, xj)

This is the average distance to points in the nearest other cluster — the nearest-cluster distance.

The silhouette score for point xi is:

s(i) = b(i) - a(i)max(a(i), b(i))

The overall silhouette score is the mean across all points: S = 1n Σi=1n s(i).

Exploring Silhouette Interactively

Each point is colored by its silhouette score — teal for confident assignments, red for potential misclassifications. Switch between presets to see how cluster geometry affects per-point scores.

Silhouette Score Explorer

Each point is colored by its individual silhouette score — teal for well-assigned, red for potentially misclassified.

0.83
Avg Silhouette
0.65
Min Score
0%
% Negative
All points score above 0.7 — every point is dramatically closer to its own cluster than to any other. The strong teal coloring across all clusters indicates unambiguous cluster assignments.

Anatomy of a Single Point's Score

The power of the silhouette lies in its per-point decomposition. For any point, you can trace exactly why it scores well or poorly by examining its two distances. A core point deep inside a tight cluster will have small a (nearby same-cluster neighbors) and large b (distant other-cluster points), yielding s close to 1. A boundary point will have similar a and b, yielding s near 0. A misclassified point will have a > b — it is closer to the wrong cluster.

This decomposition makes silhouette uniquely useful for debugging. When a clustering produces a mediocre average score, you can inspect the worst-scoring points to understand whether the problem is boundary ambiguity, cluster overlap, or outright misassignment. No other standard internal metric provides this level of diagnostic detail.

Click a Point to Explore

Click any point to see its silhouette score calculated step by step — blue lines show intra-cluster distances (a), red lines show nearest-cluster distances (b).

Silhouette Plots and k-Selection

The silhouette plot is the canonical visualization for this metric. Points are sorted by score within each cluster, producing “knife shapes.” A good clustering shows wide, uniform knives — all clusters have consistently high scores. A poor clustering shows thin, jagged knives with negative tails. The average silhouette across k values identifies the optimal cluster count, but the plot's shape matters as much as the number — uniform widths across clusters indicate balanced, well-separated groups.

When using silhouette for k-selection, look beyond the average. A clustering with k = 3 and average silhouette 0.55 where all clusters score uniformly is often preferable to k = 2 with average 0.60 where one cluster scores 0.85 and the other scores 0.35. The plot reveals this imbalance immediately, while the average alone would mislead you into choosing fewer clusters.

Silhouette Analysis Across k

The silhouette plot (left) shows per-point scores grouped by cluster. The line chart (right) summarizes the average. Clean “knife shapes” at k=3 confirm optimal clustering.

k
3
Avg Silhouette
0.72
Worst Cluster Avg
0.65

k = 3: Clean, uniform knife shapes across all three clusters — each cluster has consistently high silhouette scores with no negative values. This is the signature of well-separated, correctly-identified clusters.

Strengths and Limitations

The silhouette score has several meaningful advantages. It works with arbitrary cluster shapes because it relies on pairwise distances rather than centroids — no assumption about convexity or globular geometry. Its per-point scores enable diagnosis of specific misassignments, something no other standard internal metric provides. The bounded [-1, +1] range is immediately interpretable without context. And it works with any distance metric — Euclidean, cosine, Manhattan, or custom domain-specific distances.

However, the silhouette has real limitations. Its O(n2) computation from pairwise distances makes it impractical for datasets above roughly 50,000 points without approximation or sampling. The average silhouette can mask problems — one excellent cluster and one terrible cluster might average to a “decent” score. It does not account for density differences between clusters, so a sparse cluster far from a dense cluster may score well despite poor internal cohesion. And for very high-dimensional data, distance concentration effects can compress the score range, reducing its discriminative power.

Comparing Clustering Metrics

Each internal clustering metric makes different geometric assumptions and trades off speed against diagnostic depth.

Comparing Clustering Metrics

Calinski-Harabasz
FormulaSS_B / SS_W (normalized)
Better WhenHigher
Range[0, ∞)
ComplexityO(n·k)
Convexity BiasAssumes convex
Best ForFast k-selection with k-means
Silhouette Score
Formula(b - a) / max(a, b)
Better WhenHigher
Range[-1, 1]
ComplexityO(n²)
Convexity BiasShape-agnostic
Best ForDiagnosing individual point assignments
Davies-Bouldin
Formulaavg max (σi+σj)/dij
Better WhenLower
Range[0, ∞)
ComplexityO(n·k)
Convexity BiasAssumes convex
Best ForWorst-case cluster overlap detection
Use Silhouette when...
  • - You need per-point diagnostics (which points are misassigned?)
  • - Clusters may be non-convex (crescents, rings, arbitrary shapes)
  • - You need a bounded, interpretable score (-1 to +1)
Consider alternatives when...
  • - Dataset is large (O(n²) becomes prohibitive) — use CH instead
  • - You only need to compare k values, not diagnose points — use CH
  • - You want cluster-level worst-case analysis — use Davies-Bouldin

Key Takeaways

  1. Silhouette measures per-point fits(i) = (b - a) / max(a, b), where a is intra-cluster distance and b is nearest-cluster distance. Scores range from −1 (wrong cluster) to +1 (perfect fit).
  2. Silhouette plots reveal cluster quality visually — wide, uniform “knife shapes” indicate well-separated clusters. Negative tails signal misassigned boundary points.
  3. Shape-agnostic but O(n²) — unlike centroid-based metrics (CH, DB), silhouette works for arbitrary geometries. But pairwise distance computation limits scalability to moderate dataset sizes.
  4. Per-point diagnostics are the key advantage — no other standard clustering metric tells you which specific points are problematic. Use silhouette when you need to understand why a clustering fails, not just whether it does.

If you found this explanation helpful, consider sharing it with others.

Mastodon