What Silhouette Score Measures
The silhouette score answers a per-point question: is this data point closer to its own cluster or to the nearest neighboring cluster? Unlike the Calinski-Harabasz index, which produces one number for the entire clustering, the silhouette score computes a value for every individual point, revealing exactly where cluster assignments are strong and where they break down. Proposed by Peter Rousseeuw in 1987, it remains one of the most widely used internal clustering evaluation metrics precisely because of this granularity.
Each point's score ranges from −1 to +1. A score near +1 means the point is deep inside its cluster, far from any neighbor. Near 0 means it sits on the boundary between two clusters. Below 0 means the point is likely assigned to the wrong cluster — it is actually closer to a different cluster on average. This per-point granularity is the silhouette's defining advantage.
Mathematical Definition
For a point xi in cluster Ck:
This is the average distance to all other points in the same cluster — the intra-cluster distance.
This is the average distance to points in the nearest other cluster — the nearest-cluster distance.
The silhouette score for point xi is:
The overall silhouette score is the mean across all points: S = 1n Σi=1n s(i).
Exploring Silhouette Interactively
Each point is colored by its silhouette score — teal for confident assignments, red for potential misclassifications. Switch between presets to see how cluster geometry affects per-point scores.
Silhouette Score Explorer
Each point is colored by its individual silhouette score — teal for well-assigned, red for potentially misclassified.
Anatomy of a Single Point's Score
The power of the silhouette lies in its per-point decomposition. For any point, you can trace exactly why it scores well or poorly by examining its two distances. A core point deep inside a tight cluster will have small a (nearby same-cluster neighbors) and large b (distant other-cluster points), yielding s close to 1. A boundary point will have similar a and b, yielding s near 0. A misclassified point will have a > b — it is closer to the wrong cluster.
This decomposition makes silhouette uniquely useful for debugging. When a clustering produces a mediocre average score, you can inspect the worst-scoring points to understand whether the problem is boundary ambiguity, cluster overlap, or outright misassignment. No other standard internal metric provides this level of diagnostic detail.
Click a Point to Explore
Click any point to see its silhouette score calculated step by step — blue lines show intra-cluster distances (a), red lines show nearest-cluster distances (b).
Silhouette Plots and k-Selection
The silhouette plot is the canonical visualization for this metric. Points are sorted by score within each cluster, producing “knife shapes.” A good clustering shows wide, uniform knives — all clusters have consistently high scores. A poor clustering shows thin, jagged knives with negative tails. The average silhouette across k values identifies the optimal cluster count, but the plot's shape matters as much as the number — uniform widths across clusters indicate balanced, well-separated groups.
When using silhouette for k-selection, look beyond the average. A clustering with k = 3 and average silhouette 0.55 where all clusters score uniformly is often preferable to k = 2 with average 0.60 where one cluster scores 0.85 and the other scores 0.35. The plot reveals this imbalance immediately, while the average alone would mislead you into choosing fewer clusters.
Silhouette Analysis Across k
The silhouette plot (left) shows per-point scores grouped by cluster. The line chart (right) summarizes the average. Clean “knife shapes” at k=3 confirm optimal clustering.
k = 3: Clean, uniform knife shapes across all three clusters — each cluster has consistently high silhouette scores with no negative values. This is the signature of well-separated, correctly-identified clusters.
Strengths and Limitations
The silhouette score has several meaningful advantages. It works with arbitrary cluster shapes because it relies on pairwise distances rather than centroids — no assumption about convexity or globular geometry. Its per-point scores enable diagnosis of specific misassignments, something no other standard internal metric provides. The bounded [-1, +1] range is immediately interpretable without context. And it works with any distance metric — Euclidean, cosine, Manhattan, or custom domain-specific distances.
However, the silhouette has real limitations. Its O(n2) computation from pairwise distances makes it impractical for datasets above roughly 50,000 points without approximation or sampling. The average silhouette can mask problems — one excellent cluster and one terrible cluster might average to a “decent” score. It does not account for density differences between clusters, so a sparse cluster far from a dense cluster may score well despite poor internal cohesion. And for very high-dimensional data, distance concentration effects can compress the score range, reducing its discriminative power.
Comparing Clustering Metrics
Each internal clustering metric makes different geometric assumptions and trades off speed against diagnostic depth.
Comparing Clustering Metrics
| Property | Calinski-Harabasz | Silhouette Score | Davies-Bouldin |
|---|---|---|---|
| Formula | SS_B / SS_W (normalized) | (b - a) / max(a, b) | avg max (σi+σj)/dij |
| Better When | Higher | Higher | Lower |
| Range | [0, ∞) | [-1, 1] | [0, ∞) |
| Complexity | O(n·k) | O(n²) | O(n·k) |
| Convexity Bias | Assumes convex | Shape-agnostic | Assumes convex |
| Best For | Fast k-selection with k-means | Diagnosing individual point assignments | Worst-case cluster overlap detection |
Calinski-Harabasz
Silhouette Score
Davies-Bouldin
Use Silhouette when...
- - You need per-point diagnostics (which points are misassigned?)
- - Clusters may be non-convex (crescents, rings, arbitrary shapes)
- - You need a bounded, interpretable score (-1 to +1)
Consider alternatives when...
- - Dataset is large (O(n²) becomes prohibitive) — use CH instead
- - You only need to compare k values, not diagnose points — use CH
- - You want cluster-level worst-case analysis — use Davies-Bouldin
Key Takeaways
- Silhouette measures per-point fit — s(i) = (b - a) / max(a, b), where a is intra-cluster distance and b is nearest-cluster distance. Scores range from −1 (wrong cluster) to +1 (perfect fit).
- Silhouette plots reveal cluster quality visually — wide, uniform “knife shapes” indicate well-separated clusters. Negative tails signal misassigned boundary points.
- Shape-agnostic but O(n²) — unlike centroid-based metrics (CH, DB), silhouette works for arbitrary geometries. But pairwise distance computation limits scalability to moderate dataset sizes.
- Per-point diagnostics are the key advantage — no other standard clustering metric tells you which specific points are problematic. Use silhouette when you need to understand why a clustering fails, not just whether it does.
Related Concepts
- Calinski-Harabasz Index — Fast variance-ratio metric for convex clusters
- Davies-Bouldin Index — Worst-case cluster similarity analysis
