Calinski-Harabasz Index: The Variance Ratio Criterion

What Calinski-Harabasz Measures

The CH index (also called the Variance Ratio Criterion) answers a simple question: are your clusters compact and well-separated? It computes the ratio of between-cluster scatter to within-cluster scatter, normalized by degrees of freedom. Higher values mean better-defined clusters. Proposed by Calinski and Harabasz in 1974, it remains one of the most widely used internal clustering evaluation metrics.

Think of it as a signal-to-noise ratio for clustering. The “signal” is how far apart the cluster centers are from the overall data center (between-cluster variance). The “noise” is how spread out the points are within each cluster (within-cluster variance). A good clustering maximizes the signal relative to the noise.

Mathematical Definition

The CH index is defined as:

CH = SS_B / (k - 1)SS_W / (n - k)

where k is the number of clusters and n is the total number of data points.

Between-cluster scatter SS_B measures how far cluster centroids are from the global centroid:

SS_B = Σ_j=1^k n_j \| μ_j - μ \|²

Within-cluster scatter SS_W measures how spread out points are within each cluster:

SS_W = Σ_j=1^k Σ_{x_i ∈ C_j} \| x_i - μ_j \|²

The (k-1) and (n-k) terms normalize for the number of clusters and samples, making CH comparable across different values of k.

Exploring CH Interactively

Switch between preset scenarios to see how cluster arrangement affects the CH index. Watch how SS_B and SS_W shift as clusters move closer, overlap, or take non-convex shapes.

How the Calculation Works

The computation follows a clear sequence. First, compute the global centroid from all data points. Then compute each cluster's centroid. SS_B accumulates the squared distance from each cluster centroid to the global centroid, weighted by cluster size — larger clusters contribute more. SS_W accumulates the squared distance from every point to its own cluster centroid. Finally, the ratio is normalized by degrees of freedom.

The degrees-of-freedom normalization is what makes CH fair across different values of k. Without it, increasing k would almost always increase SS_B (more centroids spread further from the global center) and decrease SS_W (smaller clusters are tighter). The (k-1) in the numerator and (n-k) in the denominator correct for this, penalizing unnecessary splits.

Selecting k with CH

The most common use of CH is selecting the optimal number of clusters. Run the clustering algorithm for k = 2, 3, 4, \ldots, compute CH at each k, and pick the k that maximizes the score. The peak indicates where adding another cluster no longer improves the separation-to-spread ratio.

The (k-1) denominator naturally penalizes over-splitting — adding a cluster that doesn't meaningfully reduce SS_W will decrease CH. This built-in regularization makes CH more robust than raw inertia for k-selection, where the “elbow” can be ambiguous.

Strengths and Limitations

CH has several practical advantages. It runs in O(n · k) time — no pairwise distance matrix needed, unlike Silhouette Score's O(n²). The signal-to-noise interpretation is intuitive and easy to explain to stakeholders. It works well for convex, globular clusters produced by algorithms like k-means and GMM. The degrees-of-freedom normalization makes it fair across different k values, so you can directly compare scores without additional correction.

However, CH has meaningful limitations. It assumes convex cluster shapes — the centroid of a crescent-shaped cluster lies in empty space, so SS_W is inflated and SS_B is misleading. It is sensitive to cluster size imbalance — a few small outlier clusters far from center can inflate SS_B disproportionately. It cannot evaluate a single cluster (k = 1 is undefined because of the (k-1) denominator). And it does not provide per-point diagnostics — you get one number for the entire clustering, with no way to identify which individual points are poorly assigned.

Comparing Clustering Metrics

CH is one of several internal evaluation metrics. Each makes different geometric assumptions.

Comparing Clustering Metrics

Property	Calinski-Harabasz	Silhouette Score	Davies-Bouldin
Formula	SS_B / SS_W (normalized)	(b - a) / max(a, b)	avg max (σi+σj)/dij
Better When	Higher	Higher	Lower
Range	[0, ∞)	[-1, 1]	[0, ∞)
Complexity	O(n·k)	O(n²)	O(n·k)
Convexity Bias	Assumes convex	Shape-agnostic	Assumes convex
Best For	Fast k-selection with k-means	Diagnosing individual point assignments	Worst-case cluster overlap detection

Calinski-Harabasz

FormulaSS_B / SS_W (normalized)

Better WhenHigher

Range[0, ∞)

ComplexityO(n·k)

Convexity BiasAssumes convex

Best ForFast k-selection with k-means

Silhouette Score

Formula(b - a) / max(a, b)

Better WhenHigher

Range[-1, 1]

ComplexityO(n²)

Convexity BiasShape-agnostic

Best ForDiagnosing individual point assignments

Davies-Bouldin

Formulaavg max (σi+σj)/dij

Better WhenLower

Range[0, ∞)

ComplexityO(n·k)

Convexity BiasAssumes convex

Best ForWorst-case cluster overlap detection

Use CH when...

- You need fast computation (O(n\u00B7k))
- Working with convex, globular clusters (k-means, GMM)
- Comparing different k values on the same dataset

Consider alternatives when...

- Clusters are non-convex (crescents, rings) — use Silhouette
- You need per-point diagnostics — use Silhouette
- You want worst-case analysis — use Davies-Bouldin

Key Takeaways

CH = between-cluster variance / within-cluster variance — higher means better-separated, more compact clusters. It is a signal-to-noise ratio for clustering quality.
Use CH for fast k-selection — compute CH at each k, pick the peak. O(n · k) complexity makes it practical for large datasets where Silhouette's O(n²) is prohibitive.
Beware of non-convex clusters — CH uses centroids, which misrepresent the geometry of crescents, rings, or other non-convex shapes. Use Silhouette Score for arbitrary geometries.
Degrees of freedom matter — the (k-1) and (n-k) normalization prevents trivial score inflation from adding empty clusters, making CH a fair metric for comparing different k values.

Silhouette Score — Per-point evaluation metric that works with arbitrary cluster shapes
Davies-Bouldin Index — Worst-case cluster similarity analysis with O(n·k) complexity