Graph Pooling Methods

Hierarchical graph coarsening techniques - TopK, SAGPool, DiffPool, and readout operations for graph-level representations

What is Graph Pooling?

Graph pooling reduces graph size while preserving important structural and feature information. Similar to pooling in CNNs, it creates hierarchical representations by progressively coarsening the graph through node clustering or selection.

Pooling Methods

TopK Pooling

Mechanism: Select top-k nodes by learned scores
Advantages: Simple, efficient, parameter-light
Limitations: May disconnect graph structure

SAGPool (Self-Attention Graph Pooling)

Mechanism: Use self-attention to compute node importance
Advantages: Structure-aware selection
Trade-offs: Higher computational cost

DiffPool (Differentiable Pooling)

Mechanism: Learn soft cluster assignments
Advantages: End-to-end differentiable, preserves gradients
Challenges: Dense assignment matrix, O(n²) memory

MinCutPool

Mechanism: Minimize normalized cut objective
Advantages: Preserves cluster structure
Considerations: Complex optimization, orthogonality constraints

Hierarchical Architecture

Level 0: Original Graph (n nodes)
    ↓ Pool (ratio=0.5)
Level 1: Coarsened Graph (n/2 nodes)
    ↓ Pool (ratio=0.5)
Level 2: Abstract Graph (n/4 nodes)
    ↓ Global Pool
Level 3: Graph Representation (1 vector)

Readout Operations

Global Readout

Mean: Average all node features
Max: Take maximum across nodes
Sum: Aggregate all features
Attention: Weighted sum with learned weights

Hierarchical Readout

Concatenate representations from multiple levels:

h_graph = [h_level0 || h_level1 || h_level2]

Applications

Graph Classification: Molecular property prediction
Graph Regression: Protein function prediction
Graph Generation: Hierarchical graph synthesis
Graph Clustering: Community detection

Best Practices

Choose pooling ratio based on graph size and task
Use auxiliary losses (link prediction, entropy) for DiffPool
Combine multiple readout strategies
Monitor information loss across levels
Consider graph connectivity preservation

Deep Learning

Adaptive Tiling: Efficient Visual Token Generation

Learn adaptive tiling in vision transformers: dynamically partition images based on visual complexity to reduce token counts while preserving detail.

Deep Learning

Batch Normalization in Deep Learning

Learn batch normalization in deep learning: how normalizing layer inputs accelerates training, improves gradient flow, and acts as regularization.

Deep Learning

Batch Norm vs Layer Norm: When to Use Which

BatchNorm normalizes over the batch and spatial axes; LayerNorm normalizes over the channel and spatial axes for each sample. The choice changes whether your model trains stably with batch=1, depends on batch composition at inference, and behaves consistently across train and eval.

Deep Learning

Calinski-Harabasz Index: The Variance Ratio Criterion

How the Calinski-Harabasz index evaluates clustering quality by measuring the ratio of between-cluster to within-cluster variance — fast, intuitive, and ideal for k-selection with convex clusters.

Deep Learning

Representation Collapse in Self-Supervised Learning

Understanding complete, dimensional, and cluster collapse — the failure modes that every self-supervised method must prevent. Learn why collapse happens and how contrastive, asymmetric, regularization, and masking approaches solve it.

Deep Learning

Contrastive Loss for Representation Learning

Understand contrastive loss for representation learning: interactive demos of InfoNCE, triplet loss, and embedding space clustering with temperature tuning.