What is Graph Pooling?
Graph pooling reduces graph size while preserving important structural and feature information. Similar to pooling in CNNs, it creates hierarchical representations by progressively coarsening the graph through node clustering or selection.
Pooling Methods
TopK Pooling
- Mechanism: Select top-k nodes by learned scores
- Advantages: Simple, efficient, parameter-light
- Limitations: May disconnect graph structure
SAGPool (Self-Attention Graph Pooling)
- Mechanism: Use self-attention to compute node importance
- Advantages: Structure-aware selection
- Trade-offs: Higher computational cost
DiffPool (Differentiable Pooling)
- Mechanism: Learn soft cluster assignments
- Advantages: End-to-end differentiable, preserves gradients
- Challenges: Dense assignment matrix, O(n²) memory
MinCutPool
- Mechanism: Minimize normalized cut objective
- Advantages: Preserves cluster structure
- Considerations: Complex optimization, orthogonality constraints
Hierarchical Architecture
Level 0: Original Graph (n nodes) ↓ Pool (ratio=0.5) Level 1: Coarsened Graph (n/2 nodes) ↓ Pool (ratio=0.5) Level 2: Abstract Graph (n/4 nodes) ↓ Global Pool Level 3: Graph Representation (1 vector)
Readout Operations
Global Readout
- Mean: Average all node features
- Max: Take maximum across nodes
- Sum: Aggregate all features
- Attention: Weighted sum with learned weights
Hierarchical Readout
Concatenate representations from multiple levels:
h_graph = [h_level0 || h_level1 || h_level2]
Applications
- Graph Classification: Molecular property prediction
- Graph Regression: Protein function prediction
- Graph Generation: Hierarchical graph synthesis
- Graph Clustering: Community detection
Best Practices
- Choose pooling ratio based on graph size and task
- Use auxiliary losses (link prediction, entropy) for DiffPool
- Combine multiple readout strategies
- Monitor information loss across levels
- Consider graph connectivity preservation
Related concepts
Learn adaptive tiling in vision transformers: dynamically partition images based on visual complexity to reduce token counts while preserving detail.
Learn batch normalization in deep learning: how normalizing layer inputs accelerates training, improves gradient flow, and acts as regularization.
BatchNorm normalizes over the batch and spatial axes; LayerNorm normalizes over the channel and spatial axes for each sample. The choice changes whether your model trains stably with batch=1, depends on batch composition at inference, and behaves consistently across train and eval.
How the Calinski-Harabasz index evaluates clustering quality by measuring the ratio of between-cluster to within-cluster variance — fast, intuitive, and ideal for k-selection with convex clusters.
Understanding complete, dimensional, and cluster collapse — the failure modes that every self-supervised method must prevent. Learn why collapse happens and how contrastive, asymmetric, regularization, and masking approaches solve it.
Understand contrastive loss for representation learning: interactive demos of InfoNCE, triplet loss, and embedding space clustering with temperature tuning.
