Overview
Object detection has been dominated by two paradigms: anchor-based methods that define pre-set reference boxes, and anchor-free methods that predict objects directly from feature points. While anchor-based detectors like Faster R-CNN pioneered modern detection, anchor-free approaches like FCOS and CenterNet have emerged as simpler alternatives with competitive accuracy.
The choice between these paradigms affects everything from hyperparameter complexity to inference speed. Understanding their differences is essential for choosing the right approach for your detection task.
Key Concepts
Anchor Boxes
Pre-defined reference boxes of various scales and aspect ratios placed at each spatial location in the feature map. The detector predicts adjustments to these anchors.
IoU Matching
The process of assigning ground truth boxes to anchors based on Intersection over Union overlap. High IoU → positive sample, low IoU → negative sample.
Per-Pixel Prediction
Anchor-free approach where each feature point inside a ground truth box directly predicts the object's bounding box without reference anchors.
Center-ness Score
A quality score in FCOS that down-weights predictions from points far from the object center, reducing low-quality detections.
Keypoint Detection
CenterNet's approach of detecting objects as single center points in a heatmap, then regressing the object size at each peak.
Label Assignment
The strategy for deciding which predictions should match which ground truth boxes during training. Evolved from static IoU to dynamic methods.
The Anchor-Based Paradigm
Anchor-based detection revolutionized the field starting with Faster R-CNN in 2015. The key idea: place multiple pre-defined reference boxes (anchors) at each spatial location, then train the network to classify which anchors contain objects and refine their coordinates.
Understanding Anchor Boxes
Anchor Boxes at Each Location
Click on a grid cell to see 9 anchors (3 scales × 3 ratios)
Feature Map Grid (7×7)
Selected: Row 4, Col 4
Anchors at Selected Location
Anchor Configurations
Key Insight: Anchor-based detectors pre-define 441 reference boxes across the image. The network learns to predict which anchors contain objects and how to refine their coordinates.
How Anchor Matching Works
During training, each anchor must be assigned as a positive (contains object), negative (background), or ignored sample. This assignment uses IoU (Intersection over Union) between anchors and ground truth boxes.
IoU-Based Anchor Matching
IoU-Based Anchor Matching
Drag the green box to see how anchors are matched based on IoU overlap
IoU Formula
Intersection over Union measures overlap between predicted and ground truth boxes
Training Strategy: Only positive anchors contribute to localization loss. Negative anchors help the network learn what is not an object. Ignored anchors prevent confusing gradients from ambiguous cases.
Key characteristics of anchor-based detection:
- Pre-defined boxes: Typically 9 anchors per location (3 scales × 3 ratios)
- Offset prediction: Network predicts (dx, dy, dw, dh) adjustments
- IoU thresholds: Usually 0.7 for positive, 0.3 for negative
- Dense predictions: Thousands of anchors evaluated per image
Representative detectors: Faster R-CNN (2015), SSD (2016), RetinaNet (2017), YOLOv3 (2018)
The Anchor-Free Paradigm
Anchor-free methods emerged as a simpler alternative. Instead of pre-defined reference boxes, they predict objects directly from feature points. Two main approaches have gained prominence: FCOS (distance regression) and CenterNet (keypoint detection).
FCOS: Distance-Based Regression
FCOS treats every feature point inside a ground truth box as a positive sample. Each point predicts four distances: (l, t, r, b) - left, top, right, bottom distances to the box edges.
FCOS Per-Pixel Prediction
FCOS: Per-Pixel Prediction
Click anywhere inside the box to see (l, t, r, b) distance regression
Click any point inside the bounding box on the feature grid
Distance Values
Center-ness Score
Higher when point is closer to box center
FPN Level Assignment
Assigned based on max(l, t, r, b) = 100px
Center-ness Formula
Down-weights predictions from points far from the object center, reducing low-quality detections
FCOS Advantage: No anchor boxes needed! Every feature point inside a ground truth box can predict the object by regressing the four distances directly. The center-ness branch helps suppress low-quality predictions.
FCOS innovations:
- No anchors: Every point inside GT predicts directly
- Center-ness branch: Suppresses low-quality edge predictions
- Multi-scale FPN assignment: Objects assigned to appropriate pyramid level based on size
- Still requires NMS: Post-processing to remove duplicate detections
CenterNet: Keypoint Detection
CenterNet takes a different approach: detect objects as single center points in a heatmap. Gaussian peaks mark object centers, and the network regresses (width, height) at each peak.
CenterNet Keypoint Detection
CenterNet: Keypoint Detection
Detect objects as keypoints - no anchors, no NMS required!
Original image with objects to detect
Input Image
Detection Pipeline
Detected Objects
Why No NMS?
Each object produces exactly one peak in the heatmap. Local maxima extraction naturally yields one detection per object!
Heatmap Color Scale
CenterNet Insight: Objects are represented as single points (their centers). The network predicts a heatmap where peaks indicate object centers, plus size regression at each location. This elegantly avoids both anchor design and NMS post-processing.
CenterNet advantages:
- No anchors, no NMS: Peaks naturally yield one detection per object
- Simple pipeline: Just heatmap + size regression
- Works for multiple tasks: Pose estimation, 3D detection with the same framework
- Clean design: Minimal hyperparameters
Side-by-Side Comparison
Paradigm Comparison
Paradigm Comparison
Side-by-side comparison of anchor-based and anchor-free detection
Anchor-Based
Multiple anchors per location, matched by IoU
Anchor-Free
Every point inside GT predicts distances directly
Anchor-Based Detectors
Anchor-Free Detectors
Evolution Insight: Both paradigms achieve similar accuracy today. Anchor-free methods simplify the detection pipeline by removing anchor design, while anchor-based methods benefit from years of optimization. Modern detectors like YOLOX combine the best of both worlds.
| Aspect | Anchor-Based | Anchor-Free |
|---|---|---|
| Reference boxes | Pre-defined anchors (K per location) | None - direct prediction |
| Box prediction | Predict offsets (dx, dy, dw, dh) | Predict distances or center+size |
| Training assignment | IoU-based matching | Per-pixel or keypoint |
| Hyperparameters | Anchor scales, ratios, IoU thresholds | FPN ranges, center sampling |
| Post-processing | NMS required | NMS (FCOS) or NMS-free (CenterNet) |
| Design complexity | More complex (anchor design) | Simpler (no anchors to tune) |
Evolution of Label Assignment
How we assign ground truth to predictions has evolved significantly. Modern methods move beyond simple IoU thresholds to dynamic, learning-based assignment.
Label Assignment Evolution
Evolution of Label Assignment
How detection training assignment strategies have evolved over time
Static IoU Assignment
Static IoU
2015-2017Fixed IoU thresholds (0.7/0.3) determine positive/negative samples
Innovation
Simple and interpretable
Limitation
Sensitive to IoU threshold choice
Representative Detectors
Evolution Trend: Label assignment has evolved from simple fixed rules to dynamic, learning-based approaches. Modern methods like SimOTA and set prediction achieve better assignment by considering the global optimization of all predictions together.
Assignment strategy progression:
- Static IoU (2015-2017): Fixed thresholds (0.7/0.3), simple but sensitive to threshold choice
- ATSS (2019): Adaptive selection based on anchor-GT statistics
- OTA/SimOTA (2021): Optimal transport formulation for global assignment
- Set Prediction (2020+): Hungarian matching for one-to-one assignment (DETR)
Real-World Applications
Real-time Detection
When inference speed is critical
High Accuracy Requirements
When detection accuracy matters most
Dense Object Scenes
Many small, overlapping objects
Variable Object Shapes
Objects with extreme aspect ratios
New Domain Transfer
Applying to medical imaging, satellite imagery
End-to-End Systems
Want to eliminate NMS post-processing
Advantages & Limitations
Advantages
- ✓Both paradigms achieve state-of-the-art accuracy today
- ✓Anchor-free methods have simpler pipelines with fewer hyperparameters
- ✓CenterNet eliminates NMS entirely for cleaner inference
- ✓Modern methods like YOLOX combine best of both worlds
- ✓Anchor-free methods generalize better to new domains
- ✓FCOS center-ness effectively handles ambiguous predictions
Limitations
- ×Anchor-based requires careful anchor design per dataset
- ×IoU threshold sensitivity can hurt anchor-based performance
- ×FCOS still requires NMS post-processing
- ×CenterNet may miss closely overlapping objects
- ×Set prediction (DETR) has slow training convergence
- ×No clear winner - choice depends on specific requirements
Best Practices
- Start with Modern Anchor-Free: YOLOX or similar modern anchor-free detectors offer good accuracy with simpler setup. No anchor tuning required.
- Use ATSS-style Assignment: If using anchors, adaptive assignment (ATSS) removes threshold sensitivity and matches anchor-free performance.
- Consider CenterNet for Speed: If you need NMS-free detection and can tolerate slight accuracy drop on crowded scenes, CenterNet simplifies deployment.
- Match FPN Levels to Object Sizes: For anchor-free methods, ensure proper FPN level assignment based on object scale ranges in your dataset.
- Don't Over-Tune Anchors: If anchor-based accuracy isn't improving, switching to anchor-free may be more effective than anchor optimization.
- Evaluate on Your Data: Neither paradigm universally wins. Test both on your specific dataset and deployment constraints.
When to Choose Which
| Scenario | Recommended Approach | Reason |
|---|---|---|
| New project, good GPU | Anchor-free (YOLOX) | Simpler setup, competitive accuracy |
| Existing anchor-based codebase | Keep anchors + ATSS | Minimal changes, removes threshold sensitivity |
| Need NMS-free inference | CenterNet or DETR | Cleaner pipeline, easier deployment |
| Dense small objects | Anchor-based + careful tuning | More anchors can capture more objects |
| Domain transfer | Anchor-free | No domain-specific anchor redesign |
| Research/new architectures | Anchor-free | Easier to modify and experiment |
The Convergence
Modern detectors increasingly blur the line between paradigms:
- YOLOX: Anchor-free head with SimOTA dynamic assignment
- YOLOv8: Anchor-free with distribution-based regression
- RT-DETR: Real-time transformer detector with set prediction
The trend is toward simpler, more adaptive methods that achieve high accuracy without manual anchor design. The "best" approach continues to evolve as new techniques emerge.
Further Reading
- Feature Pyramid Networks for Object Detection - FPN foundation
- Focal Loss for Dense Object Detection - RetinaNet
- FCOS: Fully Convolutional One-Stage Object Detection - FCOS paper
- Objects as Points - CenterNet paper
- Bridging the Gap Between Anchor-based and Anchor-free Detection - ATSS paper
- YOLOX: Exceeding YOLO Series in 2021 - YOLOX with SimOTA
- End-to-End Object Detection with Transformers - DETR paper
