Anchor-Based vs Anchor-Free Object Detection

Overview

Object detection has been dominated by two paradigms: anchor-based methods that define pre-set reference boxes, and anchor-free methods that predict objects directly from feature points. While anchor-based detectors like Faster R-CNN pioneered modern detection, anchor-free approaches like FCOS and CenterNet have emerged as simpler alternatives with competitive accuracy.

The choice between these paradigms affects everything from hyperparameter complexity to inference speed. Understanding their differences is essential for choosing the right approach for your detection task.

Key Concepts

Anchor Boxes

Pre-defined reference boxes of various scales and aspect ratios placed at each spatial location in the feature map. The detector predicts adjustments to these anchors.

IoU Matching

The process of assigning ground truth boxes to anchors based on Intersection over Union overlap. High IoU → positive sample, low IoU → negative sample.

Per-Pixel Prediction

Anchor-free approach where each feature point inside a ground truth box directly predicts the object's bounding box without reference anchors.

Center-ness Score

A quality score in FCOS that down-weights predictions from points far from the object center, reducing low-quality detections.

Keypoint Detection

CenterNet's approach of detecting objects as single center points in a heatmap, then regressing the object size at each peak.

Label Assignment

The strategy for deciding which predictions should match which ground truth boxes during training. Evolved from static IoU to dynamic methods.

The Anchor-Based Paradigm

Anchor-based detection revolutionized the field starting with Faster R-CNN in 2015, where the concept of anchor boxes originated. The key idea: place multiple pre-defined reference boxes (anchors) at each spatial location, then train the network to classify which anchors contain objects and refine their coordinates.

Understanding Anchor Boxes

Click any location on the feature map to see the pre-defined anchor boxes — several scales and aspect ratios — that an anchor-based detector evaluates at that point.

How Anchor Matching Works

During training, each anchor must be assigned as a positive (contains object), negative (background), or ignored sample. This assignment uses IoU (Intersection over Union) between anchors and ground truth boxes.

IoU-Based Anchor Matching

Drag the boxes to watch how Intersection-over-Union decides whether an anchor becomes a positive, negative, or ignored training sample.

Key characteristics of anchor-based detection:

Pre-defined boxes: Typically 9 anchors per location (3 scales × 3 ratios)
Offset prediction: Network predicts (dx, dy, dw, dh) adjustments
IoU thresholds: Usually 0.7 for positive, 0.3 for negative
Dense predictions: Thousands of anchors evaluated per image

Representative detectors: Faster R-CNN (2015), SSD (2016), RetinaNet (2017), YOLOv3 (2018)

The Anchor-Free Paradigm

Anchor-free methods emerged as a simpler alternative. Instead of pre-defined reference boxes, they predict objects directly from feature points. Two main approaches have gained prominence: FCOS (distance regression) and CenterNet (keypoint detection).

FCOS: Distance-Based Regression

FCOS treats every feature point inside a ground truth box as a positive sample. Each point predicts four distances: (l, t, r, b) - left, top, right, bottom distances to the box edges.

FCOS Per-Pixel Prediction

Step through how FCOS treats every interior feature point as a detector — regressing four edge distances and down-weighting off-center points via the center-ness score.

FCOS innovations:

No anchors: Every point inside GT predicts directly
Center-ness branch: Suppresses low-quality edge predictions
Multi-scale FPN assignment: Objects assigned to appropriate pyramid level based on size
Still requires NMS: Post-processing to remove duplicate detections

CenterNet: Keypoint Detection

CenterNet takes a different approach: detect objects as single center points in a heatmap. Gaussian peaks mark object centers, and the network regresses (width, height) at each peak.

CenterNet Keypoint Detection

Follow CenterNet's pipeline as it detects objects as peaks in a heatmap and regresses each object's size at the peak — no anchors, no NMS.

CenterNet advantages:

No anchors, no NMS: Peaks naturally yield one detection per object
Simple pipeline: Just heatmap + size regression
Works for multiple tasks: Pose estimation, 3D detection with the same framework
Clean design: Minimal hyperparameters

Side-by-Side Comparison

Paradigm Comparison

Toggle between the two paradigms to compare reference boxes, training assignment, and post-processing side by side.

Aspect	Anchor-Based	Anchor-Free
Reference boxes	Pre-defined anchors (K per location)	None - direct prediction
Box prediction	Predict offsets (dx, dy, dw, dh)	Predict distances or center+size
Training assignment	IoU-based matching	Per-pixel or keypoint
Hyperparameters	Anchor scales, ratios, IoU thresholds	FPN ranges, center sampling
Post-processing	NMS required	NMS (FCOS) or NMS-free (CenterNet)
Design complexity	More complex (anchor design)	Simpler (no anchors to tune)

Evolution of Label Assignment

How we assign ground truth to predictions has evolved significantly. Modern methods move beyond simple IoU thresholds to dynamic, learning-based assignment.

Label Assignment Evolution

Trace how label assignment evolved from static IoU thresholds to dynamic, learning-based matching strategies.

Assignment strategy progression:

Static IoU (2015-2017): Fixed thresholds (0.7/0.3), simple but sensitive to threshold choice
ATSS (2019): Adaptive selection based on anchor-GT statistics
OTA/SimOTA (2021): Optimal transport formulation for global assignment
Set Prediction (2020+): Hungarian matching for one-to-one assignment (DETR)

Real-World Applications

Real-time Detection

When inference speed is critical

Use anchor-free (CenterNet, YOLOX) for simpler pipelines and faster inference

High Accuracy Requirements

When detection accuracy matters most

Modern anchor-free (YOLOX) and anchor-based (YOLOv8) achieve similar top accuracy

Dense Object Scenes

Many small, overlapping objects

Anchor-based with carefully tuned anchors may capture more objects

Variable Object Shapes

Objects with extreme aspect ratios

Anchor-free avoids anchor mismatch issues

New Domain Transfer

Applying to medical imaging, satellite imagery

Anchor-free often generalizes better without domain-specific anchor tuning

End-to-End Systems

Want to eliminate NMS post-processing

CenterNet or DETR-style set prediction

Advantages & Limitations

Advantages

✓Both paradigms achieve state-of-the-art accuracy today
✓Anchor-free methods have simpler pipelines with fewer hyperparameters
✓CenterNet eliminates NMS entirely for cleaner inference
✓Modern methods like YOLOX combine best of both worlds
✓Anchor-free methods generalize better to new domains
✓FCOS center-ness effectively handles ambiguous predictions

Limitations

×Anchor-based requires careful anchor design per dataset
×IoU threshold sensitivity can hurt anchor-based performance
×FCOS still requires NMS post-processing
×CenterNet may miss closely overlapping objects
×Set prediction (DETR) has slow training convergence
×No clear winner - choice depends on specific requirements

Best Practices

Start with Modern Anchor-Free: YOLOX or similar modern anchor-free detectors offer good accuracy with simpler setup. No anchor tuning required.
Use ATSS-style Assignment: If using anchors, adaptive assignment (ATSS) removes threshold sensitivity and matches anchor-free performance.
Consider CenterNet for Speed: If you need NMS-free detection and can tolerate slight accuracy drop on crowded scenes, CenterNet simplifies deployment.
Match FPN Levels to Object Sizes: For anchor-free methods, ensure proper FPN level assignment based on object scale ranges in your dataset.
Don't Over-Tune Anchors: If anchor-based accuracy isn't improving, switching to anchor-free may be more effective than anchor optimization.
Evaluate on Your Data: Neither paradigm universally wins. Test both on your specific dataset and deployment constraints.

When to Choose Which

Scenario	Recommended Approach	Reason
New project, good GPU	Anchor-free (YOLOX)	Simpler setup, competitive accuracy
Existing anchor-based codebase	Keep anchors + ATSS	Minimal changes, removes threshold sensitivity
Need NMS-free inference	CenterNet or DETR	Cleaner pipeline, easier deployment
Dense small objects	Anchor-based + careful tuning	More anchors can capture more objects
Domain transfer	Anchor-free	No domain-specific anchor redesign
Research/new architectures	Anchor-free	Easier to modify and experiment

The Convergence

Modern detectors increasingly blur the line between paradigms:

YOLOX: Anchor-free head with SimOTA dynamic assignment
YOLOv8: Anchor-free with distribution-based regression
RT-DETR: Real-time transformer detector with set prediction

The trend is toward simpler, more adaptive methods that achieve high accuracy without manual anchor design. The "best" approach continues to evolve as new techniques emerge.

Key Takeaways

Anchor-Based vs Anchor-Free Essentials

• Anchor-based: pre-defined reference boxes at every location; the network predicts offsets to them

• Anchor-free: predict objects directly from feature points — no reference boxes to design

• IoU matching: anchor-based training assigns positives/negatives by Intersection-over-Union thresholds

• FCOS: per-pixel distance regression with a center-ness branch — still needs NMS

• CenterNet: objects as heatmap peaks

size regression — anchor-free and NMS-free

• Label assignment: evolved from static IoU → ATSS → OTA/SimOTA → set prediction

• Convergence: YOLOX, YOLOv8, and RT-DETR blur the line with dynamic, anchor-free designs

• No clear winner: the right choice depends on your accuracy, speed, and domain constraints

Anchor-based detectors made modern detection possible, but the field is moving toward simpler, anchor-free designs with adaptive label assignment. Both reach state-of-the-art accuracy today, so the decision comes down to pipeline simplicity, inference speed, and how much anchor tuning your dataset would demand.