Anchor-Based vs Anchor-Free Object Detection

Compare anchor-based vs anchor-free object detection: Faster R-CNN and RetinaNet anchors vs FCOS and CenterNet point-based methods.

Best viewed on desktop for optimal interactive experience

Overview

Object detection has been dominated by two paradigms: anchor-based methods that define pre-set reference boxes, and anchor-free methods that predict objects directly from feature points. While anchor-based detectors like Faster R-CNN pioneered modern detection, anchor-free approaches like FCOS and CenterNet have emerged as simpler alternatives with competitive accuracy.

The choice between these paradigms affects everything from hyperparameter complexity to inference speed. Understanding their differences is essential for choosing the right approach for your detection task.

Key Concepts

Anchor Boxes

Pre-defined reference boxes of various scales and aspect ratios placed at each spatial location in the feature map. The detector predicts adjustments to these anchors.

IoU Matching

The process of assigning ground truth boxes to anchors based on Intersection over Union overlap. High IoU → positive sample, low IoU → negative sample.

Per-Pixel Prediction

Anchor-free approach where each feature point inside a ground truth box directly predicts the object's bounding box without reference anchors.

Center-ness Score

A quality score in FCOS that down-weights predictions from points far from the object center, reducing low-quality detections.

Keypoint Detection

CenterNet's approach of detecting objects as single center points in a heatmap, then regressing the object size at each peak.

Label Assignment

The strategy for deciding which predictions should match which ground truth boxes during training. Evolved from static IoU to dynamic methods.

The Anchor-Based Paradigm

Anchor-based detection revolutionized the field starting with Faster R-CNN in 2015. The key idea: place multiple pre-defined reference boxes (anchors) at each spatial location, then train the network to classify which anchors contain objects and refine their coordinates.

Understanding Anchor Boxes

Anchor Boxes at Each Location

Click on a grid cell to see 9 anchors (3 scales × 3 ratios)

Feature Map Grid (7×7)

Selected: Row 4, Col 4

Anchors at Selected Location
(4, 4)
Anchor Configurations
0.5×
0.5:1
0.5×
1:1
0.5×
2:1
1×
0.5:1
1×
1:1
1×
2:1
1.5×
0.5:1
1.5×
1:1
1.5×
2:1
7×7
Grid Size
9
Anchors/Cell
441
Total Anchors
3×3
Scales × Ratios

Key Insight: Anchor-based detectors pre-define 441 reference boxes across the image. The network learns to predict which anchors contain objects and how to refine their coordinates.

How Anchor Matching Works

During training, each anchor must be assigned as a positive (contains object), negative (background), or ignored sample. This assignment uses IoU (Intersection over Union) between anchors and ground truth boxes.

IoU-Based Anchor Matching

IoU-Based Anchor Matching

Drag the green box to see how anchors are matched based on IoU overlap

Positive (IoU ≥ 0.7)
Ignored (0.3 < IoU < 0.7)
Negative (IoU ≤ 0.3)
Ground Truth (Drag me!)
0
Positive
Foreground
0
Ignored
Skipped
35
Negative
Background
IoU Formula
IoU = Area(A ∩ B) / Area(A ∪ B)

Intersection over Union measures overlap between predicted and ground truth boxes

Training Strategy: Only positive anchors contribute to localization loss. Negative anchors help the network learn what is not an object. Ignored anchors prevent confusing gradients from ambiguous cases.

Key characteristics of anchor-based detection:

  • Pre-defined boxes: Typically 9 anchors per location (3 scales × 3 ratios)
  • Offset prediction: Network predicts (dx, dy, dw, dh) adjustments
  • IoU thresholds: Usually 0.7 for positive, 0.3 for negative
  • Dense predictions: Thousands of anchors evaluated per image

Representative detectors: Faster R-CNN (2015), SSD (2016), RetinaNet (2017), YOLOv3 (2018)

The Anchor-Free Paradigm

Anchor-free methods emerged as a simpler alternative. Instead of pre-defined reference boxes, they predict objects directly from feature points. Two main approaches have gained prominence: FCOS (distance regression) and CenterNet (keypoint detection).

FCOS: Distance-Based Regression

FCOS treats every feature point inside a ground truth box as a positive sample. Each point predicts four distances: (l, t, r, b) - left, top, right, bottom distances to the box edges.

FCOS Per-Pixel Prediction

FCOS: Per-Pixel Prediction

Click anywhere inside the box to see (l, t, r, b) distance regression

Click any point inside the bounding box on the feature grid

Distance Values
Left (l)
80px
Top (t)
60px
Right (r)
100px
Bottom (b)
80px
Center-ness Score
0.77

Higher when point is closer to box center

FPN Level Assignment
P3
0-64
P4
64-128
P5
128-256
P6
256-512
P7
>512

Assigned based on max(l, t, r, b) = 100px

Center-ness Formula
centerness = √(min(l,r)/max(l,r) × min(t,b)/max(t,b))

Down-weights predictions from points far from the object center, reducing low-quality detections

FCOS Advantage: No anchor boxes needed! Every feature point inside a ground truth box can predict the object by regressing the four distances directly. The center-ness branch helps suppress low-quality predictions.

FCOS innovations:

  • No anchors: Every point inside GT predicts directly
  • Center-ness branch: Suppresses low-quality edge predictions
  • Multi-scale FPN assignment: Objects assigned to appropriate pyramid level based on size
  • Still requires NMS: Post-processing to remove duplicate detections

CenterNet: Keypoint Detection

CenterNet takes a different approach: detect objects as single center points in a heatmap. Gaussian peaks mark object centers, and the network regresses (width, height) at each peak.

CenterNet Keypoint Detection

CenterNet: Keypoint Detection

Detect objects as keypoints - no anchors, no NMS required!

Original image with objects to detect

Input Image
CarPersonBall
Detection Pipeline
1Input Image
2Generate Heatmap
3Find Peaks
4Regress Size
5Final Detection
Detected Objects
Carcenter: (100, 100)
Personcenter: (280, 140)
Ballcenter: (200, 220)
Why No NMS?

Each object produces exactly one peak in the heatmap. Local maxima extraction naturally yields one detection per object!

Heatmap Color Scale
Low
High (Peak)

CenterNet Insight: Objects are represented as single points (their centers). The network predicts a heatmap where peaks indicate object centers, plus size regression at each location. This elegantly avoids both anchor design and NMS post-processing.

CenterNet advantages:

  • No anchors, no NMS: Peaks naturally yield one detection per object
  • Simple pipeline: Just heatmap + size regression
  • Works for multiple tasks: Pose estimation, 3D detection with the same framework
  • Clean design: Minimal hyperparameters

Side-by-Side Comparison

Paradigm Comparison

Paradigm Comparison

Side-by-side comparison of anchor-based and anchor-free detection

Anchor-Based
Ground Truth

Multiple anchors per location, matched by IoU

Anchor-Free
Ground Truth

Every point inside GT predicts distances directly

Aspect
Anchor-Based
Anchor-Free
Reference Boxes
Pre-defined anchors (e.g., 9 per location)
No anchors - direct prediction
Box Prediction
Predict offsets (dx, dy, dw, dh)
Predict (l, t, r, b) or (cx, cy, w, h)
Label Assignment
IoU-based matching to GT
Per-pixel inside GT box
Hyperparameters
Anchor scales, ratios, IoU thresholds
FPN level ranges, center sampling radius
Post-processing
NMS required
NMS (FCOS) or NMS-free (CenterNet)
Anchor-Based Detectors
Faster R-CNN (2015)SSD (2016)RetinaNet (2017)YOLOv3 (2018)
Anchor-Free Detectors
CornerNet (2018)FCOS (2019)CenterNet (2019)YOLOX (2021)

Evolution Insight: Both paradigms achieve similar accuracy today. Anchor-free methods simplify the detection pipeline by removing anchor design, while anchor-based methods benefit from years of optimization. Modern detectors like YOLOX combine the best of both worlds.

AspectAnchor-BasedAnchor-Free
Reference boxesPre-defined anchors (K per location)None - direct prediction
Box predictionPredict offsets (dx, dy, dw, dh)Predict distances or center+size
Training assignmentIoU-based matchingPer-pixel or keypoint
HyperparametersAnchor scales, ratios, IoU thresholdsFPN ranges, center sampling
Post-processingNMS requiredNMS (FCOS) or NMS-free (CenterNet)
Design complexityMore complex (anchor design)Simpler (no anchors to tune)

Evolution of Label Assignment

How we assign ground truth to predictions has evolved significantly. Modern methods move beyond simple IoU thresholds to dynamic, learning-based assignment.

Label Assignment Evolution

Evolution of Label Assignment

How detection training assignment strategies have evolved over time

Static IoU Assignment
Ground TruthPositive (6)Negative
Static IoU
2015-2017

Fixed IoU thresholds (0.7/0.3) determine positive/negative samples

Innovation

Simple and interpretable

Limitation

Sensitive to IoU threshold choice

Representative Detectors
Faster R-CNNSSDRetinaNet
1 / 4

Evolution Trend: Label assignment has evolved from simple fixed rules to dynamic, learning-based approaches. Modern methods like SimOTA and set prediction achieve better assignment by considering the global optimization of all predictions together.

Assignment strategy progression:

  1. Static IoU (2015-2017): Fixed thresholds (0.7/0.3), simple but sensitive to threshold choice
  2. ATSS (2019): Adaptive selection based on anchor-GT statistics
  3. OTA/SimOTA (2021): Optimal transport formulation for global assignment
  4. Set Prediction (2020+): Hungarian matching for one-to-one assignment (DETR)

Real-World Applications

Real-time Detection

When inference speed is critical

Use anchor-free (CenterNet, YOLOX) for simpler pipelines and faster inference

High Accuracy Requirements

When detection accuracy matters most

Modern anchor-free (YOLOX) and anchor-based (YOLOv8) achieve similar top accuracy

Dense Object Scenes

Many small, overlapping objects

Anchor-based with carefully tuned anchors may capture more objects

Variable Object Shapes

Objects with extreme aspect ratios

Anchor-free avoids anchor mismatch issues

New Domain Transfer

Applying to medical imaging, satellite imagery

Anchor-free often generalizes better without domain-specific anchor tuning

End-to-End Systems

Want to eliminate NMS post-processing

CenterNet or DETR-style set prediction

Advantages & Limitations

Advantages

  • Both paradigms achieve state-of-the-art accuracy today
  • Anchor-free methods have simpler pipelines with fewer hyperparameters
  • CenterNet eliminates NMS entirely for cleaner inference
  • Modern methods like YOLOX combine best of both worlds
  • Anchor-free methods generalize better to new domains
  • FCOS center-ness effectively handles ambiguous predictions

Limitations

  • ×Anchor-based requires careful anchor design per dataset
  • ×IoU threshold sensitivity can hurt anchor-based performance
  • ×FCOS still requires NMS post-processing
  • ×CenterNet may miss closely overlapping objects
  • ×Set prediction (DETR) has slow training convergence
  • ×No clear winner - choice depends on specific requirements

Best Practices

  • Start with Modern Anchor-Free: YOLOX or similar modern anchor-free detectors offer good accuracy with simpler setup. No anchor tuning required.
  • Use ATSS-style Assignment: If using anchors, adaptive assignment (ATSS) removes threshold sensitivity and matches anchor-free performance.
  • Consider CenterNet for Speed: If you need NMS-free detection and can tolerate slight accuracy drop on crowded scenes, CenterNet simplifies deployment.
  • Match FPN Levels to Object Sizes: For anchor-free methods, ensure proper FPN level assignment based on object scale ranges in your dataset.
  • Don't Over-Tune Anchors: If anchor-based accuracy isn't improving, switching to anchor-free may be more effective than anchor optimization.
  • Evaluate on Your Data: Neither paradigm universally wins. Test both on your specific dataset and deployment constraints.

When to Choose Which

ScenarioRecommended ApproachReason
New project, good GPUAnchor-free (YOLOX)Simpler setup, competitive accuracy
Existing anchor-based codebaseKeep anchors + ATSSMinimal changes, removes threshold sensitivity
Need NMS-free inferenceCenterNet or DETRCleaner pipeline, easier deployment
Dense small objectsAnchor-based + careful tuningMore anchors can capture more objects
Domain transferAnchor-freeNo domain-specific anchor redesign
Research/new architecturesAnchor-freeEasier to modify and experiment

The Convergence

Modern detectors increasingly blur the line between paradigms:

  • YOLOX: Anchor-free head with SimOTA dynamic assignment
  • YOLOv8: Anchor-free with distribution-based regression
  • RT-DETR: Real-time transformer detector with set prediction

The trend is toward simpler, more adaptive methods that achieve high accuracy without manual anchor design. The "best" approach continues to evolve as new techniques emerge.

Further Reading

If you found this explanation helpful, consider sharing it with others.

Mastodon