Exploring Plain Vision Transformer Backbones for Object Detection
Investigating the effectiveness of plain Vision Transformers as backbones for object detection and proposing modifications to improve their performance.
Explore machine learning papers and reviews related to transformers. Find insights, analysis, and implementation details.
Investigating the effectiveness of plain Vision Transformers as backbones for object detection and proposing modifications to improve their performance.
Introducing DETR, a novel end-to-end object detection framework that leverages Transformers to directly predict a set of object bounding boxes.
Vision Transformer (ViT) explained: how splitting images into 16x16 patches enables pure transformer architecture for state-of-the-art image recognition.
Survey of transformer inference optimization: pruning, quantization, knowledge distillation, neural architecture search, and hardware acceleration.
Swin Transformer: hierarchical Vision Transformer using shifted windows for efficient image classification, object detection, and segmentation.
Deep dive into the Transformer architecture that revolutionized NLP. Understand self-attention, multi-head attention, and positional encoding.
Analysis of transformer performance bottlenecks caused by data movement. Learn optimization strategies for memory-bound operations on GPUs.