2022
BEiT: BERT Pre-Training of Image Transformers
How BEiT bridges BERT and vision by predicting discrete visual tokens from masked image patches — the first masked image modeling approach for Vision Transformers, achieving 83.2% on ImageNet-1K.
Explore machine learning papers and reviews related to masked-image-modeling. Find insights, analysis, and implementation details.
How BEiT bridges BERT and vision by predicting discrete visual tokens from masked image patches — the first masked image modeling approach for Vision Transformers, achieving 83.2% on ImageNet-1K.