2022
MAE: Masked Autoencoders Are Scalable Self-Supervised Learners
How masking 75% of image patches and reconstructing pixels creates a scalable self-supervised learner that trains ViT-H to 87.8% on ImageNet-1K — 3.5× faster than full encoding, no labels required.
