BYOL: Bootstrap Your Own Latent
How self-supervised learning works without negative pairs — a predictor and momentum target network are all you need to prevent representation collapse.
Explore machine learning papers and reviews related to representation-learning. Find insights, analysis, and implementation details.
How self-supervised learning works without negative pairs — a predictor and momentum target network are all you need to prevent representation collapse.
How masking 75% of image patches and reconstructing pixels creates a scalable self-supervised learner that trains ViT-H to 87.8% on ImageNet-1K — 3.5× faster than full encoding, no labels required.
How a momentum-updated encoder and a dictionary queue make contrastive learning practical — large dictionaries with consistent keys, no large-batch requirement.
How a simple framework — augmentation, shared encoder, projection head, and contrastive loss — set a new standard for self-supervised visual representation learning.
How V-JEPA learns powerful video representations by predicting masked spatiotemporal regions in embedding space rather than reconstructing pixels, achieving state-of-the-art frozen features with superior label efficiency.
How variance, invariance, and covariance regularization enables self-supervised representation learning without negative pairs or momentum encoders.