Tagged with

self-supervised-learning

Explore machine learning papers and reviews related to self-supervised-learning. Find insights, analysis, and implementation details.

Papers Found

Back to all papers

Papers Related to self-supervised-learning

2022

BEiT: BERT Pre-Training of Image Transformers

self-supervised-learning masked-image-modeling visual-tokenizer vision-transformer

How BEiT bridges BERT and vision by predicting discrete visual tokens from masked image patches — the first masked image modeling approach for Vision Transformers, achieving 83.2% on ImageNet-1K.

Read review Original Paper

2024

DINOv2: Learning Robust Visual Features without Supervision

self-supervised-learning foundation-model knowledge-distillation vision-transformer

How DINOv2 combines DINO self-distillation with iBOT masked prediction at scale on curated data (LVD-142M), producing the strongest open-source frozen visual features across classification, segmentation, depth, and retrieval.

Read review Original Paper

2023

I-JEPA: Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

self-supervised-learning joint-embedding predictive-architecture vision-transformer

How I-JEPA learns visual representations by predicting abstract feature representations of masked image regions — no pixel reconstruction, no augmentation — achieving 81.7% linear probe accuracy with ViT-H.

Read review Original Paper

2025

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

self-supervised-learning video-understanding world-model robotics vision-transformer

How V-JEPA 2 scales self-supervised video learning to 1M+ hours with mask denoising and 3D-RoPE, then extends to V-JEPA 2-AC — an action-conditioned world model that enables zero-shot robotic planning from just 62 hours of unlabeled video.

Read review Original Paper

2020

BYOL: Bootstrap Your Own Latent

self-supervised-learning representation-learning knowledge-distillation contrastive-learning

How self-supervised learning works without negative pairs — a predictor and momentum target network are all you need to prevent representation collapse.

Read review Original Paper

2021

DINO: Emerging Properties in Self-Supervised Vision Transformers

self-supervised-learning vision-transformer knowledge-distillation attention-maps object-segmentation

How self-distillation with no labels produces Vision Transformer attention maps that automatically segment objects — without any pixel-level supervision.

Read review Original Paper

2022

MAE: Masked Autoencoders Are Scalable Self-Supervised Learners

self-supervised-learning masked-autoencoders vision-transformer representation-learning

How masking 75% of image patches and reconstructing pixels creates a scalable self-supervised learner that trains ViT-H to 87.8% on ImageNet-1K — 3.5× faster than full encoding, no labels required.

Read review Original Paper

2020

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

self-supervised-learning contrastive-learning representation-learning momentum-encoder

How a momentum-updated encoder and a dictionary queue make contrastive learning practical — large dictionaries with consistent keys, no large-batch requirement.

Read review Original Paper

2020

SimCLR: A Simple Framework for Contrastive Learning

self-supervised-learning contrastive-learning representation-learning data-augmentation

How a simple framework — augmentation, shared encoder, projection head, and contrastive loss — set a new standard for self-supervised visual representation learning.

Read review Original Paper

2024

V-JEPA: Learning Video Representations by Predicting in Latent Space

self-supervised-learning video-understanding representation-learning vision-transformers masking

How V-JEPA learns powerful video representations by predicting masked spatiotemporal regions in embedding space rather than reconstructing pixels, achieving state-of-the-art frozen features with superior label efficiency.

Read review Original Paper

2022

VICReg: Self-Supervised Learning Without Collapse

self-supervised-learning representation-learning contrastive-learning regularization

How variance, invariance, and covariance regularization enables self-supervised representation learning without negative pairs or momentum encoders.

Read review Original Paper