world-model Papers | Abhik Sarkar

2025

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

self-supervised-learning video-understanding world-model robotics vision-transformer

How V-JEPA 2 scales self-supervised video learning to 1M+ hours with mask denoising and 3D-RoPE, then extends to V-JEPA 2-AC — an action-conditioned world model that enables zero-shot robotic planning from just 62 hours of unlabeled video.

Read review Original Paper

world-model

Papers Related to world-model

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning