2025
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
How V-JEPA 2 scales self-supervised video learning to 1M+ hours with mask denoising and 3D-RoPE, then extends to V-JEPA 2-AC — an action-conditioned world model that enables zero-shot robotic planning from just 62 hours of unlabeled video.
