2023
BLIP-2: Efficient Vision-Language Pre-training
Computer VisionNatural Language ProcessingDeep LearningMultimodal LearningBLIP-2Vision-Language Models
BLIP-2 leverages frozen image encoders and LLMs for efficient vision-language pre-training, achieving state-of-the-art multimodal performance.
