Zero-Shot Visual Question Answering with PVLMs

Zero-Shot Visual Question Answering with PVLMs
2024-4-26 23:19:41 Author: hackernoon.com(查看原文) 阅读量:2 收藏

Read on Terminal Reader

Open TLDR

Too Long; Didn't Read

This section defines the task of zero-shot visual question answering (VQA) and explores the use of pre-trained vision-language models (PVLMs) like BLIP-2, highlighting its Querying Transformer component for bridging the modality gap in cross-modal understanding.

featured image - Zero-Shot Visual Question Answering with PVLMs