Abstract: The field of Large Visual-Language Models (LVLMs) has made significant strides in integrating visual recognition and language understanding. However, its application in multimodal ...
Abstract: Knowledge-based Visual Question Answering (VQA) is a challenging task that requires models to access external knowledge for reasoning. Large Language Models (LLMs) have recently been ...