Visual Basic Example Language

Multifaceted Assessment and Resolution of Hallucinations in Large Visual-Language Models

Abstract: The field of Large Visual-Language Models (LVLMs) has made significant strides in integrating visual recognition and language understanding. However, its application in multimodal ...

IEEE

Learning Visual Grounding from Generative Vision and Language Model

Abstract: Visual grounding tasks aim to localize image regions based on natural language references. In this work, we ex-plore whether generative VLMs predominantly trained on image-text data could be ...

GitHub

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

In vision-language models (VLMs), visual tokens usually consume a significant amount of computational overhead, despite their sparser information density compared to text tokens. To address this, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Multifaceted Assessment and Resolution of Hallucinations in Large Visual-Language Models

Learning Visual Grounding from Generative Vision and Language Model

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Trending now