A Method for Visual Spatial Description Based on Large Language Model Fine-tuning
Abstract

References
Index Terms
- A Method for Visual Spatial Description Based on Large Language Model Fine-tuning
Recommendations
RAG-Guided Large Language Models for Visual Spatial Description with Adaptive Hallucination Corrector
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaVisual Spatial Description (VSD) is an emerging image-to-text task which aims at generating descriptions of the spatial relationships between given objects in an image. In this paper, we apply Retrieval-Augmented Generation (RAG) technology in guiding ...
LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaVisual Spatial Description (VSD) aims to generate texts that describe the spatial relationships between objects within images. Traditional visual spatial relationship classification (VSRC) methods typically output the spatial relationship between two ...
A Novel Lightweight Audio-visual Saliency Model for Videos
Audio information has not been considered an important factor in visual attention models regardless of many psychological studies that have shown the importance of audio information in the human visual perception system. Since existing visual attention ...
Comments
Information & Contributors
Information
Published In

- General Chairs:
- Jianfei Cai,
- Mohan Kankanhalli,
- Balakrishnan Prabhakaran,
- Susanne Boll,
- Program Chairs:
- Ramanathan Subramanian,
- Liang Zheng,
- Vivek K. Singh,
- Pablo Cesar,
- Lexing Xie,
- Dong Xu
Sponsors
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Conference
Acceptance Rates
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 61Total Downloads
- Downloads (Last 12 months)61
- Downloads (Last 6 weeks)7
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in