Poster: Translating Vision into Words: Advancing Object Recognition with Visual-Language Models
Abstract
References
Index Terms
- Poster: Translating Vision into Words: Advancing Object Recognition with Visual-Language Models
Recommendations
Visual Language Based Succinct Zero-Shot Object Detection
MM '21: Proceedings of the 29th ACM International Conference on MultimediaOn account of a large scale of dataset need to be annotated to train the deep learning based modern object detection model, zero-shot object detection has become an important research field which aims to simultaneously localize and recognize unseen ...
KDNet: Leveraging Vision-Language Knowledge Distillation for Few-Shot Object Detection
Artificial Neural Networks and Machine Learning – ICANN 2024AbstractFew-shot object detection (FSOD) aims to detect new categories given only few instances for training. Recently emerged vision-language models (VLMs) have shown great performances in zero-shot and open-vocabulary object detection due to their ...
Exploring Vision Language Pretraining with Knowledge Enhancement via Large Language Model
Trustworthy Artificial Intelligence for HealthcareAbstractThe integration of Vision-Language Pretraining (VLP) models in the medical field represents a significant advancement in the development of AI-driven diagnostic tools. These models, which learn to understand and generate descriptions of visual ...
Comments
Information & Contributors
Information
Published In
- Chairs:
- Tadashi Okoshi,
- JeongGil Ko,
- Program Chair:
- Robert LiKamWa
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Short-paper
Funding Sources
Conference
Acceptance Rates
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 31Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)1
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in