It is our great pleasure to welcome you to the 2023 ACM Multimedia Workshop - MMIR 2023. The emergence of multimodal learning offers a feasible way for multimodal IR. Within recent decades with the rapid development of deep learning techniques, the triumph of multimodal learning has been witnessed. Deep multimodal learning has been defined as to use of deep neural techniques to model and learn from multiple sources of data or modalities among others. In the context of IR, deep multimodal learning has shown great potential to improve the performance and application scope of retrieval systems, i.e., by enabling better understanding and processing of the diverse types of data. MMIR'23 workshop can be a good complementarity to place the major focus on multimodal IR. This workshop sets the goal to extend existing work in this direction, by bringing together and facilitating the community of researchers and practitioners. And meanwhile, we aim to encourage an exchange of perspectives and solutions between industry and academia to bridge the gap between academic design guidelines and the best practices in the industry regarding multimodal IR.
Proceeding Downloads
Metaverse Retrieval: Finding the Best Metaverse Environment via Language
In recent years, the metaverse has sparked an increasing interest across the globe and is projected to reach a market size of more than \1000B by 2030. This is due to its many potential applications in highly heterogeneous fields, such as entertainment ...
TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content
- Avinash Anand,
- Raj Jaiswal,
- Pijush Bhuyan,
- Mohit Gupta,
- Siddhesh Bangar,
- Md. Modassir Imam,
- Rajiv Ratn Shah,
- Shin'ichi Satoh
The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representation, enhancing the predictive capabilities of various ...
Prescription Recommendation based on Intention Retrieval Network and Multimodal Medical Indicator
Knowledge based Clinical Decision Support Systems can provide precise and interpretable results for prescription recommendation. Many existing knowledge based prescription recommendation systems take into account multi-modal historical medical events to ...
Boon: A Neural Search Engine for Cross-Modal Information Retrieval
Visual-Semantic Embedding (VSE) networks can help search engines understand the meaning behind visual content and associate it with relevant textual information, leading to accurate search results. VSE networks can be used in cross-modal search engines ...
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language. Recent improvements in video REC have been made using Transformer-based methods with learnable queries. However, we contend ...
Dynamic Network for Language-based Fashion Retrieval
Language-based fashion image retrieval, as a kind of composed image retrieval, presents a substantial challenge in the domain of multi-modal retrieval. This task aims to retrieve the target fashion item in the gallery given a reference image and a ...
On Popularity Bias of Multimodal-aware Recommender Systems: A Modalities-driven Analysis
Multimodal-aware recommender systems (MRSs) exploit multimodal content (e.g., product images or descriptions) as items' side information to improve recommendation accuracy. While most of such methods rely on factorization models (e.g., MFBPR) as base ...
Index Terms
- Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval