skip to main content
10.1145/3606040acmconferencesBook PagePublication PagesmmConference Proceedingsconference-collections
MMIR '23: Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval
ACM2023 Proceeding
  • General Chairs:
  • Wei Ji,
  • Yinwei Wei,
  • Zhedong Zheng,
  • Hao Fei,
  • Tat-seng Chua
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
MM '23: The 31st ACM International Conference on Multimedia Ottawa ON Canada 2 November 2023
ISBN:
979-8-4007-0271-6
Published:
29 October 2023
Sponsors:
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN

Reflects downloads up to 20 Feb 2025Bibliometrics
Skip Abstract Section
Abstract

It is our great pleasure to welcome you to the 2023 ACM Multimedia Workshop - MMIR 2023. The emergence of multimodal learning offers a feasible way for multimodal IR. Within recent decades with the rapid development of deep learning techniques, the triumph of multimodal learning has been witnessed. Deep multimodal learning has been defined as to use of deep neural techniques to model and learn from multiple sources of data or modalities among others. In the context of IR, deep multimodal learning has shown great potential to improve the performance and application scope of retrieval systems, i.e., by enabling better understanding and processing of the diverse types of data. MMIR'23 workshop can be a good complementarity to place the major focus on multimodal IR. This workshop sets the goal to extend existing work in this direction, by bringing together and facilitating the community of researchers and practitioners. And meanwhile, we aim to encourage an exchange of perspectives and solutions between industry and academia to bridge the gap between academic design guidelines and the best practices in the industry regarding multimodal IR.

Skip Table Of Content Section
SESSION: Workshop Presentations
research-article
Open Access
Metaverse Retrieval: Finding the Best Metaverse Environment via Language

In recent years, the metaverse has sparked an increasing interest across the globe and is projected to reach a market size of more than \1000B by 2030. This is due to its many potential applications in highly heterogeneous fields, such as entertainment ...

research-article
TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representation, enhancing the predictive capabilities of various ...

research-article
Prescription Recommendation based on Intention Retrieval Network and Multimodal Medical Indicator

Knowledge based Clinical Decision Support Systems can provide precise and interpretable results for prescription recommendation. Many existing knowledge based prescription recommendation systems take into account multi-modal historical medical events to ...

research-article
Open Access
Boon: A Neural Search Engine for Cross-Modal Information Retrieval

Visual-Semantic Embedding (VSE) networks can help search engines understand the meaning behind visual content and associate it with relevant textual information, leading to accurate search results. VSE networks can be used in cross-modal search engines ...

research-article
Video Referring Expression Comprehension via Transformer with Content-conditioned Query

Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language. Recent improvements in video REC have been made using Transformer-based methods with learnable queries. However, we contend ...

research-article
Dynamic Network for Language-based Fashion Retrieval

Language-based fashion image retrieval, as a kind of composed image retrieval, presents a substantial challenge in the domain of multi-modal retrieval. This task aims to retrieve the target fashion item in the gallery given a reference image and a ...

research-article
Open Access
On Popularity Bias of Multimodal-aware Recommender Systems: A Modalities-driven Analysis

Multimodal-aware recommender systems (MRSs) exploit multimodal content (e.g., product images or descriptions) as items' side information to improve recommendation accuracy. While most of such methods rely on factorization models (e.g., MFBPR) as base ...

Contributors
  • National University of Singapore
  • Monash University
  • University of Macau
  • National University of Singapore

Index Terms

  1. Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval
        Index terms have been assigned to the content through auto-classification.

        Recommendations