Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval

MMIR '23: Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval

November 2023

2023 Proceeding

General Chairs:
Wei Ji
National University of Singapore, Singapore
,
Yinwei Wei
Monash University, Australia
,
Zhedong Zheng
National University of Singapore, Singapore
,
Hao Fei
National University of Singapore, Singapore
,
Tat-seng Chua
National University of Singapore, Singapore

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

MM '23: The 31st ACM International Conference on Multimedia Ottawa ON Canada 2 November 2023

ISBN:

979-8-4007-0271-6

Published:

29 October 2023

Sponsors:

SIGMM

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Abstract

It is our great pleasure to welcome you to the 2023 ACM Multimedia Workshop - MMIR 2023. The emergence of multimodal learning offers a feasible way for multimodal IR. Within recent decades with the rapid development of deep learning techniques, the triumph of multimodal learning has been witnessed. Deep multimodal learning has been defined as to use of deep neural techniques to model and learn from multiple sources of data or modalities among others. In the context of IR, deep multimodal learning has shown great potential to improve the performance and application scope of retrieval systems, i.e., by enabling better understanding and processing of the diverse types of data. MMIR'23 workshop can be a good complementarity to place the major focus on multimodal IR. This workshop sets the goal to extend existing work in this direction, by bringing together and facilitating the community of researchers and practitioners. And meanwhile, we aim to encourage an exchange of perspectives and solutions between industry and academia to bridge the gap between academic design guidelines and the best practices in the industry regarding multimodal IR.

Proceeding Downloads

PDFFrontmatter (Title Page, Copyright, Welcome, Contents, Organization, Sponsors)

PDFBackmatter (Author Index)

Select All

Export Citations Save to Binder

SESSION: Workshop Presentations

research-article

Open Access

Metaverse Retrieval: Finding the Best Metaverse Environment via Language

Pages 1–9https://doi.org/10.1145/3606040.3617445

In recent years, the metaverse has sparked an increasing interest across the globe and is projected to reach a market size of more than \1000B by 2030. This is due to its many potential applications in highly heterogeneous fields, such as entertainment ...

research-article

TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

Pages 11–18https://doi.org/10.1145/3606040.3617444

The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representation, enhancing the predictive capabilities of various ...

research-article

Prescription Recommendation based on Intention Retrieval Network and Multimodal Medical Indicator

Pages 19–27https://doi.org/10.1145/3606040.3617446

Knowledge based Clinical Decision Support Systems can provide precise and interpretable results for prescription recommendation. Many existing knowledge based prescription recommendation systems take into account multi-modal historical medical events to ...

research-article

Open Access

Boon: A Neural Search Engine for Cross-Modal Information Retrieval

Pages 29–37https://doi.org/10.1145/3606040.3617440

Visual-Semantic Embedding (VSE) networks can help search engines understand the meaning behind visual content and associate it with relevant textual information, leading to accurate search results. VSE networks can be used in cross-modal search engines ...

research-article

Video Referring Expression Comprehension via Transformer with Content-conditioned Query

Pages 39–48https://doi.org/10.1145/3606040.3617439

Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language. Recent improvements in video REC have been made using Transformer-based methods with learnable queries. However, we contend ...

research-article

Dynamic Network for Language-based Fashion Retrieval

Pages 49–57https://doi.org/10.1145/3606040.3617438

Language-based fashion image retrieval, as a kind of composed image retrieval, presents a substantial challenge in the domain of multi-modal retrieval. This task aims to retrieve the target fashion item in the gallery given a reference image and a ...

research-article

Open Access

On Popularity Bias of Multimodal-aware Recommender Systems: A Modalities-driven Analysis

Pages 59–68https://doi.org/10.1145/3606040.3617441

Multimodal-aware recommender systems (MRSs) exploit multimodal content (e.g., product images or descriptions) as items' side information to improve recommendation accuracy. While most of such methods rely on factorization models (e.g., MFBPR) as base ...

Contributors

Wei Ji
National University of Singapore
- Publication Years2021 - 2025
- Publication counts33
- Citation count247
- Available for Download22
- Downloads (cumulative)10,195
- Downloads (12 months)5,438
- Downloads (6 weeks)960
- Average Downloads per Article463
- Average Citation per Article7
View Full Profile
Yinwei Wei
Monash University
- Publication Years2019 - 2024
- Publication counts47
- Citation count1,097
- Available for Download33
- Downloads (cumulative)35,861
- Downloads (12 months)20,115
- Downloads (6 weeks)1,859
- Average Downloads per Article1,087
- Average Citation per Article23
View Full Profile
Zhedong Zheng
University of Macau
- Publication Years2017 - 2024
- Publication counts37
- Citation count1,669
- Available for Download19
- Downloads (cumulative)8,772
- Downloads (12 months)4,225
- Downloads (6 weeks)382
- Average Downloads per Article462
- Average Citation per Article45
View Full Profile
Hao Fei
- Publication Years
- Publication counts0
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile
Tat-Seng Chua
National University of Singapore
- Publication Years1984 - 2025
- Publication counts531
- Citation count27,087
- Available for Download352
- Downloads (cumulative)376,062
- Downloads (12 months)94,478
- Downloads (6 weeks)11,047
- Average Downloads per Article1,068
- Average Citation per Article51
View Full Profile

Index Terms

Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval

Index terms have been assigned to the content through auto-classification.

Comments

MM

Sections

Proceeding Downloads

Metaverse Retrieval: Finding the Best Metaverse Environment via Language

TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

Prescription Recommendation based on Intention Retrieval Network and Multimodal Medical Indicator

Boon: A Neural Search Engine for Cross-Modal Information Retrieval

Video Referring Expression Comprehension via Transformer with Content-conditioned Query

Dynamic Network for Language-based Fashion Retrieval

On Popularity Bias of Multimodal-aware Recommender Systems: A Modalities-driven Analysis

Index Terms

WiSE '02: Proceedings of the 1st ACM workshop on Wireless security

ICISSP 2015: Proceedings of the 1st International Conference on Information Systems Security and Privacy

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

Save to Binder

Sections

Proceeding Downloads

Metaverse Retrieval: Finding the Best Metaverse Environment via Language

TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

Prescription Recommendation based on Intention Retrieval Network and Multimodal Medical Indicator

Boon: A Neural Search Engine for Cross-Modal Information Retrieval

Video Referring Expression Comprehension via Transformer with Content-conditioned Query

Dynamic Network for Language-based Fashion Retrieval

On Popularity Bias of Multimodal-aware Recommender Systems: A Modalities-driven Analysis

Save to Binder

Index Terms

Recommendations

WiSE '02: Proceedings of the 1st ACM workshop on Wireless security

ICISSP 2015: Proceedings of the 1st International Conference on Information Systems Security and Privacy

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces