Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2018

Abstract

By offering a natural way for information seeking, multimodal dialogue systems are attracting increasing attention in several domains such as retail, travel etc. However, most existing dialogue systems are limited to textual modality, which cannot be easily extended to capture the rich semantics in visual modality such as product images. For example, in fashion domain, the visual appearance of clothes and matching styles play a crucial role in understanding the user’s intention. Without considering these, the dialogue agent may fail to generate desirable responses for users. In this paper, we present a Knowledge-aware Multimodal Dialogue (KMD) model to address the limitation of text-based dialogue systems. It gives special consideration to the semantics and domain knowledge revealed in visual content, and is featured with three key components. First, we build a taxonomy-based learning module to capture the fine-grained semantics in images (e.g., the category and attributes of a product). Second, we propose an end-to-end neural conversational model to generate responses based on the conversation history, visual semantics, and domain knowledge. Lastly, to avoid inconsistent dialogues, we adopt a deep reinforcement learning method which accounts for future rewards to optimize the neural conversational model. We perform extensive evaluation on a multi-turn task-oriented dialogue dataset in fashion domain. Experiment results show that our method significantly outperforms state-of-the-art methods, demonstrating the efficacy of modeling visual modality and domain knowledge for dialogue systems.

Keywords

Multimodal dialogue, Domain knowledge, fashion

Discipline

Artificial Intelligence and Robotics

Research Areas

Data Science and Engineering; Intelligent Systems and Optimization

Publication

Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 2018 October 22 - 26

First Page

801

Last Page

809

ISBN

9781450356657

Identifier

10.1145/3240508.3240605

Publisher

Association for Computing Machinery

City or Country

United States

Citation

LIAO, Lizi; MA, Yunshan; HE, Xiangnan; HUANG, Richang; and CHUA, Tat-Seng. Knowledge-aware multimodal dialogue systems. (2018). Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 2018 October 22 - 26. 801-809.
Available at: https://ink.library.smu.edu.sg/sis_research/7573

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3240508.3240605

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

Knowledge-aware multimodal dialogue systems

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Knowledge-aware multimodal dialogue systems

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links