Journals & Magazines >IEEE/ACM Transactions on Audi... >Volume: 28

Multimodal Word Discovery and Retrieval With Spoken Descriptions and Visual Concepts

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In the absence of dictionaries, translators, or grammars, it is still possible to learn some of the words of a new language by listening to spoken descriptions of images....Show More

Metadata

Abstract:

In the absence of dictionaries, translators, or grammars, it is still possible to learn some of the words of a new language by listening to spoken descriptions of images. If several images, each containing a particular visually salient object, each co-occur with a particular sequence of speech sounds, we can infer that those speech sounds are a word whose definition is the visible object. A multimodal word discovery system accepts, as input, a database of spoken descriptions of images (or a set of corresponding phone transcriptions) and learns a mapping from waveform segments (or phone strings) to their associated image concepts. In this article, four multimodal word discovery systems are demonstrated: three models based on statistical machine translation (SMT) and one based on neural machine translation (NMT). The systems are trained with phonetic transcriptions, MFCC and multilingual bottleneck features (MBN). On the phone-level, the SMT outperforms the NMT model, achieving a 61.6% F1 score in the phone-level word discovery task on Flickr30k. On the audio-level, we compared our models with the existing ES-KMeans algorithm for word discovery and present some of the challenges in multimodal spoken word discovery.

Published in: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 28)

Page(s): 1560 - 1573

Date of Publication: 20 May 2020

ISSN Information:

DOI: 10.1109/TASLP.2020.2996082

Contents

References is not available for this document.

Multimodal Word Discovery and Retrieval With Spoken Descriptions and Visual Concepts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Multimodal Word Discovery and Retrieval With Spoken Descriptions and Visual Concepts

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?