skip to main content
10.1145/2964284.2980537acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
abstract

Vision and Language Integration Meets Multimedia Fusion: Proceedings of ACM Multimedia 2016 Workshop

Published: 01 October 2016 Publication History

Abstract

Multimodal information fusion both at the signal and the semantics levels is a core part in most multimedia applications, including multimedia indexing, retrieval, summarization and others. Early or late fusion of modality-specific processing results has been addressed in multimedia prototypes since their very early days, through various methodologies including rule-based approaches, information-theoretic models and machine learning. Vision and Language are two of the predominant modalities that are being fused and which have attracted special attention in international challenges with a long history of results, such as TRECVid, ImageClef and others. During the last decade, vision-language semantic integration has attracted attention from traditionally non-interdisciplinary research communities, such as Computer Vision and Natural Language Processing. This is due to the fact that one modality can greatly assist the processing of another providing cues for disambiguation, complementary information and noise/error filtering. The latest boom of deep learning methods has opened up new directions in joint modelling of visual and co-occurring verbal information in multimedia discourse. The workshop on Vision and Language Integration Meets Multimedia Fusion has been held during the workshop weekend of the ACM Multimedia 2016 Conference and the European Conference on Computer Vision (ECCV 2016) on October 16, 2016 in Amsterdam, the Netherlands. The proceedings contain seven selected long papers, which have been orally presented at the workshop, and three abstracts of the invited keynote speeches. The papers and abstracts discuss data collection, representation learning, deep learning approaches, matrix and tensor factorization methods and graph based clustering with regard to the fusion of multimedia data. A variety of applications is presented including image captioning, summarization of news, video hyperlinking, sub-shot segmentation of user generated video, cross-modal classification, cross-modal question-answering, and the detection of misleading metadata of user generated video. The workshop is organized and supported by the EU COST action iV&L Net, the European Network on Integrating Vision and Language: Combining Computer Vision and Language Processing for Advanced Search, Retrieval, Annotation and Description of Visual Data (IC 1307--2014-2018).

Cited By

View all
  • (2019)Multimodal Learning toward Micro-Video UnderstandingSynthesis Lectures on Image, Video, and Multimedia Processing10.2200/S00938ED1V01Y201907IVM0209:4(1-186)Online publication date: 17-Sep-2019
  • (2018)Integrating Vision and Language for First-Impression Personality AnalysisIEEE MultiMedia10.1109/MMUL.2018.02312116225:2(24-33)Online publication date: Apr-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '16: Proceedings of the 24th ACM international conference on Multimedia
October 2016
1542 pages
ISBN:9781450336031
DOI:10.1145/2964284
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Check for updates

Author Tags

  1. combining computer vision and natural language processing
  2. cross-modal and multimodal processing of visual and language data

Qualifiers

  • Abstract

Funding Sources

  • EU COST IC 1307

Conference

MM '16
Sponsor:
MM '16: ACM Multimedia Conference
October 15 - 19, 2016
Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Multimodal Learning toward Micro-Video UnderstandingSynthesis Lectures on Image, Video, and Multimedia Processing10.2200/S00938ED1V01Y201907IVM0209:4(1-186)Online publication date: 17-Sep-2019
  • (2018)Integrating Vision and Language for First-Impression Personality AnalysisIEEE MultiMedia10.1109/MMUL.2018.02312116225:2(24-33)Online publication date: Apr-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media