abstract

Vision and Language Integration Meets Multimedia Fusion: Proceedings of ACM Multimedia 2016 Workshop

Authors:

Marie-Francine Moens,

Katerina Pastra,

Kate Saenko,

Tinne TuytelaarsAuthors Info & Claims

MM '16: Proceedings of the 24th ACM international conference on Multimedia

Page 1493

https://doi.org/10.1145/2964284.2980537

Published: 01 October 2016 Publication History

Get Access

Abstract

Multimodal information fusion both at the signal and the semantics levels is a core part in most multimedia applications, including multimedia indexing, retrieval, summarization and others. Early or late fusion of modality-specific processing results has been addressed in multimedia prototypes since their very early days, through various methodologies including rule-based approaches, information-theoretic models and machine learning. Vision and Language are two of the predominant modalities that are being fused and which have attracted special attention in international challenges with a long history of results, such as TRECVid, ImageClef and others. During the last decade, vision-language semantic integration has attracted attention from traditionally non-interdisciplinary research communities, such as Computer Vision and Natural Language Processing. This is due to the fact that one modality can greatly assist the processing of another providing cues for disambiguation, complementary information and noise/error filtering. The latest boom of deep learning methods has opened up new directions in joint modelling of visual and co-occurring verbal information in multimedia discourse. The workshop on Vision and Language Integration Meets Multimedia Fusion has been held during the workshop weekend of the ACM Multimedia 2016 Conference and the European Conference on Computer Vision (ECCV 2016) on October 16, 2016 in Amsterdam, the Netherlands. The proceedings contain seven selected long papers, which have been orally presented at the workshop, and three abstracts of the invited keynote speeches. The papers and abstracts discuss data collection, representation learning, deep learning approaches, matrix and tensor factorization methods and graph based clustering with regard to the fusion of multimedia data. A variety of applications is presented including image captioning, summarization of news, video hyperlinking, sub-shot segmentation of user generated video, cross-modal classification, cross-modal question-answering, and the detection of misleading metadata of user generated video. The workshop is organized and supported by the EU COST action iV&L Net, the European Network on Integrating Vision and Language: Combining Computer Vision and Language Processing for Advanced Search, Retrieval, Annotation and Description of Visual Data (IC 1307--2014-2018).

Cited By

View all

Nie LLiu MSong X(2019)Multimodal Learning toward Micro-Video UnderstandingSynthesis Lectures on Image, Video, and Multimedia Processing10.2200/S00938ED1V01Y201907IVM0209:4(1-186)Online publication date: 17-Sep-2019
https://doi.org/10.2200/S00938ED1V01Y201907IVM020
Gorbova JAvots ELusi IFishel MEscalera SAnbarjafari G(2018)Integrating Vision and Language for First-Impression Personality AnalysisIEEE MultiMedia10.1109/MMUL.2018.02312116225:2(24-33)Online publication date: Apr-2018
https://doi.org/10.1109/MMUL.2018.023121162

Index Terms

Vision and Language Integration Meets Multimedia Fusion: Proceedings of ACM Multimedia 2016 Workshop
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
  2. World Wide Web
    1. Web mining

Recommendations

iV&L-MM '16: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion
PDM '13: Proceedings of the 1st ACM international workshop on Personal data meets distributed multimedia
Report from the first international workshop on computer vision meets databases (CVDB 2004)

This report summarizes the presentations and discussions of the First International Workshop on Computer Vision meets Databases, or CVDB 2004, which was held in Paris, France, on June 13, 2004. The workshop was co-located with the 2004 ACM SIGMOD/PODS ...

Comments

Information & Contributors

Information

Published In

MM '16: Proceedings of the 24th ACM international conference on Multimedia

October 2016

1542 pages

ISBN:9781450336031

DOI:10.1145/2964284

General Chairs:
Alan Hanjalic
Delft University of Technology
,
Cees Snoek
Qualcomm Research Netherlands / University of Amsterdam
,
Marcel Worring
University of Amsterdam
,
Moderator:
Dick Bulterman
CWI / VU University Amsterdam
,
Program Chairs:
Benoit Huet
EURECOM
,
Aisling Kelliher
Virginia Tech
,
Yiannis Kompatsiaris
CERTH-ITI
,
Jin Li
Microsoft

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Check for updates

Author Tags

Qualifiers

Abstract

Funding Sources

EU COST IC 1307

Conference

MM '16

Sponsor:

SIGMM

MM '16: ACM Multimedia Conference

October 15 - 19, 2016

Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
150
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Nie LLiu MSong X(2019)Multimodal Learning toward Micro-Video UnderstandingSynthesis Lectures on Image, Video, and Multimedia Processing10.2200/S00938ED1V01Y201907IVM0209:4(1-186)Online publication date: 17-Sep-2019
https://doi.org/10.2200/S00938ED1V01Y201907IVM020
Gorbova JAvots ELusi IFishel MEscalera SAnbarjafari G(2018)Integrating Vision and Language for First-Impression Personality AnalysisIEEE MultiMedia10.1109/MMUL.2018.02312116225:2(24-33)Online publication date: Apr-2018
https://doi.org/10.1109/MMUL.2018.023121162

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

Cited By

Index Terms

Recommendations

iV&L-MM '16: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion

PDM '13: Proceedings of the 1st ACM international workshop on Personal data meets distributed multimedia

Report from the first international workshop on computer vision meets databases (CVDB 2004)

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations