research-article

Multimodal summarization of complex sentences

Authors:

Naushad UzZaman,

Jeffrey P. Bigham,

James F. AllenAuthors Info & Claims

IUI '11: Proceedings of the 16th international conference on Intelligent user interfaces

Pages 43 - 52

https://doi.org/10.1145/1943403.1943412

Published: 13 February 2011 Publication History

Abstract

In this paper, we introduce the idea of automatically illustrating complex sentences as multimodal summaries that combine pictures, structure and simplified compressed text. By including text and structure in addition to pictures, multimodal summaries provide additional clues of what happened, who did it, to whom and how, to people who may have difficulty reading or who are looking to skim quickly. We present ROC-MMS, a system for automatically creating multimodal summaries (MMS) of complex sentences by generating pictures, textual summaries and structure. We show that pictures alone are insufficient to help people understand most sentences, especially for readers who are unfamiliar with the domain. An evaluation of ROC-MMS in the Wikipedia domain illustrates both the promise and challenge of automatically creating multimodal summaries.

References

[1]

R. N. Carney and J. R. Levin, "Pictorial Illustrations Still Improve Students' Learning from Text," Educational Psychology Review, vol. 14, 2002.

[2]

B. Goldberg, et al., "Easy as ABC? Facilitating pictorial communication via semantically enhanced layout.," Twelfth International Conference on Computational Natural Language Learning, 2008.

Digital Library

[3]

R. Mihalcea and B. Leong, "Toward communicating simple sentences using pictorial representations," presented at the Association of Machine Translation in the Americas., 2006.

Digital Library

[4]

J. Zhu, et al., "A text-to-picture synthesis system for augmenting communication.," in The Integrated Intelligence Track of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007.

Digital Library

[5]

K. Barnard, et al., "Matching words and pictures.," Machine Learning Research, vol. 3, pp. 1107--1135, 2003.

Digital Library

[6]

D. Joshi, et al., "The story picturing engine - a system for automatic text illustration.," ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 2(1), 2006.

Digital Library

[7]

Paivio, "Mental representations: A dual coding approach," New York: Oxford University Press., 1986.

[8]

M. Glenberg, "Component-levels theory of the effects of spacing of repetitions on recall and recognition.," Memory and Cognition, vol. 7, pp. 95--112, 1979.

[9]

R. G. Greene, "Spacing effects in memory: Evidence for a two-process account.," Journal of Experimental Psychology: Learning. Memory. and Cognition, vol. 15, pp. 371--377, 1989.

[10]

M. Glenberg and W. E. Langston, "Comprehension of illustrated text: pictures help to build mental models.," Memory and Language, vol. 31, pp. 129--151, 1992.

[11]

R. E. Mayer, Multimedia learning. Cambridge, UK: Cambridge University Press., 2001.

Digital Library

[12]

U. Frith, "A developmental framework for developmental dyslexia," Annals of Dyslexia, vol. 36, pp. 69--81, 1985.

[13]

S. L. H. Association, "Roles and responsibilities of speech- language pathologists with respect to augmentative and alternative communication: Technical report," ASHA Supplement, vol. 24, 2004.

[14]

N. UzZaman, et al., "Pictorial Temporal Structure of Documents to Help People who have Trouble Reading or Understanding., " International Workshop on Design to Read, CHI, Atlanta, GA, 2010.

[15]

J. P. Bigham, et al., "WebAnywhere: A Self-Voicing, Web-Browsing Web Application," International Conference on the World Wide Web, Beijing, China, 2008.

[16]

K. Knight and D. Marcu, "Summarization beyond sentence extraction: a probabilistic approach to sentence compression," Artificial Intelligence, vol. 139, pp. 91--107, 2002.

Digital Library

[17]

J. Pustejovsky, et al., "TimeML: Robust Specication of Event and Temporal Expressions in Text., " in New Directions in Question Answering, 2003.

[18]

J. Pustejovsky and M. Verhagen, "SemEval-2010 task 13: evaluating events, time expressions, and temporal relations (TempEval-2)," Workshop on Semantic Evaluations: Recent Achievements and Future Directions, 2010.

Digital Library

[19]

Y. Matsuo and M. Ishizuka, "Keyword Extraction from a Single Document Using Word Co-Occurrence Statistical Information," International Journal on Artificial Intelligence Tools, vol. 13, pp. 157--170, 2004.

[20]

R. Mihalcea and P. Tarau, "TextRank: Bringing Order into Texts," Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, 2004.

[21]

R. Datta, et al., "Image retrieval: Ideas, influences, and trends of the new age," ACM Comput. Surv., vol. 40, pp. 1--60, 2008.

Digital Library

[22]

Coyne and R. Sproat, "WordsEye: An automatic text-to-scene conversion system," SIG-GRAPH, 2001.

Digital Library

[23]

K. Barnard and D. Forsyth, "Learning the Semantics of Words and Pictures," Eighth International Conference on Computer Vision (ICCV'01), 2001.

[24]

J. Lafferty, et al., "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," International Conference on Machine Learning, 2001.

Digital Library

[25]

N. UzZaman and J. F. Allen, "TRIPS and TRIOS System for TempEval-2: Extracting Temporal Information from Text," International Workshop on Semantic Evaluations, ACL 2010.

Digital Library

[26]

J. F. Allen, et al., "Deep semantic analysis of text," Symposium on Semantics in Systems for Text Processing (STEP), 2008.

Digital Library

[27]

Y. Lin, "ROUGE: A package for automatic evaluation of summaries," ACL Text Summarization Workshop, 2004.

Cited By

Liang PZadeh AMorency L(2024)Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys10.1145/365658056:10(1-42)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3656580
Jangra AMukherjee SJatowt ASaha SHasanuzzaman M(2023)A Survey on Multi-modal SummarizationACM Computing Surveys10.1145/358470055:13s(1-36)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3584700
Ramesh Kashyap AYang YKan M(2023)Scientific document processing: challenges for modern learning methodsInternational Journal on Digital Libraries10.1007/s00799-023-00352-724:4(283-309)Online publication date: 24-Mar-2023
https://doi.org/10.1007/s00799-023-00352-7
Show More Cited By

Index Terms

Multimodal summarization of complex sentences
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Human-centered computing

Recommendations

Graph-based Multimodal Ranking Models for Multimodal Summarization
Multimodal summarization aims to extract the most important information from the multimedia input. It is becoming increasingly popular due to the rapid growth of multimedia data in recent years. There are various researches focusing on different ...
EPICURE - Aspect-based Multimodal Review Summarization
WebSci '18: Proceedings of the 10th ACM Conference on Web Science

Restaurant reviews are popular and a valuable source of information. Often, large number of reviews are written for restaurants which warrants the need for automated summarization systems. In this paper we present epicure, a novel text and image ...
Multi-task Hierarchical Heterogeneous Fusion Framework for multimodal summarization
Abstract
With the rise of multimedia content on the internet, Multimodal Summarization has become a challenging task to help individuals grasp vital information fast. However, previous methods mainly learn the different modalities indistinguishably, which ...
Highlights
- We propose a Multi-task Hierarchical Heterogeneous Fusion Framework for multimodal summarization.
- Fine-grained semantics and cross-modality correlation is explored for summarization generation.
- The proposed framework outperforms ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '11: Proceedings of the 16th international conference on Intelligent user interfaces

February 2011

504 pages

ISBN:9781450304191

DOI:10.1145/1943403

General Chairs:
Pearl Pu
EPFL, Switzerland
,
Michael Pazzani
Rutgers University, USA
,
Program Chairs:
Elisabeth André
Augsburg University, Germany
,
Doug Riecken
CCLS Columbia University, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 February 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

IUI '11

Sponsor:

IUI '11: 16th International Conference on Intelligent User Interfaces

February 13 - 16, 2011

CA, Palo Alto, USA

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
427
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)1

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liang PZadeh AMorency L(2024)Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys10.1145/365658056:10(1-42)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3656580
Jangra AMukherjee SJatowt ASaha SHasanuzzaman M(2023)A Survey on Multi-modal SummarizationACM Computing Surveys10.1145/358470055:13s(1-36)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3584700
Ramesh Kashyap AYang YKan M(2023)Scientific document processing: challenges for modern learning methodsInternational Journal on Digital Libraries10.1007/s00799-023-00352-724:4(283-309)Online publication date: 24-Mar-2023
https://doi.org/10.1007/s00799-023-00352-7
Lu QZhu CYe X(2022)Research on Multimodal Summarization by Integrating Visual and Text Modal Information2022 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA)10.1109/AEECA55500.2022.9919012(882-889)Online publication date: 20-Aug-2022
https://doi.org/10.1109/AEECA55500.2022.9919012
Saiyyad MPatil N(2022)The State of the Art Text Summarization TechniquesApplied Computational Technologies10.1007/978-981-19-2719-5_41(434-447)Online publication date: 15-May-2022
https://doi.org/10.1007/978-981-19-2719-5_41
Jangra ASaha SJatowt AHasanuzzaman MDiaz FShah CSuel TCastells PJones RSakai T(2021)Multi-Modal Supplementary-Complementary Summarization using Multi-Objective OptimizationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462877(818-828)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3462877
Jangra ASaha SJatowt AHasanuzzaman MHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Multi-Modal Summary Generation using Multi-Objective OptimizationProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401232(1745-1748)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401232
Leake MShin HKim JAgrawala MBernhaupt RMueller FVerweij DAndres JMcGrenere JCockburn AAvellino IGoguey ABjørn PZhao SSamson BKocielnik R(2020)Generating Audio-Visual Slideshows from Text Articles Using Word ConcretenessProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376519(1-11)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3313831.3376519
Jangra AJatowt AHasanuzzaman MSaha S(2020)Text-Image-Video Summary Generation Using Joint Integer Linear ProgrammingAdvances in Information Retrieval10.1007/978-3-030-45442-5_24(190-198)Online publication date: 8-Apr-2020
https://doi.org/10.1007/978-3-030-45442-5_24
Liu SWang XCollins CDou WOuyang FEl-Assady MJiang LKeim D(2019)Bridging Text Visualization and Mining: A Task-Driven SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2018.283434125:7(2482-2504)Online publication date: 1-Jul-2019
https://doi.org/10.1109/TVCG.2018.2834341
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten