skip to main content
10.1145/1943403.1943412acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Multimodal summarization of complex sentences

Published: 13 February 2011 Publication History

Abstract

In this paper, we introduce the idea of automatically illustrating complex sentences as multimodal summaries that combine pictures, structure and simplified compressed text. By including text and structure in addition to pictures, multimodal summaries provide additional clues of what happened, who did it, to whom and how, to people who may have difficulty reading or who are looking to skim quickly. We present ROC-MMS, a system for automatically creating multimodal summaries (MMS) of complex sentences by generating pictures, textual summaries and structure. We show that pictures alone are insufficient to help people understand most sentences, especially for readers who are unfamiliar with the domain. An evaluation of ROC-MMS in the Wikipedia domain illustrates both the promise and challenge of automatically creating multimodal summaries.

References

[1]
R. N. Carney and J. R. Levin, "Pictorial Illustrations Still Improve Students' Learning from Text," Educational Psychology Review, vol. 14, 2002.
[2]
B. Goldberg, et al., "Easy as ABC? Facilitating pictorial communication via semantically enhanced layout.," Twelfth International Conference on Computational Natural Language Learning, 2008.
[3]
R. Mihalcea and B. Leong, "Toward communicating simple sentences using pictorial representations," presented at the Association of Machine Translation in the Americas., 2006.
[4]
J. Zhu, et al., "A text-to-picture synthesis system for augmenting communication.," in The Integrated Intelligence Track of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007.
[5]
K. Barnard, et al., "Matching words and pictures.," Machine Learning Research, vol. 3, pp. 1107--1135, 2003.
[6]
D. Joshi, et al., "The story picturing engine - a system for automatic text illustration.," ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 2(1), 2006.
[7]
Paivio, "Mental representations: A dual coding approach," New York: Oxford University Press., 1986.
[8]
M. Glenberg, "Component-levels theory of the effects of spacing of repetitions on recall and recognition.," Memory and Cognition, vol. 7, pp. 95--112, 1979.
[9]
R. G. Greene, "Spacing effects in memory: Evidence for a two-process account.," Journal of Experimental Psychology: Learning. Memory. and Cognition, vol. 15, pp. 371--377, 1989.
[10]
M. Glenberg and W. E. Langston, "Comprehension of illustrated text: pictures help to build mental models.," Memory and Language, vol. 31, pp. 129--151, 1992.
[11]
R. E. Mayer, Multimedia learning. Cambridge, UK: Cambridge University Press., 2001.
[12]
U. Frith, "A developmental framework for developmental dyslexia," Annals of Dyslexia, vol. 36, pp. 69--81, 1985.
[13]
S. L. H. Association, "Roles and responsibilities of speech- language pathologists with respect to augmentative and alternative communication: Technical report," ASHA Supplement, vol. 24, 2004.
[14]
N. UzZaman, et al., "Pictorial Temporal Structure of Documents to Help People who have Trouble Reading or Understanding., " International Workshop on Design to Read, CHI, Atlanta, GA, 2010.
[15]
J. P. Bigham, et al., "WebAnywhere: A Self-Voicing, Web-Browsing Web Application," International Conference on the World Wide Web, Beijing, China, 2008.
[16]
K. Knight and D. Marcu, "Summarization beyond sentence extraction: a probabilistic approach to sentence compression," Artificial Intelligence, vol. 139, pp. 91--107, 2002.
[17]
J. Pustejovsky, et al., "TimeML: Robust Specication of Event and Temporal Expressions in Text., " in New Directions in Question Answering, 2003.
[18]
J. Pustejovsky and M. Verhagen, "SemEval-2010 task 13: evaluating events, time expressions, and temporal relations (TempEval-2)," Workshop on Semantic Evaluations: Recent Achievements and Future Directions, 2010.
[19]
Y. Matsuo and M. Ishizuka, "Keyword Extraction from a Single Document Using Word Co-Occurrence Statistical Information," International Journal on Artificial Intelligence Tools, vol. 13, pp. 157--170, 2004.
[20]
R. Mihalcea and P. Tarau, "TextRank: Bringing Order into Texts," Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, 2004.
[21]
R. Datta, et al., "Image retrieval: Ideas, influences, and trends of the new age," ACM Comput. Surv., vol. 40, pp. 1--60, 2008.
[22]
Coyne and R. Sproat, "WordsEye: An automatic text-to-scene conversion system," SIG-GRAPH, 2001.
[23]
K. Barnard and D. Forsyth, "Learning the Semantics of Words and Pictures," Eighth International Conference on Computer Vision (ICCV'01), 2001.
[24]
J. Lafferty, et al., "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," International Conference on Machine Learning, 2001.
[25]
N. UzZaman and J. F. Allen, "TRIPS and TRIOS System for TempEval-2: Extracting Temporal Information from Text," International Workshop on Semantic Evaluations, ACL 2010.
[26]
J. F. Allen, et al., "Deep semantic analysis of text," Symposium on Semantics in Systems for Text Processing (STEP), 2008.
[27]
Y. Lin, "ROUGE: A package for automatic evaluation of summaries," ACL Text Summarization Workshop, 2004.

Cited By

View all
  • (2024)Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys10.1145/365658056:10(1-42)Online publication date: 22-Jun-2024
  • (2023)A Survey on Multi-modal SummarizationACM Computing Surveys10.1145/358470055:13s(1-36)Online publication date: 13-Jul-2023
  • (2023)Scientific document processing: challenges for modern learning methodsInternational Journal on Digital Libraries10.1007/s00799-023-00352-724:4(283-309)Online publication date: 24-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '11: Proceedings of the 16th international conference on Intelligent user interfaces
February 2011
504 pages
ISBN:9781450304191
DOI:10.1145/1943403
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 February 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AAC
  2. MMS
  3. ROC MMS
  4. augmentative and alternative communication
  5. automatic illustration
  6. illustration
  7. multimodal summarization
  8. pictorial representation
  9. picture
  10. sentence compression
  11. summarization
  12. text-to-picture
  13. visualization

Qualifiers

  • Research-article

Conference

IUI '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys10.1145/365658056:10(1-42)Online publication date: 22-Jun-2024
  • (2023)A Survey on Multi-modal SummarizationACM Computing Surveys10.1145/358470055:13s(1-36)Online publication date: 13-Jul-2023
  • (2023)Scientific document processing: challenges for modern learning methodsInternational Journal on Digital Libraries10.1007/s00799-023-00352-724:4(283-309)Online publication date: 24-Mar-2023
  • (2022)Research on Multimodal Summarization by Integrating Visual and Text Modal Information2022 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA)10.1109/AEECA55500.2022.9919012(882-889)Online publication date: 20-Aug-2022
  • (2022)The State of the Art Text Summarization TechniquesApplied Computational Technologies10.1007/978-981-19-2719-5_41(434-447)Online publication date: 15-May-2022
  • (2021)Multi-Modal Supplementary-Complementary Summarization using Multi-Objective OptimizationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462877(818-828)Online publication date: 11-Jul-2021
  • (2020)Multi-Modal Summary Generation using Multi-Objective OptimizationProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401232(1745-1748)Online publication date: 25-Jul-2020
  • (2020)Generating Audio-Visual Slideshows from Text Articles Using Word ConcretenessProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376519(1-11)Online publication date: 21-Apr-2020
  • (2020)Text-Image-Video Summary Generation Using Joint Integer Linear ProgrammingAdvances in Information Retrieval10.1007/978-3-030-45442-5_24(190-198)Online publication date: 8-Apr-2020
  • (2019)Bridging Text Visualization and Mining: A Task-Driven SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2018.283434125:7(2482-2504)Online publication date: 1-Jul-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media