research-article

Exploiting User Comments for Document Summarization with Matrix Factorization

Authors:

Minh-Tien Nguyen,

Tran Viet Cuong,

Nguyen Xuan HoaiAuthors Info & Claims

SoICT '19: Proceedings of the 10th International Symposium on Information and Communication Technology

Pages 118 - 124

https://doi.org/10.1145/3368926.3369699

Published: 04 December 2019 Publication History

Abstract

Social media presents a new method for readers who can freely discuss the content of an event mentioned in a Web document by posting relevant comments. The comments provide additional information which can be used to enrich the information of the main document. This paper introduces a new model which integrates user comments into the summarization process. While prior methods consider the same topic number between sentences and comments of a document, we argue that sentences and comments should own their different topics and they also share common hidden topics in term of same or inferred words. From this, we define a new objective function which jointly combines sentences and comments to achieve global optimization. The objective function is optimized by our non-negative matrix factorization algorithm to find out weights of sentence-matrix and comment-matrix for ranking sentences and comments. Experimental results on two datasets in English and Vietnamese show that our model achieves promising results for single-document summarization.

References

[1]

Kathleen McKeown Ani Nenkova. 2011. Automatic Summarization. Foundations and Trends in Information Retrieval, 5(2-3): 103--233 (2011).

[2]

Siddhartha Banerjee, Prasenjit Mitra, and Kazunari Sugiyama. 2015. Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression. In Twenty-Fourth International Joint Conference on Artificial Intelligence, pp. 1208--1214.

Digital Library

[3]

Ziqiang Cao, Furu Wei, Sujian Li, Wenjie Li, Ming Zhou, and Houfeng Wang. 2015. Learning Summary Prior Representation for Extractive Summarization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics ACL (2), pp. 829--833. Association for Computational Linguistics.

[4]

Gunes Erkan and Dragomir R. Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, pp. 457--479 (2004).

Digital Library

[5]

Wei Gao, Peng Li, and Kareem Darwish. 2012. Joint Topic Modeling for Event Summarization across News and Social Media Streams. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1173--1182. ACM.

Digital Library

[6]

Yihong Gong and Xin Liu. 2001. Generic Text Summarization using Relevant Measure and Latent Semantic Analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19--25. ACM.

Digital Library

[7]

Masahiro Kohjima, Tatsushi Matsubayashi, and Hiroshi Sawada. 2015. Probabilistic Non-negative Inconsistent-resolution Matrices Factorization. In 24th ACM International on Conference on Information and Knowledge Management (CIKM), pp. 1855--1858.

[8]

Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, no. 6755, pp. 788--791 (1999).

[9]

Daniel D. Lee and H. Sebastian Seung. 2001. Algorithms for non-negative matrix factorization, In Advances in Neural Information Processing Systems, pp. 556--562. Advances in Neural Information Processing Systems 13 (2001).

[10]

Ju-Hong Lee, Sun Park, Chan-Min Ahn, and Daeho Kim. 2009. Automatic generic document summarization based on non-negative matrix factorization. Information Processing & Management 45(1), pp. 20--34 (2009).

Digital Library

[11]

Chen Li, Zhongyu Wei, Yang Liu, Yang Jin, and Fei Huang. 2016. Using Relevant Public Posts to Enhance News Article Summarization. In COLING, pp. 557--566.

[12]

Chih-Jen Lin. 2007. Projected gradient methods for nonnegative matrix factorization. Neural computation, 19(10), pp. 2756--2779 (2007).

[13]

Chin-Yew Lin and Eduard H. Hovy. 2003. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 71--78. Association for Computational Linguistics.

[14]

Preslav Nakov, Antonia Popova, and Plamen Mateev. 2001. Weight functions impact on LSA performance. In EuroConference RANLP, pp. 187--193.

[15]

Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), pp. 3075--3081.

Digital Library

[16]

Ani Nenkova. 2005. Automatic text summarization of newswire: lessons learned from the document understanding conference. In AAAI, vol. 5, pp. 1436--1441.

Digital Library

[17]

Minh-Tien Nguyen, Tran Viet Cuong, Nguyen Xuan Hoai, and Minh-Le Nguyen. 2017. Utilizing User Posts to Enrich Web Document Summarization with Matrix Co-factorization. In Proceedings of the Eighth International Symposium on Information and Communication Technology (SoICT), pp. 70--77. ACM.

Digital Library

[18]

Minh-Tien Nguyen, Viet Dac Lai, Phong-Khac Do, Duc-Vu Tran, and Minh-Le Nguyen. 2016. VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization. In The 12th Workshop on Asian Language Resources, pp. 38--48. Association for Computational Linguistics.

[19]

Minh-Tien Nguyen and Minh-Le Nguyen. 2016. SoRTESum: A Social Context Framework for Single-Document Summarization. In European Conference on Information Retrieval, pp. 3--14. Springer International Publishing.

[20]

Minh-Tien Nguyen, Chien-Xuan Tran, Duc-Vu Tran, and Minh-Le Nguyen. 2016. SoLSCSum: A Linked Sentence-Comment Dataset for Social Context Summarization. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2409--2412. ACM.

Digital Library

[21]

Minh-Tien Nguyen, Duc-Vu Tran, and Minh-Le Nguyen. 2018. Social Context Summarization using User-generated Content and Third-party Sources. Knowledge-Based Systems, 144(2018), pp. 51--64. Elsevier (2018).

[22]

Minh-Tien Nguyen, Viet Cuong Tran, Xuan Hoai Nguyen, and Le-Minh Nguyen. 2019. Web Document Summarization by Exploiting Social Context with Matrix Co-factorization. Information Processing & Management, 56(3), pp. 495--515 (2019).

Digital Library

[23]

Sun Park, Ju-Hong Lee, Chan-Min Ahn, Jun Sik Hong, and Seok-Ju Chun. 2006. Query Based Summarization Using Non-negative Matrix Factorization. In Knowledge-Based Intelligent Information and Engineering Systems, pp. 84--89. Springer Berlin/Heidelberg.

[24]

Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, and Zheng Chen. 2007. Document Summarization Using Conditional Random Fields. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), vol. 7, pp. 2862--2867.

Digital Library

[25]

Doina Tatar, Emma Tamaianu-Morita, Andreea Mihis, and Dana Lupsa. 2008. Summarization by logic segmentation and text entailment. In Advances in Natural Language Processing and Applications (CICLING), pp. 15--26.

[26]

Dingding Wang, Tao Li, Shenghuo Zhu, and Chris Ding. 2008. Multidocument summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307--314. ACM.

Digital Library

[27]

Zhongyu Wei and Wei Gao. 2014. Utilizing Microblogs for Automatic News Highlights Extraction. In COLING, pp. 872--883.

[28]

Zhongyu Wei and Wei Gao. 2015. Gibberish, Assistant, or Master?: Using Tweets Linking to News for Extractive Single-Document Summarization. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1003--1006. ACM.

Digital Library

[29]

Kristian Woodsend and Mirella Lapata. 2012. Multiple Aspect Summarization Using Integer Linear Programming. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 233--243. Association for Computational Linguistics.

Digital Library

[30]

Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, and Juanzi Li. 2011. Social Context Summarization. In Proceedings of the 34th International SIGIR Conference on Research and Development in Information Retrieval, pp. 255--264. ACM.

Digital Library

Cited By

Wang JRao YShi XChen X(2021)Picture Preview Generation for Interactive Educational ResourcesComplexity10.1155/2021/55368672021(1-14)Online publication date: 12-May-2021
https://doi.org/10.1155/2021/5536867
Wang JXie LShi XChen XLi XRao Y(2020)Preview Generation for Mathematical Interactive Educational Resources in Netpad2020 8th International Conference on Digital Home (ICDH)10.1109/ICDH51081.2020.00045(221-226)Online publication date: Sep-2020
https://doi.org/10.1109/ICDH51081.2020.00045

Index Terms

Exploiting User Comments for Document Summarization with Matrix Factorization
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Summarization

Recommendations

Utilizing User Posts to Enrich Web Document Summarization with Matrix Co-factorization
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology

In the context of social media, users tend to post relevant information corresponding to an event mentioned in a Web document. This paper presents a model to capture the nature of the relationships between sentences and user posts such as relevant ...
Exploiting User Posts for Web Document Summarization

Relevant user posts such as comments or tweets of a Web document provide additional valuable information to enrich the content of this document. When creating user posts, readers tend to borrow salient words or phrases in sentences. This can be ...
Web document summarization by exploiting social context with matrix co-factorization
Abstract
In the context of social media, users usually post relevant information corresponding to the contents of events mentioned in a Web document. This information posses two important values in that (i) it reflects the content of an event ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SoICT '19: Proceedings of the 10th International Symposium on Information and Communication Technology

December 2019

551 pages

ISBN:9781450372459

DOI:10.1145/3368926

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SOICT: School of Information and Communication Technology - HUST
NAFOSTED: The National Foundation for Science and Technology Development

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

UTEHY.L.2019.53

Conference

SoICT 2019

SoICT 2019: The Tenth International Symposium on Information and Communication Technology

December 4 - 6, 2019

Ha Long Bay, Hanoi, Viet Nam

Acceptance Rates

Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
41
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang JRao YShi XChen X(2021)Picture Preview Generation for Interactive Educational ResourcesComplexity10.1155/2021/55368672021(1-14)Online publication date: 12-May-2021
https://doi.org/10.1155/2021/5536867
Wang JXie LShi XChen XLi XRao Y(2020)Preview Generation for Mathematical Interactive Educational Resources in Netpad2020 8th International Conference on Digital Home (ICDH)10.1109/ICDH51081.2020.00045(221-226)Online publication date: Sep-2020
https://doi.org/10.1109/ICDH51081.2020.00045

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten