Multi-document Summarization Based on Cluster Using Non-negative Matrix Factorization

Park, Sun; Lee, Ju-Hong; Kim, Deok-Hwan; Ahn, Chan-Min

doi:10.1007/978-3-540-69507-3_66

Sun Park¹,
Ju-Hong Lee²,
Deok-Hwan Kim³ &
…
Chan-Min Ahn¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4362))

Included in the following conference series:

International Conference on Current Trends in Theory and Practice of Computer Science

1763 Accesses
10 Citations

Abstract

In this paper, a new summarization method, which uses non-negative matrix factorization (NMF) and K-means clustering, is introduced to extract meaningful sentences from multi-documents. The proposed method can improve the quality of document summaries because the inherent semantics of the documents are well reflected by using the semantic features calculated by NMF and the sentences most relevant to the given topic are extracted efficiently by using the semantic variables derived by NMF. Besides, it uses K-means clustering to remove noises so that it can avoid the biased inherent semantics of the documents to be reflected in summaries. We perform detail experiments with the well-known DUC test dataset. The experimental results demonstrate that the proposed method has better performance than other methods using the LSA, the Kmeans, and the NMF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chin-Yew, L.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL (2004)
Google Scholar
Chuang, W.T., Yang, J.: Extracting Sentence Segments for Text Summarization: A Machine Learning Approach. In: Proceeding of ACM SIGIR, pp. 152–159 (2000)
Google Scholar
Goldstein, J., Mittal, V., Carbonell, J., Callan, J.: Creating and Evaluating Multi-Document Sentence Extract Summaries. In: The Proceeding of CIKM, pp. 165–172 (2000)
Google Scholar
Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: Proceeding of ACM SIGIR, pp. 19–25 (2001)
Google Scholar
Hachey, B., Murray, G., Reitter, D.: The Embra System at DUC 2005: Query-Oriented Multi-Document Summarization with a Very Large Latent Semantic Space. In: Proceedings of the DUC (2005)
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Harabagiu, S., Finley, L.: Topic Themes for Multi-Document Summarization. In: Proceeding of ACM SIGIR, pp. 202–209 (2005)
Google Scholar
Hoa, H.D.: Overview of DUC 2005. In: Proceedings of the DUC (2005)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non-Negative Matrix Factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for Non-Negative Matrix Factorization. Advances in Neural Information Processing Systems 13, 556–562 (2000)
Google Scholar
Mani, I.: Automatic Summarization. John Benjamins, Amsterdam (2001)
MATH Google Scholar
Park, S., Lee, J.-H., Ahn, C.-M., Hong, J.S., Chun, S.-J.: Query Based Summarization Using Non-negative Matrix Factorization. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds.) KES 2006. LNCS (LNAI), vol. 4253, pp. 84–89. Springer, Heidelberg (2006)
Chapter Google Scholar
Radev, D.R., Hovy, E., Mckeown, K.: Introduction to the Special Issue on Summarization. In: Blikle, A. (ed.) MFCS 1974. LNCS, vol. 28, pp. 399–408. Springer, Heidelberg (1975)
Google Scholar
Ricardo, B.Y., Berthier, R.N.: Moden Information Retrieval. ACM Press, New York (1999)
Google Scholar
Sakurai, T., Utsumi, A.: Query-based Multidocument Summarization for Information Retrieval. In: The Proceeding of NTCIR (2004)
Google Scholar
Sassion, H.: Topic-Based Summarization at DUC 2005. In: Proceedings of DUC (2005)
Google Scholar
Varadarajan, R., Hristidis, V.: Structure-Based Query-Specific Document Summarization. In: The Proceeding of CIKM, pp. 231–232 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Information Engineering, Inha University, Incheon, Korea
Sun Park & Chan-Min Ahn
Department of Computer Science & Information Engineering, Inha University, Incheon, Korea
Ju-Hong Lee
Department of Electronics Engineering, Inha University,
Deok-Hwan Kim

Authors

Sun Park
View author publications
You can also search for this author in PubMed Google Scholar
Ju-Hong Lee
View author publications
You can also search for this author in PubMed Google Scholar
Deok-Hwan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Chan-Min Ahn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jan van Leeuwen Giuseppe F. Italiano Wiebe van der Hoek Christoph Meinel Harald Sack František Plášil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, S., Lee, JH., Kim, DH., Ahn, CM. (2007). Multi-document Summarization Based on Cluster Using Non-negative Matrix Factorization. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds) SOFSEM 2007: Theory and Practice of Computer Science. SOFSEM 2007. Lecture Notes in Computer Science, vol 4362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69507-3_66

Download citation

DOI: https://doi.org/10.1007/978-3-540-69507-3_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69506-6
Online ISBN: 978-3-540-69507-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics