Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis

Yeh, Jen-Yuan; Ke, Hao-Ren; Yang, Wei-Pang

doi:10.1007/3-540-36227-4_8

Jen-Yuan Yeh⁶,
Hao-Ren Ke⁷ &
Wei-Pang Yang⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2555))

Included in the following conference series:

International Conference on Asian Digital Libraries

1257 Accesses
12 Citations

Abstract

In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aone, C., Okurowski, M.E., Gorlinsky, J., Larsen, B.: A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In: Mani, I., Maybury, M. (eds.): Advances in Automated Text Summarization. MIT Press (1999) 71–80
Google Scholar
Azzam, S., Humphreys, K., Gaizauskas, R.: Using Coreference Chains for Text Summarization. Processing of the ACL’99 Workshop on Coreference and its Applications. ACL, Baltimore (1999)
Google Scholar
Barzilay, R., Elhadad, M.: Using Lexical Chains for Text Summarization. Processing of the Workshop on Intelligent Scalable Text Summarization. (1997)
Google Scholar
Bellegarda, J.R., Butzberger, J.W., Chow, Y.L.: A Novel Word Clustering Algorithm Based on Latent Semantic Analysis. Conference on Acoustics, Speech, and Signal Processing, Vol. 1. IEEE (1996) 172–175
Google Scholar
Edmundson, H.P.: New Methods in Automatic Extracting. In: Mani, I., Maybury, M. (eds.): Advances in Automated Text Summarization. MIT Press (1999) 23–42
Google Scholar
Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. SIGIR. ACM, New Orleans Louisiana (2001)
Google Scholar
Habn, U., Mani, I.: The Challenge of Automatic Summarization. Computer, Vol. 33, No. 2000. IEEE (2000) 29–36
Google Scholar
Han, J., Kember, M.: In Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers (2001)
Google Scholar
Hovy, E., Lin, C.Y.: Automated Text Summarization in SUMMARIST. In: Mani, I., Maybury, M. (eds.): Advances in Automated Text Summarization. MIT Press (1999) 81–94
Google Scholar
Kim, J.H., Kim, J.H., Hwang, D.: Korean Text Summarization Using an Aggregative Similarity. Processing of the 5th International Workshop on Information Retrieval with Asian Languages. ACM (2000)
Google Scholar
Kowalski, G. (ed.): Information Retrieval Systems: Theory and Implementation. Kluwer Academic Publishers (1997)
Google Scholar
Kupiec, J., Pedersen, J., Chen, F.: A Trainable Document Summarizer. SIGIR. ACM, Seattle Washington (1995)
Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An Introduction to Latent Semantic Analysis. Discourse Processes, Vol. 25. (1998) 259–284
Article Google Scholar
Lin, C.Y.: Training a Selection Function for Extraction. CIKM. ACM, Kansas City (1999)
Google Scholar
Mani, I., Maybury, M. (eds.): Advances in Automated Text Summarization. MIT Press (1999)
Google Scholar
McKeown, K.R., Radev, D.R.: Generating Summaries of Multiple News Articles. SIGIR. ACM, Seattle Washington (1995) 74–82
Google Scholar
Myaeng, S.H., Jang, D.: Development and Evaluation of a Statistical Based Document System. In: Mani, I., Maybury, M. (eds.): Advances in Automated Text Summarization. MIT Press (1999) 61–70
Google Scholar
Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic Text Structuring and Summarization. Information Processing & Management, Vol. 33, No. 2. Elsevier (1997) 193–207
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer & Information Science, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C.
Jen-Yuan Yeh & Wei-Pang Yang
Digital Library & Information Section of Library, National Chiao-Tung University, 1001 Ta Hsueh Rd., 30050, Hsinchu, Taiwan, R.O.C.
Hao-Ren Ke

Authors

Jen-Yuan Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Hao-Ren Ke
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Pang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore
Ee- Peng Lim , Schubert Foo & Chris Khoo , &
University of Arizona, USA
Hsinchun Chen
Virginia Tech, USA
Edward Fox
University of Mysore, Mysore
Shalini Urs
IEI-CNR, Italy
Thanos Costantino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yeh, JY., Ke, HR., Yang, WP. (2002). Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis. In: Lim, E.P., et al. Digital Libraries: People, Knowledge, and Technology. ICADL 2002. Lecture Notes in Computer Science, vol 2555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36227-4_8

Download citation

DOI: https://doi.org/10.1007/3-540-36227-4_8
Published: 16 December 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00261-1
Online ISBN: 978-3-540-36227-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics