Clustering and Visualization in a Multi-lingual Multi-document Summarization System

Chen, Hsin-Hsi; Kuo, June-Jei; Su, Tsei-Chun

doi:10.1007/3-540-36618-0_19

Hsin-Hsi Chen⁵,
June-Jei Kuo⁵ &
Tsei-Chun Su⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Included in the following conference series:

European Conference on Information Retrieval

1330 Accesses

Abstract

To measure the similarity of words, sentences, and documents is one of the major issues in multi-lingual multi-document summarization. This paper presents five strategies to compute the multilingual sentence similarity. The experimental results show that sentence alignment without considering the word position or order in a sentence obtains the best performance. Besides, two strategies are proposed for multilingual document clustering. The two-phase strategy (translation after clustering) is better than one-phase strategy (translation before clustering). Translation deferred to sentence clustering, which reduces the propagation of translation errors, is most promising. Moreover, three strategies are proposed to tackle the sentence clustering. Complete link within a cluster has the best performance, however, the subsumption-based clustering has the advantage of lower computation complexity and similar performance. Finally, two visualization models (i.e., focusing and browsing), which consider the users’ language preference, are proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multilingual Multi-document Summarization with Enhanced hLDA Features

Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs

Article 07 June 2018

Query Focused Multi-document Summarization Based on Five-Layered Graph and Universal Paraphrastic Embeddings

References

Barzilay, Regina and Elhadad, Michael (1997) “Using Lexical Chains for Text Summarization”. Proceedings of ACL/EACL 1997 Workshop on The Intelligent Scalable Text Summarization, pp. 10–16.
Google Scholar
Chen, Hsin-Hsi and Huang, Sheng-Jie (1999) “A Summarization System for Chinese News from Multiple Sources”, Proceedings of the 4th International Workshop on Information Retrieval with Asian Languages, Taipei, Taiwan, pp. 1–7.
Google Scholar
Chen, Hsin-Hsi and Lin, Chuan-Jie (2000) “A Multilingual News Summarizer”, Proceedings of 18th International Conference on Computational Linguistics, pp. 159–165.
Google Scholar
Chen, Hsin-Hsi, Lin, Chi-Ching and Lin, Wen-Cheng (2002) “Building a Chinese-English WordNet for Translingual Applications,” ACM Transactions on Asian Language Information Processing, 1(2), pp. 103–122.
Article Google Scholar
Fellbaum, Christinae., Ed. (1998) WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
MATH Google Scholar
Goldstein, Jade, Mittal, Vibhu, Carbonell, Jaime and Callan, Jamie (2000) “Creating and Evaluating Multi-Document Sentence Extract Summaries”, Proceedings of the 2000 ACM International Conference on Information and Knowledge Management, pp. 165–172.
Google Scholar
Hatzivassiloglou, Vasileios, Klavans, Judith L., Holcombe, Melissa L. Barzilay, Regina, Kan, Min-Yen and Mckeown, Kathleen R. (2001) “SIMFINDER: A Flexible Clustering Tool for Summarization”, Proceedings of NAACL 2001 Workshop on Automation Summarization, pp. 41–49.
Google Scholar
Mani, Inderjeet and Bloedorn, Eric (1999) “Summarizing Similarities and Difference among Related Documents”,. Information Retrieval, 1(1–2), pp. 35–67.
Article Google Scholar
Mckeown, Kathleen, Klavans, Judith L., Hatzivassiloglou, Vasileios, Barzilay, Regina and Eskin, Eleazar (1999) “Towards Multi-document Summarization by Reformulation”, Proceedings of AAAI-99, pp. 453–460.
Google Scholar
Radev, Dragomir.R., Jing, Hongyan and Budzikowska, Malgorzata (2000) “Centroid-based Summarization of Multiple Documents: Sentence Extraction, Utility-based Evaluation, and User Studies”, Proceedings of Workshop on Summarization, ANLP/NAACL, 2000.
Google Scholar
Vossen, Piek (1998) “EuroWordNet: Building a Multilingual Database with Wordnets for European languages”, The ELRA Newsletter, 3(1), 7–10.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Hsin-Hsi Chen, June-Jei Kuo & Tsei-Chun Su

Authors

Hsin-Hsi Chen
View author publications
You can also search for this author in PubMed Google Scholar
June-Jei Kuo
View author publications
You can also search for this author in PubMed Google Scholar
Tsei-Chun Su
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via Giuseppe Moruzzi, 1, 56124, Pisa, Italy
Fabrizio Sebastiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, HH., Kuo, JJ., Su, TC. (2003). Clustering and Visualization in a Multi-lingual Multi-document Summarization System. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_19

Download citation

DOI: https://doi.org/10.1007/3-540-36618-0_19
Published: 15 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-01274-0
Online ISBN: 978-3-540-36618-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics