Skip to main content

Clustering and Visualization in a Multi-lingual Multi-document Summarization System

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2633))

Abstract

To measure the similarity of words, sentences, and documents is one of the major issues in multi-lingual multi-document summarization. This paper presents five strategies to compute the multilingual sentence similarity. The experimental results show that sentence alignment without considering the word position or order in a sentence obtains the best performance. Besides, two strategies are proposed for multilingual document clustering. The two-phase strategy (translation after clustering) is better than one-phase strategy (translation before clustering). Translation deferred to sentence clustering, which reduces the propagation of translation errors, is most promising. Moreover, three strategies are proposed to tackle the sentence clustering. Complete link within a cluster has the best performance, however, the subsumption-based clustering has the advantage of lower computation complexity and similar performance. Finally, two visualization models (i.e., focusing and browsing), which consider the users’ language preference, are proposed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barzilay, Regina and Elhadad, Michael (1997) “Using Lexical Chains for Text Summarization”. Proceedings of ACL/EACL 1997 Workshop on The Intelligent Scalable Text Summarization, pp. 10–16.

    Google Scholar 

  2. Chen, Hsin-Hsi and Huang, Sheng-Jie (1999) “A Summarization System for Chinese News from Multiple Sources”, Proceedings of the 4th International Workshop on Information Retrieval with Asian Languages, Taipei, Taiwan, pp. 1–7.

    Google Scholar 

  3. Chen, Hsin-Hsi and Lin, Chuan-Jie (2000) “A Multilingual News Summarizer”, Proceedings of 18th International Conference on Computational Linguistics, pp. 159–165.

    Google Scholar 

  4. Chen, Hsin-Hsi, Lin, Chi-Ching and Lin, Wen-Cheng (2002) “Building a Chinese-English WordNet for Translingual Applications,” ACM Transactions on Asian Language Information Processing, 1(2), pp. 103–122.

    Article  Google Scholar 

  5. Fellbaum, Christinae., Ed. (1998) WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.

    MATH  Google Scholar 

  6. Goldstein, Jade, Mittal, Vibhu, Carbonell, Jaime and Callan, Jamie (2000) “Creating and Evaluating Multi-Document Sentence Extract Summaries”, Proceedings of the 2000 ACM International Conference on Information and Knowledge Management, pp. 165–172.

    Google Scholar 

  7. Hatzivassiloglou, Vasileios, Klavans, Judith L., Holcombe, Melissa L. Barzilay, Regina, Kan, Min-Yen and Mckeown, Kathleen R. (2001) “SIMFINDER: A Flexible Clustering Tool for Summarization”, Proceedings of NAACL 2001 Workshop on Automation Summarization, pp. 41–49.

    Google Scholar 

  8. Mani, Inderjeet and Bloedorn, Eric (1999) “Summarizing Similarities and Difference among Related Documents”,. Information Retrieval, 1(1–2), pp. 35–67.

    Article  Google Scholar 

  9. Mckeown, Kathleen, Klavans, Judith L., Hatzivassiloglou, Vasileios, Barzilay, Regina and Eskin, Eleazar (1999) “Towards Multi-document Summarization by Reformulation”, Proceedings of AAAI-99, pp. 453–460.

    Google Scholar 

  10. Radev, Dragomir.R., Jing, Hongyan and Budzikowska, Malgorzata (2000) “Centroid-based Summarization of Multiple Documents: Sentence Extraction, Utility-based Evaluation, and User Studies”, Proceedings of Workshop on Summarization, ANLP/NAACL, 2000.

    Google Scholar 

  11. Vossen, Piek (1998) “EuroWordNet: Building a Multilingual Database with Wordnets for European languages”, The ELRA Newsletter, 3(1), 7–10.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, HH., Kuo, JJ., Su, TC. (2003). Clustering and Visualization in a Multi-lingual Multi-document Summarization System. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-36618-0_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-01274-0

  • Online ISBN: 978-3-540-36618-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics