Skip to main content

Exploring Clustering for Multi-document Arabic Summarisation

  • Conference paper
Information Retrieval Technology (AIRS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7097))

Included in the following conference series:

Abstract

In this paper we explore clustering for multi-document Arabic summarisation. For our evaluation we use an Arabic version of the DUC-2002 dataset that we previously generated using Google Translate. We explore how clustering (at the sentence level) can be applied to multi-document summarisation as well as for redundancy elimination within this process. We use different parameter settings including the cluster size and the selection model applied in the extractive summarisation process. The automatically generated summaries are evaluated using the ROUGE metric, as well as precision and recall. The results we achieve are compared with the top five systems in the DUC-2002 multi-document summarisation task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Funk, A., Maynard, D., Saggion, H., Bontcheva, K.: Ontological integration of information extracted from multiple sources. In: In the Multi-source Multilingual Information Extraction and Summarization (MMIES) Workshop at Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria (2007)

    Google Scholar 

  2. Berger, A., Mittal, V.O.: Query-relevant summarization using FAQs. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, ACL 2000, pp. 294–301. Association for Computational Linguistics, Stroudsburg (2000)

    Google Scholar 

  3. Brandow, R., Mitze, K., Rau, L.F.: Automatic condensation of electronic publications by sentence selection. Inf. Process. Manage. 31, 675–685 (1995)

    Article  Google Scholar 

  4. Douzidia, F.S., Lapalme, G.: Lakhas, an Arabic summarising system. In: In the Proceedings of the Document Understanding Conferences (DUC) Workshop, pp. 128–135. DUC (2004)

    Google Scholar 

  5. Document Understanding Conference (DUC) dataset (2002), http://duc.nist.gov/

  6. Dunlavy, D.M., O’Leary, D.P., Conroy, J.M., Schlesinger, J.D.: Qcs: A system for querying, clustering and summarizing documents. Inf. Process. Manage. 43, 1588–1605 (2007)

    Article  Google Scholar 

  7. El-Haj, M., Kruschwitz, U., Fox, C.: Multi-document Arabic text summarisation. In: Proceedings of the third Computer science and Electronic Engineering Conference. IEEE, Colchester (2011)

    Google Scholar 

  8. Fattah, M.A., Ren, F.: Automatic text summarization. Proceedings of World Academy of Science 27, 192–195 (2008)

    Google Scholar 

  9. Fiszman, M., Demner-Fushman, D., Kilicoglu, H., Rindflesch, T.C.: Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation. Jouranl of Biomedical Informatics 42(5), 801–813 (2009)

    Article  Google Scholar 

  10. Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in +Information Retrieval, SIGIR 2001, pp. 19–25. ACM, New York (2001)

    Google Scholar 

  11. Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. on Knowl. and Data Eng. 16, 1279–1296 (2004)

    Article  Google Scholar 

  12. Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1), 100–108 (1979)

    MATH  Google Scholar 

  13. Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: X Machine Translation Summit, pp. 79–86. Phuket, Thailand (2005)

    Google Scholar 

  14. Lin, C.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)

    Google Scholar 

  15. Liu, S., Lindroos, J.: Towards fast digestion of IMF staff reports with automated text summarization systems. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 978–982. IEEE Computer Society (2006)

    Google Scholar 

  16. McKeown, K., Radev, D.R.: Generating summaries of multiple news articles. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1995, pp. 74–82. ACM, New York (1995)

    Google Scholar 

  17. Mihalcea, R., Tarau, P.: Multi-Document Summarization with Iterative Graph-based Algorithms. In: The First International Conference on Intelligent Analysis Methods and Tools (IA 2005), McLean, VA (2005)

    Google Scholar 

  18. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL 2002, pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

  19. Radev, D.R., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, NAACL-ANLP-AutoSum 2000, vol. 4, pp. 21–30. Association for Computational Linguistics, Stroudsburg (2000)

    Google Scholar 

  20. Radev, D.R., Jing, H., Sty, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)

    Article  MATH  Google Scholar 

  21. Salhi, H.: Small parallel corpora in an English-Arabic translation classroom: No need to reinvent the wheel in the era of globalization. In: Globalisation and Aspects of Translation, pp. 53–67. Cambridge Scholars Publishing, Newcastle (2010)

    Google Scholar 

  22. Salton, G., Wong, A., Yang, S.: A vector space model for automatic indexing. Proceedings of the Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  23. Sarkar, K.: Centroid-based summarization of multiple documents. TECHNIA — International Journal of Computing Science and Communication Technologies 2 (2009)

    Google Scholar 

  24. Schlesinger, J.D., O’Leary, D.P., Conroy, J.M.: Arabic/English Multi-document Summarization with CLASSY—The Past and the Future. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 568–581. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  25. Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The jrc-acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proceedings of LREC, Genova, Italy, pp. 24–26 (2006)

    Google Scholar 

  26. Turchi, M., Steinberger, J., Kabadjov, M., Steinberger, R.: Using Parallel Corpora for Multilingual (Multi-document) Summarisation Evaluation. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds.) CLEF 2010. LNCS, vol. 6360, pp. 52–63. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  27. Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 299–306. ACM, New York (2008)

    Google Scholar 

  28. Yeh, J.Y., Ke, H.R., Yang, W.P.: iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network. Expert Systems with Applications 35(3), 1451–1462 (2008)

    Article  Google Scholar 

  29. Zhao, L., Wu, L., Huang, X.: Using query expansion in graph-based approach for query-focused multi-document summarization. Inf. Process. Manage. 45, 35–41 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

El-Haj, M., Kruschwitz, U., Fox, C. (2011). Exploring Clustering for Multi-document Arabic Summarisation. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds) Information Retrieval Technology. AIRS 2011. Lecture Notes in Computer Science, vol 7097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25631-8_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25631-8_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25630-1

  • Online ISBN: 978-3-642-25631-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics