Exploring Clustering for Multi-document Arabic Summarisation

El-Haj, Mahmoud; Kruschwitz, Udo; Fox, Chris

doi:10.1007/978-3-642-25631-8_50

Mahmoud El-Haj²¹,
Udo Kruschwitz²¹ &
Chris Fox²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7097))

Included in the following conference series:

Asia Information Retrieval Symposium

1375 Accesses
14 Citations

Abstract

In this paper we explore clustering for multi-document Arabic summarisation. For our evaluation we use an Arabic version of the DUC-2002 dataset that we previously generated using Google Translate. We explore how clustering (at the sentence level) can be applied to multi-document summarisation as well as for redundancy elimination within this process. We use different parameter settings including the cluster size and the selection model applied in the extractive summarisation process. The automatically generated summaries are evaluated using the ROUGE metric, as well as precision and recall. The results we achieve are compared with the top five systems in the DUC-2002 multi-document summarisation task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Funk, A., Maynard, D., Saggion, H., Bontcheva, K.: Ontological integration of information extracted from multiple sources. In: In the Multi-source Multilingual Information Extraction and Summarization (MMIES) Workshop at Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria (2007)
Google Scholar
Berger, A., Mittal, V.O.: Query-relevant summarization using FAQs. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, ACL 2000, pp. 294–301. Association for Computational Linguistics, Stroudsburg (2000)
Google Scholar
Brandow, R., Mitze, K., Rau, L.F.: Automatic condensation of electronic publications by sentence selection. Inf. Process. Manage. 31, 675–685 (1995)
Article Google Scholar
Douzidia, F.S., Lapalme, G.: Lakhas, an Arabic summarising system. In: In the Proceedings of the Document Understanding Conferences (DUC) Workshop, pp. 128–135. DUC (2004)
Google Scholar
Document Understanding Conference (DUC) dataset (2002), http://duc.nist.gov/
Dunlavy, D.M., O’Leary, D.P., Conroy, J.M., Schlesinger, J.D.: Qcs: A system for querying, clustering and summarizing documents. Inf. Process. Manage. 43, 1588–1605 (2007)
Article Google Scholar
El-Haj, M., Kruschwitz, U., Fox, C.: Multi-document Arabic text summarisation. In: Proceedings of the third Computer science and Electronic Engineering Conference. IEEE, Colchester (2011)
Google Scholar
Fattah, M.A., Ren, F.: Automatic text summarization. Proceedings of World Academy of Science 27, 192–195 (2008)
Google Scholar
Fiszman, M., Demner-Fushman, D., Kilicoglu, H., Rindflesch, T.C.: Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation. Jouranl of Biomedical Informatics 42(5), 801–813 (2009)
Article Google Scholar
Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in +Information Retrieval, SIGIR 2001, pp. 19–25. ACM, New York (2001)
Google Scholar
Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. on Knowl. and Data Eng. 16, 1279–1296 (2004)
Article Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1), 100–108 (1979)
MATH Google Scholar
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: X Machine Translation Summit, pp. 79–86. Phuket, Thailand (2005)
Google Scholar
Lin, C.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)
Google Scholar
Liu, S., Lindroos, J.: Towards fast digestion of IMF staff reports with automated text summarization systems. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 978–982. IEEE Computer Society (2006)
Google Scholar
McKeown, K., Radev, D.R.: Generating summaries of multiple news articles. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1995, pp. 74–82. ACM, New York (1995)
Google Scholar
Mihalcea, R., Tarau, P.: Multi-Document Summarization with Iterative Graph-based Algorithms. In: The First International Conference on Intelligent Analysis Methods and Tools (IA 2005), McLean, VA (2005)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL 2002, pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar
Radev, D.R., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, NAACL-ANLP-AutoSum 2000, vol. 4, pp. 21–30. Association for Computational Linguistics, Stroudsburg (2000)
Google Scholar
Radev, D.R., Jing, H., Sty, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)
Article MATH Google Scholar
Salhi, H.: Small parallel corpora in an English-Arabic translation classroom: No need to reinvent the wheel in the era of globalization. In: Globalisation and Aspects of Translation, pp. 53–67. Cambridge Scholars Publishing, Newcastle (2010)
Google Scholar
Salton, G., Wong, A., Yang, S.: A vector space model for automatic indexing. Proceedings of the Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Sarkar, K.: Centroid-based summarization of multiple documents. TECHNIA — International Journal of Computing Science and Communication Technologies 2 (2009)
Google Scholar
Schlesinger, J.D., O’Leary, D.P., Conroy, J.M.: Arabic/English Multi-document Summarization with CLASSY—The Past and the Future. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 568–581. Springer, Heidelberg (2008)
Chapter Google Scholar
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The jrc-acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proceedings of LREC, Genova, Italy, pp. 24–26 (2006)
Google Scholar
Turchi, M., Steinberger, J., Kabadjov, M., Steinberger, R.: Using Parallel Corpora for Multilingual (Multi-document) Summarisation Evaluation. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds.) CLEF 2010. LNCS, vol. 6360, pp. 52–63. Springer, Heidelberg (2010)
Chapter Google Scholar
Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 299–306. ACM, New York (2008)
Google Scholar
Yeh, J.Y., Ke, H.R., Yang, W.P.: iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network. Expert Systems with Applications 35(3), 1451–1462 (2008)
Article Google Scholar
Zhao, L., Wu, L., Huang, X.: Using query expansion in graph-based approach for query-focused multi-document summarization. Inf. Process. Manage. 45, 35–41 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Electronic Engineering, University of Essex, United Kingdom
Mahmoud El-Haj, Udo Kruschwitz & Chris Fox

Authors

Mahmoud El-Haj
View author publications
You can also search for this author in PubMed Google Scholar
Udo Kruschwitz
View author publications
You can also search for this author in PubMed Google Scholar
Chris Fox
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computer Science and Engineering, University of Wollongong, Dubai Knowledge Village, P.O. Box 20182, Dubai, United Arab Emirates
Mohamed Vall Mohamed Salem
Faculty of Engineering and IT, Dubai International Academic City, Block 11, 1st and 2nd Floor, P.O. Box 345015, Dubai, United Arab Emirates
Khaled Shaalan
Faculty of Computer Science and Engineering, University of Wollongong, Dubai Knowledge Village, P.O. Box 20183, Dubai, United Arab Emirates
Farhad Oroumchian
Department of Electrical and Computer Engineering, University of Tehran, Faculty of Engineering, North Kargar Street, P.O. Box 14395-515, Tehran, Iran
Azadeh Shakery
Faculty of Computer Science and Engineering, University of Wollongong, Dubai knowledge Village, P.O. Box 20183, Dubai, United Arab Emirates
Halim Khelalfa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El-Haj, M., Kruschwitz, U., Fox, C. (2011). Exploring Clustering for Multi-document Arabic Summarisation. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds) Information Retrieval Technology. AIRS 2011. Lecture Notes in Computer Science, vol 7097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25631-8_50

Download citation

DOI: https://doi.org/10.1007/978-3-642-25631-8_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25630-1
Online ISBN: 978-3-642-25631-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics