Abstract
In this paper we explore clustering for multi-document Arabic summarisation. For our evaluation we use an Arabic version of the DUC-2002 dataset that we previously generated using Google Translate. We explore how clustering (at the sentence level) can be applied to multi-document summarisation as well as for redundancy elimination within this process. We use different parameter settings including the cluster size and the selection model applied in the extractive summarisation process. The automatically generated summaries are evaluated using the ROUGE metric, as well as precision and recall. The results we achieve are compared with the top five systems in the DUC-2002 multi-document summarisation task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Funk, A., Maynard, D., Saggion, H., Bontcheva, K.: Ontological integration of information extracted from multiple sources. In: In the Multi-source Multilingual Information Extraction and Summarization (MMIES) Workshop at Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria (2007)
Berger, A., Mittal, V.O.: Query-relevant summarization using FAQs. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, ACL 2000, pp. 294–301. Association for Computational Linguistics, Stroudsburg (2000)
Brandow, R., Mitze, K., Rau, L.F.: Automatic condensation of electronic publications by sentence selection. Inf. Process. Manage. 31, 675–685 (1995)
Douzidia, F.S., Lapalme, G.: Lakhas, an Arabic summarising system. In: In the Proceedings of the Document Understanding Conferences (DUC) Workshop, pp. 128–135. DUC (2004)
Document Understanding Conference (DUC) dataset (2002), http://duc.nist.gov/
Dunlavy, D.M., O’Leary, D.P., Conroy, J.M., Schlesinger, J.D.: Qcs: A system for querying, clustering and summarizing documents. Inf. Process. Manage. 43, 1588–1605 (2007)
El-Haj, M., Kruschwitz, U., Fox, C.: Multi-document Arabic text summarisation. In: Proceedings of the third Computer science and Electronic Engineering Conference. IEEE, Colchester (2011)
Fattah, M.A., Ren, F.: Automatic text summarization. Proceedings of World Academy of Science 27, 192–195 (2008)
Fiszman, M., Demner-Fushman, D., Kilicoglu, H., Rindflesch, T.C.: Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation. Jouranl of Biomedical Informatics 42(5), 801–813 (2009)
Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in +Information Retrieval, SIGIR 2001, pp. 19–25. ACM, New York (2001)
Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. on Knowl. and Data Eng. 16, 1279–1296 (2004)
Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1), 100–108 (1979)
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: X Machine Translation Summit, pp. 79–86. Phuket, Thailand (2005)
Lin, C.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)
Liu, S., Lindroos, J.: Towards fast digestion of IMF staff reports with automated text summarization systems. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 978–982. IEEE Computer Society (2006)
McKeown, K., Radev, D.R.: Generating summaries of multiple news articles. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1995, pp. 74–82. ACM, New York (1995)
Mihalcea, R., Tarau, P.: Multi-Document Summarization with Iterative Graph-based Algorithms. In: The First International Conference on Intelligent Analysis Methods and Tools (IA 2005), McLean, VA (2005)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL 2002, pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002)
Radev, D.R., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, NAACL-ANLP-AutoSum 2000, vol. 4, pp. 21–30. Association for Computational Linguistics, Stroudsburg (2000)
Radev, D.R., Jing, H., Sty, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)
Salhi, H.: Small parallel corpora in an English-Arabic translation classroom: No need to reinvent the wheel in the era of globalization. In: Globalisation and Aspects of Translation, pp. 53–67. Cambridge Scholars Publishing, Newcastle (2010)
Salton, G., Wong, A., Yang, S.: A vector space model for automatic indexing. Proceedings of the Communications of the ACM 18(11), 613–620 (1975)
Sarkar, K.: Centroid-based summarization of multiple documents. TECHNIA — International Journal of Computing Science and Communication Technologies 2 (2009)
Schlesinger, J.D., O’Leary, D.P., Conroy, J.M.: Arabic/English Multi-document Summarization with CLASSY—The Past and the Future. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 568–581. Springer, Heidelberg (2008)
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The jrc-acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proceedings of LREC, Genova, Italy, pp. 24–26 (2006)
Turchi, M., Steinberger, J., Kabadjov, M., Steinberger, R.: Using Parallel Corpora for Multilingual (Multi-document) Summarisation Evaluation. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds.) CLEF 2010. LNCS, vol. 6360, pp. 52–63. Springer, Heidelberg (2010)
Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 299–306. ACM, New York (2008)
Yeh, J.Y., Ke, H.R., Yang, W.P.: iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network. Expert Systems with Applications 35(3), 1451–1462 (2008)
Zhao, L., Wu, L., Huang, X.: Using query expansion in graph-based approach for query-focused multi-document summarization. Inf. Process. Manage. 45, 35–41 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
El-Haj, M., Kruschwitz, U., Fox, C. (2011). Exploring Clustering for Multi-document Arabic Summarisation. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds) Information Retrieval Technology. AIRS 2011. Lecture Notes in Computer Science, vol 7097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25631-8_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-25631-8_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25630-1
Online ISBN: 978-3-642-25631-8
eBook Packages: Computer ScienceComputer Science (R0)