Abstract
Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English text datasets of news articles. We consider 13 different summarization techniques, namely, TextRank, LexRank, Luhn, LSA, Edmundson, ChunkRank, TGraph, UniRank, NN-ED, NN-SE, FE-SE, SummaRuNNer, and MMR-SE, and we evaluate their performance using various performance metrics, such as precision, recall, F1, cohesion, non-redundancy, readability, and significance. A thorough analysis is done in eight different parts that exhibits the strengths and limitations of these methods, effect of performance over the summary length, impact of language of a document, and other factors as well. A standard summary evaluation tool (ROUGE) and extensive programmatic evaluation using Python 3.5 in Anaconda environment are used to evaluate their outcome.
- Hans Peter Luhn. 1958. The automatic creation of literature abstracts. IBM J. Res. Dev. 2, 2, 159--165. Google ScholarDigital Library
- Dipanjan Das and Andre F. T. Martins. 2007. A survey on automatic text summarization. Lit. Survey Lang. Stat. 4, 192--195.Google Scholar
- Ehsan Shareghi and Leila Sharif Hassanabadi. 2008. Text summarization with harmony search algorithm-based sentence extraction. Proceedings of the 5th International Conference on Soft Computing as Transdisciplinary Science and Technology. ACM. 226--231. Google ScholarDigital Library
- K. Sankar and L. Sobha. 2009. An approach to text summarization. Proceedings of the 3rd International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies. ACL. 53--60. Google ScholarDigital Library
- Daraksha Parveen, Mohsen Mesgar, and Michael Strube. 2016. Generating coherent summaries of scientific articles using coherence patterns. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 773--783.Google ScholarCross Ref
- Pradeepika Verma and Hari Om. 2019. MCRMR: Maximum coverage and relevancy with minimal redundancy-based multi-document summarization. Expert Syst. Appl. 120, 43--56.Google ScholarCross Ref
- Harold P. Edmundson. 1969. New methods in automatic extracting. J. ACM 16, 2, 264--285. Google ScholarDigital Library
- Gunes Erkan and Dragomir R. Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artific. Intell. Res. 22, 457--479. Google ScholarDigital Library
- Josef Steinberger and Karel Jezek. 2004. Using latent semantic analysis in text summarization and summary evaluation. Proceedings of the International Conference on Information System Implementation and Modeling (ISIM’04). 93--100.Google Scholar
- Rafael Ferreira, Luciano de Souza Cabral, Rafael Dueire Lins, Gabriel Pereira e Silva, Fred Freitas, George D. C. Cavalcanti, Rinaldo Lima, Steven J. Simske, and Luciano Favaro. 2013. Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40, 14, 5755--5764.Google ScholarCross Ref
- Sandeep Sripada, Venu Gopal Kasturi, and Gautam Kumar Parai. 2005. Multi-document extraction-based Summarization. CS 224N, Final Project. https://nlp.stanford.edu/courses/cs224n/2010/reports/ssandeep-venuk-gkparai.pdf.Google Scholar
- Xiaojun Wan. 2010. Towards a unified approach to simultaneous single-document and multi-document summarizations. In Proceedings of the 23rd International Conference on Computational Linguistics. ACL. 1137--1145. Google ScholarDigital Library
- Janara Christensen, Stephen Soderland, and Oren Etzioni. 2013. Towards coherent multi-document summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1163--1173.Google Scholar
- Daraksha Parveen, Hans-Martin Ramsl, and Michael Strube. 2015. Topical coherence for graph-based extractive summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1949--1954.Google ScholarCross Ref
- Pradeepika Verma and Hari Om. 2019. Collaborative ranking-based text summarization using a metaheuristic approach. In Proceedings of the Emerging Technologies in Data Mining and Information Security. Springer. 417--426.Google ScholarCross Ref
- Hayato Kobayashi, Masaki Noguchi, and Taichi Yatsuka. 2015. Summarization based on embedding distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL. 1984--1989.Google ScholarCross Ref
- Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252.Google Scholar
- Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A recurrent neural network-based sequence model for extractive summarization of documents. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI’17). 3075--3081. Google ScholarDigital Library
- Rasim M. Alguliev, Ramiz M. Aliguliyev, Makrufa S. Hajirahimova, and Chingiz A. Mehdiyev. 2011. MCMR: Maximum coverage and minimum redundant text summarization model. Expert Syst. Appl. 38, 12, 14514--14522. Google ScholarDigital Library
- Rasim M. Alguliev, Ramiz M. Aliguliyev, and Nijat R. Isazade. 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Syst. Appl. 40, 5, 1675--1689. Google ScholarDigital Library
- Atif Khan, Naomie Salim, and Yogan Jaya Kumar. 2015. A framework for multi-document abstractive summarization based on semantic role labelling. Appl. Soft Comput. 30, 737--747. Google ScholarDigital Library
- Razieh Abbasi-ghalehtaki, Hassan Khotanlou, and Mansour Esmaeilpour. 2016. Fuzzy evolutionary cellular learning automata model for text summarization. Swarm Evolution. Comput. 30, 11--26.Google ScholarCross Ref
- Rasmita Rautray and Rakesh Chandra Balabantaray. 2017. Cat swarm optimization-based evolutionary framework for multi document summarization. Physica A: Stat. Mech. Appl. 477, 174--186.Google ScholarCross Ref
- Pradeepika Verma and Hari Om. 2019. A variable dimension optimization approach for text summarization. In Proceedings of the Harmony Search and Nature Inspired Optimization Algorithms. Springer. 687--696.Google ScholarCross Ref
- Vishal Gupta and Gurpreet Singh Lehal. 2010. A survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2, 3, 258--268.Google Scholar
- Mahak Gambhir and Vishal Gupta. 2017. Recent automatic text summarization techniques: A survey. Artific. Intell. Rev. 47, 1, 1--66. Google ScholarDigital Library
- N. Moratanch and S. Chitrakala. 2016. A survey on abstractive text summarization. In Proceedings of the Conference on Circuit, Power and Computing Technologies (ICCPCT’16). IEEE. 1--7.Google Scholar
- Christopher C. Yang and Kar Wing Li. 2003. Automatic construction of English/Chinese parallel corpora. J. Amer. Soc. Info. Sci. Technol. 54, 8, 730--742. Google ScholarDigital Library
- Eduard Hovy and Chin-Yew Lin. 1998. Automated text summarization and the SUMMARIST system. In Proceedings of the Association for Computational Linguistics Workshop. ACL. 13--15. Google ScholarDigital Library
- Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
- Chin-Yew Lin. 2004. Looking for a few good metrics: Automatic summarization evaluation—How many samples are enough? In Proceedings of NII Testbeds and Community for Information Access Research.Google Scholar
- Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. 2010. Opinosis: A graph-based approach to abstractive summarization of highly redundant opinions. Proceedings of the 23rd International Conference on Computational Linguistics. ACL. 340--348. Google ScholarDigital Library
- Feng Jin, Minlie Huang, and Xiaoyan Zhu. 2010. A comparative study on ranking and selection strategies for multi-document summarization. In Proceedings of the 23rd International Conference on Computational Linguistics. ACL. 525--533. Google ScholarDigital Library
- Eleni Galiotou, Nikitas Karanikolas, and Christodoulos Tsoulloftas. 2013. On the effect of stemming algorithms on extractive summarization: A case study. Proceedings of the 17th Panhellenic Conference on Informatics. ACM. 300--304. Google ScholarDigital Library
- P. M. Dhanya and M. Jathavedan. 2013. Comparative study of text summarization in Indian Languages. Int. J. Comput. Appl. 75, 6.Google Scholar
- K. Vimal Kumar, Divakar Yadav, and Arun Sharma. 2015. Graph-based technique for hindi text summarization. Information Systems Design and Intelligent Applications. Springer, New Delhi, 301--310.Google Scholar
- K. Vimal Kumar and Divakar Yadav. 2015. An improvised extractive approach to hindi text summarization. Information Systems Design and Intelligent Applications. Springer, New Delhi, 291--300.Google Scholar
- C. Sunitha, A. Jaya, and Amal Ganesh. 2016. A study on abstractive summarization techniques in indian languages. Procedia Comput. Sci. 87, 25--31.Google ScholarCross Ref
- Pradeepika Verma and Hari Om. 2016. Extraction-based text summarization methods on user’s review data: A comparative study. In Proceedings of the Conference on Smart Trends for Information Technology and Computer Communications. Springer, Singapore. 346--354.Google ScholarCross Ref
- Inderjeet Mani and Mark T. Maybury. 1999. Advances in Automatic Text Summarization. MIT Press. Google ScholarDigital Library
- Jade Goldstein and Jaime Carbonell. 1998. Summarization: (1) using MMR for diversity-based reranking and (2) evaluating summaries. Proceedings of the Association for Computational Linguistics Workshop. ACL. 181--195. Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google Scholar
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out.Google Scholar
- Michael Alexander Kirkwood Halliday and Ruqaiya Hasan. 2014. Cohesion in English. Routledge.Google Scholar
- Houda Oufaida, Omar Nouali, and Philippe Blache. 2014. Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J. King Saud Univ.-Comput. Info. Sci. 26, 4, 450--461. Google ScholarDigital Library
- Jade Goldstein, Vibhu Mittal, Jaime Carbonell, and Mark Kantrowitz. 2000. Multi-document summarization by sentence extraction. In Proceedings of the NAACL-ANLP Workshop on Automatic Summarization. ACL. 40--48. Google ScholarDigital Library
- Ondrej Bojar, Vojtech Diatka, Pavel Rychly, Pavel Stranik, Vat Suchomel, Ales Tamchyna, and Daniel Zeman. 2014. HindEnCorp-Hindi-English and Hindi-only corpus for machine translation. In Proceedings of the Language Resources and Evaluation Conference (LREC’14). 3550--3555.Google Scholar
- William H. DuBay. 2004. The Principles of Readability. ERIC. Online Submission. https://files.eric.ed.gov/fulltext/ED490073.pdf.Google Scholar
- Ray R. Larson. 2010. Introduction to information retrieval. J. Amer. Soc. Info. Sci. Technol. 4, 852--853. Google ScholarDigital Library
Index Terms
- A Comparative Analysis on Hindi and English Extractive Text Summarization
Recommendations
Metaheuristic Optimization Using Sentence Level Semantics for Extractive Document Summarization
MIKE 2015: Proceedings of the Third International Conference on Mining Intelligence and Knowledge Exploration - Volume 9468Multi document summarization is the process of automatic creation of a summary of one or more text documents. We developed a multi-document summarization system which generate an extractive generic summary with maximum relevance and minimum redundancy. ...
Automatic Extractive Text Summarization using Multiple Linguistic Features
Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for ...
RankSum—An unsupervised extractive text summarization based on rank fusion
AbstractIn this paper, we propose Ranksum, an approach for extractive text summarization of single documents based on the rank fusion of four multi-dimensional sentence features extracted for each sentence: topic information, semantic content, ...
Graphical abstractDisplay Omitted
Highlights- A unified summarization framework with multi-dimensional sentence features.
- ...
Comments