research-article

A Comparative Analysis on Hindi and English Extractive Text Summarization

Authors:
Pradeepika Verma

Indian Institute of Technology (Indian School of Mines) Dhanbad, Dhanbad, India

Indian Institute of Technology (Indian School of Mines) Dhanbad, Dhanbad, India
View Profile

,
Sukomal Pal

Indian Institute of Technology (Banaras Hindu University) varanasi, Varanasi, India

Indian Institute of Technology (Banaras Hindu University) varanasi, Varanasi, India
View Profile

,
Hari Om

Indian Institute of Technology (Indian School of Mines) Dhanbad, Dhanbad, India

Indian Institute of Technology (Indian School of Mines) Dhanbad, Dhanbad, India
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 18 Issue 3Article No.: 30pp 1–39https://doi.org/10.1145/3308754

Published:09 May 2019Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English text datasets of news articles. We consider 13 different summarization techniques, namely, TextRank, LexRank, Luhn, LSA, Edmundson, ChunkRank, TGraph, UniRank, NN-ED, NN-SE, FE-SE, SummaRuNNer, and MMR-SE, and we evaluate their performance using various performance metrics, such as precision, recall, F₁, cohesion, non-redundancy, readability, and significance. A thorough analysis is done in eight different parts that exhibits the strengths and limitations of these methods, effect of performance over the summary length, impact of language of a document, and other factors as well. A standard summary evaluation tool (ROUGE) and extensive programmatic evaluation using Python 3.5 in Anaconda environment are used to evaluate their outcome.

References

Hans Peter Luhn. 1958. The automatic creation of literature abstracts. IBM J. Res. Dev. 2, 2, 159--165. Google ScholarDigital Library
Dipanjan Das and Andre F. T. Martins. 2007. A survey on automatic text summarization. Lit. Survey Lang. Stat. 4, 192--195.Google Scholar
Ehsan Shareghi and Leila Sharif Hassanabadi. 2008. Text summarization with harmony search algorithm-based sentence extraction. Proceedings of the 5th International Conference on Soft Computing as Transdisciplinary Science and Technology. ACM. 226--231. Google ScholarDigital Library
K. Sankar and L. Sobha. 2009. An approach to text summarization. Proceedings of the 3rd International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies. ACL. 53--60. Google ScholarDigital Library
Daraksha Parveen, Mohsen Mesgar, and Michael Strube. 2016. Generating coherent summaries of scientific articles using coherence patterns. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 773--783.Google ScholarCross Ref
Pradeepika Verma and Hari Om. 2019. MCRMR: Maximum coverage and relevancy with minimal redundancy-based multi-document summarization. Expert Syst. Appl. 120, 43--56.Google ScholarCross Ref
Harold P. Edmundson. 1969. New methods in automatic extracting. J. ACM 16, 2, 264--285. Google ScholarDigital Library
Gunes Erkan and Dragomir R. Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artific. Intell. Res. 22, 457--479. Google ScholarDigital Library
Josef Steinberger and Karel Jezek. 2004. Using latent semantic analysis in text summarization and summary evaluation. Proceedings of the International Conference on Information System Implementation and Modeling (ISIM’04). 93--100.Google Scholar
Rafael Ferreira, Luciano de Souza Cabral, Rafael Dueire Lins, Gabriel Pereira e Silva, Fred Freitas, George D. C. Cavalcanti, Rinaldo Lima, Steven J. Simske, and Luciano Favaro. 2013. Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40, 14, 5755--5764.Google ScholarCross Ref
Sandeep Sripada, Venu Gopal Kasturi, and Gautam Kumar Parai. 2005. Multi-document extraction-based Summarization. CS 224N, Final Project. https://nlp.stanford.edu/courses/cs224n/2010/reports/ssandeep-venuk-gkparai.pdf.Google Scholar
Xiaojun Wan. 2010. Towards a unified approach to simultaneous single-document and multi-document summarizations. In Proceedings of the 23rd International Conference on Computational Linguistics. ACL. 1137--1145. Google ScholarDigital Library
Janara Christensen, Stephen Soderland, and Oren Etzioni. 2013. Towards coherent multi-document summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1163--1173.Google Scholar
Daraksha Parveen, Hans-Martin Ramsl, and Michael Strube. 2015. Topical coherence for graph-based extractive summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1949--1954.Google ScholarCross Ref
Pradeepika Verma and Hari Om. 2019. Collaborative ranking-based text summarization using a metaheuristic approach. In Proceedings of the Emerging Technologies in Data Mining and Information Security. Springer. 417--426.Google ScholarCross Ref
Hayato Kobayashi, Masaki Noguchi, and Taichi Yatsuka. 2015. Summarization based on embedding distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL. 1984--1989.Google ScholarCross Ref
Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252.Google Scholar
Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A recurrent neural network-based sequence model for extractive summarization of documents. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI’17). 3075--3081. Google ScholarDigital Library
Rasim M. Alguliev, Ramiz M. Aliguliyev, Makrufa S. Hajirahimova, and Chingiz A. Mehdiyev. 2011. MCMR: Maximum coverage and minimum redundant text summarization model. Expert Syst. Appl. 38, 12, 14514--14522. Google ScholarDigital Library
Rasim M. Alguliev, Ramiz M. Aliguliyev, and Nijat R. Isazade. 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Syst. Appl. 40, 5, 1675--1689. Google ScholarDigital Library
Atif Khan, Naomie Salim, and Yogan Jaya Kumar. 2015. A framework for multi-document abstractive summarization based on semantic role labelling. Appl. Soft Comput. 30, 737--747. Google ScholarDigital Library
Razieh Abbasi-ghalehtaki, Hassan Khotanlou, and Mansour Esmaeilpour. 2016. Fuzzy evolutionary cellular learning automata model for text summarization. Swarm Evolution. Comput. 30, 11--26.Google ScholarCross Ref
Rasmita Rautray and Rakesh Chandra Balabantaray. 2017. Cat swarm optimization-based evolutionary framework for multi document summarization. Physica A: Stat. Mech. Appl. 477, 174--186.Google ScholarCross Ref
Pradeepika Verma and Hari Om. 2019. A variable dimension optimization approach for text summarization. In Proceedings of the Harmony Search and Nature Inspired Optimization Algorithms. Springer. 687--696.Google ScholarCross Ref
Vishal Gupta and Gurpreet Singh Lehal. 2010. A survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2, 3, 258--268.Google Scholar
Mahak Gambhir and Vishal Gupta. 2017. Recent automatic text summarization techniques: A survey. Artific. Intell. Rev. 47, 1, 1--66. Google ScholarDigital Library
N. Moratanch and S. Chitrakala. 2016. A survey on abstractive text summarization. In Proceedings of the Conference on Circuit, Power and Computing Technologies (ICCPCT’16). IEEE. 1--7.Google Scholar
Christopher C. Yang and Kar Wing Li. 2003. Automatic construction of English/Chinese parallel corpora. J. Amer. Soc. Info. Sci. Technol. 54, 8, 730--742. Google ScholarDigital Library
Eduard Hovy and Chin-Yew Lin. 1998. Automated text summarization and the SUMMARIST system. In Proceedings of the Association for Computational Linguistics Workshop. ACL. 13--15. Google ScholarDigital Library
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
Chin-Yew Lin. 2004. Looking for a few good metrics: Automatic summarization evaluation—How many samples are enough? In Proceedings of NII Testbeds and Community for Information Access Research.Google Scholar
Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. 2010. Opinosis: A graph-based approach to abstractive summarization of highly redundant opinions. Proceedings of the 23rd International Conference on Computational Linguistics. ACL. 340--348. Google ScholarDigital Library
Feng Jin, Minlie Huang, and Xiaoyan Zhu. 2010. A comparative study on ranking and selection strategies for multi-document summarization. In Proceedings of the 23rd International Conference on Computational Linguistics. ACL. 525--533. Google ScholarDigital Library
Eleni Galiotou, Nikitas Karanikolas, and Christodoulos Tsoulloftas. 2013. On the effect of stemming algorithms on extractive summarization: A case study. Proceedings of the 17th Panhellenic Conference on Informatics. ACM. 300--304. Google ScholarDigital Library
P. M. Dhanya and M. Jathavedan. 2013. Comparative study of text summarization in Indian Languages. Int. J. Comput. Appl. 75, 6.Google Scholar
K. Vimal Kumar, Divakar Yadav, and Arun Sharma. 2015. Graph-based technique for hindi text summarization. Information Systems Design and Intelligent Applications. Springer, New Delhi, 301--310.Google Scholar
K. Vimal Kumar and Divakar Yadav. 2015. An improvised extractive approach to hindi text summarization. Information Systems Design and Intelligent Applications. Springer, New Delhi, 291--300.Google Scholar
C. Sunitha, A. Jaya, and Amal Ganesh. 2016. A study on abstractive summarization techniques in indian languages. Procedia Comput. Sci. 87, 25--31.Google ScholarCross Ref
Pradeepika Verma and Hari Om. 2016. Extraction-based text summarization methods on user’s review data: A comparative study. In Proceedings of the Conference on Smart Trends for Information Technology and Computer Communications. Springer, Singapore. 346--354.Google ScholarCross Ref
Inderjeet Mani and Mark T. Maybury. 1999. Advances in Automatic Text Summarization. MIT Press. Google ScholarDigital Library
Jade Goldstein and Jaime Carbonell. 1998. Summarization: (1) using MMR for diversity-based reranking and (2) evaluating summaries. Proceedings of the Association for Computational Linguistics Workshop. ACL. 181--195. Google ScholarDigital Library
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google Scholar
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out.Google Scholar
Michael Alexander Kirkwood Halliday and Ruqaiya Hasan. 2014. Cohesion in English. Routledge.Google Scholar
Houda Oufaida, Omar Nouali, and Philippe Blache. 2014. Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J. King Saud Univ.-Comput. Info. Sci. 26, 4, 450--461. Google ScholarDigital Library
Jade Goldstein, Vibhu Mittal, Jaime Carbonell, and Mark Kantrowitz. 2000. Multi-document summarization by sentence extraction. In Proceedings of the NAACL-ANLP Workshop on Automatic Summarization. ACL. 40--48. Google ScholarDigital Library
Ondrej Bojar, Vojtech Diatka, Pavel Rychly, Pavel Stranik, Vat Suchomel, Ales Tamchyna, and Daniel Zeman. 2014. HindEnCorp-Hindi-English and Hindi-only corpus for machine translation. In Proceedings of the Language Resources and Evaluation Conference (LREC’14). 3550--3555.Google Scholar
William H. DuBay. 2004. The Principles of Readability. ERIC. Online Submission. https://files.eric.ed.gov/fulltext/ED490073.pdf.Google Scholar
Ray R. Larson. 2010. Introduction to information retrieval. J. Amer. Soc. Info. Sci. Technol. 4, 852--853. Google ScholarDigital Library

Index Terms

A Comparative Analysis on Hindi and English Extractive Text Summarization
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems

Recommendations

Metaheuristic Optimization Using Sentence Level Semantics for Extractive Document Summarization
MIKE 2015: Proceedings of the Third International Conference on Mining Intelligence and Knowledge Exploration - Volume 9468

Multi document summarization is the process of automatic creation of a summary of one or more text documents. We developed a multi-document summarization system which generate an extractive generic summary with maximum relevance and minimum redundancy. ...
Read More
Automatic Extractive Text Summarization using Multiple Linguistic Features
Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for ...
Read More
RankSum—An unsupervised extractive text summarization based on rank fusion
Abstract
In this paper, we propose Ranksum, an approach for extractive text summarization of single documents based on the rank fusion of four multi-dimensional sentence features extracted for each sentence: topic information, semantic content, ...
Graphical abstract

Display Omitted
Highlights
- A unified summarization framework with multi-dimensional sentence features.
- ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 18, Issue 3
September 2019
386 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3305347
Editor:
Nianwen Xue
Brandeis University, Waltham, USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2019
- Accepted: 1 January 2019
- Revised: 1 October 2018
- Received: 1 September 2017
Published in tallip Volume 18, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ROUGE
Text summarization
graph-based techniques
latent semantic analysis
meta-heuristic-based techniques
neural networks-based techniques
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 556
  Total Downloads
- Downloads (Last 12 months)108
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Comparative Analysis on Hindi and English Extractive Text Summarization

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Metaheuristic Optimization Using Sentence Level Semantics for Extractive Document Summarization

Automatic Extractive Text Summarization using Multiple Linguistic Features

RankSum—An unsupervised extractive text summarization based on rank fusion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Comparative Analysis on Hindi and English Extractive Text Summarization

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Metaheuristic Optimization Using Sentence Level Semantics for Extractive Document Summarization

Automatic Extractive Text Summarization using Multiple Linguistic Features

RankSum—An unsupervised extractive text summarization based on rank fusion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media